Helicone Community Page

Updated 12 months ago

Monitoring Hosted Llm Inference Workloads

Hi Helicone Team,

Hope your're well πŸ™‚ !
So, a question. I am moving all my LLM inference workloads to my own hosted servers at my home.
I have a static IP and they are all OpenAI compatible endpoints.

So instead of this http://oai.hconeai.com/v1. I have http://my-static-ip:8080/
I have the managed helicone account. And I would like to still use it for monitoring purposes.

What is the best way to accomplish this? Is it the gateways?
And if yes, is there a way to overcome to 1 request per second restriction?

I know maybe this is a very specific use case. But would appreciate any hints here.
Attachment
image.png
J
a
5 comments
hi @adi ! I am so sorry for the delay here!
We can totally work something out. Are you free to hop on a call?
Would you mind if we scheduled a quick call to resolve this?

https://calendly.com/justintorre/1-1-justin-1
You can schedule some time above or we can chat Async on discord
No worries. Sorry, I saw this just now. Thank you so much for the offer.
I scheduled a call for us for tomorrow.
Add a reply
Sign up and join the conversation on Discord