Monitoring Hosted Llm Inference Workloads

At a glance

Hi Helicone Team,

Hope your're well 🙂 !
So, a question. I am moving all my LLM inference workloads to my own hosted servers at my home.
I have a static IP and they are all OpenAI compatible endpoints.

So instead of this http://oai.hconeai.com/v1. I have http://my-static-ip:8080/
I have the managed helicone account. And I would like to still use it for monitoring purposes.

What is the best way to accomplish this? Is it the gateways?
And if yes, is there a way to overcome to 1 request per second restriction?

I know maybe this is a very specific use case. But would appreciate any hints here.

Attachment

5 comments

JJustin

hi @adi ! I am so sorry for the delay here!

JJustin

We can totally work something out. Are you free to hop on a call?

JJustin

Would you mind if we scheduled a quick call to resolve this?

https://calendly.com/justintorre/1-1-justin-1

JJustin

You can schedule some time above or we can chat Async on discord

aadi

No worries. Sorry, I saw this just now. Thank you so much for the offer.
I scheduled a call for us for tomorrow.

Add a reply

Helicone Community Page

Monitoring Hosted Llm Inference Workloads