Hope your're well π ! So, a question. I am moving all my LLM inference workloads to my own hosted servers at my home. I have a static IP and they are all OpenAI compatible endpoints.
So instead of this http://oai.hconeai.com/v1. I have http://my-static-ip:8080/ I have the managed helicone account. And I would like to still use it for monitoring purposes.
What is the best way to accomplish this? Is it the gateways? And if yes, is there a way to overcome to 1 request per second restriction?
I know maybe this is a very specific use case. But would appreciate any hints here.