Helicone Community Page

Updated 4 months ago

Helicone's impact on latency in LLM calls

Am I crazy or does Helicone add a bunch of latency to every single LLM call? I'm using the Vercel AI SDK
J
D
18 comments
Hi David! It should not, unless you are using Cache or Rate Limiting
ah okay, my understanding is that we need to hit your server first and then you forward to whatever LLM provider. But maybe I don't get what the baseUrl param does
how can I check whether I'm using cache or rate limiting? feel free to share a link and happy to do some reading
If you want to use our async integration and pass traffic through Helicone you can use our OpenLLMetry integration

https://docs.helicone.ai/getting-started/integration-method/openllmetry
this link is broken i think
Attachment
image.png
in the last page you sent
does this sdk require a node environment or can it run on edge runtimes?
is this intuition correct?
Hye @David Alonso great question! I am not sure, I am double checking with the OpenLLMetry team
nice, but then there is added network latency right? so the async method would lead to faster inference iiuc
well faster response time for the user i mean
That's correct, but it should really be marginal.
I assume they haven’t replied, but super interested to hear back!
Thanks for following up, their answer was not very helpful lol
What edge time environment are you looking to run this in @David Alonso
Add a reply
Sign up and join the conversation on Discord