Helicone Community Page

Updated 3 months ago

Since 2024/06/06 at 11:30pm UTC cache: true on embeddings hangs for 335-340 seconds

I've been using Helicone for awhile and I'm sure these are on queries that are cache hits given the content/context. Here's the curl request:

Plain Text
curl -X "POST" "https://oai.hconeai.com/v1/embeddings" \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer sk-XXX' \
     -H 'Openai-Organization: org-XXX' \
     -H 'Helicone-Auth: Bearer sk-XXX' \
     -H 'Helicone-Cache-Enabled: true' \
     -d $'{
  "model": "text-embedding-ada-002",
  "input": "Can I drink alcohol while pregnant?"
}'


If I switch cache enabled to false it returns as expected.
1
S
J
C
13 comments
based on timestamp merged and PR title this looks like a likely culprit https://github.com/Helicone/helicone/pull/2067
Hi I am taking a look now
My Helicone Org ID is 08ebd77a-72a0-44c5-8b94-b6b2923439b5
I've narrowed it down to a very odd behavior that only exists when the cache header is set to true.

To replicate, try out these two different curl requests (and make sure to keep the formatting intact):

Normal expected behavior:
Plain Text
curl -X "POST" https://oai.hconeai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-XXX' \
  -H 'OpenAI-Organization: org-XXX' \
  -H 'Helicone-Auth: Bearer sk-XXX' \
  -H 'Helicone-Cache-Enabled: true' \
  -d $'{
    "model":"text-embedding-ada-002",
    "input":"Can I drink alcohol while pregnant?"
    }'


Unexpected behavior with same header content (slowly streamed response):
Plain Text
  curl -X "POST" https://oai.hconeai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-XXX' \
  -H 'OpenAI-Organization: org-XXX' \
  -H 'Helicone-Auth: Bearer sk-XXX' \
  -H 'Helicone-Cache-Enabled: true' \
  -d $'{"model":"text-embedding-ada-002","input":"Can I drink alcohol while pregnant?"}'
This is not evident when using the same requests against openai directly
Thank you, I was able to recreate it. is looking into this at the moment.
Hi just wanted to confirm the duration of delay. Is it 300+ seconds or milliseconds?
but in those two curl requests it looks like a fast time to first byte and then it streams the vector line by line
So that 300+ second delay may be representative of something else
Hey could you hop on a call with me and we can debug this live?
Please book a call via https://cal.com/yashkarthik or lmk what time works best for you
sounds good. booked 15 min for today
Add a reply
Sign up and join the conversation on Discord