Since 2024/06/06 at 11:30pm UTC cache: true on embeddin...

SScotter

I've been using Helicone for awhile and I'm sure these are on queries that are cache hits given the content/context. Here's the curl request:

Plain Text

curl -X "POST" "https://oai.hconeai.com/v1/embeddings" \
     -H 'Content-Type: application/json' \
     -H 'Authorization: Bearer sk-XXX' \
     -H 'Openai-Organization: org-XXX' \
     -H 'Helicone-Auth: Bearer sk-XXX' \
     -H 'Helicone-Cache-Enabled: true' \
     -d $'{
  "model": "text-embedding-ada-002",
  "input": "Can I drink alcohol while pregnant?"
}'

If I switch cache enabled to false it returns as expected.

13 comments

SScotter

based on timestamp merged and PR title this looks like a likely culprit https://github.com/Helicone/helicone/pull/2067

JJustin

Hi I am taking a look now

SScotter

My Helicone Org ID is 08ebd77a-72a0-44c5-8b94-b6b2923439b5

SScotter

I've narrowed it down to a very odd behavior that only exists when the cache header is set to true.

To replicate, try out these two different curl requests (and make sure to keep the formatting intact):

Normal expected behavior:

Plain Text

curl -X "POST" https://oai.hconeai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-XXX' \
  -H 'OpenAI-Organization: org-XXX' \
  -H 'Helicone-Auth: Bearer sk-XXX' \
  -H 'Helicone-Cache-Enabled: true' \
  -d $'{
    "model":"text-embedding-ada-002",
    "input":"Can I drink alcohol while pregnant?"
    }'

Unexpected behavior with same header content (slowly streamed response):

Plain Text

  curl -X "POST" https://oai.hconeai.com/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer sk-XXX' \
  -H 'OpenAI-Organization: org-XXX' \
  -H 'Helicone-Auth: Bearer sk-XXX' \
  -H 'Helicone-Cache-Enabled: true' \
  -d $'{"model":"text-embedding-ada-002","input":"Can I drink alcohol while pregnant?"}'

SScotter

This is not evident when using the same requests against openai directly

CCole

Thank you, I was able to recreate it. is looking into this at the moment.

yyash karthik

Hi just wanted to confirm the duration of delay. Is it 300+ seconds or milliseconds?

SScotter

300+ seconds

SScotter

but in those two curl requests it looks like a fast time to first byte and then it streams the vector line by line

SScotter

So that 300+ second delay may be representative of something else

yyash karthik

Hey could you hop on a call with me and we can debug this live?

yyash karthik

Please book a call via https://cal.com/yashkarthik or lmk what time works best for you

SScotter

sounds good. booked 15 min for today

Add a reply

Helicone Community Page

Since 2024/06/06 at 11:30pm UTC cache: true on embeddings hangs for 335-340 seconds