I've narrowed it down to a very odd behavior that only exists when the cache header is set to true.
To replicate, try out these two different curl requests (and make sure to keep the formatting intact):
Normal expected behavior:
curl -X "POST" https://oai.hconeai.com/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-XXX' \
-H 'OpenAI-Organization: org-XXX' \
-H 'Helicone-Auth: Bearer sk-XXX' \
-H 'Helicone-Cache-Enabled: true' \
-d $'{
"model":"text-embedding-ada-002",
"input":"Can I drink alcohol while pregnant?"
}'
Unexpected behavior with same header content (slowly streamed response):
curl -X "POST" https://oai.hconeai.com/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer sk-XXX' \
-H 'OpenAI-Organization: org-XXX' \
-H 'Helicone-Auth: Bearer sk-XXX' \
-H 'Helicone-Cache-Enabled: true' \
-d $'{"model":"text-embedding-ada-002","input":"Can I drink alcohol while pregnant?"}'