Helicone Community Page

Updated 3 months ago

Partial chat stream chunks when using Helicone w/ Ruby's OpenAI Gem

Hey Helicone team, I wanted to bring this to your attention. Ruby dev's are finding that using Helicone as a proxy is interfering with the chat stream. https://github.com/alexrudall/ruby-openai/issues/251
2
a
C
S
11 comments
Taking a look! Will lyk when we have any updates
Looking into this, was able to recreate it
Attachment
Screenshot_2023-08-15_at_3.34.51_PM.png
I was able to fix it by adding this header:
"helicone-stream-force-format" => "true"

Could you give that a shot and let me know if that works for you?
Sure. I’ll give it a try tomorrow. What’s happening here behind the scenes?
It works. I'd be curious to hear more about it on the GH issue
Putting this in production. I'm still seeing some ocassional tokens dropped out of the stream. I'm not 100% sure
Thanks for this , , !
would you be able to elaborate on the helicone-stream-force-format header?
Hey yeah sure -

So basically there is an issue when we pull the stream in from OpenAI on our workers where the chunks are coming in "half chunked" so we basically only receive the first few characters of the chunk. So we have this force format which basically views the chunks and checks if the chunk is less than 50 characters we append the next chunk.

This was a hack and not well tested which is why it is behind a flag.

Plain Text
  if (
    proxyRequest.requestWrapper.heliconeHeaders.featureFlags.streamForceFormat
  ) {
    let buffer: any = null;
    const transformer = new TransformStream({
      transform(chunk, controller) {
        if (chunk.length < 50) {
          buffer = chunk;
        } else {
          if (buffer) {
            const mergedArray = new Uint8Array(buffer.length + chunk.length);
            mergedArray.set(buffer);
            mergedArray.set(chunk, buffer.length);
            controller.enqueue(mergedArray);
          } else {
            controller.enqueue(chunk);
          }
          buffer = null;
        }
      },
    });
    body = body?.pipeThrough(transformer) ?? null;
  }


You can see the TransformStream here. I think a better approach is to format the stream with a known regex and keep appending the chunks until the regex is matched and then send it back.

Sorry for the issue, I tried digging into this at some point in history as to why this happens in the first place, and I couldn't figure out why. We do not manipulate or call OpenAI any differently, so I am pretty confused why this happens.

The OpenAI python and typescript libraries actually handle this case, so we never gave it too much attention. it is only for folks that are using 3rd party libraries like Ruby or hand rolling their own SDK.

In the meantime, I added a new issue to improve the chunking merge strategy
: https://github.com/Helicone/helicone/issues/731
interesting. if you happen to know where in the source python/TS libs handle this then maybe we can port over some of that solution
This is a great idea! I can dig back into it, it’s been a while since I’ve done so
Add a reply
Sign up and join the conversation on Discord