Partial chat stream chunks when using Helicone w/ Ruby'...

SScotter

Hey Helicone team, I wanted to bring this to your attention. Ruby dev's are finding that using Helicone as a proxy is interfering with the chat stream. https://github.com/alexrudall/ruby-openai/issues/251

11 comments

aayoKho

Taking a look! Will lyk when we have any updates

CCole

Looking into this, was able to recreate it

Attachment

CCole

I was able to fix it by adding this header:
"helicone-stream-force-format" => "true"

Could you give that a shot and let me know if that works for you?

SScotter

Sure. I’ll give it a try tomorrow. What’s happening here behind the scenes?

SScotter

It works. I'd be curious to hear more about it on the GH issue

SScotter

Putting this in production. I'm still seeing some ocassional tokens dropped out of the stream. I'm not 100% sure

aalexr

Thanks for this , , !

CCole

would you be able to elaborate on the helicone-stream-force-format header?

JJustin

Hey yeah sure -

So basically there is an issue when we pull the stream in from OpenAI on our workers where the chunks are coming in "half chunked" so we basically only receive the first few characters of the chunk. So we have this force format which basically views the chunks and checks if the chunk is less than 50 characters we append the next chunk.

This was a hack and not well tested which is why it is behind a flag.

Plain Text

  if (
    proxyRequest.requestWrapper.heliconeHeaders.featureFlags.streamForceFormat
  ) {
    let buffer: any = null;
    const transformer = new TransformStream({
      transform(chunk, controller) {
        if (chunk.length < 50) {
          buffer = chunk;
        } else {
          if (buffer) {
            const mergedArray = new Uint8Array(buffer.length + chunk.length);
            mergedArray.set(buffer);
            mergedArray.set(chunk, buffer.length);
            controller.enqueue(mergedArray);
          } else {
            controller.enqueue(chunk);
          }
          buffer = null;
        }
      },
    });
    body = body?.pipeThrough(transformer) ?? null;
  }

You can see the TransformStream here. I think a better approach is to format the stream with a known regex and keep appending the chunks until the regex is matched and then send it back.

Sorry for the issue, I tried digging into this at some point in history as to why this happens in the first place, and I couldn't figure out why. We do not manipulate or call OpenAI any differently, so I am pretty confused why this happens.

The OpenAI python and typescript libraries actually handle this case, so we never gave it too much attention. it is only for folks that are using 3rd party libraries like Ruby or hand rolling their own SDK.

In the meantime, I added a new issue to improve the chunking merge strategy
: https://github.com/Helicone/helicone/issues/731

SScotter

interesting. if you happen to know where in the source python/TS libs handle this then maybe we can port over some of that solution

JJustin

This is a great idea! I can dig back into it, it’s been a while since I’ve done so

Add a reply

Helicone Community Page

Partial chat stream chunks when using Helicone w/ Ruby's OpenAI Gem