Hey yeah sure -
So basically there is an issue when we pull the stream in from OpenAI on our workers where the chunks are coming in "half chunked" so we basically only receive the first few characters of the chunk. So we have this force format which basically views the chunks and checks if the chunk is less than 50 characters we append the next chunk.
This was a hack and not well tested which is why it is behind a flag.
if (
proxyRequest.requestWrapper.heliconeHeaders.featureFlags.streamForceFormat
) {
let buffer: any = null;
const transformer = new TransformStream({
transform(chunk, controller) {
if (chunk.length < 50) {
buffer = chunk;
} else {
if (buffer) {
const mergedArray = new Uint8Array(buffer.length + chunk.length);
mergedArray.set(buffer);
mergedArray.set(chunk, buffer.length);
controller.enqueue(mergedArray);
} else {
controller.enqueue(chunk);
}
buffer = null;
}
},
});
body = body?.pipeThrough(transformer) ?? null;
}
You can see the TransformStream here. I think a better approach is to format the stream with a known regex and keep appending the chunks until the regex is matched and then send it back.
Sorry for the issue, I tried digging into this at some point in history as to why this happens in the first place, and I couldn't figure out why. We do not manipulate or call OpenAI any differently, so I am pretty confused why this happens.
The OpenAI python and typescript libraries actually handle this case, so we never gave it too much attention. it is only for folks that are using 3rd party libraries like Ruby or hand rolling their own SDK.
In the meantime, I added a new issue to improve the chunking merge strategy
:
https://github.com/Helicone/helicone/issues/731