In JS land, this problem (streaming, resuming, recovering, multi-client, etc) has been fully solved by https://durablestreams.com - and it can be self-hosted, or managed via Cloudflare DO.
> Stop reading here if you just wanted the how-to. Because I’m going to talk about what I think is better, and that is probably too ‘commercial’ for some folks.
> I work for Ably, and I’m building a dedicated transport for AI applications that...
It's honest. If they genuinely think it's better it's fair to say so. The article up to that point seems well written over the domain (I've solved much of the same set of problems.)
I built this Clojure lib for robust high scale LLM calls wherein the consumer is usually a http request waiting on an SSE stream. https://github.com/jhancock/aimee
The article states: "Most applications are built on an architecture like the one above, where there are a number of stateless horizontally scaleable server replicas that can handle client requests."
Using the library I built, I have yet to worry about this as Clojure core.async, http libs and Java VM are so rock solid, I don't have a fragile set of stateless servers. Sure, at some point there are rare edge cases but it's nice to get very far along without worrying about them.
> HTTP is just not a good transport for streaming LLM tokens and for building async agentic applications
I don't know if I agree if this is a problem with SSE or HTTP. Something like a Redis Streams-backed SSE would solve most of the 'challenges' presented in the post.
In JS land, this problem (streaming, resuming, recovering, multi-client, etc) has been fully solved by https://durablestreams.com - and it can be self-hosted, or managed via Cloudflare DO.
> Stop reading here if you just wanted the how-to. Because I’m going to talk about what I think is better, and that is probably too ‘commercial’ for some folks.
> I work for Ably, and I’m building a dedicated transport for AI applications that...
It's honest. If they genuinely think it's better it's fair to say so. The article up to that point seems well written over the domain (I've solved much of the same set of problems.)
Should be at the top of the post, not at the end.
Agreed, I found it informative and did appreciate the reading, even with the assumption it was AI generated
I built this Clojure lib for robust high scale LLM calls wherein the consumer is usually a http request waiting on an SSE stream. https://github.com/jhancock/aimee
The article states: "Most applications are built on an architecture like the one above, where there are a number of stateless horizontally scaleable server replicas that can handle client requests."
Using the library I built, I have yet to worry about this as Clojure core.async, http libs and Java VM are so rock solid, I don't have a fragile set of stateless servers. Sure, at some point there are rare edge cases but it's nice to get very far along without worrying about them.
Im not sure what you mean by fragile stateless servers. If they are stateless, what is fragile about them?
> HTTP is just not a good transport for streaming LLM tokens and for building async agentic applications
I don't know if I agree if this is a problem with SSE or HTTP. Something like a Redis Streams-backed SSE would solve most of the 'challenges' presented in the post.