1 comments

  • yogthos 7 hours ago ago

    My general thesis here is that context rot and other problems agents often end up exhibiting largely stems from the way we structure code, which is not conducive towards LLMs. Even small models that you can run locally are quite competent writing small chunks of code, say 50~100 lines or so. And any large application can be broken up into smaller isolated components.

    In particular, we can break applications up by treating them as state machines. For any workflow, you can draw out a state chart where you have nodes that do some computation, and then the state transitions to another node in the graph. The problem with traditional coding style is that we implicitly bake this graph into function calls. You have a piece of code that does some logic, like authenticating a user, and then it decides what code should run after that. And that creates coupling, cause now you have to trace through code to figure out what the data flow actually is. This is difficult for agents to do because it causes context to quickly grow in unbounded way, leading to context rot. When an LLM has too much data in its context, it doesn’t really know what’s important and what to focus on, so it ends up going off the rails.

    But now, let’s imagine that we do inversion of control here. Instead of having each node in the state graph call each other, why not pull that logic out. We could pass a data structure around that each node gets as its input, it does some work, and then returns a new state. A separate conductor component manages the workflow and inspects the state and decides which edge of the graph to take.

    The graph can be visually inspected, and it becomes easy for the human to tell what the business logic is doing. The graphs don’t really have a lot of data in them either because they’re declarative. They’re decoupled from the actual implementation details that live in the logic of each node abstracted over by its API.

    Going back to the user authentication example. The handler could get a parsed HTTP request, try to look up the user in the db, check if the session token is present, etc. Then update the state to add a user or set a flag stating that user wasn’t found, or wasn’t authenticated. Then the conductor can look at the result, and decide to either move on to the next step, or call the error handler.

    Now we basically have a bunch of tiny programs that know nothing about one another, and the agent working on each one has a fixed context that doesn’t grow in unbounded fashion. On top of that, we can have validation boundaries between each node, so the LLM can check that the component produces correct output, handles whatever side effects it needs to do correctly, and so on. Testing becomes much simpler too, cause now you don’t need to load the whole app, you can just test each component to make sure it fills its contract correctly.

    What’s more is that each workflow can be treated as a node in a bigger workflow, so the whole thing becomes composable. And the nodes themselves are like reusable Lego blocks, since the context is passed in to them.

    This whole idea isn’t new, workflow engines have been around for a long time. The reason they don’t really catch on for general purpose programming is because it doesn’t feel natural to code in that way. There’s a lot of ceremony involved in creating these workflow definitions, writing contracts for them, and jumping between that and the implementation for the nodes. But the equation changes when we’re dealing with LLMs, they have no problem doing tedious tasks like that, and all the ceremony helps keep them on track.