Architecture-First: How We're Solving the AI Agent Context Problem
Every AI-native team eventually hits the same wall. Agents lie, drift, miss constraints, produce plausible-looking output that breaks against requirements they were never shown. And our instinct is always to blame the model. The real problem is almost always the problem context provided to these agents - either too much of it, too little, the wrong kind, or completely disconnected from the actual structure of the system. The fix isn't better prompting. We believe, it's a development process where your architecture is the spine everything else connects to, and your work items are the mechanism that delivers precisely scoped context to every agent at every stage. Like we say here at Workalaya, “Every agent needs exactly what it needs. Nothing more, nothing less”.
The senior engineer model
I hear this all the time - treat your AI agents like junior developers. Review their work constantly, guide them, correct them. While that is true, to grow into a really effective AI-native team, your goal shouldn’t be to manage junior engineers, it should be to create the right conditions so that they can turn into senior engineers.
A senior engineer or an architect approaching a large codebase doesn't read the whole thing. They navigate to the right service, open the relevant files, recall the interfaces they need to touch, and work. They carry architectural diagrams and mental model of the system at altitude and use it to scope what deserves close attention. Everything outside that scope stays out of active consideration, and that is by design.
That skill, knowing which context is load-bearing for a given task and which is just noise, is what separates a senior engineer from a junior one. A junior engineer's struggle with a large codebase is rarely a technical gap. It's the inability to scope attention correctly. They try to hold too much, the real problem gets buried, and the output suffers.
An agent without that scoping does exactly the same thing: produces plausible-looking output that misses constraints it was never shown. The question you have to answer before any agent touches a task is the same one a senior engineer answers instinctively, what does this task actually require?
More context makes it worse
So what does this task really require? The obvious response is to load everything in. Give the agent the full repo, let it reason across the whole codebase, and trust the context window.
The research says otherwise. Liu et al. (2024) found LLM accuracy follows a U-shaped curve across long contexts, highest at the very start and end, dropping sharply in the middle. In some conditions, adding context made performance worse than providing none at all. Levy et al. (2024) found reasoning degrading around 3,000 tokens, well below the advertised limits of any current model. Chroma's 2025 Context Rot report tested 18 models including GPT-4.1, Claude 4, and Gemini 2.5. Every single one degraded as input length grew.
The mechanism is inherently structural. Because softmax forces attention weights to sum to one, adding more tokens monotonically increases noise in the attention distribution. More tokens means more noise competing for the same attention budget. The model doesn't signal that it's struggling, it just produces worse answers. And context that doesn’t contribute to the current problem isn't neutral: Shi et al. (2023) showed that adding topically related but task-irrelevant information dropped consistent problem-solving to below 30%.
The problem is the agent doesn't know what to ignore, and not knowing that negatively impacts its output. So you, the human, have to make that decision before it touches any task.
Architecture as the spine
The solution to the context problem then is what good engineers already do, compartmentalize. Hold the system at the right level of abstraction for the work in front of you, and nothing else. The challenge is making that instinct explicit, consistent, and machine-readable.
We built on two ideas to get there. Simon Brown's C4 model gave us a clean framework for describing a system at different zoom levels, from the whole product and its external relationships, down through deployable units, logical components, and finally the code itself. Eric Evans' concept of bounded contexts gave us a principled way to define what belongs inside a given scope and what doesn't. Both influenced how we think about structure.
But we extended them. In our process, the architectural spine doesn't just carry diagrams. It carries everything, artifacts, work items, project statuses, implementation decisions. The same structure that describes your system at altitude is the structure that organizes every piece of work being done inside it. Architecture and execution share a spine. That's what makes precise context delivery possible.
You don't need a perfect codebase
The practical objection to this thinking is that this sort of context management approach requires a clean, well-modeled system, which most codebases of course aren't.
But turns out you don't need to restructure anything. You need to build context infrastructure on top of what you have. Maybe manifest files that describe what each component owns, the interfaces it exposes, and the contracts it depends on. Smart indexes that pull the right schema fragment or API contract for a specific task without loading everything. Pointers that map work items to the files and components they touch.
The goal isn't a perfect architecture. It's the ability to describe your system at the right levels of abstraction, and to encode that description as lightweight, maintainable context infrastructure. That works on any codebase, at any stage of maturity frankly.
The key insight is that the agent doesn't need your entire repo. It needs the right map.
The work item hierarchy
And the right map evolves from your architecture and how you organize your work items. Our process organizes engineering work into four levels, and each level maps to a zoom level in our architectural spine. Each carries a specific context package. There is no ambiguity about what an agent receives at any stage.
The first level captures business value, the problem being solved, the product goal, the outcome. This is the domain of product and architecture stakeholders, tied to the highest zoom level of the spin
The second level is the full cross-container product slice that delivers that value. Anything touching multiple parts of the system to produce a coherent piece of functionality. A thinker at this level needs to know which containers are involved, how they communicate, and where the integration points are. This is still human-led architectural thinking.
The third level is where agent work begins. It belongs to exactly one container, and its context package is scoped accordingly, enough of the broader system view to understand integration contracts, plus the full internal view of the target container. The agent knows its container's structure, the interfaces it must respect, and the boundaries it cannot cross
An optional fourth level breaks that work down further to the component unit inside a container. The agent at this level receives the relevant component's structure, the specific interface or data contract it is implementing, and the schema fragments it will touch, nothing above that. How this work fits into the level above it was already decided by a human. The agent isn't discovering scope. It's been given one.
This hierarchical decomposition is the process of narrowing context to exactly what the next layer of work requires. By the time a work item reaches an agent, the scope has been defined at every level above it. The agent isn't discovering what matters. It's been told, and given its own confined playground - which means it can still innovate, but inside a ‘box’.
What this produces
An AI agent is an inherently non-deterministic creature. The right context package puts it in what we call a deterministic cage, not limiting its capability, but bounding the universe in which it has to reason about. It gets what the task requires. It doesn't get what it doesn't. It can't drift into assumptions about parts of the system outside its context, because those parts were never shown to it. And that is possible via a well engineered design and development process unified via an intentional architectural spine.
This is what we teach in Workalaya Academy's AI-Native Engineering course. Six weeks, free, built for experienced engineers who want to operate at the level this moment requires.
If that's you, apply here. Cohort 1 is open.
References
- Liu, N. F. et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL.
- Levy, M. et al. (2024). Same Task, More Tokens: The Impact of Input Length on LLM Reasoning. ACL.
- Shi, F. et al. (2023). Large Language Models Can Be Easily Distracted by Irrelevant Context. ICML.
- Xiao, G. et al. (2024). Efficient Streaming Language Models with Attention Sinks. ICLR.
- Chroma. (2025). Context Rot: How Long Contexts Degrade LLM Performance.
- Brown, S. C4 Model for Software Architecture. c4model.com
- Evans, E. (2003). Domain-Driven Design. Addison-Wesley.