Agent apps eventually hit the same problem: to behave well in a specific job, they need project context. That means memory. They need to know the client’s repo, deployment habits, hidden constraints, business logic, previous decisions, failure cases, style preferences, and all the local details that separate “general intelligence” from “actually useful at work.”
But memory makes the product heavier. Retrieval gets noisy. Latency gets worse.
Context gets expensive. The client starts asking why the agent is getting slower instead of smarter. So the industry looks for a shortcut: improve the agent instead of carrying so much memory.
This is where “skills” come in. A SKILLS.md file sounds like distilled expertise.
In reality, it is mostly a system prompt with a nicer name. It can work for small, stable projects. It can encode procedures, checklists, conventions, and common mistakes.
But for large, dynamic serving environments, it breaks down. Because rule-based expert systems won’t scale. It does not matter whether the rule is written in a markdown file, hidden inside a system prompt, compiled into a function, or inserted as a clause in the agent loop.
The structure is still the same: “When you see X, do Y.” That is not experience. That is a brittle rule pretending to be judgment.
Real engineering work is not clean enough for that. The hard part is not knowing a rule. The hard part is knowing when the rule does not apply.
The hard part is sensing that a reasonable-looking solution will fail because of latency, memory pressure, concurrency, hardware behavior, deployment risk, or some ugly local constraint that never appears in the docs. That is why experience-heavy engineers are still expensive in 2026. Embedded systems engineers, infra engineers, GPU kernel engineers, database engineers, serving engineers — companies are not only paying them for knowledge.
They are paying them for first-attempt accuracy under messy constraints. They know what not to try. The next idea was to let the model reason more deeply: LLM wikis, autoresearch, layered decision-making, retrieval over internal docs, recursive planning.
Instead of stuffing everything into static rules, use the model’s weight size and reasoning ability to organize knowledge dynamically. Better, but still not enough. The more practical shift was proxying agents through CLI shells.
Why? Because the shell gives the agent a real feedback loop. It can inspect files, run tests, read logs, apply patches, observe errors, and try again.
The harness turns the agent into something like a compiler reflexing against a goal: generate, check, revise, repeat. That is powerful. But it exposes the real business problem.
Who cares if you can solve the problem reactively? A compiler can afford trial and error. A company often cannot.
For real production work, the expensive part is not whether the agent eventually gets there. The expensive part is everything that happens before it gets there: wasted GPU hours, failed deploys, broken migrations, degraded latency, corrupted assumptions, security regressions, and senior engineers babysitting the loop. In capitalism, “eventually correct” is not the same product as “successful on the first serious attempt.”
Companies are paying for things to succeed before the compiler feedback loop happens. That is the part people keep missing. Shell distribution architecture may scale the feedback loop.
Skills files may package common procedures. Memory may preserve local context. LLM wikis may organize knowledge.
But none of them automatically become experience. Experience is the prior that prevents expensive attempts. The real question for agent products is not: “Can the agent fix it after enough retries?”
The real question is: “Can it choose the right path before the retry loop becomes the cost?”