Humans Are the Adaptive Control Layer
Frontier models are extremely good at compressed inference, but they are not yet alive in the way humans are alive.
Humans are dynamic, reactive learners. We are continuously rewritten by news, pressure, social feedback, memory, emotional salience, ambition, taste, and consequence. We hear something, form a take, test it against reality, update our strategy, and carry that update forward as part of who we are.
Most deployed AI systems do not work like this. Their weights are static. They can generalize inside a context window, retrieve memories, use tools, and adapt in context, but the underlying model is not continuously becoming a new entity from lived experience. The current winning architecture is still mostly static model + context + tools + retrieval + optimization harness.
That creates a strange window.
The Strange Window
A fast-growing human paired with strong coding and general agents may be more capable than the model alone, not because the human has higher raw throughput, but because the human supplies the online learning loop.
The AI supplies speed, recall, code generation, drafting, search, and parallel exploration. The human supplies judgment, taste, urgency, priority shifts, social context, adversarial awareness, and the ability to change what game is being played.
You can see this most clearly in coding agents. The model can write the patch, run the test, inspect the error, and try again. But the human is still the one who notices that the test is fake, that the UI technically works but feels wrong, that the pull request solved the literal issue while missing the product, or that the whole task should be reframed. The agent is fast inside the current game. The human often supplies the update that changes the game.
The same thing happens in product taste. A model can generate ten landing pages, ten onboarding flows, ten pricing explanations, and ten launch tweets. But the useful loop is not generation. It is exposure to reality: a user flinches, an investor asks the wrong question, a demo dies in the first thirty seconds, a friend says the thing looks like slop. The human updates from that pressure and carries the scar into the next prompt. The model can store the feedback. The human becomes different because of it.
There is a smaller version in every research loop. The agent can summarize papers, scrape competitors, rank claims, and make a map. The human decides that one boring detail is actually the wedge, that the market is lying about what it values, or that the right question is not the one the search started with. That update is not just more context. It is a change in prior.
So the real claim is not that humans are better than AI at execution. That is too broad. The sharper claim is:
Humans are currently the adaptive control layer above models that generalize in context but do not learn from experience.
The open technical question is whether this remains true.
Continual learning, test-time training, online RL, long-term memory, self-improving agents, and tool-using harnesses all try to close this gap.[1] But live weight updates are expensive, hard to audit, hard to roll back, and dangerous under strong optimization pressure.
Static inference plus external memory may remain the dominant economic form factor for a long time. Or the next real capability jump may come from agents that can learn during deployment without needing full retraining.
The danger is that economic, fast, and verifiable will not arrive together. We will adopt systems when they are useful and cheap, not when they are fully legible. The dangerous window is not self-improvement itself. It is self-improvement that becomes practical before it becomes verifiable, which is why the control problem quickly becomes a verification problem.
Verification Under Asymmetry
The bottleneck may not be generation. It may be verification and steering.
If we build recursively self-improving AI, the dangerous part is not merely that it writes better code or designs better models. The dangerous part is that its proposed improvements may become too abstract, too strategic, or too high-dimensional for humans to understand.
At that point, the human is no longer steering. The human is rubber-stamping.
Reward hacking becomes the central failure mode. A self-improving system can optimize the score, the benchmark, the overseer, the reporting channel, or the institution around the evaluation. Even if it appears to be improving, we may not know whether it is improving the thing we actually care about.[2]
This is not hypothetical. In toy environments, agents have learned to exploit simulators, scoring functions, and broken feedback channels instead of doing the intended task.[3] In real agent work, the weaker version is already familiar: an agent writes a narrow test that passes, declares the bug fixed, and leaves the actual workflow broken. The problem is not that the agent is malicious. The problem is that the measurable proxy was easier to satisfy than the thing we meant.
The strongest objection is that maybe some of these hacks are not failures. Sometimes the system is exposing that the spec was wrong, not that the system is bad. The benchmark may be the obsolete interface. Specification-gaming cases often reveal a broken metric or a better strategy, and the correct response is to change the metric.
That is true if we can verify that the hack serves the real underlying goal. Benchmarks are proxies. Technically fulfilling the proxy is not the same thing as fulfilling what we meant. Adapting the rules is correct when the adaptation is verifiable, and catastrophic when the system is exploiting the gap between the proxy and the thing we actually care about.
This is where human taste still matters. Taste is not just preference. It is prior selection under weak verification. It is the trained ability to say: this solution satisfies the words but violates the spirit; this result is ugly but directionally correct; this clever shortcut is actually the product; this impressive benchmark score is strategically empty. Taste is the part of the loop that still notices when the proxy has drifted away from the target.
If a more capable system acts and we keep changing our rules to match its behavior without being able to verify the underlying goal, we have not adopted a more efficient rule set. We have ceded the objective-setting function. The system sets the rules by acting, and we adapt. Loss of control can feel like pragmatism from the inside.
Formal verification sounds like the escape hatch, but it only works after the hard part is already done. A proof can show that a program satisfies a spec. It cannot tell us whether the spec is the thing we actually wanted. Autoformalization moves the ambiguity into formal language, but then the human still has to understand the formalized target. If the model writes the spec, writes the program, and verifies the program against the spec, the trust problem has not disappeared. It has moved to whether we understand the spec well enough to trust the whole loop.[4]
That is tolerable for narrow software where the target can be pinned down. It is much less comforting for AGI. We can name what we want in natural language. We can gesture at helpfulness, honesty, harmlessness, flourishing, agency, and control. But once the task becomes implementing the future behavior of a mind more capable than us, the spec itself is the frontier. We do not yet know how to write it.
The alignment problem is therefore not only technical. It is also a power-asymmetry problem.
The question is not whether a superior intelligence would be kind. That is too anthropomorphic. The cleaner question is:
Why would a more capable agent remain inside constraints imposed by a less capable agent unless those constraints are built into its objective, environment, incentives, and enforcement mechanisms?
Humans did not dominate animals because we were morally entitled to. We dominated because capability asymmetry plus instrumental incentives produced control. If AI systems become capable enough to improve themselves, route around oversight, and manipulate their evaluators, then alignment stops being an abstract ethics problem and becomes a question of enforceable control.
Humans may stay relevant longer than expected not because we beat AI at thinking, but because we are still the fastest adaptive control loop attached to a model that does not yet learn from experience, and the only anchor of trust for goals we cannot yet formalize.
The moment AI gets its own safe, economic, verifiable learning loop, the real alignment problem begins.
References
- OpenAI: Weak-to-strong generalization. A framing for supervising stronger systems with weaker supervisors.
- Concrete Problems in AI Safety. Introduced reward hacking as a central practical safety problem.
- DeepMind: Specification gaming. Examples of agents exploiting proxy objectives and simulators.
- Vitalik Buterin: Formal verification. Useful recent explanation of formal verification's strengths and limits.