The Alchemist's Error: What AI Cannot Automate and Why It Matters

There is a recurring seduction in the history of human thought: that a sufficiently deep understanding of a system's underlying patterns could allow one to simply impose a desired outcome upon it, that the distance between intention and result could be collapsed entirely by those who knew enough. It is an intoxicating idea, and is by no means a product of modernity.

The Hermetic tradition, which emerged in the early centuries AD and reached its peak influence during the Renaissance, was built around this premise. Hermeticism held that the universe operated according to a set of hidden correspondences. Namely patterns that, if properly understood and invoked, could be used to effect transformation in the material world. Alchemy was its most well-known practical expression. The pursuit of the Philosopher's Stone was less about greed than it was about the belief that mastering the language of nature could make transmutation not just possible, but inevitable. The error was not in the seriousness of the inquiry, but in the assumption that intent, once refined to sufficient precision, could be executed without remainder, and that friction was merely a symptom of incomplete knowledge rather than an irreducible property of complex systems.

That assumption has never really gone away. It was the animating spirit behind the industrial revolution's promise of autonomous production, and it is the same instinct driving the current belief that AI, given the right instructions or enough iterations, can simply execute intent on behalf of a human mind. What follows is an attempt to examine why that assumption remains as mistaken as it has always been, not because the tools are not impressive, but because the problem they are being asked to solve was never one of tooling in the first place.

I. Intent Is a Property of Mind

I shall reconsider human knowledge by starting from the fact that we can know more than we can tell. — Michael Polanyi, The Tacit Dimension (1966)

The mind generates far more than the hand can produce, and that gap is not a deficiency to be engineered away but a fundamental condition of how human cognition works. We can all hear a great symphony in our minds if we try. The capacity for imagination is broadly distributed, and most people with enough exposure to music have some intuitive sense of what a compelling piece of it might feel like. But to actually write and execute that music is a skill won through sacrificing years of effort, one that requires not just the vision but an entirely separate and laboriously acquired capacity to translate that vision into a form that can exist outside the mind. The vision and the execution are not the same thing, and crucially, they do not develop together. One is nearly effortless, and the other is the work of a lifetime.

No abstraction of execution has ever subsumed the faculty that directs it. Just as the industrial revolution modularized the construction of complex objects while leaving the skill needed to design, adapt, and reason about those objects as a distinctly human burden, AI has accelerated the production of smaller building blocks in software engineering while leaving the harder problem conspicuously intact.

Consider what the actual work of software engineering in a professional setting looks like in practice. A team is asked to build a new feature, say, a reporting dashboard for a financial services client. The code itself, at this point, is almost trivially generatable; any competent engineer with access to modern tooling can produce a working implementation in a fraction of the time it would have taken five years ago. However, the code is perhaps ten percent of the problem. The other ninety percent is everything that has to happen before a single line gets written: understanding what the stakeholders actually need versus what they asked for, reconciling competing priorities across product, compliance, and engineering, determining the correct level of granularity at which to model the underlying data, anticipating the edge cases that will only become visible once the feature is in front of real users, and negotiating the inevitable gap between what was scoped and what is actually feasible within the constraints of the existing system. None of that is a problem of physical output. It is a problem of intent, and intent, it turns out, is not something that can be extracted from a prompt.

Intent, in this sense, is not the casual awareness of what you want but the deeper capacity to understand a problem with enough fidelity to know what a good solution actually looks like before you build it. Intent is generative, contextual, and irreducibly dependent on the kind of loose associative reasoning that emerges from embodied experience and continuous engagement with a problem over time. It is not a prompt, and it is not even a well-designed set of instructions. The capacity to generate those instructions, to align them correctly with the problem at hand, and to recognize when they have drifted from the original intent is itself irreducibly human. The act of formulating the instructions is where most of the value is produced; the execution that follows is, by comparison, the easy part. The Hermeticists believed that the right formula could make the universe execute their will, and were misguided by precisely the same reason that running an AI agent in a loop is not a substitute for understanding the problem it is being asked to solve.

II. The Loop Is Not a Strategy

The belief in the possibility of AI, given present computers, is the belief that all that is essential to human intelligence can be formalized. — Hubert Dreyfus, Alchemy and AI (1965)

It is worth being precise about what code actually is, because the current discourse around AI tends to obscure a distinction that matters enormously. Code is the industry-standard mechanism for expressing deterministic intent, a formal language precise enough that a machine can execute it without ambiguity, producing the same result given the same inputs every time. This is what determinism looks like in practice: given sufficient specification, the outcome is guaranteed. The formalist assumption, which is something different and more subtle, and one the Hermeticists would have recognized immediately, is the belief that the judgment required to arrive at that specification can itself be fully encoded as explicit rules or instructions, such that if you specify the what with enough precision, the how follows automatically. It is the conflation of these two things, the determinism of execution with the supposed formalizability of the reasoning required to direct it, that underlies most of the confusion about what AI can and cannot do.

Auto-regressive LLMs cannot, by themselves, do planning or self-verification... LLMs should be viewed as universal approximate knowledge sources. — Subbarao Kambhampati et al., LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks, ICML (2024)

AI systems are not a resolution of the conflation between execution and the judgment required to direct it; they are a particularly compelling instance of it. They are not unlike an extraordinarily well-read interlocutor who has absorbed an almost incomprehensible volume of text and learned, with remarkable precision, how ideas tend to follow one another. They do not understand what they have read in the way a person does; they have instead developed a finely calibrated sense of correspondence. Thi is akin to a feel for which patterns tend to co-occur in which responses tend to satisfy the implicit expectations of the person asking. Think of it less as reasoning and more as navigation: the system moves through an enormous space of possible continuations, surfacing whichever path feels most coherent with everything it has encountered before. It is a capability of genuine utility, but one that arrives at coherence the way a river finds the sea, not by understanding the destination, but by following the path of least resistance through the landscape it was shaped by.

The text layer through which we interact with these systems is not a more accessible form of specification; it is a less precise one. The same prompt will not reliably produce the same output, which is already disqualifying for any problem that requires exactness, and a set of instructions specific and complete enough to deterministically describe a solution is no longer meaningfully different from code itself. You have simply written the program in a less precise language and asked something else to translate it, which introduces ambiguity at every step. Hubert Dreyfus spent much of his career arguing against early AI researchers on exactly this point, contending that the kind of tacit, skilled expertise humans develop through experience resists complete formalization, and that assuming otherwise is a categorical error rather than merely an engineering challenge. The promise of natural language as a substitute for formal specification is, on close inspection, not a simplification of the problem but a deferral of it.

Code was never the bottleneck. The bottleneck was always gathering enough information and deriving the correct level of abstraction at which you can execute against an idea with the right level of fidelity. While failure modes can be anticipated in advance, they rarely survive contact with production environments in the form you expected them. Misalignment between the understanding of a problem and the intent behind its solution is the most common source of failure in software, and it has nothing to do with how fast the code gets written.

Philosophers have not done justice to the distinction which is quite familiar to all of us between knowing that something is the case and knowing how to do things. — Gilbert Ryle, The Concept of Mind (1949)

Producing correct software requires more than knowing what the output should look like. It requires the kind of practiced judgment that comes from deeply understanding the problem, and that judgment is not the same kind of thing as the instructions it eventually produces. Knowing that a solution needs to handle edge cases is not the same as knowing how to anticipate which edge cases will matter. Software at the level of execution is deterministic, while the reasoning required to arrive at what to execute is not, and no amount of additional instruction has ever made it so. Running AI in a loop does not change this; each iteration drifts incrementally further from the original intent, producing output that resembles a solution while quietly solving a different problem altogether, and the degradation is gradual enough to be mistaken for progress. The belief that enough iterations can substitute for human judgment is not an engineering strategy; it is the same avenue the Hermeticists explored centuries before us, pursued now with different instruments and the same underlying assumption.

III. The Prosthetic, Not the Replacement

None of this is an argument against the utility of these tools but an argument for intellectual honesty about where that utility actually lives. The discourse around autonomous agents in particular has outrun the evidence by a considerable margin. The idea that a sufficiently well-specified goal can substitute for the judgment required to pursue it is precisely the formalist assumption the previous section identified as the core error, dressed now in the language of product roadmaps and hype-driven LinkedIn ramblings rather than natural philosophy.

Agents don't always act as humans intend... when AI systems pursue goals autonomously, they can sometimes take actions that seem reasonable to the system but aren't what humans actually wanted. — Anthropic, Our Framework for Developing Safe and Trustworthy Agents (2025)

Anthropic's own research found that their engineers, among the earliest and most sophisticated adopters of these tools, reported being able to fully delegate only 0-20% of their work to AI. The real value is narrower and more specific: handling well-specified, verifiable, low-stakes work that would otherwise consume time a skilled engineer could spend thinking at a higher level, surfacing problems earlier, generating boilerplate that can be reviewed and corrected, reducing the mechanical friction of exploration. Used this way, AI is not a replacement for judgment but a tool for compressing the feedback cycles through which judgment develops. The engineer who uses it to pressure-test a hypothesis, surface edge cases they had not considered, or rapidly aggregate context before a difficult conversation is developing expertise, not outsourcing it.

The distinction turns on how you engage with the output, and to understand why it matters it is worth being precise about how memory and expertise actually work. Cognitive psychologists distinguish between two fundamentally different modes of knowledge retrieval: recognition, the ability to identify something as familiar when encountered, and recall, the ability to independently produce and apply knowledge without external cues. The distinction is not merely academic.

The key process in memory is retrieval. — Endel Tulving, Elements of Episodic Memory (1983)

Tulving's insight, foundational to modern memory research, is that encoding information and being able to retrieve it under novel conditions are separate capacities that must be developed independently. A naive answer can be just as convincing as a correct one if enough supporting signals surround it, and passive consumption of AI output is particularly dangerous precisely because it satisfies the recognition threshold without developing the recall capacity underneath it. Recognition is cheap and easily replicated by a fluent probability machine. Recall, being able to produce and apply knowledge independently under novel conditions, is what actually constitutes expertise, and it is built only through active retrieval, error correction, and repeated engagement with problems from multiple angles. Research by Riedl and Bogert, studying over 52,000 individuals, found that higher-skilled practitioners use AI feedback more productively specifically because they seek it after failures rather than successes, using it to close genuine knowledge gaps rather than to confirm what they already know.

Higher-skilled decision-makers seek AI feedback more often and are far more likely to seek AI feedback after a failure, and benefit more from AI feedback than lower-skilled individuals. — Riedl & Bogert, Effects of AI Feedback on Learning, the Skill Gap, and Intellectual Diversity, Northeastern University (2024)

The disposition that produces those outcomes is the Socratic one: treat the tool as an interlocutor to interrogate rather than an oracle to defer to, push back on its conclusions, ask for the counterargument, stress-test its outputs against your own developing judgment. Used that way, AI becomes what it is actually well-suited to be, a means of accelerating the accumulation of understanding, not a substitute for it.

I worry much more about the oversight and supervision problem than I do about my skill set specifically... having my skills atrophy or fail to develop is primarily going to be problematic with respect to my ability to safely use AI for the tasks that I care about. — Anthropic engineer, How AI is Transforming Work at Anthropic (2025)

The value of the tool is bounded by the competence of the person using it. The fundamentals of software engineering have not shifted in any meaningful way; the merit of understanding how to coordinate the pieces, how to derive intent from ambiguity, and how to recognize when a solution has quietly drifted from the problem it was meant to solve is still where an engineer's value lives. The code was never the problem. It still isn't.

Conclusion

Every wave of technological abstraction has produced the same prediction, that the new capability would reduce or eliminate the need for deep human expertise, and every wave has instead revealed a new layer of complexity that only deeper expertise could navigate. The Hermeticists pursued a formula that would make the hard work of understanding unnecessary. The practitioners who fall behind this transition will not be those who lack access to these tools but those who use them to avoid the friction of genuine inquiry, running loops instead of asking harder questions, consuming outputs instead of interrogating them. The same error, restated in the language of autonomous agents and agentic loops, is no less seductive and no less mistaken.

The alchemists never found the Philosopher's Stone, but the serious ones among them produced the foundations of modern chemistry as a byproduct of the search, because the discipline they were actually building was not the one they thought they were pursuing. Something similar is available to practitioners today. The engineers who approach AI as an instrument for compressing the distance between curiosity and understanding, between a half-formed hypothesis and a pressure-tested one, between encountering an unfamiliar problem and developing genuine fluency with it, are not falling behind the automation wave. They are using it to develop judgment faster than any previous generation could, and because their ability to learn and aggregate understanding is expedited, the quality of the work compounds alongside the volume. That is the actual leverage on offer, and it requires nothing more than a willingness to engage with these tools on honest terms, as a means of becoming more capable rather than a substitute for the effort that capability has always demanded.

Tulving, Endel. Elements of Episodic Memory. Oxford University Press, 1983.
Kambhampati, Subbarao et al. LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks. Proceedings of the 41st International Conference on Machine Learning, PMLR, 2024. proceedings.mlr.press
Riedl, Christoph and Bogert, Eric. Effects of AI Feedback on Learning, the Skill Gap, and Intellectual Diversity. Northeastern University, 2024. arxiv.org
Resurrecting Socrates in the Age of AI: A Study Protocol for Evaluating a Socratic Tutor to Support Research Question Development in Higher Education. 2025. arxiv.org
Anthropic. How AI is Transforming Work at Anthropic. December 2025. anthropic.com
Anthropic. Measuring AI Agent Autonomy in Practice. February 2026. anthropic.com
Anthropic. Our Framework for Developing Safe and Trustworthy Agents. August 2025. anthropic.com
Anthropic. Trustworthy Agents in Practice. April 2026. anthropic.com
Polanyi, Michael. The Tacit Dimension. Doubleday, 1966. University of Chicago Press · Google Books
Dreyfus, Hubert L. Artificial Intelligence. The Annals of the American Academy of Political and Social Science, vol. 412, 1974. SAGE Journals (the Dreyfus quote also appears in the original Alchemy and AI, RAND Corporation, 1965, which is not freely available online)
Dreyfus, Hubert L. What Computers Can't Do: A Critique of Artificial Reason. Harper & Row, 1972.
Dreyfus, Hubert L. What Computers Still Can't Do. MIT Press, 1992.
Ryle, Gilbert. The Concept of Mind. Hutchinson, 1949. Internet Archive · University of Chicago Press
Yates, Frances A. Giordano Bruno and the Hermetic Tradition. University of Chicago Press, 1964. Internet Archive · Wikipedia overview

The Alchemist's Error: What AI Cannot Automate and Why It Matters

I. Intent Is a Property of Mind

II. The Loop Is Not a Strategy

III. The Prosthetic, Not the Replacement

Conclusion

Continue Learning

Avoiding a Class Hierarchy Meltdown: Inheritance, Composition, and You

Yield to Greatness: Writing Smarter Code with Ruby's Enumerable

The Alchemist's Error: What AI Cannot Automate and Why It Matters