You cannot use words to convey what warm sunshine feels like on your skin.

Not the physics — the feeling. The specific quality of ease that starts at the surface and radiates inward. The way it slows your breathing. The particular warmth that is different from a heater, different from a bath, different from every other source of warmth you’ve encountered. You know this feeling so intimately you could recognize it with your eyes closed. And you cannot put it into words.

You can say “warm sunshine on my skin” to another human, and it works. But what actually happened? You didn’t describe the sensation. You couldn’t — there is no sequence of words that transmits the warmth, the weight of light, that quality of ease. What you did was something more elegant and more dangerous: you used a label to activate a memory. You pointed at shared experience and trusted that your listener had the same one.

This works between humans with remarkable reliability. It works so well that we forget it’s happening. We treat language as if it describes reality, when in fact most of the time it merely indexes it — pointing to experiences that the reader must already possess in order for the words to mean anything at all.

The Invisible Operating System essay identified a vast substrate of tacit assumptions that human civilization runs on and AI lacks. It identified a ceiling: some of that substrate is constitutively tacit, meaning it can never be made explicit regardless of how hard we try. But it didn’t explain why the ceiling exists. It cited Polanyi — “we know more than we can tell” — and moved on to the implications.

It also introduced Nate Jones’s framework for the evolution of AI input — four disciplines diverging from what we used to call “prompting”: prompt craft, context engineering, intent engineering, and specification engineering. Each operates at a different altitude. Each requires the one below it. The essay promised to return to what these disciplines demand of organizational knowledge.

This essay goes back to the ceiling and looks up. And then it follows the implications downward, into what the ceiling means for the most ambitious of Jones’s disciplines — intent engineering and specification engineering — and why the experiential structure of language sets hard limits on what even the best-specified knowledge substrate can achieve.


The Explanatory Gap

In 1974, the philosopher Thomas Nagel asked a question that looks simple and isn’t: What is it like to be a bat?

Bats perceive the world through echolocation. We can study every detail of the neurology — the ultrasonic pulses, the cochlear processing, the spatial mapping in the auditory cortex. We can build complete computational models of how bat sonar works. We can know, in the fullest scientific sense, every physical fact about bat perception.

And yet. We still don’t know what it’s like to perceive through echolocation. We don’t know the texture of that experience from the inside. All of our knowledge is about the mechanism; none of it captures the experience.

In 1983, the philosopher Joseph Levine gave this problem a name: the explanatory gap. There is a gap between any physical description of an experience and the experience itself — a gap that no amount of additional description can close. Not because our science is insufficient, but because description and experience are different kinds of thing.

Frank Jackson made the point vivid with a thought experiment. Imagine Mary, a brilliant neuroscientist who has spent her entire life in a black-and-white room. Through textbooks, monitors, and exhaustive study, she has learned every physical fact about color vision — every wavelength, every neural pathway, every photoreceptor response. She knows everything about what happens when humans see red.

Then she walks outside and sees a ripe tomato for the first time.

Does she learn something new?

Nearly everyone’s intuition says yes. She learns what red looks like. And that piece of knowledge — the experiential knowledge — was not contained in any of the propositional knowledge she had before. She had a complete description. She was still missing the experience.

This matters for our purposes because every organizational document in the world has a Mary problem. The document contains propositions. The reader supplies the experience. The meaning was never in the message. It was in the receiver. And we’ve been so successful at this division of labor — so seamlessly good at it — that we forgot the division existed.


Language as Indexing

Ludwig Wittgenstein saw this from a different angle. He asked: what makes sensation words meaningful? When you say “pain,” what gives the word its content?

The naive answer is that “pain” refers to the sensation. But Wittgenstein dismantled this with a thought experiment he called the beetle in the box. Imagine everyone carries a box with something inside it they call a “beetle.” No one can look in anyone else’s box. Everyone is sure they know what a beetle is because they have one. But the word “beetle” doesn’t get its meaning from the thing in the box — because nobody can compare beetles. The beetle “drops out of consideration.” What remains is the shared, public practice of using the word.

This seems to undermine the experiential indexing thesis at first. If the private experience doesn’t determine meaning, then language isn’t indexing experience — it’s just performing shared social practice.

But follow the thread. The shared social practice depends on the shared experience. It works because humans, by virtue of having the same kind of bodies and the same kind of nervous systems, developed the same behavioral repertoire around the same experiences. “Pain” works as a word not because it describes the quale, but because all humans who have felt pain developed similar responses to it — wincing, withdrawal, crying out — and the word grew from and into that shared behavioral landscape. The experience doesn’t determine the meaning in isolation, but it makes the social practice possible in the first place.

Now remove the shared experience. Give the word to an entity that has never been in pain, has no body, has never winced or withdrawn. The social practice that gave the word meaning does not transfer. The word arrives, but without the experiential substrate that made it work between humans, it is an empty symbol — pointing to something the receiver cannot access.

This is exactly what happens when an AI agent reads your incident response playbook and encounters the instruction “assess the severity.”


The Grounding Problem

In 1990, the cognitive scientist Stevan Harnad formalized what Wittgenstein had described philosophically. He called it the symbol grounding problem.

Imagine you speak no Chinese. Someone hands you a Chinese-to-Chinese dictionary. You look up a character. The definition is in Chinese. You look up those characters. More Chinese. You can look up characters forever and never reach meaning, because every definition is in terms of other definitions. The symbols are all defined in terms of each other. None of them are connected to anything outside the system.

This is the structural condition of every AI language model. The tokens are defined by their statistical relationships to other tokens. The model has learned, with extraordinary sophistication, how symbols relate to each other. But no symbol in the system is grounded — connected through causal interaction to the thing it refers to.

Humans escape the dictionary-go-round because our words are grounded in sensorimotor experience. We learned “red” not from a definition, but from seeing red things. We learned “heavy” from lifting heavy objects. We learned “warm” from feeling warmth. Our entire linguistic system rests on a foundation of direct sensory interaction with the world — a foundation that is simply absent in any text-processing system.

The experiential index thesis says: this isn’t just a problem for sensation words. It’s a problem for most of language.


The Metaphor Beneath Everything

George Lakoff and Mark Johnson spent forty years demonstrating that abstract thought is not abstract at all. It is embodied metaphor — built from concrete, bodily experience and projected onto concepts that have no physical referent.

We speak of understanding as seeing: “I see what you mean,” “that’s a clear explanation,” “let me shed some light on this.” We speak of control as verticality: “she’s on top of the situation,” “he’s under my authority,” “standards are rising.” We speak of difficulty as physical weight: “that’s a heavy burden,” “lighten the workload,” “this weighs on me.” We speak of progress as forward motion: “we’re moving ahead,” “the project is on track,” “we’ve hit a roadblock.”

These are not poetic embellishments. They are the structural architecture of how we think about these concepts. They are so ubiquitous that they are invisible — which is exactly what makes them dangerous for specification.

Your deployment runbook says: “Ensure the service is healthy.”

The word “healthy” is a metaphor grounded in the embodied experience of biological wellness — vitality, responsiveness, absence of distress, normal functioning. Every engineer on your team understands it, not because the word describes what “healthy” means for this specific service, but because they all have bodies that have been healthy and sick, and they project that embodied understanding onto the service. They know, from bodily experience, what “healthy” feels like. They translate.

Your AI agent has never been healthy. It has never been sick. It has no body from which to project. The metaphor doesn’t land. The word is an experiential index pointing to a shelf the agent cannot reach. So it does what any system does with an unresolvable reference: it guesses. It infers from context. Sometimes it guesses right. Sometimes it deletes your production database.

And the terrifying thing is: this isn’t a special case. This is most of the language in most of your documents.


The Taxonomy of Unspecifiability

The Invisible Operating System essay drew a line between tacit knowledge that can be made explicit (the explication project) and tacit knowledge that cannot (the constitutive ceiling). But the research behind that ceiling reveals not a single barrier but a stratified landscape — five distinct levels at which language indexes experience rather than describing it, each with different properties and different implications.

Level 1: Raw sensation. Color, pain, warmth, taste, the feeling of acceleration, the sound of thunder. Pure qualia. The paradigm cases from philosophy of mind. No specification can transmit these; the best a document can do is label them, trusting that the reader has had the experience.

Level 2: Embodied metaphor. Abstract concepts structured by bodily experience. “Healthy service,” “clean architecture,” “solid foundation,” “sharp analysis,” “deep understanding.” This is the largest category in enterprise documentation and the most tractable. The metaphors can be unpacked into operational definitions — “healthy” can be specified as “responding to health checks within 200ms, CPU below 80%, error rate below 0.1%.” The specification engine’s highest-value work lives here: detecting embodied metaphors and prompting for operational translation. The key insight is that these metaphors are translatable because the evaluation function for correctness is deterministic — you can verify whether the service meets the criteria. The metaphor obscures a testable condition.

Level 3: Emotional and social intelligence. “Read the room.” “Use good judgment.” “Handle this diplomatically.” This is where the ceiling hardens. These instructions point to a shared emotional substrate built through years of embodied social interaction — the ability to sense discomfort, to calibrate tone, to know when someone’s “fine” means “not fine.” Unlike Level 2, the evaluation function here is not deterministic. You cannot write a test for “did you read the room correctly?” because the correct answer depends on embodied social perception that is itself constitutively tacit. Documents that delegate to social intelligence are delegating to an operating system that only embodied, socially developed beings have — and there is no specification that can substitute.

Level 4: Procedural expertise. Riding a bicycle. Debugging a complex system. Negotiating under pressure. Knowing when the sourdough is ready. Polanyi’s paradigm cases — knowledge that lives in practiced neural patterns, not in propositions. “The experienced engineer will know what to look for” is a statement that indexes thousands of hours of embodied practice.

Level 5: The intersubjective substrate. Organizational culture. Team dynamics. “How things work around here.” Not individual experiences but shared experiences — the accumulated residue of a community’s history, compressed into norms, expectations, and reference points that no individual member could fully articulate but that all members can navigate. This is the Invisible Operating System proper.


What This Means

The specification gap is not a documentation problem. It is not a problem of effort, process, or tooling. It is a structural property of human language.

Language evolved as a coordination mechanism for embodied social beings who share a common biological substrate. It was never designed to be a standalone description of reality. It was designed to be a set of pointers — efficient, compressed, beautiful in their economy — that activate shared understanding in beings who already possess the relevant experiences.

For the entire history of written communication, this worked. Every reader was human. Every reader had a body. Every reader had felt warmth, navigated social situations, understood what “healthy” means from the inside. The experiential indexes resolved automatically, silently, perfectly. Nobody noticed they were there.

AI is the first reader that breaks the indexing system.

Not because AI is stupid. Not because the models are too small or the training data insufficient. Because the experiential indexes in human language are pointers to embodied experience, and a system that has no body, has never been warm, has never been sick, has never read a room, has never felt the satisfaction of a clean solution — that system processes the words but cannot dereference the pointers. It operates on symbols whose most important content is stored in a library it cannot access.


The Ceiling of Intent Engineering

In Part 1, I introduced Nate Jones’s hierarchy of AI input disciplines: prompt craft, context engineering, intent engineering, specification engineering. Each builds on the one beneath it. Each requires the one below it.

What the experiential index thesis reveals is that each discipline in the hierarchy hits a harder ceiling than the one below it — because each higher discipline relies on more experientially loaded language. Context engineering is the most tractable — most context is propositional (“This is a production environment. The database is Postgres 16.”). Specification engineering is where embodied metaphor becomes dangerous (“Ensure the service is healthy” seems complete but resolves through embodied understanding the agent doesn’t have). Much can be translated. Some cannot.

Intent engineering hits the hardest ceiling. It answers: what does the organization want? And organizational intent is the most experientially saturated category of all.

Consider Jones’s paradigm case: Klarna. Their AI agent resolved 2.3 million conversations in the first month. Slashed resolution times. Projected $40 million in savings. Then customer satisfaction cratered — because the agent was optimizing for speed when the organizational intent was relationship quality.

“Relationship quality” is an experiential index operating at Levels 3 and 5 simultaneously. It indexes the embodied experience of what a good relationship feels like (Level 3: emotional intelligence — warmth, attentiveness, the felt sense that someone cares) and the intersubjective organizational understanding of what “quality” means in Klarna’s specific culture, for Klarna’s specific customers, given Klarna’s specific history (Level 5: the intersubjective substrate).

No specification document could have prevented the Klarna trap — not because the document writers were lazy, but because the intent itself was an experiential index. The humans who handled those customer interactions carried the intent in their bodies: the felt sense of when a conversation needed to slow down, when empathy mattered more than efficiency, when the customer’s tone shifted from irritation to distress. They didn’t consult a document. They read the room. They used embodied social intelligence calibrated by years of human interaction.

Jones’s “6 Reasons Your Work Is Hard” framework identifies the axes of difficulty: reasoning, effort, coordination, emotional intelligence, judgment, domain expertise, ambiguity. The experiential index thesis explains why these axes automate on different timelines. Reasoning and effort are largely propositional — they can be specified. Emotional intelligence and judgment are experiential indexes pointing to embodied understanding. They don’t automate on a different timeline because they’re “harder” in some generic sense. They resist automation because the language we use to describe them is not description at all — it’s shorthand for experiences that only embodied social beings have.


What Can Be Built

This does not mean specification is futile. It means specification must be understood as translation — the conversion of experiential indexes into propositional content that doesn’t require embodied experience to interpret.

Some translations are straightforward and enormously valuable. “Ensure the service is healthy” → “Verify that the /health endpoint returns 200 within 200ms, CPU utilization is below 80%, and error rate over the trailing 5 minutes is below 0.1%.” The embodied metaphor is replaced by operational criteria. An AI agent can now act on this with precision. Wherever the evaluation function for correctness is deterministic — wherever the translation can be tested — specification works.

Some translations are possible but require human judgment that a specification engine can prompt for. “Use appropriate communication” → “What does ‘appropriate’ mean in this context? For this audience? At this level of escalation?” The engine can’t answer the question. But it can ask it, surface related documents that may contain the answer, and draft the specification from the human’s response. This is explication — the systematic practice of converting experiential indexes into propositional content, one question at a time.

And some translations are impossible — not because we haven’t tried, but because the content being indexed is constitutively experiential. No amount of words can transmit what it feels like to debug a system at 3 AM with production down and customers angry. That experience shapes how an engineer reads every runbook in your organization, and no document can capture it. The honest response is to measure these — to know how much of a document is propositional content an AI can act on versus experiential index it can only guess at — and to tell the AI agent explicitly where its understanding ends.

The Invisible Operating System described what breaks when AI enters a world built for humans. This essay explains why the break is structural — why the ceiling exists and what the taxonomy of unspecifiability looks like from beneath it. Part 2 will examine what we build in response: infrastructure designed with honest awareness that language is an indexing system, not a description system, and that the most important organizational knowledge lives in a layer that no document can fully capture — but that a specification engine can measure, partially translate, and honestly flag.

That is what it means to make documents honest about their own limitations. Not perfect. Honest.