← Back
AItasteengineeringessay

Taste Was Always the Job

A hand selecting a pocket watch from scattered clock parts and gears on a workbench

Slavoj Žižek has this thought experiment about sex toys. You take a vibrator and a fleshlight, plug them into each other, turn them both on, and let them go at it. Meanwhile the two actual humans sit at a nearby table, drinking tea and having a real conversation.[1]

The machines are doing it for us, buzzing in the background, and I'm free to do whatever I want... we have a nice talk; we have tea; we talk about movies. I talk with a lady because we really like each other.

I keep thinking about this in the context of AI and work. We've spent decades optimizing for the wrong thing: the mechanical part, the execution, the buzzing. And now that machines can handle the buzzing, we're sitting at the table for the first time, realizing the conversation is what mattered all along. Some of us are discovering we're great conversationalists. Others are discovering they never learned how to talk.

In February 2026, Paul Graham posted that taste is now the core differentiator.[2] Sam Altman told Fortune that even non-technical people can contribute to AGI teams if they have taste.[3] Suddenly taste was the word on everyone's lips. But the framing was wrong. Taste isn't becoming the job. It was always the job. We just couldn't see it through the fog of execution.


The Fog of Execution

For most of human history, execution was so expensive that it obscured what actually mattered.

  • A monk hand-copying a manuscript in 1440 couldn't afford to ask whether the book was worth reading. He had six months of lettering ahead of him.
  • A portrait painter in 1830 couldn't question whether the subject was worth depicting. The commission paid rent.
  • A recording engineer in 1955 couldn't waste studio time wondering if the song was any good. Professional tape cost $50 a reel, roughly $570 in today's dollars.[4]

The cost of making something was so high that the question of whether you should make it was a luxury.

1440
Printing Press
Execution cost ↓
Book production: months → days. Cost per copy drops ~80%.
Taste demand ↑
The press didn't make editors obsolete. It invented them. Publishers, literary critics, and curators emerged because someone had to decide what was worth the ink.
1839
Photography
Execution cost ↓
Capturing a likeness: weeks of portrait sitting → seconds of exposure.
Taste demand ↑
Everyone said painting was dead. Instead, painters freed from representational labor became artists. Impressionism, Expressionism, and Cubism all followed within decades.
1920s
Recording & Radio
Execution cost ↓
Music distribution: limited to venue capacity → infinite broadcast reach.
Taste demand ↑
A&R executives, producers, label curators. An entire industry built around a single question: what deserves to be heard?
1985
Desktop Publishing
Execution cost ↓
Typesetting: $500/page specialists → anyone with a Macintosh and LaserWriter.
Taste demand ↑
99% of flyers were hideous. Graphic design became a real profession specifically because execution without taste produced garbage at unprecedented scale.
2005
YouTube & Blogging
Execution cost ↓
Publishing: broadcast licenses and editorial gatekeepers → a WordPress install and a webcam.
Taste demand ↑
500 hours of video uploaded per minute by 2025. The people who built audiences were editors and curators, not the fastest typists.
2025
AI Agents
Execution cost ↓
Software: teams of engineers, months of sprints, millions in payroll → an afternoon with an agent.
Taste demand ↑
This is where we are now.
swipe →

Then the cost dropped. And every single time, the same pattern emerged: the technology didn't eliminate the need for human judgment. It created entirely new professions dedicated to it. The people who thrived weren't the ones who could operate the new technology fastest. They were the ones who could decide what the technology should be pointed at.

This is Jevons Paradox applied to creative and knowledge work.[5] When coal got cheaper in the 1860s, England didn't use less coal; total consumption increased because cheaper energy made more applications economically viable. Resistance to this idea was fierce. People assumed efficiency would reduce total demand. Instead it exploded it, because the constraint had been masking latent demand that was always there.

The same thing is happening with execution. When building software gets 10x cheaper, we don't build 1x the software with 10x less effort. We build 100x the software, and the question of whether any of it is good becomes the entire question.

1B
AI-assisted code contributions on GitHub in 2024
4x
YC W25 applications vs. prior year
$2T
SaaS market cap at risk from AI commoditization

GitHub reported 1 billion AI-assisted code contributions in 2024, up from effectively zero two years before.[7] Y Combinator's W25 batch drew 4x the applications of the year prior.[6] Morgan Stanley flagged $2 trillion in SaaS market cap at risk from AI-driven commoditization of software.[11] The supply of software is exploding. The supply of good judgment about what software should exist has not changed.

Every drop in the cost of execution is a corresponding increase in the demand for taste. For someone who can look at the flood of output and say this one matters, that one doesn't, and here's why. Taste was always the job. We're just finally able to see it.


The Thirty-Year Window

We had a specific, historically unusual window (roughly 1995 to 2025) where you could build an entire career on execution alone. Could you get the code to compile? Could you ship on time? Could you scale to a million users without the servers falling over? The difficulty of execution created a fog thick enough that we rarely asked the more important question: should this thing exist at all? Is it good? Does anyone actually need it?

I've watched this fog lift in real time. A year ago, building a feature meant weeks of scoping, architecting, debugging, deploying. The process was so consuming that we rarely paused to ask whether the feature was right because we were too deep in the work of making it function. Now an agent builds it in an afternoon. The backlog that used to represent months of work gets cleared in days. And suddenly you're face-to-face with the question you'd been too busy to ask: is this actually good?

I now write 95% of my code from my phone. I'm mass-producing software. I'm often mentally exhausted by 11am.

Simon Willison, co-creator of Django, describes this shift viscerally.[8] The cognitive load didn't decrease when AI took over the typing. It shifted. From the mechanical act of writing code to the much harder work of deciding what code should be written, evaluating whether the output is correct, and directing the next iteration. The exhaustion is real, but it's a different kind of exhaustion: judgment, not labor.

And the squeeze isn't evenly distributed:

  • Senior engineers amplify deep experience through agents. They thrive.
  • Juniors onboard faster than ever. AI closes the knowledge gap.
  • Mid-career engineers, the ones whose value was reliable execution, face the greatest pressure.

But the real shift isn't about individuals. When every company can spin up infinite AI employees, throughput stops being the constraint. The speed at which one person can go alone asymptotes. Institutions have to relearn how to go far together:

  • What work actually matters?
  • How do you review AI output at scale?
  • How do you build trust in decisions you didn't make?
  • How do you train people when the work keeps changing?
  • How do you redesign organizations around a surplus of intelligence bottlenecked by judgment?

This is the part most "taste" discourse misses. It's not just about individual discernment. Meaningful leverage under these conditions isn't about how much one person or one organization can produce. It's about how much context people, teams, and institutions can coordinate across humans and agents. The bottleneck has moved from doing the work to deciding which work matters, and from individual decisions to institutional judgment.


How Taste Is Actually Built

So taste is the job. But how do you actually develop it? That depends on what you think taste is. The discourse has fractured into at least five positions:

What taste meansWho says itImplication
Choosing what to makePaul Graham, Greg Brockman[2]Selection is the skill. Build the right thing.
Choosing what not to makeEric De Castro[13]Restraint is the moat. Saying no is harder than saying yes.
Trained instinctEmil Kowalski[9]Learnable through exposure, analysis, practice.
Pattern recognition (AI can learn it)Paras Chopra, Nan Yu (Linear)[14]If taste is just good judgment, models will get there.
Conviction, not tasteJulie Zhuo[15], Ivan Zhao[16]Taste-as-prediction is trainable. Will is the real differentiator.

These aren't contradictory. They're describing different layers of the same thing. Selection, restraint, instinct, judgment, conviction. The question is which layer matters most when AI can already handle the first few. I think the answer is all of them, in sequence, and we're currently watching AI climb the stack from the bottom.

But here's what they all agree on, even if they don't say it directly: taste was always the job. It was always underneath, always the thing that separated the best work from the functional. Execution just made it invisible.

Here's another way to see it. For any system you want to build, there are a million valid implementations. Different architectures, abstractions, tech stacks, tradeoffs. Most of them work. Very few are good. Taste is the prior that lets you navigate to the coherent region instead of sampling from "things that compile." That prior is built from years of accumulated context: shipping the wrong abstraction, optimizing for the wrong metric, building the feature users asked for instead of the one they needed.

start →iterations →

Same tools, same AI. One path ships in 5 steps. The other is still firefighting at 9.

So how do you build it? Emil Kowalski has a useful analogy: when the first car came out, nobody cared about color or silhouette because the competition was a horse.[9] Basic transportation was the miracle. But once cars were everywhere, design became the differentiator, because the functional problem was solved. Software is at this exact inflection point. Shipping something that works is no longer impressive. An agent can do that. The question is whether what you shipped is worth using.

Kowalski's framework is three things:

  1. Surround yourself with great work. Deliberate exposure to the best in your field.
  2. Think critically about why you like it. Not "I like this design" but "this design works because the hierarchy guides my eye from the value prop to the CTA without me having to think about where to look next."
  3. Practice your craft relentlessly. Close the gap between your judgment and your output.

Taste isn't inborn preference. It's trained instinct.

I've noticed this pattern in my own work and it's been surprisingly direct. The weeks I spend reading great essays, studying products I admire, analyzing why a specific interface or API feels right, those are the weeks I make noticeably better decisions about what to build and how to build it. The weeks I'm heads-down executing without pausing to look up, I ship more but the work is mediocre. Speed without taste is just faster mediocrity.

Taste is also a reading list. The blogs you follow, the products you study, the people whose judgment you respect enough to learn from. The act of selection: choosing what's worth your attention, which ideas to absorb, which frameworks to internalize, which trends to ignore. That selection process is taste in action, long before you open an editor.


The Window Is Closing

Every previous technology in the timeline automated execution but couldn't touch judgment. A camera captured light but couldn't decide what was worth photographing. A printing press reproduced text but couldn't tell you what was worth reading. Clean division: machines handle production, humans handle selection.

AI breaks this. It automates cognitive execution, and it's climbing the taste stack faster than most people realize.

Taste is not one skill. A friend described it to me as a meta-capability[23], and the framing stuck: the ability to hold an entire system in your head and make judgments across every dimension simultaneously. When you review a pull request and something feels off, you're not doing one thing. You're evaluating whether this abstraction will survive the next three features, whether the API surface makes sense to someone who hasn't read the implementation, whether the test coverage exercises the failure modes that matter in production or just the ones that were easy to write.

All of that collapses into a single feeling: this isn't right.

That feeling is taste. It's the synthesis that no individual metric captures. And it's why AI-generated code still feels like slop even when it compiles and passes tests: each function is reasonable in isolation, but together they form an architecture nobody would have chosen on purpose. The model pattern-matched on what code looks like rather than reasoning about what the codebase needs.

But AI is building its own prior. Each generation climbs higher:

  • SWE-bench Verified: 4% → 94% in three years.[18] SWE-bench Pro, the harder architecture benchmark: 0% → 78% in one model generation.[22]
  • Humanity's Last Exam, 3,000 expert-level questions: 3% → 57% in fifteen months.[20][22]
  • ARC-AGI keeps resetting to near zero. Each iteration gets solved within a year. ARC-AGI-3, interactive agentic environments: every frontier model under 1%.[19]
AI performance relative to human baseline (%)Source: Stanford AI Index 2025, swebench.com, anthropic.com/glasswing0%20%40%60%80%100%120%2016201820202022202420262027Human baselineImage classificationReading comprehensionLanguage understandingCompetition mathPhD-level scienceSWE-bench Verified (94%)

Older benchmarks already crossed. Newer, harder ones closing fast. SWE-bench Verified: 4% → 94% in 3 years. Source: Stanford AI Index 2025, Anthropic.

[17] [18]

Think of taste as a stack of layers, surface to deep:

  • Code style and formatting
  • Component-level design decisions
  • System architecture and long-term maintainability
  • Product direction and user understanding
  • Vision: what should exist, for whom, and why

AI automates from the bottom up. With each generation, another layer crosses the line. The floor keeps rising, and the room keeps shrinking.

Marc Andreessen points to institutional navigation as one durable human advantage: the messy, political work of getting organizations to actually adopt change.[10] Sequoia makes a similar argument, that humans occupy "the edge" where intelligence meets reality, navigating trust, cultural context, and ethical judgment that models can't perceive.[12] These are real frictions. But they're not the fundamental bottleneck. The fundamental bottleneck is taste itself: holistic, multi-dimensional judgment is genuinely hard to develop. The deeper question is whether AI can develop judgment that requires caring about the outcome.

An AI can evaluate which design is better. It can't care which one ships.

Taste is pattern recognition plus instinct, and pattern recognition is learnable. Conviction is something else: having a stake in the outcome, refusing to ship what you don't believe in. That requires having something to lose. But conviction compounds too. The engineers I know who make consistently good calls aren't just opinionated. They've been wrong enough times to calibrate. Each regret sharpened the next decision.


The Feedback Loop

The people thriving right now were always exercising taste but were bottlenecked by execution. AI removed the bottleneck. Now they're amplified. The people struggling are discovering they don't know what to build when building is free.

But this isn't a race against AI. It's a feedback loop.

AI gets better, which raises the bar for human judgment. Humans develop deeper taste, which produces better work for AI to learn from. Each pushes the other higher. Two forces that don't compete so much as co-evolve.

But the loop only works if you have something to feed into it.

Engineers with taste enter this compounding cycle. Each AI iteration sharpens their judgment, which produces better prompts, which yields better output, which reveals subtler patterns. Engineers without taste never enter the loop. More iterations don't help when you can't tell good output from bad. The gap doesn't close. It widens with every generation.

The same compounding is happening at the lab level. Frontier labs now use their own models to build the next generation. Most of the code for future Claude models is written by Claude. Releases that took quarters ship in weeks. This is recursive self-improvement in practice, and once a lab establishes a lead, each cycle widens it. The better your model, the faster your next one arrives.

The compounding advantageRecursive improvement vs. linear scalingG0G1G2G3G4G5G6G7model generations →widening gaprecursiveimprovementlinear scalingAI writes its own codeAI designs training runs

The lab that compounds first doesn't just lead. It accelerates away.

I notice this in my own work. Patterns I spent years developing intuition for now show up in model suggestions. Not perfectly, but recognizably. That doesn't make the intuition less valuable. It frees me to operate at a layer I couldn't reach before, because I was spending all my time on the layers below it. The floor rises, and so does the ceiling.

Every wave of automation in the timeline didn't just raise the bar. It changed what the bar measured. The printing press didn't make scribes more valuable. It created editorial judgment, a skill that didn't exist before the technology demanded it. AI will do the same. The taste that matters most three years from now is probably a kind of judgment we don't have a name for yet, one that emerges from the interaction itself.

I've never been more excited to work. The parts of building I always cared about most, the decisions about what should exist and why, used to be buried under months of execution. Now they're the whole job. The fog lifted, and what's underneath is the work I wanted to be doing all along.


Thanks to Xiuyu Li for reading a draft of this and pushing back on the parts that needed it.

References

  1. Slavoj Žižek on synthetic sex. Big Think interview. The vibrator-and-fleshlight thought experiment.
  2. Paul Graham on taste in the AI age. Feb 14, 2026. Greg Brockman QT: 'Taste is a new core skill.' ~3.7M combined impressions.
  3. Sam Altman on taste and AGI teams. Fortune, Feb 27, 2026
  4. Recording tape costs in the 1950s. Professional recording tape was ~$50/reel in 1955 dollars (~$570 adjusted)
  5. Jevons Paradox. William Stanley Jevons, 1865. Increased efficiency of coal use led to increased total coal consumption.
  6. Y Combinator W25 batch growth. 4x application increase year-over-year
  7. GitHub Octoverse 2024: 1B AI-assisted contributions. Reported Oct 2024
  8. Simon Willison on Lenny's Podcast. Apr 2026. 95% of code from phone, exhausted by 11am, 'hundreds of small prompts'
  9. Emil Kowalski: Developing Taste. Taste as trained instinct: exposure, analysis, practice. The car analogy.
  10. Marc Andreessen on Latent Space. Apr 3, 2026. Institutional resistance as the real bottleneck.
  11. Morgan Stanley: AI's Impact on $2T SaaS Market. 2025. SaaS market cap risk from AI commoditization of software.
  12. Sequoia: From Hierarchy to Intelligence. Humans at 'the edge' where intelligence meets reality
  13. Eric De Castro: Taste Is the Only Moat. Feb 2026. Taste defined by what you refuse to do — restraint as the moat.
  14. Is taste a 'new core skill'? Techies debate. Feb 2026. Paras Chopra and Nan Yu (Linear) on whether AI can learn taste as pattern recognition.
  15. Julie Zhuo: When AI Has Better Taste Than You. Conviction vs. taste — will as the real differentiator when AI can predict quality.
  16. The Ivanisms that power Notion: Ivan Zhao on taste and conviction. Taste as pattern recognition is replicable; movement-making conviction is not.
  17. Stanford HAI AI Index Report 2025. Comprehensive benchmark tracking: image classification, reading comprehension, language understanding, competition math, PhD-level science QA.
  18. SWE-bench Verified Leaderboard. Real-world software engineering benchmark. 4.4% → 81% resolved in ~2.5 years (Oct 2023 – Apr 2026).
  19. ARC Prize: ARC-AGI Benchmarks. ARC-AGI-1: o3 hit 87.5% (Dec 2024), now saturated at ~98%. ARC-AGI-2: launched at 4%, now ~85%. ARC-AGI-3: interactive/agentic, all frontier models <1% (Mar 2026).
  20. Humanity's Last Exam. 3,000 expert-level questions. Launch scores ~3-4% (Jan 2025), Gemini 2.5 Pro reached ~18% within months.
  21. SWE-bench Pro: Architecture-Level Coding Benchmark. Scale AI, Sep 2025. Multi-file edits, 100+ line changes.
  22. Anthropic: Project Glasswing. Apr 7, 2026. Claude Mythos Preview: SWE-bench Verified 93.9%, SWE-bench Pro 77.8%, HLE 56.8%.
  23. Xiuyu Li. Private conversation, Apr 2026.

If any of this resonated or you see it differently, I'd like to hear from you