Karpathy Says 'Vibe Coding' Is Evolving Into 'Agentic Engineering' — Here's What That Means for Developers
By EndOfCoding
Andrej Karpathy — the Stanford PhD, former Tesla AI director, and OpenAI co-founder who coined the term 'vibe coding' — has declared a conceptual evolution. In a widely circulated post from early May 2026, Karpathy described a shift from 'vibe coding' to what he calls 'agentic engineering': a more disciplined, systematic approach to AI-assisted development where developers act as engineering directors of fleets of AI agents, rather than lone coders vibing with a single model. The original vibe coding framing — talking to AI in natural language, accepting the output, iterating fast — captured a genuine workflow shift that millions of developers adopted. But Karpathy's new framing acknowledges that the most capable practitioners have moved beyond that early phase. Agentic engineering isn't the death of vibe coding. It's vibe coding with structure: the same AI-native mindset, but applied at a higher level of abstraction where your role is orchestration, verification, and system design rather than line-by-line prompting. This post unpacks what Karpathy actually said, what the practical difference is between vibe coding and agentic engineering, and how developers should think about their own progression along this spectrum.
What You'll Learn
You'll understand Karpathy's specific framing of agentic engineering and how it differs from vibe coding as he originally defined it, the practical workflow differences between a vibe coder and an agentic engineer (task decomposition, agent orchestration, verification patterns), which tools and techniques define the agentic engineering stack in 2026 (Claude Code, Devin 2.0, Windsurf 2.0, multi-agent pipelines), how to assess where you are on the vibe coding to agentic engineering spectrum and what skills to build next, and why this conceptual shift matters for how you market yourself and your work in an AI-saturated job market.
What Karpathy Actually Said
The framing came from Karpathy's public commentary in early May 2026, building on his original 2025 vibe coding definition. The key shift:
Vibe coding (original 2025 framing):
├── 'Give in to the vibes, forget that the code even exists'
├── Describe what you want in natural language
├── Accept and iterate on AI output
├── Speed over precision, prototype over production
└── Developer as: creative director / product person
Agentic engineering (2026 evolution):
├── 'You are the engineering director of a fleet of AI agents'
├── Decompose complex work into agent-executable tasks
├── Define verification criteria before agent execution
├── Review agent output systematically, not naively
└── Developer as: systems architect / engineering manager
The key difference:
├── Vibe coding: one developer, one AI session, one feature
├── Agentic engineering: one developer, multiple parallel agents,
│ multiple features running simultaneously with structured handoffs
Karpathy's point isn't that vibe coding was wrong — it's that the tools have matured to the point where the most effective practitioners have naturally evolved their approach. The 'vibe' is still there (AI-native, natural language, fast iteration), but it's been formalized into an engineering discipline.
The Practical Difference: Vibe Coder vs. Agentic Engineer
Here's how the same task looks at each level:
Task: 'Build a user authentication system for our Next.js app'
Vibe coder approach:
├── Opens Claude Code
├── Prompts: 'Add Supabase auth to this app with email/password
│ and Google OAuth'
├── Reviews output, fixes issues turn by turn
├── Tests manually in the browser
└── Ships when it looks right
Time: 2-4 hours. Quality: good for MVP. Issues: may miss edge cases,
RLS policies may be incomplete, error states may be underhandled.
Agentic engineer approach:
├── Writes a spec first:
│ - Auth flows: email/password, Google OAuth, password reset
│ - Protected routes: /dashboard, /settings, /profile
│ - RLS policies: users can only read/write their own data
│ - Error states: invalid credentials, expired session, rate limit
│ - Acceptance tests: automated test for each flow
│
├── Launches parallel agents:
│ Agent 1 (Claude Code): Implement Supabase auth + RLS policies
│ Agent 2 (Claude Code, new session): Write acceptance test suite
│ Agent 3 (Claude Code, new session): Generate error state UI components
│
├── Reviews each agent's output against the spec (not just vibes)
├── Runs the acceptance test suite before marking complete
└── Ships when tests pass, not just when it looks right
Time: 3-5 hours. Quality: production-ready. Issues: requires spec-writing
upfront, requires test infrastructure, requires reviewing 3x the output.
The agentic engineer approach takes slightly longer upfront but produces dramatically better output quality and catches entire categories of bugs that vibe coding misses.
The Agentic Engineering Stack in 2026
The tools that enable agentic engineering have all shipped major updates that make the workflow practical:
Orchestration layer:
├── Claude Code (Anthropic) — May 2026 release
│ ├── Agent View: visual display of all active agents and their status
│ ├── /goal command: set a high-level goal, Claude Code decomposes
│ │ it into subtasks and executes them in sequence
│ ├── Background sessions: launch agents that run while you work
│ │ on other things, notified on completion
│ └── Multi-agent coordination: agents can hand off context
│
├── Devin 2.0 (Cognition) — launched May 2026
│ ├── Agent-native IDE: built for long-running autonomous tasks
│ ├── $20/month starter plan: enterprise-class autonomous coding
│ │ accessible to individual developers
│ └── Async task execution: assign a task, Devin works independently
│
└── Windsurf 2.0 (Codeium) — April 2026
├── Agent Command Center: manage multiple AI agents in one UI
├── Devin integration: route tasks to Devin from within Windsurf
└── Task queue management: prioritize and monitor agent workloads
Verification layer (new emphasis in agentic engineering):
├── Automated test suites that agents must pass before output accepted
├── Type checking and linting as hard gates on agent output
├── CLAUDE.md / project-level instructions as agent contracts
└── Human review checkpoints at architectural decision points
Spec layer (what differentiates agentic engineering from vibe coding):
├── PRDs and user stories written before agent execution
├── Acceptance criteria defined as test cases, not prose
├── Architecture decision records for choices that affect multiple agents
└── Clear scope boundaries: what each agent is responsible for
Where You Are on the Spectrum
Karpathy's framing implies a progression. Here's a diagnostic:
Level 1 — AI-Assisted Coding:
├── You write code; AI completes snippets
├── AI is a smarter autocomplete
└── No fundamental change to how you think about development
Level 2 — Vibe Coding (original framing):
├── You describe features in natural language
├── AI generates the implementation
├── You review and iterate
├── Fundamentally faster than traditional coding
└── Where most developers are today (2026)
Level 3 — Agentic Coding:
├── You orchestrate multiple AI sessions
├── You think in tasks that agents can execute
├── Agents run in parallel; you review the outputs
├── Test suites as verification, not just manual review
└── Where power users are in 2026
Level 4 — Agentic Engineering (Karpathy's new framing):
├── You write specs before agents execute
├── Multiple agents with defined contracts and handoffs
├── Systematic verification at every stage
├── You are the engineering director; agents are your team
└── Where top 5-10% of AI-native developers operate
Level 5 — AI-Native Architecture:
├── The system itself is agentic (production agents, not just dev agents)
├── You design systems where AI agents run business logic
├── Developer as architect of human-AI systems
└── Emerging — a small number of companies operate here today
Most vibe coders today are operating at Level 2-3. Karpathy's framing describes Level 4, and signals that Level 5 is the next frontier.
How to Develop Agentic Engineering Skills
Moving from vibe coder to agentic engineer is a skill progression, not a tool swap:
Skill 1: Spec writing before execution
├── Before launching any agent on a complex task, write:
│ - What problem this solves (1-2 sentences)
│ - What the output looks like when done correctly
│ - What the acceptance tests are
│ - What the agent should NOT do (scope boundaries)
└── Practice: start every Claude Code session with 'here's the spec'
Skill 2: Task decomposition
├── Break complex features into agent-executable units
├── Good agent task: self-contained, verifiable, 30-90 minute scope
├── Bad agent task: 'build the entire backend'
└── Practice: take your last 5 features and decompose them in writing
Skill 3: Parallel agent orchestration
├── Identify which tasks are independent (can run in parallel)
├── Identify which tasks have dependencies (must run sequentially)
├── Launch parallel agents in separate Claude Code sessions
└── Practice: next feature, deliberately split into 2-3 parallel agents
Skill 4: Systematic verification
├── Define acceptance tests BEFORE agent execution
├── Run tests on every agent output, not just the final one
├── Treat a failing test as valuable signal, not a failure
└── Practice: write tests for your next feature before prompting
Skill 5: Agent contract design
├── CLAUDE.md as a contract that all agents must follow
├── Clear naming conventions, coding patterns, architectural constraints
├── Agents should be able to execute correctly without your presence
└── Practice: review your CLAUDE.md — does it fully constrain agent behavior?
Common Challenges
'Is agentic engineering just software engineering with extra steps?' — It's closer to software engineering than original vibe coding was. The key difference is that agentic engineering is optimized for AI execution speed and scale. Traditional software engineering assumed a human would execute every step. Agentic engineering assumes agents execute most steps, which changes how you structure tasks (smaller, more discrete), write specs (machine-executable not just human-readable), and verify output (automated tests not code review). 'Do I need to learn all of this to stay relevant?' — No. Most developers will land at Level 3 (agentic coding) and that will be a productive, well-compensated place to work. Karpathy's Level 4 framing is aspirational — it describes the highest-leverage way to use current tools, not the minimum viable skill level. 'What's the fastest way to move from Level 2 to Level 3?' — Start writing acceptance criteria before you prompt. This single habit changes how you interact with AI coding tools more than any other change. When you know what 'done' looks like before you start, you evaluate agent output objectively rather than optimistically. 'Will agentic engineering make individual developers less relevant?' — The opposite. Agentic engineers who can orchestrate 5-10 agents simultaneously are delivering 10-50x the output of traditional developers. The constraint isn't the number of developers — it's the number of developers who can orchestrate agents effectively. That skill is currently scarce and commands premium compensation.
Advanced Tips
Treat your CLAUDE.md as an agent contract, not just a preferences file. The most effective agentic engineers maintain a CLAUDE.md that fully constrains agent behavior: coding patterns, naming conventions, testing requirements, architectural constraints, what to never do. When an agent can execute correctly from your CLAUDE.md alone, you've reached engineering-grade agent management. Use the /goal command in Claude Code to practice task decomposition. Set a high-level goal and watch how Claude Code breaks it into subtasks. If the decomposition looks wrong, that's a signal your spec was underspecified — refine it before execution. Build a personal agent playbook. Document the task types where agents consistently perform well in your codebase and those where they consistently need more direction. Over time, this becomes a routing guide: these task types get a full spec and parallel agents; these simpler tasks get a quick prompt. Invest in test infrastructure early. The verification layer is what separates agentic engineering from advanced vibe coding. If you don't have an automated test suite, agents have no objective completion criterion. Even a minimal Playwright E2E test suite gives you a hard gate on agent output. The Vibe Coding Academy Advanced Track covers agentic engineering workflows in Module 11 (Multi-Agent Development) — including hands-on orchestration exercises with Claude Code's Agent View. The conceptual foundations are covered in the Vibe Coding Ebook Chapter 6, which has been updated with Karpathy's agentic engineering framing. Follow ongoing developments at EndOfCoding.
Conclusion
Karpathy's framing of agentic engineering isn't a rejection of vibe coding — it's its natural evolution. The developers who got the most value from vibe coding in 2025 are the ones who are now most capable of making the transition to agentic engineering in 2026. The tools to support this transition have arrived simultaneously: Claude Code's Agent View and background sessions, Devin 2.0's autonomous execution model, Windsurf 2.0's Agent Command Center. The gap between what a skilled agentic engineer can build alone and what a traditional team can build is widening. The good news is that the path from vibe coder to agentic engineer is learnable and incremental: start writing specs before you prompt, add acceptance tests before execution, and practice parallel agent orchestration on your next feature. The Vibe Coding Academy curriculum covers this progression from beginner prompt-to-feature workflows through Advanced Track multi-agent orchestration. Follow Karpathy and the AI-native engineering community at EndOfCoding for ongoing developments.