Codex Levels 0-10: Definitions for All 11 Stages
About 10 minutes
Codex maturity cannot be measured by the number of features used. The important question is how much work can be delegated and how well context, verification, permissions, and review support that delegation.
This page defines Codex maturity from Level 0 through Level 10. Because Level 0 is included, the model has 11 stages. Not every project needs Level 10. Choose a level that matches the risk of the changes, team size, and frequency of operation.
The concrete examples use building and operating a Next.js personal portfolio site as a shared theme. This makes it possible to compare expanding delegation through one project, from a small project-card edit to adding projects, updating the skills page, and publication workflows.
The 11-Stage Model
Section titled “The 11-Stage Model”| Level | Title | Core capability | Typical deliverable |
|---|---|---|---|
| Level 0 | Chat Advisor | Coding advice | Answer or sample code |
| Level 1 | Repository Reader | Repository understanding | Related-file map and investigation results |
| Level 2 | Focused Editor | Bounded editing | Single-responsibility diff |
| Level 3 | Verified Implementer | Verified implementation | Small feature and test results |
| Level 4 | Context Engineer | Persistent context | AGENTS.md and shared rules |
| Level 5 | GitHub Collaborator | Issue and PR collaboration | Branch, PR, and review record |
| Level 6 | Harness Builder | Safe working environment | Permissions, approvals, checks, and constraints |
| Level 7 | Tool Operator | External tool operation | Browser checks and MCP or connector actions |
| Level 8 | Parallel Orchestrator | Parallel task management | Task plan, worktrees, and subagent results |
| Level 9 | Workflow Operator | Recurring operations | CI, scheduled runs, and triage workflows |
| Level 10 | Agent Platform Architect | Organization-scale platform design | Agent roles, evaluations, audit, and improvement loops |
Level 0: Chat Advisor
Section titled “Level 0: Chat Advisor”State: You ask coding or design questions and apply the answer manually. Codex does not inspect the target repository, edit files, or run commands.
Typical work: Explain an error, generate a function example, or compare design options.
Limitation: The answer may not match the actual dependencies, conventions, or implementation.
Move forward when: Codex can inspect the repository and explain the code using real files as evidence.
Level 1: Repository Reader
Section titled “Level 1: Repository Reader”State: Codex reads the working tree and explains code structure, related files, and likely causes. Investigation is the main activity; editing is optional.
Typical work: Locate the authentication entry point, trace a bug, or identify tests affected by a proposed change.
Completion standard: The result includes file paths and code evidence that a human can verify.
Move forward when: You can define in-scope and out-of-scope areas and delegate a small edit.
Level 2: Focused Editor
Section titled “Level 2: Focused Editor”State: Codex makes a change limited to one file or one responsibility. A human reads the diff for unintended changes.
Typical work: Add one validation rule, correct copy, or perform a small refactor using an existing pattern.
Completion standard: The request defines what may change, what must not change, and the expected result.
Move forward when: Tests or linting become part of the same definition of done as the edit.
Level 3: Verified Implementer
Section titled “Level 3: Verified Implementer”State: Codex updates several related files and completes a small feature or bug fix, including verification.
Typical work: Update a form, validator, and tests, then run the targeted test command.
Completion standard: The final report includes changed files, commands run, success or failure, and anything unverified.
Move forward when: Repeated conventions and commands are moved into repository instructions.
Level 4: Context Engineer
Section titled “Level 4: Context Engineer”State: Files such as AGENTS.md continuously provide the technology stack, editing rules, verification commands, and prohibited actions.
Typical work: Use a root AGENTS.md as an entry point to domain-specific rules and skills.
Completion standard: New threads receive the same working agreement without repeating it in every prompt.
Move forward when: Work expands from the local tree into issues, branches, pull requests, and reviews.
Level 5: GitHub Collaborator
Section titled “Level 5: GitHub Collaborator”State: Codex reads GitHub issues and pull requests and assists with branch creation, implementation, PR creation, and review feedback.
Typical work: Extract acceptance criteria from an issue, draft a PR summary, or address review comments.
Completion standard: Commit scope, branch policy, review ownership, and merge permission are explicit.
Move forward when: Permissions, approvals, and safety checks are systematized instead of specified case by case.
Level 6: Harness Builder
Section titled “Level 6: Harness Builder”State: You design a harness that gives Codex repeatable rules, skills, checks, permissions, approval conditions, prohibited actions, and failure procedures.
Typical work: Require approval for production builds, protect folders, run standard checks after edits, and validate shared policies automatically.
Completion standard: Dangerous actions stop for approval, routine changes follow reproducible procedures, and policy violations are detected.
Move forward when: Tools beyond files and the shell are connected with limited purpose and permissions.
Level 7: Tool Operator
Section titled “Level 7: Tool Operator”State: Codex uses browsers, MCP, connectors, images, or document tools and verifies evidence outside the codebase.
Typical work: Inspect an article list in a browser, read GitHub or CMS post data through a connector, or compare a screenshot with an implementation.
Completion standard: Read and write permissions, sensitive-data boundaries, and consequential actions requiring confirmation are defined for every tool.
Move forward when: Independent work can be split into non-conflicting parallel tasks.
Level 8: Parallel Orchestrator
Section titled “Level 8: Parallel Orchestrator”State: Multiple threads, worktrees, cloud tasks, or subagents execute independent work in parallel.
Typical work: Separate implementation, test additions, and documentation updates, then perform integration verification.
Completion standard: File ownership, dependencies, integration order, and conflict ownership are explicit. Multiple tasks do not edit the same files simultaneously.
Move forward when: Parallel work becomes part of recurring or event-driven operations.
Level 9: Workflow Operator
Section titled “Level 9: Workflow Operator”State: Codex participates in recurring workflows such as CI investigation, issue triage, dependency analysis, and documentation synchronization.
Typical work: Perform first-pass CI failure analysis, identify low-coverage areas, or run scheduled consistency checks.
Completion standard: Triggers, stop conditions, timeouts, notifications, retries, audit logs, and human handoff conditions are defined.
Move forward when: Individual workflows are managed as a shared platform and continuously evaluated.
Level 10: Agent Platform Architect
Section titled “Level 10: Agent Platform Architect”State: Multiple agents, tools, harnesses, evaluations, and audit controls form a reusable platform across projects or an organization.
Typical work: Separate planning, implementation, review, and security roles, then improve rules and skills using evaluation results.
Completion standard: The platform has role separation, least privilege, quality metrics, cost limits, audit trails, failure shutdown procedures, and improvement loops.
Level 10 does not remove humans. Humans still define specifications, permissions, quality standards, and exception handling while governing the work of the agent system.
Assess Your Current Level
Section titled “Assess Your Current Level”Use the highest stage you can reproduce routinely, not the highest stage that succeeded once.
| Question | Level |
|---|---|
| Do you use answers manually without repository inspection? | Level 0 |
| Can Codex investigate using real code evidence? | Level 1 |
| Can you review a tightly scoped diff? | Level 2 |
| Can Codex complete a small feature with tests? | Level 3 |
Are working rules applied persistently through AGENTS.md or similar files? | Level 4 |
| Can issues, PRs, and reviews be handled consistently? | Level 5 |
| Are permissions, approvals, and checks managed as a harness? | Level 6 |
| Can external tools be used within defined permission boundaries? | Level 7 |
| Can multiple tasks run in parallel without conflicts? | Level 8 |
| Can recurring workflows be monitored and recovered? | Level 9 |
| Can multiple workflows be evaluated, audited, and improved? | Level 10 |
See the references for the external specifications and background sources used on this page.[1][2][3]
References
Section titled “References”Summary
Section titled “Summary”- Codex Levels 0-10 measure the maturity of context, verification, permissions, and operations supporting delegation.
- Levels 0-3 cover advice through verified implementation; Levels 4-6 cover instructions, GitHub, and harnesses; Levels 7-10 cover tools, parallelism, recurring operations, and platform design.
- The goal is not the highest level. The goal is a reproducible level appropriate for the project’s risk and scale.