Skip to content
LinkedInX

Codex Levels 0-10: Definitions for All 11 Stages

About 10 minutes

Target audience: Developers and teams that want to assess their current Codex maturity and identify the next capability, rule, or verification practice to add
Prerequisites: You have read How to Use Codex Levels

Codex maturity cannot be measured by the number of features used. The important question is how much work can be delegated and how well context, verification, permissions, and review support that delegation.

This page defines Codex maturity from Level 0 through Level 10. Because Level 0 is included, the model has 11 stages. Not every project needs Level 10. Choose a level that matches the risk of the changes, team size, and frequency of operation.

The concrete examples use building and operating a Next.js personal portfolio site as a shared theme. This makes it possible to compare expanding delegation through one project, from a small project-card edit to adding projects, updating the skills page, and publication workflows.

LevelTitleCore capabilityTypical deliverable
Level 0Chat AdvisorCoding adviceAnswer or sample code
Level 1Repository ReaderRepository understandingRelated-file map and investigation results
Level 2Focused EditorBounded editingSingle-responsibility diff
Level 3Verified ImplementerVerified implementationSmall feature and test results
Level 4Context EngineerPersistent contextAGENTS.md and shared rules
Level 5GitHub CollaboratorIssue and PR collaborationBranch, PR, and review record
Level 6Harness BuilderSafe working environmentPermissions, approvals, checks, and constraints
Level 7Tool OperatorExternal tool operationBrowser checks and MCP or connector actions
Level 8Parallel OrchestratorParallel task managementTask plan, worktrees, and subagent results
Level 9Workflow OperatorRecurring operationsCI, scheduled runs, and triage workflows
Level 10Agent Platform ArchitectOrganization-scale platform designAgent roles, evaluations, audit, and improvement loops

State: You ask coding or design questions and apply the answer manually. Codex does not inspect the target repository, edit files, or run commands.

Typical work: Explain an error, generate a function example, or compare design options.

Limitation: The answer may not match the actual dependencies, conventions, or implementation.

Move forward when: Codex can inspect the repository and explain the code using real files as evidence.

State: Codex reads the working tree and explains code structure, related files, and likely causes. Investigation is the main activity; editing is optional.

Typical work: Locate the authentication entry point, trace a bug, or identify tests affected by a proposed change.

Completion standard: The result includes file paths and code evidence that a human can verify.

Move forward when: You can define in-scope and out-of-scope areas and delegate a small edit.

State: Codex makes a change limited to one file or one responsibility. A human reads the diff for unintended changes.

Typical work: Add one validation rule, correct copy, or perform a small refactor using an existing pattern.

Completion standard: The request defines what may change, what must not change, and the expected result.

Move forward when: Tests or linting become part of the same definition of done as the edit.

State: Codex updates several related files and completes a small feature or bug fix, including verification.

Typical work: Update a form, validator, and tests, then run the targeted test command.

Completion standard: The final report includes changed files, commands run, success or failure, and anything unverified.

Move forward when: Repeated conventions and commands are moved into repository instructions.

State: Files such as AGENTS.md continuously provide the technology stack, editing rules, verification commands, and prohibited actions.

Typical work: Use a root AGENTS.md as an entry point to domain-specific rules and skills.

Completion standard: New threads receive the same working agreement without repeating it in every prompt.

Move forward when: Work expands from the local tree into issues, branches, pull requests, and reviews.

State: Codex reads GitHub issues and pull requests and assists with branch creation, implementation, PR creation, and review feedback.

Typical work: Extract acceptance criteria from an issue, draft a PR summary, or address review comments.

Completion standard: Commit scope, branch policy, review ownership, and merge permission are explicit.

Move forward when: Permissions, approvals, and safety checks are systematized instead of specified case by case.

State: You design a harness that gives Codex repeatable rules, skills, checks, permissions, approval conditions, prohibited actions, and failure procedures.

Typical work: Require approval for production builds, protect folders, run standard checks after edits, and validate shared policies automatically.

Completion standard: Dangerous actions stop for approval, routine changes follow reproducible procedures, and policy violations are detected.

Move forward when: Tools beyond files and the shell are connected with limited purpose and permissions.

State: Codex uses browsers, MCP, connectors, images, or document tools and verifies evidence outside the codebase.

Typical work: Inspect an article list in a browser, read GitHub or CMS post data through a connector, or compare a screenshot with an implementation.

Completion standard: Read and write permissions, sensitive-data boundaries, and consequential actions requiring confirmation are defined for every tool.

Move forward when: Independent work can be split into non-conflicting parallel tasks.

State: Multiple threads, worktrees, cloud tasks, or subagents execute independent work in parallel.

Typical work: Separate implementation, test additions, and documentation updates, then perform integration verification.

Completion standard: File ownership, dependencies, integration order, and conflict ownership are explicit. Multiple tasks do not edit the same files simultaneously.

Move forward when: Parallel work becomes part of recurring or event-driven operations.

State: Codex participates in recurring workflows such as CI investigation, issue triage, dependency analysis, and documentation synchronization.

Typical work: Perform first-pass CI failure analysis, identify low-coverage areas, or run scheduled consistency checks.

Completion standard: Triggers, stop conditions, timeouts, notifications, retries, audit logs, and human handoff conditions are defined.

Move forward when: Individual workflows are managed as a shared platform and continuously evaluated.

State: Multiple agents, tools, harnesses, evaluations, and audit controls form a reusable platform across projects or an organization.

Typical work: Separate planning, implementation, review, and security roles, then improve rules and skills using evaluation results.

Completion standard: The platform has role separation, least privilege, quality metrics, cost limits, audit trails, failure shutdown procedures, and improvement loops.

Level 10 does not remove humans. Humans still define specifications, permissions, quality standards, and exception handling while governing the work of the agent system.

Use the highest stage you can reproduce routinely, not the highest stage that succeeded once.

QuestionLevel
Do you use answers manually without repository inspection?Level 0
Can Codex investigate using real code evidence?Level 1
Can you review a tightly scoped diff?Level 2
Can Codex complete a small feature with tests?Level 3
Are working rules applied persistently through AGENTS.md or similar files?Level 4
Can issues, PRs, and reviews be handled consistently?Level 5
Are permissions, approvals, and checks managed as a harness?Level 6
Can external tools be used within defined permission boundaries?Level 7
Can multiple tasks run in parallel without conflicts?Level 8
Can recurring workflows be monitored and recovered?Level 9
Can multiple workflows be evaluated, audited, and improved?Level 10

See the references for the external specifications and background sources used on this page.[1][2][3]

  1. Prompting - Codex
  2. Custom instructions with AGENTS.md
  3. Security - Codex
  • Codex Levels 0-10 measure the maturity of context, verification, permissions, and operations supporting delegation.
  • Levels 0-3 cover advice through verified implementation; Levels 4-6 cover instructions, GitHub, and harnesses; Levels 7-10 cover tools, parallelism, recurring operations, and platform design.
  • The goal is not the highest level. The goal is a reproducible level appropriate for the project’s risk and scale.
Quiz