How to Use Codex Levels

About 10 minutes

Developers who want to use Codex for implementation, verification, and pull requests instead of one-off code generation

When you first use Codex, asking it to fix a piece of code can already feel useful. But to use it reliably in real work, you need more than one-off code generation. You need to design how work is split, how results are verified, and how changes are reviewed.

Codex Levels are a map for that maturity curve. Level 0 is advice. Level 1 is repository understanding. Levels 2-3 are local editing and verification. Levels 4-6 cover context, GitHub collaboration, and harness design. Levels 7 and above move into external tools, parallel work, recurring operations, and platform design.

What Codex Levels Mean

Codex Levels describe how much work you can safely delegate to Codex. A higher level does not mean humans disappear. It means the human role moves from individual implementation details toward specification, verification, permissions, and review design.

Stage	How Codex is used	What the human designs
Level 0	Coding advice outside a repository	Assumptions, questions, usage decisions
Level 1	File reading and code investigation	Scope and investigation criteria
Level 2-3	Bounded edits, tests, and small feature completion	Change scope, acceptance criteria, verification commands
Level 4-6	AGENTS.md, GitHub collaboration, and harnesses	Working agreements, review ownership, constraints, approvals
Level 7-8	Browser, MCP, external tools, parallel work	Tool permissions, task ownership, conflict avoidance
Level 9-10	CI, recurring tasks, team operations	Operating rules, auditability, quality standards

Use the table as a delegation checklist, not as a personal skill ranking.

Core Principles From the Official Docs

OpenAI’s Codex documentation describes Codex as an agent that receives prompts, reads files, edits files, and calls tools while working through a task. In other words, Codex is not just a chat interface that returns an answer. It acts inside a working environment.

That means your prompt should include:

the target files or feature
steps to reproduce the problem
the expected behavior
verification commands to run
areas that must not change
evidence you want in the final report

For example:

Fix the project detail page in src/app/projects/[slug]/page.tsx.

Conditions:
- Do not change the existing ProjectCard component.
- Show projects with featured: false on the detail page.
- Add a failing-case test in tests/projects/slug.test.ts.
- Run npm test -- tests/projects/slug.test.ts after the change.
- Report anything that remains unverified.

If you only say “fix this,” Codex has to guess what completion means. Clear completion criteria align Codex’s work with your review.

Patterns Seen in Public Workflows

Public posts on LinkedIn and X from experienced Codex users tend to emphasize working conditions over long “magic prompts.” The recurring patterns are:

Write requests like GitHub issues.
Keep AGENTS.md as a short table of contents and move detailed rules into separate files.
Keep one task around one hour of work or a few hundred lines of changes.
Ask the final report to include tests, diffs, and unverified items.

This matches the official guidance: Codex is more reliable with verifiable work, small focused steps, and concrete context.

Level 0-1: Move From Advice to Repository Understanding

At the first stage, do not start by delegating a large implementation. Use Codex to understand the codebase.

Explain where login is implemented in this repository.
List the related files, major functions, and test files in a table.
Do not make changes yet.

This is close to the Ask mode workflow OpenAI recommends for larger changes. Before editing, ask Codex to explain the structure and draft a plan. Once the understanding is correct, move into Code mode or CLI-based edits.

Level 2-3: Delegate a Small Feature With Verification

At the next stage, delegate both implementation and verification. Keep the scope small.

Good request:

Add technology stack badge display to the project card.

Scope:
- components/ProjectCard.tsx
- app/projects/page.tsx
- tests/project-card.test.tsx

Completion criteria:
- Display the first three tags from the tags array as badges.
- Do not change the existing card size or layout.
- npm test -- tests/project-card.test.tsx passes.

At this level, do not accept Codex’s report blindly. Read the diff, confirm the commands it ran, and check whether the test actually covers the changed behavior.

Level 4-6: Add AGENTS.md and Harness Rules

If you use Codex repeatedly, writing the same context in every prompt is inefficient. Put the repository working agreement in AGENTS.md.

# AGENTS.md

## Repository Rules

- Inspect related files and existing patterns before implementation.
- Keep changes minimal; avoid unrelated refactors.
- Run npx tsc --noEmit after TypeScript changes.
- Check the browser with next dev after UI changes.
- Do not write secrets, API keys, or personal data into files.
- In the final report, include changes, verification results, and unverified items.

Longer is not automatically better. A very long instructions file becomes hard for both humans and Codex to use. In practice, the root AGENTS.md works best as a table of contents, with detailed rules split into files for testing, security, UI, and GitHub workflows.

At Level 6, add explicit verification commands, forbidden commands, approval requirements, and CI checks. That is harness design.

Level 7-8: Use Tool Integrations and Parallel Work

Codex is available through CLI, IDE, web, and app surfaces. Local threads read and edit your working tree. Cloud threads work in an isolated environment connected to a GitHub repository. When you start parallel work, avoid having two threads edit the same files.

Work that parallelizes well:

Add API tests on branch A.
Update documentation on branch B.
Investigate dependency impact on branch C.

Work that does not parallelize well:

Multiple UI edits to the same component.
Multiple edits to the same migration file.
Implementing several competing designs before the spec is settled.

Codex app and cloud threads are useful for background work. Review and integration still belong to the human workflow.

Level 9-10: Move Into Operations

At higher levels, Codex becomes part of routine development. Examples include adding tests for low-coverage areas, checking dependency update impact, triaging issues, drafting pull request reviews, and updating documentation.

The more you automate, the more permission design matters.

Area	What to check
Permissions	Repositories Codex can read, branches it can write, commands it can run
Auditability	Which instruction produced which diff and which checks were run
Review	Human review boundaries and files that always require review
Failure handling	CI failures, conflicts, and unverified items

Level 10 is not “let Codex do everything.” It is the stage where you design the environment, rules, verification, and review process that let Codex work safely.

A Practical Level-Up Sequence

Ask Codex to explain the structure of an existing repository.
Pick one small change and write target files plus completion criteria.
Specify the verification command to run after implementation.
Ask for a final report with diff summary, test results, and unverified items.
Move repeated rules into AGENTS.md.

This sequence changes Codex from a useful code generator into a verifiable development workflow.

See the references for the external specifications and background sources used on this page.[1][2][3][4]

References

Summary

Codex Levels are a way to expand Codex delegation gradually.
Leveling up is about completion criteria, verification, AGENTS.md, and review design, not longer prompts.
Small tasks, concrete file references, and verification commands make Codex more reliable.
As you move into parallel work and automation, define permissions, auditability, and failure handling first.

Quiz