Code RAG is RAG that retrieves not natural-language documents but source code, type definitions, tests, configuration files, issues, pull-request history, and design notes, then uses that retrieved context for code generation or modification. With the rise of coding agents, RAG has expanded beyond “internal document search” into “context acquisition technology for understanding and modifying a repository.”[1][4]
Why conventional RAG does not transfer directly to code
Section titled “Why conventional RAG does not transfer directly to code”In natural-language RAG, documents are commonly split by character count or paragraph boundaries. That approach alone is insufficient for code.
Code has a structure that differs fundamentally from natural language.
- Functions
- Classes
- Types
- Imports and exports
- Call relationships
- Tests
- Configuration files
- Generated files
- Build scripts
- Modification history
When a chunk boundary falls in the middle of a function, the return value, exception handling, types, and dependencies are lost. Conversely, grouping unrelated functions into one chunk causes the LLM to be misled by irrelevant code.
Information that Code RAG handles
Section titled “Information that Code RAG handles”The retrieval targets in Code RAG extend well beyond source files.
| Information | Use |
|---|---|
| Source code | Understanding the implementation, identifying where to change |
| Type definitions | Understanding API inputs, outputs, and constraints |
| Tests | Expected behaviour, regression confirmation |
| README / docs | Design intent, usage instructions |
| Configuration files | Build settings, linting, routing, environment differences |
| Issues / PRs | Past discussions, reasons for changes |
| Commit history | Why a particular implementation decision was made |
| Execution logs | Failure causes, environment-specific issues |
| Agent instruction files | Repository-specific work rules |
Coding agents combine these sources to decide what to read, what to change, and how to verify changes.
Basic flow of Code RAG
Section titled “Basic flow of Code RAG”graph TD
Task["Natural-language development task"] --> Plan["Investigation plan"]
Plan --> Search["Code & document search"]
Search --> Read["Read relevant files"]
Read --> Edit["Edit code"]
Edit --> Test["Run tests & linting"]
Test --> Observe["Review results"]
Observe -->|Failure| Search
Observe -->|Success| Summary["Describe changes"]Where conventional RAG ends at generating an answer, Code RAG extends through editing, execution, and verification.
Chunk design: cut by structure, not by line count
Section titled “Chunk design: cut by structure, not by line count”The quality of Code RAG depends heavily on chunk design.
A poor approach is simple fixed-length splitting.
Lines 1–80
Lines 81–160
Lines 161–240This risks cutting in the middle of a function or class.
A better approach uses syntactic structure.
| Chunk unit | Suited for |
|---|---|
| Function | Understanding a specific operation |
| Class | Understanding state and responsibilities |
| Type definition | Understanding an API contract |
| File | Understanding an entire module |
| Directory | Understanding a subsystem |
| Call graph | Analysing the scope of impact |
Research such as cAST (2025) demonstrates the importance of using ASTs to create semantically coherent chunks. In code, what matters is not character count but structural integrity.[2]
Search methods: vector search alone is not enough
Section titled “Search methods: vector search alone is not enough”Code RAG combines multiple retrieval methods.
| Search method | Example | Strength |
|---|---|---|
| Text search | rg "functionName" | Exact matches for names, strings, and errors |
| Symbol search | Definition and reference lookup | Tracing relationships between functions and types |
| Vector search | Searching for similar logic | Intent-based and paraphrase-tolerant queries |
| AST search | Syntactic pattern matching | Discovering specific structural forms |
| Execution result search | Test logs, error messages | Identifying failure causes |
| History search | Git logs, PRs | Understanding reasons for changes |
In practice, coding agents typically start with fast string search to locate initial leads, then move to semantic search or file reading as needed.
The relationship between coding agents and RAG
Section titled “The relationship between coding agents and RAG”Coding agents use RAG as an internal component — but its role is more than simple retrieval.
| Agent action | Relationship to RAG |
|---|---|
| Understand the task | Read relevant documentation, issues, and existing implementations |
| Locate the change site | Code search, symbol search, dependency search |
| Decide the implementation approach | Reference similar implementations, design patterns, and tests |
| Edit | Modify code based on retrieved context |
| Test | Use execution results as the next piece of context |
| Fix | Search and read error logs to inform re-editing |
| Explain | Cite evidence files and verification results |
In other words, RAG is at the core of a coding agent’s ability to read.
Why repository-level tasks are harder
Section titled “Why repository-level tasks are harder”Generating a small function and modifying a repository are entirely different problems.
Repository-level tasks present the following challenges.
- Changes span multiple files.
- Modifications must conform to the existing design.
- Tests, linting, and type checking are all involved.
- The work requires editing existing code rather than generating new code.
- Misjudging the scope of impact causes regressions.
- Project-specific rules must be followed.
Benchmarks such as SWE-PolyBench have emerged because the evaluation of coding agents has shifted from “solve an isolated code problem” to “make a change in a real repository.”[3]
The importance of agent context files
Section titled “The importance of agent context files”In recent coding agents, files such as AGENTS.md and similar context files have become important. These files convey repository-specific work rules, prohibited actions, verification commands, and design principles to the agent.[5][6]
From a Code RAG perspective, such a file is a high-priority document that should always be retrieved.
Typical content includes the following.
- Do not run the production build command without approval.
- Treat Japanese as the source of truth.
- Do not modify the existing UI.
- Which commands to use for verification.
- The boundary between generated files and hand-edited files.
If an agent searches only the code and ignores this file, it may produce changes that are technically correct but violate project conventions.
Evaluating Code RAG
Section titled “Evaluating Code RAG”Code RAG cannot be evaluated on answer quality alone. Whether the code actually works is what matters.
| Evaluation dimension | What to check |
|---|---|
| Retrieval Recall | Were the necessary files, functions, and tests found? |
| Context Precision | Did the agent read excessive amounts of unneeded code? |
| Edit Correctness | Does the change satisfy the requirement? |
| Regression Safety | Were any existing behaviours broken? |
| Test Success | Do tests, linting, and type checking pass? |
| Style Consistency | Does the edit match the conventions of the existing codebase? |
| Minimality | Were unrelated changes avoided? |
| Traceability | Can the reason for the change and its supporting evidence be explained? |
In natural-language RAG the goal is a “correct answer.” In Code RAG the goal is a “correct, verifiable change.”
Practical Code RAG design
Section titled “Practical Code RAG design”1. Strengthen string search first
Section titled “1. Strengthen string search first”In code, function names, type names, error messages, and test names are powerful leads. Exact-text search often outperforms semantic search in the first retrieval pass.
2. Index structural information
Section titled “2. Index structural information”Storing the following metadata alongside file contents improves retrieval precision.
- Symbol names
- Definition locations
- Reference locations
- Import and export relationships
- Test targets
- Module dependencies
- Owning directory
3. Treat tests as context
Section titled “3. Treat tests as context”Tests are part of the specification. Retrieving tests related to the implementation under consideration — not just the implementation itself — is important for confirming expected behaviour.
4. Use execution results as the input for the next retrieval
Section titled “4. Use execution results as the input for the next retrieval”A test failure log becomes the next retrieval query.
Failure log → search error name → read related test → fix → re-run testsThis loop pairs naturally with an Agentic RAG design.
5. Separate read permissions from write permissions
Section titled “5. Separate read permissions from write permissions”The scope the agent can read and the scope it can write should be treated as distinct concerns.
| Operation | Example |
|---|---|
| Read | Source, tests, documentation, configuration |
| Write | Designated implementation files, tests |
| Requires approval | Production builds, destructive operations, external communication |
| Prohibited | Secrets, credentials, large-scale reformatting of unrelated code |
How coding agents are changing RAG
Section titled “How coding agents are changing RAG”With the emergence of coding agents, RAG is expanding from “evidence retrieval for answering questions” to “context management for performing work.”
The following directions are becoming important for Code RAG going forward.
- Structural search using ASTs and type information
- Integration with tests, execution logs, and coverage
- Persistent retrieval of repository-specific rules
- Continual-memory-style history using past modification records
- Agents that evaluate before-and-after diffs
- Workflows that retrieve and incorporate code review comments
Summary
Section titled “Summary”- Code RAG retrieves code, tests, configuration, history, and agent instructions to support development work.
- In code, structured chunking by function, class, AST node, or symbol is more important than fixed-length chunking.
- Coding agents use RAG in a loop of retrieval, reading, editing, testing, and re-fixing.
- Evaluation must cover not only retrieval precision but also whether the resulting change works and whether it breaks existing behaviour.
References
Section titled “References”- CodeRAG-Bench: Can Retrieval Augment Code Generation?
- cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
- SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents
- Retrieval-Augmented Code Generation: A Survey with Focus on Repository-Level Approaches
- Agent READMEs: An Empirical Study of Context Files for Agentic Coding
- Introducing Codex