Code RAG and Coding Agents

About 10 minutes

Code RAG is RAG that retrieves not natural-language documents but source code, type definitions, tests, configuration files, issues, pull-request history, and design notes, then uses that retrieved context for code generation or modification. With the rise of coding agents, RAG has expanded beyond “internal document search” into “context acquisition technology for understanding and modifying a repository.”[1][4]

Why conventional RAG does not transfer directly to code

In natural-language RAG, documents are commonly split by character count or paragraph boundaries. That approach alone is insufficient for code.

Code has a structure that differs fundamentally from natural language.

Functions
Classes
Types
Imports and exports
Call relationships
Tests
Configuration files
Generated files
Build scripts
Modification history

When a chunk boundary falls in the middle of a function, the return value, exception handling, types, and dependencies are lost. Conversely, grouping unrelated functions into one chunk causes the LLM to be misled by irrelevant code.

Information that Code RAG handles

The retrieval targets in Code RAG extend well beyond source files.

Information	Use
Source code	Understanding the implementation, identifying where to change
Type definitions	Understanding API inputs, outputs, and constraints
Tests	Expected behaviour, regression confirmation
README / docs	Design intent, usage instructions
Configuration files	Build settings, linting, routing, environment differences
Issues / PRs	Past discussions, reasons for changes
Commit history	Why a particular implementation decision was made
Execution logs	Failure causes, environment-specific issues
Agent instruction files	Repository-specific work rules

Coding agents combine these sources to decide what to read, what to change, and how to verify changes.

Basic flow of Code RAG

graph TD
    Task["Natural-language development task"] --> Plan["Investigation plan"]
    Plan --> Search["Code & document search"]
    Search --> Read["Read relevant files"]
    Read --> Edit["Edit code"]
    Edit --> Test["Run tests & linting"]
    Test --> Observe["Review results"]
    Observe -->|Failure| Search
    Observe -->|Success| Summary["Describe changes"]

Where conventional RAG ends at generating an answer, Code RAG extends through editing, execution, and verification.

Chunk design: cut by structure, not by line count

The quality of Code RAG depends heavily on chunk design.

A poor approach is simple fixed-length splitting.

Lines 1–80
Lines 81–160
Lines 161–240

This risks cutting in the middle of a function or class.

A better approach uses syntactic structure.

Chunk unit	Suited for
Function	Understanding a specific operation
Class	Understanding state and responsibilities
Type definition	Understanding an API contract
File	Understanding an entire module
Directory	Understanding a subsystem
Call graph	Analysing the scope of impact

Research such as cAST (2025) demonstrates the importance of using ASTs to create semantically coherent chunks. In code, what matters is not character count but structural integrity.[2]

Search methods: vector search alone is not enough

Code RAG combines multiple retrieval methods.

Search method	Example	Strength
Text search	`rg "functionName"`	Exact matches for names, strings, and errors
Symbol search	Definition and reference lookup	Tracing relationships between functions and types
Vector search	Searching for similar logic	Intent-based and paraphrase-tolerant queries
AST search	Syntactic pattern matching	Discovering specific structural forms
Execution result search	Test logs, error messages	Identifying failure causes
History search	Git logs, PRs	Understanding reasons for changes

In practice, coding agents typically start with fast string search to locate initial leads, then move to semantic search or file reading as needed.

The relationship between coding agents and RAG

Coding agents use RAG as an internal component — but its role is more than simple retrieval.

Agent action	Relationship to RAG
Understand the task	Read relevant documentation, issues, and existing implementations
Locate the change site	Code search, symbol search, dependency search
Decide the implementation approach	Reference similar implementations, design patterns, and tests
Edit	Modify code based on retrieved context
Test	Use execution results as the next piece of context
Fix	Search and read error logs to inform re-editing
Explain	Cite evidence files and verification results

In other words, RAG is at the core of a coding agent’s ability to read.

Why repository-level tasks are harder

Generating a small function and modifying a repository are entirely different problems.

Repository-level tasks present the following challenges.

Changes span multiple files.
Modifications must conform to the existing design.
Tests, linting, and type checking are all involved.
The work requires editing existing code rather than generating new code.
Misjudging the scope of impact causes regressions.
Project-specific rules must be followed.

Benchmarks such as SWE-PolyBench have emerged because the evaluation of coding agents has shifted from “solve an isolated code problem” to “make a change in a real repository.”[3]

The importance of agent context files

In recent coding agents, files such as AGENTS.md and similar context files have become important. These files convey repository-specific work rules, prohibited actions, verification commands, and design principles to the agent.[5][6]

From a Code RAG perspective, such a file is a high-priority document that should always be retrieved.

Typical content includes the following.

Do not run the production build command without approval.
Treat Japanese as the source of truth.
Do not modify the existing UI.
Which commands to use for verification.
The boundary between generated files and hand-edited files.

If an agent searches only the code and ignores this file, it may produce changes that are technically correct but violate project conventions.

Evaluating Code RAG

Code RAG cannot be evaluated on answer quality alone. Whether the code actually works is what matters.

Evaluation dimension	What to check
Retrieval Recall	Were the necessary files, functions, and tests found?
Context Precision	Did the agent read excessive amounts of unneeded code?
Edit Correctness	Does the change satisfy the requirement?
Regression Safety	Were any existing behaviours broken?
Test Success	Do tests, linting, and type checking pass?
Style Consistency	Does the edit match the conventions of the existing codebase?
Minimality	Were unrelated changes avoided?
Traceability	Can the reason for the change and its supporting evidence be explained?

In natural-language RAG the goal is a “correct answer.” In Code RAG the goal is a “correct, verifiable change.”

Practical Code RAG design

1. Strengthen string search first

In code, function names, type names, error messages, and test names are powerful leads. Exact-text search often outperforms semantic search in the first retrieval pass.

2. Index structural information

Storing the following metadata alongside file contents improves retrieval precision.

Symbol names
Definition locations
Reference locations
Import and export relationships
Test targets
Module dependencies
Owning directory

3. Treat tests as context

Tests are part of the specification. Retrieving tests related to the implementation under consideration — not just the implementation itself — is important for confirming expected behaviour.

4. Use execution results as the input for the next retrieval

A test failure log becomes the next retrieval query.

Failure log → search error name → read related test → fix → re-run tests

This loop pairs naturally with an Agentic RAG design.

5. Separate read permissions from write permissions

The scope the agent can read and the scope it can write should be treated as distinct concerns.

Operation	Example
Read	Source, tests, documentation, configuration
Write	Designated implementation files, tests
Requires approval	Production builds, destructive operations, external communication
Prohibited	Secrets, credentials, large-scale reformatting of unrelated code

How coding agents are changing RAG

With the emergence of coding agents, RAG is expanding from “evidence retrieval for answering questions” to “context management for performing work.”

The following directions are becoming important for Code RAG going forward.

Structural search using ASTs and type information
Integration with tests, execution logs, and coverage
Persistent retrieval of repository-specific rules
Continual-memory-style history using past modification records
Agents that evaluate before-and-after diffs
Workflows that retrieve and incorporate code review comments

Summary

Code RAG retrieves code, tests, configuration, history, and agent instructions to support development work.
In code, structured chunking by function, class, AST node, or symbol is more important than fixed-length chunking.
Coding agents use RAG in a loop of retrieval, reading, editing, testing, and re-fixing.
Evaluation must cover not only retrieval precision but also whether the resulting change works and whether it breaks existing behaviour.

References

The Future of RAG

What is Agentic RAG?