The Future of RAG

About 10 minutes

RAG is not disappearing — it is changing shape. The simple “run a vector search and pass results to an LLM” configuration is shrinking in relevance, while RAG is evolving into an information-access foundation that combines long-context models, agents, permission management, code execution, evaluation, and knowledge graphs.

The misconception that “RAG will become unnecessary”

As LLMs capable of handling longer contexts become available, it is tempting to think “just put everything in the context window and RAG is no longer needed.”

But the purpose of RAG is not simply to save context length.

RAG serves the following roles.

Selecting the needed information from a large document collection
Controlling which information is shown based on permissions
Managing document freshness and versioning
Enabling source citation
Preventing unnecessary confidential content from being passed to the LLM
Evaluating retrieval and generation quality
Conducting investigation across multiple tools and data sources

Long-context models provide the ability to read a lot at once. RAG provides the ability to select what should be read and manage it as citable evidence. The two are complementary, not competing.

Directions of evolution

RAG is expanding in six directions.

Direction	What changes
Long-context RAG	Retrieval narrows the candidates; selected documents are read at length
Agentic RAG	An agent performs retrieval planning, re-retrieval, and verification
Graph RAG	Document corpora are structured as entities, relationships, and summaries
Multimodal RAG	PDFs, tables, images, audio, and video are all retrieval targets
Code RAG	Repositories, tests, execution logs, and history are retrieval targets
Secure RAG	Permissions, auditing, and data minimisation are treated as core design

1. Long-context RAG

Future RAG will not merely pass a handful of short chunks to the LLM; instead, retrieval will narrow the candidates, and relevant documents will then be read in full within a long context window.

graph LR
    Q["Question"] --> R["Candidate retrieval"]
    R --> S["Document selection"]
    S --> LC["Deep reading in long context"]
    LC --> A["Answer"]

In this design, RAG’s role shifts from “keeping context short” to “selecting the documents worth reading.”

Suited for use cases such as these.

Reading a specific contract clause together with its surrounding context
Comparing multiple research papers
Reading related sections from a lengthy design document
Reading related files across a repository in one pass

2. Agentic RAG

For complex questions, retrieval cannot be completed in a single pass — it must unfold as an investigation.

In Agentic RAG, the agent proceeds as follows.

Decompose the question.
Select the required sources.
Retrieve.
Read the results.
Change the retrieval strategy if evidence is insufficient.
Check for contradictions.
Answer with citations.

Research such as the 2025 Agentic RAG survey and the 2026 A-RAG paper demonstrates that RAG is moving away from fixed pipelines toward a design that leverages the LLM’s reasoning and tool-use capabilities.[1][2]

3. Graph RAG

When a user asks “what is happening overall?” across a large corpus, chunk retrieval alone is insufficient.

Graph RAG extracts entities, relationships, communities, and summaries from documents, treating the entire corpus as a structured object.[3]

Going forward, it will become important for use cases such as these.

Theme analysis across all customer inquiries
Visualising the knowledge network within an organisation
Mapping relationships among research papers
Organising related clauses in legal and regulatory documents
Analysing the relationship between incident reports and system configuration

Graph RAG carries high construction costs, however, so it is not necessary for every RAG system. It delivers its value in domains where corpus-level and relationship questions dominate.

4. Multimodal RAG

Business documents are not limited to text.

PDFs, tables, diagrams, screenshots, audio, and video all contain information that plain text extraction alone will lose.[4]

Data type	Information easily lost
PDF	Page structure, footnotes, tables, multi-column layout
Table	Row-column relationships, units, formulas
Image	Meaning of figures, spatial arrangement, annotations
Audio	Speaker identity, pauses, emphasis
Video	Timestamps, on-screen actions, scene transitions

Future RAG must not only search OCR-extracted text but also preserve the original structure and present evidence in correspondence with that structure.

5. Code RAG

With the rise of coding agents, RAG is also becoming central to software development.[5]

Code RAG retrieves and reads the following.

Source code
Type definitions
Tests
Configuration
Execution logs
Issues
Pull requests
Commit history
Agent instruction files

The future of Code RAG is not mere code search but “RAG that edits and verifies.”

graph TD
    Search["Retrieve relevant code"] --> Edit["Edit"]
    Edit --> Test["Run tests"]
    Test --> Log["Read logs"]
    Log --> Search

Execution results become the next piece of context, and the agent iterates between retrieval and modification.

6. Secure RAG

The more widely RAG is deployed in enterprises, the more important permissions and auditing become — sometimes surpassing raw retrieval precision in priority.

The following design elements will become standard in future RAG systems.

Design element	Description
Permission inheritance	Enforce the source system’s access controls at retrieval time
Data minimisation	Pass only the necessary evidence to the LLM
Citation audit	Preserve the mapping between answers and their supporting evidence
Index freshness	Reflect updates, deletions, and expirations
Prompt injection mitigation	Never treat retrieved documents as commands
Log control	Avoid over-retaining sensitive information

A design that puts everything into a vector database without addressing permissions and updates creates real risk. Going forward, architectures that query the original data source at retrieval time — preserving source-level permissions — will become more common.

RAG becomes the agent’s memory

For an agent, RAG is external memory.

This is not simply long-term storage, however.

Memory that consults authoritative documents
Memory that consults past work history
Memory controlled by the user’s permissions
Memory that carries verifiable citations
Memory that can be updated and deleted

When a human does their job, they do not memorise everything — they locate the relevant materials, read them, take notes, and cite their sources. RAG provides agents with that same working capability.

Challenges that will remain

Even as RAG evolves, certain challenges persist.

Challenge	Why it is difficult
Retrieval failure	Without the needed evidence, correct answers are impossible
Contradictory evidence	Multiple sources may contain conflicting information
Evaluation difficulty	Real-world questions often have more than one valid answer
Cost	Retrieval, reranking, long-context reading, and verification all add expense
Latency	Agentic designs naturally involve multiple steps
Permissions	Different users should see different information
Freshness	A stale index leads to incorrect answers

Evaluation in particular will remain critical. RAG improvements should be measured not by intuition but by retrieval recall, evidence faithfulness, citation accuracy, and task success rate in actual workflows.

Practical conclusions

For anyone designing RAG systems from this point forward, the following framing is realistic.

For simple FAQ, Advanced RAG is sufficient.
For reading long documents, combine Long-context RAG.
For corpus-level trend questions, consider Graph RAG.
For complex investigation, use Agentic RAG.
For software development, design it as Code RAG.
For enterprise deployment, build in permissions, auditing, and evaluation from the start.

Summary

RAG is not disappearing; it is evolving into a foundation for information access, permission management, verification, and agent action.
Long-context models are not a replacement for RAG but a complement: they read deeply the documents that RAG selects.
Agentic RAG and Code RAG expand RAG from “answer generation” to “task execution.”
What matters in future RAG is not only retrieval precision but also evidence quality, permissions, freshness, evaluation, and auditing.

References

Embeddings & Vector Representations

Code RAG and Coding Agents