Skip to content
LinkedInX

The Future of RAG

About 10 minutes

Prerequisites: What is RAG and What is Agentic RAG?

RAG is not disappearing — it is changing shape. The simple “run a vector search and pass results to an LLM” configuration is shrinking in relevance, while RAG is evolving into an information-access foundation that combines long-context models, agents, permission management, code execution, evaluation, and knowledge graphs.

The misconception that “RAG will become unnecessary”

Section titled “The misconception that “RAG will become unnecessary””

As LLMs capable of handling longer contexts become available, it is tempting to think “just put everything in the context window and RAG is no longer needed.”

But the purpose of RAG is not simply to save context length.

RAG serves the following roles.

  • Selecting the needed information from a large document collection
  • Controlling which information is shown based on permissions
  • Managing document freshness and versioning
  • Enabling source citation
  • Preventing unnecessary confidential content from being passed to the LLM
  • Evaluating retrieval and generation quality
  • Conducting investigation across multiple tools and data sources

Long-context models provide the ability to read a lot at once. RAG provides the ability to select what should be read and manage it as citable evidence. The two are complementary, not competing.

RAG is expanding in six directions.

DirectionWhat changes
Long-context RAGRetrieval narrows the candidates; selected documents are read at length
Agentic RAGAn agent performs retrieval planning, re-retrieval, and verification
Graph RAGDocument corpora are structured as entities, relationships, and summaries
Multimodal RAGPDFs, tables, images, audio, and video are all retrieval targets
Code RAGRepositories, tests, execution logs, and history are retrieval targets
Secure RAGPermissions, auditing, and data minimisation are treated as core design

Future RAG will not merely pass a handful of short chunks to the LLM; instead, retrieval will narrow the candidates, and relevant documents will then be read in full within a long context window.

graph LR
    Q["Question"] --> R["Candidate retrieval"]
    R --> S["Document selection"]
    S --> LC["Deep reading in long context"]
    LC --> A["Answer"]

In this design, RAG’s role shifts from “keeping context short” to “selecting the documents worth reading.”

Suited for use cases such as these.

  • Reading a specific contract clause together with its surrounding context
  • Comparing multiple research papers
  • Reading related sections from a lengthy design document
  • Reading related files across a repository in one pass

For complex questions, retrieval cannot be completed in a single pass — it must unfold as an investigation.

In Agentic RAG, the agent proceeds as follows.

  1. Decompose the question.
  2. Select the required sources.
  3. Retrieve.
  4. Read the results.
  5. Change the retrieval strategy if evidence is insufficient.
  6. Check for contradictions.
  7. Answer with citations.

Research such as the 2025 Agentic RAG survey and the 2026 A-RAG paper demonstrates that RAG is moving away from fixed pipelines toward a design that leverages the LLM’s reasoning and tool-use capabilities.[1][2]

When a user asks “what is happening overall?” across a large corpus, chunk retrieval alone is insufficient.

Graph RAG extracts entities, relationships, communities, and summaries from documents, treating the entire corpus as a structured object.[3]

Going forward, it will become important for use cases such as these.

  • Theme analysis across all customer inquiries
  • Visualising the knowledge network within an organisation
  • Mapping relationships among research papers
  • Organising related clauses in legal and regulatory documents
  • Analysing the relationship between incident reports and system configuration

Graph RAG carries high construction costs, however, so it is not necessary for every RAG system. It delivers its value in domains where corpus-level and relationship questions dominate.

Business documents are not limited to text.

PDFs, tables, diagrams, screenshots, audio, and video all contain information that plain text extraction alone will lose.[4]

Data typeInformation easily lost
PDFPage structure, footnotes, tables, multi-column layout
TableRow-column relationships, units, formulas
ImageMeaning of figures, spatial arrangement, annotations
AudioSpeaker identity, pauses, emphasis
VideoTimestamps, on-screen actions, scene transitions

Future RAG must not only search OCR-extracted text but also preserve the original structure and present evidence in correspondence with that structure.

With the rise of coding agents, RAG is also becoming central to software development.[5]

Code RAG retrieves and reads the following.

  • Source code
  • Type definitions
  • Tests
  • Configuration
  • Execution logs
  • Issues
  • Pull requests
  • Commit history
  • Agent instruction files

The future of Code RAG is not mere code search but “RAG that edits and verifies.”

graph TD
    Search["Retrieve relevant code"] --> Edit["Edit"]
    Edit --> Test["Run tests"]
    Test --> Log["Read logs"]
    Log --> Search

Execution results become the next piece of context, and the agent iterates between retrieval and modification.

The more widely RAG is deployed in enterprises, the more important permissions and auditing become — sometimes surpassing raw retrieval precision in priority.

The following design elements will become standard in future RAG systems.

Design elementDescription
Permission inheritanceEnforce the source system’s access controls at retrieval time
Data minimisationPass only the necessary evidence to the LLM
Citation auditPreserve the mapping between answers and their supporting evidence
Index freshnessReflect updates, deletions, and expirations
Prompt injection mitigationNever treat retrieved documents as commands
Log controlAvoid over-retaining sensitive information

A design that puts everything into a vector database without addressing permissions and updates creates real risk. Going forward, architectures that query the original data source at retrieval time — preserving source-level permissions — will become more common.

For an agent, RAG is external memory.

This is not simply long-term storage, however.

  • Memory that consults authoritative documents
  • Memory that consults past work history
  • Memory controlled by the user’s permissions
  • Memory that carries verifiable citations
  • Memory that can be updated and deleted

When a human does their job, they do not memorise everything — they locate the relevant materials, read them, take notes, and cite their sources. RAG provides agents with that same working capability.

Even as RAG evolves, certain challenges persist.

ChallengeWhy it is difficult
Retrieval failureWithout the needed evidence, correct answers are impossible
Contradictory evidenceMultiple sources may contain conflicting information
Evaluation difficultyReal-world questions often have more than one valid answer
CostRetrieval, reranking, long-context reading, and verification all add expense
LatencyAgentic designs naturally involve multiple steps
PermissionsDifferent users should see different information
FreshnessA stale index leads to incorrect answers

Evaluation in particular will remain critical. RAG improvements should be measured not by intuition but by retrieval recall, evidence faithfulness, citation accuracy, and task success rate in actual workflows.

For anyone designing RAG systems from this point forward, the following framing is realistic.

  1. For simple FAQ, Advanced RAG is sufficient.
  2. For reading long documents, combine Long-context RAG.
  3. For corpus-level trend questions, consider Graph RAG.
  4. For complex investigation, use Agentic RAG.
  5. For software development, design it as Code RAG.
  6. For enterprise deployment, build in permissions, auditing, and evaluation from the start.
  • RAG is not disappearing; it is evolving into a foundation for information access, permission management, verification, and agent action.
  • Long-context models are not a replacement for RAG but a complement: they read deeply the documents that RAG selects.
  • Agentic RAG and Code RAG expand RAG from “answer generation” to “task execution.”
  • What matters in future RAG is not only retrieval precision but also evidence quality, permissions, freshness, evaluation, and auditing.
  1. Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
  2. A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces
  3. From Local to Global: A Graph RAG Approach to Query-Focused Summarization
  4. MMA-RAG: A Survey on Multimodal Agentic Retrieval-Augmented Generation
  5. SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents