RAG Architecture Patterns

About 10 minutes

RAG is not simply “run a vector search and hand the results to an LLM.” The right architecture depends on document volume, question complexity, permission requirements, update frequency, and how strictly sources must be cited.[1]

Overview

RAG architectures can be broadly grouped by complexity.

Pattern	Characteristics	Suited for
Naive RAG	One retrieval pass, answer directly from results	Small-scale FAQ, prototypes
Advanced RAG	Pre- and post-retrieval correction steps	Internal document search, support bots
Modular RAG	Retrieval, compression, re-retrieval, and verification as interchangeable components	Production LLM applications
Graph RAG	Extract entities, relationships, and summaries from documents	Overall trends, relationships, organisational knowledge
Multimodal RAG	Retrieval targets include non-text content	PDFs, diagrams, images, video, audio
Agentic RAG	An agent plans retrieval actions	Complex investigation, code understanding, multi-step workflows

This page covers all patterns except Agentic RAG, which is explained in detail on a separate page.

Naive RAG

Naive RAG is the most basic form of RAG.

graph LR
    Q["Question"] --> E["Embed question"]
    E --> V["Vector DB search"]
    V --> C["Top-k chunks"]
    C --> P["Insert into prompt"]
    P --> L["LLM answer"]

Strengths

Simple to implement
Easy to prototype
Works well enough for small, well-structured document sets
Good for learning the fundamentals of RAG

Weaknesses

Poor retrieval queries lead to poor answers
Product codes, proper nouns, and error messages are often missed
Top-k results can include irrelevant documents
Insufficient retrieval results are passed to the LLM without correction
Document permissions and freshness are difficult to handle

Naive RAG is a valid first step, but its limits become apparent quickly in production.

Advanced RAG

Advanced RAG adds correction steps before and after retrieval.

graph TD
    Q["Question"] --> QR["Query rewriting"]
    QR --> H["Hybrid search"]
    H --> R["Reranking"]
    R --> CC["Context compression"]
    CC --> G["Grounded generation"]
    G --> A["Answer with citations"]

Pre-retrieval improvements

Pre-retrieval steps make the user’s question more retrievable.

Improvement	Description
Query rewriting	Use conversation history to restore omitted subjects
Multi-query generation	Generate multiple search terms from one question
HyDE	Generate a hypothetical answer and use it as the retrieval query
Metadata inference	Infer filter conditions such as language, product, date, or department

In-retrieval improvements

During retrieval, different search methods are combined.

Search method	Strong at	Weak at
Vector search	Semantic similarity, paraphrasing	Exact string matching
Keyword search	Product codes, proper nouns, error messages	Semantic paraphrasing
Hybrid search	Complementing both	Score calibration is required
Metadata-filtered search	Permissions, dates, language, category	Requires well-maintained metadata

In practice, combining vector search with a keyword method such as BM25 is common.

Post-retrieval improvements

Post-retrieval steps refine the evidence passed to the LLM.

Improvement	Description
Reranking	Re-order retrieval candidates by relevance to the question
Deduplication	Remove chunks that contain duplicate content
Context compression	Strip irrelevant sentences, keeping only the evidence
Evidence evaluation	Assess whether the retrieved results can actually answer the question

Passing raw retrieval results without filtering causes the LLM to be distracted by noise. In RAG, passing the right evidence matters more than passing more evidence.

Modular RAG

Modular RAG treats RAG not as a fixed pipeline but as a set of interchangeable components.

graph TD
    Q["Question"] --> Router["Router"]
    Router --> SearchA["Product docs search"]
    Router --> SearchB["FAQ search"]
    Router --> SearchC["Ticket search"]
    SearchA --> Merge["Candidate merging"]
    SearchB --> Merge
    SearchC --> Merge
    Merge --> Eval["Evidence evaluation"]
    Eval -->|Sufficient| Gen["Answer generation"]
    Eval -->|Insufficient| Retry["Re-retrieve or ask for clarification"]

Modular RAG separates components such as these.

Query rewriter
Source router
Retriever
Reranker
Context compressor
Evidence evaluator
Generator
Citation formatter
Guardrail

This design is advantageous in production because when quality degrades it is easier to isolate whether the problem lies in the retriever, the reranker, or the generation prompt.

Graph RAG

Graph RAG treats documents not as a flat collection of chunks but as a network of entities and relationships.[3]

graph TD
    D["Document corpus"] --> EX["Entity & relationship extraction"]
    EX --> KG["Knowledge graph"]
    KG --> CS["Community summaries"]
    Q["Question"] --> RET["Relevant node & summary retrieval"]
    CS --> RET
    KG --> RET
    RET --> A["Answer covering the whole corpus"]

Graph RAG is well suited to questions such as these.

What are the main themes in this document corpus?
Which product areas do customer requests concentrate on?
Which departments are associated with which risks?
Across multiple sets of meeting notes, what is the decision-making flow?

Conventional RAG excels at finding the specific passage that contains an answer, while Graph RAG excels at reading the structure of an entire document corpus.[3]

Caveats

Graph RAG is powerful but carries higher upfront construction costs.

Entity extraction quality must be actively managed.
A mechanism for keeping the graph up to date is required.
Incorrect relationship extraction can degrade answer quality.
It is often over-engineered for simple FAQ use cases.

Rather than adopting Graph RAG from the start, first confirm that corpus-level and relationship questions genuinely dominate the use case before committing to the investment.

Multimodal RAG

Multimodal RAG extends retrieval to content beyond plain text.

Content type	Examples	Design considerations
PDF	Contracts, papers, specifications	Preserve layout, tables, and footnotes
Tables	CSV, spreadsheets	Preserve row-column relationships and units
Images	Diagrams, screenshots	Combine OCR with image understanding
Audio	Meeting recordings, calls	Handle transcription and speaker attribution
Video	Lectures, screen recordings	Timestamp-based and scene-level retrieval is needed

In practice, extracting plain text from a PDF is often insufficient. When table column relationships, figure captions, page numbers, or heading hierarchies are lost, the meaning of retrieved evidence changes.[4]

Choosing the right pattern

There is no need to start with a complex RAG system. Expanding incrementally in the following order makes it easier to isolate failure causes.

Build the minimum viable system with Naive RAG.
If retrieval failures are frequent, add hybrid search.
If noise is a problem, add a reranker.
If documents are long, add hierarchical chunking or context compression.[2]
If corpus-level questions are common, consider Graph RAG.[3]
If multi-step investigation is needed, consider Agentic RAG.

Minimum recommended production architecture

For a business deployment, including the following elements from the start is advisable.

Element	Reason
Metadata	To filter by permission, date, language, and document type
Hybrid search	To handle both proper nouns and semantic queries
Reranking	To improve the quality of evidence passed to the LLM
Citations	To let users verify answers
Refusal on retrieval failure	To avoid generating answers without evidence
Evaluation set	To measure whether improvements actually work

Summary

Naive RAG is suitable for learning and prototyping but shows weaknesses in production.
Advanced RAG improves production quality through pre-retrieval, in-retrieval, and post-retrieval correction.
Modular RAG makes it easier to isolate failure causes and swap components in production.
Graph RAG is suited to questions about themes and relationships across an entire document corpus.[3]
In Multimodal RAG, preserving the structure of PDFs, tables, and images is critical.[4]

References

What is Agentic RAG?

The History of RAG