RAG is not simply “run a vector search and hand the results to an LLM.” The right architecture depends on document volume, question complexity, permission requirements, update frequency, and how strictly sources must be cited.[1]
Overview
Section titled “Overview”RAG architectures can be broadly grouped by complexity.
| Pattern | Characteristics | Suited for |
|---|---|---|
| Naive RAG | One retrieval pass, answer directly from results | Small-scale FAQ, prototypes |
| Advanced RAG | Pre- and post-retrieval correction steps | Internal document search, support bots |
| Modular RAG | Retrieval, compression, re-retrieval, and verification as interchangeable components | Production LLM applications |
| Graph RAG | Extract entities, relationships, and summaries from documents | Overall trends, relationships, organisational knowledge |
| Multimodal RAG | Retrieval targets include non-text content | PDFs, diagrams, images, video, audio |
| Agentic RAG | An agent plans retrieval actions | Complex investigation, code understanding, multi-step workflows |
This page covers all patterns except Agentic RAG, which is explained in detail on a separate page.
Naive RAG
Section titled “Naive RAG”Naive RAG is the most basic form of RAG.
graph LR
Q["Question"] --> E["Embed question"]
E --> V["Vector DB search"]
V --> C["Top-k chunks"]
C --> P["Insert into prompt"]
P --> L["LLM answer"]Strengths
Section titled “Strengths”- Simple to implement
- Easy to prototype
- Works well enough for small, well-structured document sets
- Good for learning the fundamentals of RAG
Weaknesses
Section titled “Weaknesses”- Poor retrieval queries lead to poor answers
- Product codes, proper nouns, and error messages are often missed
- Top-k results can include irrelevant documents
- Insufficient retrieval results are passed to the LLM without correction
- Document permissions and freshness are difficult to handle
Naive RAG is a valid first step, but its limits become apparent quickly in production.
Advanced RAG
Section titled “Advanced RAG”Advanced RAG adds correction steps before and after retrieval.
graph TD
Q["Question"] --> QR["Query rewriting"]
QR --> H["Hybrid search"]
H --> R["Reranking"]
R --> CC["Context compression"]
CC --> G["Grounded generation"]
G --> A["Answer with citations"]Pre-retrieval improvements
Section titled “Pre-retrieval improvements”Pre-retrieval steps make the user’s question more retrievable.
| Improvement | Description |
|---|---|
| Query rewriting | Use conversation history to restore omitted subjects |
| Multi-query generation | Generate multiple search terms from one question |
| HyDE | Generate a hypothetical answer and use it as the retrieval query |
| Metadata inference | Infer filter conditions such as language, product, date, or department |
In-retrieval improvements
Section titled “In-retrieval improvements”During retrieval, different search methods are combined.
| Search method | Strong at | Weak at |
|---|---|---|
| Vector search | Semantic similarity, paraphrasing | Exact string matching |
| Keyword search | Product codes, proper nouns, error messages | Semantic paraphrasing |
| Hybrid search | Complementing both | Score calibration is required |
| Metadata-filtered search | Permissions, dates, language, category | Requires well-maintained metadata |
In practice, combining vector search with a keyword method such as BM25 is common.
Post-retrieval improvements
Section titled “Post-retrieval improvements”Post-retrieval steps refine the evidence passed to the LLM.
| Improvement | Description |
|---|---|
| Reranking | Re-order retrieval candidates by relevance to the question |
| Deduplication | Remove chunks that contain duplicate content |
| Context compression | Strip irrelevant sentences, keeping only the evidence |
| Evidence evaluation | Assess whether the retrieved results can actually answer the question |
Passing raw retrieval results without filtering causes the LLM to be distracted by noise. In RAG, passing the right evidence matters more than passing more evidence.
Modular RAG
Section titled “Modular RAG”Modular RAG treats RAG not as a fixed pipeline but as a set of interchangeable components.
graph TD
Q["Question"] --> Router["Router"]
Router --> SearchA["Product docs search"]
Router --> SearchB["FAQ search"]
Router --> SearchC["Ticket search"]
SearchA --> Merge["Candidate merging"]
SearchB --> Merge
SearchC --> Merge
Merge --> Eval["Evidence evaluation"]
Eval -->|Sufficient| Gen["Answer generation"]
Eval -->|Insufficient| Retry["Re-retrieve or ask for clarification"]Modular RAG separates components such as these.
- Query rewriter
- Source router
- Retriever
- Reranker
- Context compressor
- Evidence evaluator
- Generator
- Citation formatter
- Guardrail
This design is advantageous in production because when quality degrades it is easier to isolate whether the problem lies in the retriever, the reranker, or the generation prompt.
Graph RAG
Section titled “Graph RAG”Graph RAG treats documents not as a flat collection of chunks but as a network of entities and relationships.[3]
graph TD
D["Document corpus"] --> EX["Entity & relationship extraction"]
EX --> KG["Knowledge graph"]
KG --> CS["Community summaries"]
Q["Question"] --> RET["Relevant node & summary retrieval"]
CS --> RET
KG --> RET
RET --> A["Answer covering the whole corpus"]Graph RAG is well suited to questions such as these.
- What are the main themes in this document corpus?
- Which product areas do customer requests concentrate on?
- Which departments are associated with which risks?
- Across multiple sets of meeting notes, what is the decision-making flow?
Conventional RAG excels at finding the specific passage that contains an answer, while Graph RAG excels at reading the structure of an entire document corpus.[3]
Caveats
Section titled “Caveats”Graph RAG is powerful but carries higher upfront construction costs.
- Entity extraction quality must be actively managed.
- A mechanism for keeping the graph up to date is required.
- Incorrect relationship extraction can degrade answer quality.
- It is often over-engineered for simple FAQ use cases.
Rather than adopting Graph RAG from the start, first confirm that corpus-level and relationship questions genuinely dominate the use case before committing to the investment.
Multimodal RAG
Section titled “Multimodal RAG”Multimodal RAG extends retrieval to content beyond plain text.
| Content type | Examples | Design considerations |
|---|---|---|
| Contracts, papers, specifications | Preserve layout, tables, and footnotes | |
| Tables | CSV, spreadsheets | Preserve row-column relationships and units |
| Images | Diagrams, screenshots | Combine OCR with image understanding |
| Audio | Meeting recordings, calls | Handle transcription and speaker attribution |
| Video | Lectures, screen recordings | Timestamp-based and scene-level retrieval is needed |
In practice, extracting plain text from a PDF is often insufficient. When table column relationships, figure captions, page numbers, or heading hierarchies are lost, the meaning of retrieved evidence changes.[4]
Choosing the right pattern
Section titled “Choosing the right pattern”There is no need to start with a complex RAG system. Expanding incrementally in the following order makes it easier to isolate failure causes.
- Build the minimum viable system with Naive RAG.
- If retrieval failures are frequent, add hybrid search.
- If noise is a problem, add a reranker.
- If documents are long, add hierarchical chunking or context compression.[2]
- If corpus-level questions are common, consider Graph RAG.[3]
- If multi-step investigation is needed, consider Agentic RAG.
Minimum recommended production architecture
Section titled “Minimum recommended production architecture”For a business deployment, including the following elements from the start is advisable.
| Element | Reason |
|---|---|
| Metadata | To filter by permission, date, language, and document type |
| Hybrid search | To handle both proper nouns and semantic queries |
| Reranking | To improve the quality of evidence passed to the LLM |
| Citations | To let users verify answers |
| Refusal on retrieval failure | To avoid generating answers without evidence |
| Evaluation set | To measure whether improvements actually work |
Summary
Section titled “Summary”- Naive RAG is suitable for learning and prototyping but shows weaknesses in production.
- Advanced RAG improves production quality through pre-retrieval, in-retrieval, and post-retrieval correction.
- Modular RAG makes it easier to isolate failure causes and swap components in production.
- Graph RAG is suited to questions about themes and relationships across an entire document corpus.[3]
- In Multimodal RAG, preserving the structure of PDFs, tables, and images is critical.[4]