Skip to content
LinkedInX

RAG Architecture Patterns

About 10 minutes

Prerequisites: What is RAG

RAG is not simply “run a vector search and hand the results to an LLM.” The right architecture depends on document volume, question complexity, permission requirements, update frequency, and how strictly sources must be cited.[1]

RAG architectures can be broadly grouped by complexity.

PatternCharacteristicsSuited for
Naive RAGOne retrieval pass, answer directly from resultsSmall-scale FAQ, prototypes
Advanced RAGPre- and post-retrieval correction stepsInternal document search, support bots
Modular RAGRetrieval, compression, re-retrieval, and verification as interchangeable componentsProduction LLM applications
Graph RAGExtract entities, relationships, and summaries from documentsOverall trends, relationships, organisational knowledge
Multimodal RAGRetrieval targets include non-text contentPDFs, diagrams, images, video, audio
Agentic RAGAn agent plans retrieval actionsComplex investigation, code understanding, multi-step workflows

This page covers all patterns except Agentic RAG, which is explained in detail on a separate page.

Naive RAG is the most basic form of RAG.

graph LR
    Q["Question"] --> E["Embed question"]
    E --> V["Vector DB search"]
    V --> C["Top-k chunks"]
    C --> P["Insert into prompt"]
    P --> L["LLM answer"]
  • Simple to implement
  • Easy to prototype
  • Works well enough for small, well-structured document sets
  • Good for learning the fundamentals of RAG
  • Poor retrieval queries lead to poor answers
  • Product codes, proper nouns, and error messages are often missed
  • Top-k results can include irrelevant documents
  • Insufficient retrieval results are passed to the LLM without correction
  • Document permissions and freshness are difficult to handle

Naive RAG is a valid first step, but its limits become apparent quickly in production.

Advanced RAG adds correction steps before and after retrieval.

graph TD
    Q["Question"] --> QR["Query rewriting"]
    QR --> H["Hybrid search"]
    H --> R["Reranking"]
    R --> CC["Context compression"]
    CC --> G["Grounded generation"]
    G --> A["Answer with citations"]

Pre-retrieval steps make the user’s question more retrievable.

ImprovementDescription
Query rewritingUse conversation history to restore omitted subjects
Multi-query generationGenerate multiple search terms from one question
HyDEGenerate a hypothetical answer and use it as the retrieval query
Metadata inferenceInfer filter conditions such as language, product, date, or department

During retrieval, different search methods are combined.

Search methodStrong atWeak at
Vector searchSemantic similarity, paraphrasingExact string matching
Keyword searchProduct codes, proper nouns, error messagesSemantic paraphrasing
Hybrid searchComplementing bothScore calibration is required
Metadata-filtered searchPermissions, dates, language, categoryRequires well-maintained metadata

In practice, combining vector search with a keyword method such as BM25 is common.

Post-retrieval steps refine the evidence passed to the LLM.

ImprovementDescription
RerankingRe-order retrieval candidates by relevance to the question
DeduplicationRemove chunks that contain duplicate content
Context compressionStrip irrelevant sentences, keeping only the evidence
Evidence evaluationAssess whether the retrieved results can actually answer the question

Passing raw retrieval results without filtering causes the LLM to be distracted by noise. In RAG, passing the right evidence matters more than passing more evidence.

Modular RAG treats RAG not as a fixed pipeline but as a set of interchangeable components.

graph TD
    Q["Question"] --> Router["Router"]
    Router --> SearchA["Product docs search"]
    Router --> SearchB["FAQ search"]
    Router --> SearchC["Ticket search"]
    SearchA --> Merge["Candidate merging"]
    SearchB --> Merge
    SearchC --> Merge
    Merge --> Eval["Evidence evaluation"]
    Eval -->|Sufficient| Gen["Answer generation"]
    Eval -->|Insufficient| Retry["Re-retrieve or ask for clarification"]

Modular RAG separates components such as these.

  • Query rewriter
  • Source router
  • Retriever
  • Reranker
  • Context compressor
  • Evidence evaluator
  • Generator
  • Citation formatter
  • Guardrail

This design is advantageous in production because when quality degrades it is easier to isolate whether the problem lies in the retriever, the reranker, or the generation prompt.

Graph RAG treats documents not as a flat collection of chunks but as a network of entities and relationships.[3]

graph TD
    D["Document corpus"] --> EX["Entity & relationship extraction"]
    EX --> KG["Knowledge graph"]
    KG --> CS["Community summaries"]
    Q["Question"] --> RET["Relevant node & summary retrieval"]
    CS --> RET
    KG --> RET
    RET --> A["Answer covering the whole corpus"]

Graph RAG is well suited to questions such as these.

  • What are the main themes in this document corpus?
  • Which product areas do customer requests concentrate on?
  • Which departments are associated with which risks?
  • Across multiple sets of meeting notes, what is the decision-making flow?

Conventional RAG excels at finding the specific passage that contains an answer, while Graph RAG excels at reading the structure of an entire document corpus.[3]

Graph RAG is powerful but carries higher upfront construction costs.

  • Entity extraction quality must be actively managed.
  • A mechanism for keeping the graph up to date is required.
  • Incorrect relationship extraction can degrade answer quality.
  • It is often over-engineered for simple FAQ use cases.

Rather than adopting Graph RAG from the start, first confirm that corpus-level and relationship questions genuinely dominate the use case before committing to the investment.

Multimodal RAG extends retrieval to content beyond plain text.

Content typeExamplesDesign considerations
PDFContracts, papers, specificationsPreserve layout, tables, and footnotes
TablesCSV, spreadsheetsPreserve row-column relationships and units
ImagesDiagrams, screenshotsCombine OCR with image understanding
AudioMeeting recordings, callsHandle transcription and speaker attribution
VideoLectures, screen recordingsTimestamp-based and scene-level retrieval is needed

In practice, extracting plain text from a PDF is often insufficient. When table column relationships, figure captions, page numbers, or heading hierarchies are lost, the meaning of retrieved evidence changes.[4]

There is no need to start with a complex RAG system. Expanding incrementally in the following order makes it easier to isolate failure causes.

  1. Build the minimum viable system with Naive RAG.
  2. If retrieval failures are frequent, add hybrid search.
  3. If noise is a problem, add a reranker.
  4. If documents are long, add hierarchical chunking or context compression.[2]
  5. If corpus-level questions are common, consider Graph RAG.[3]
  6. If multi-step investigation is needed, consider Agentic RAG.
Section titled “Minimum recommended production architecture”

For a business deployment, including the following elements from the start is advisable.

ElementReason
MetadataTo filter by permission, date, language, and document type
Hybrid searchTo handle both proper nouns and semantic queries
RerankingTo improve the quality of evidence passed to the LLM
CitationsTo let users verify answers
Refusal on retrieval failureTo avoid generating answers without evidence
Evaluation setTo measure whether improvements actually work
  • Naive RAG is suitable for learning and prototyping but shows weaknesses in production.
  • Advanced RAG improves production quality through pre-retrieval, in-retrieval, and post-retrieval correction.
  • Modular RAG makes it easier to isolate failure causes and swap components in production.
  • Graph RAG is suited to questions about themes and relationships across an entire document corpus.[3]
  • In Multimodal RAG, preserving the structure of PDFs, tables, and images is critical.[4]
  1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
  2. RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
  3. From Local to Global: A Graph RAG Approach to Query-Focused Summarization
  4. Retrieval-Augmented Generation for AI-Generated Content: A Survey