Skip to content
LinkedInX

Reasoning Models

About 5 minutes

Prerequisites: Must have read Transformer Models and BERT vs. GPT

A reasoning model is an AI model or model mode designed to spend more compute on multi-step problem solving before returning a final answer. Chain-of-Thought research showed that prompting models to produce intermediate reasoning steps can improve performance on some complex reasoning tasks, and later reasoning-oriented models build on related ideas with training and inference-time compute.[1]

A standard LLM probabilistically predicts tokens for a given prompt, outputting them from left to right. This approach is well-suited for fluent text generation and general knowledge answers, but has limitations for complex logical reasoning, mathematical calculation, and multi-step problem solving.

Examples of Problems That Standard LLMs Struggle With

Section titled “Examples of Problems That Standard LLMs Struggle With”
Problem: "When a certain integer is multiplied by 3 and 7 is added, the result is 40. What is the integer?"

Issues with standard LLMs:
- May output the correct answer (11), but can also make arithmetic or setup errors
- The reasoning process behind the answer may be unclear
- Reliability can drop as problems become more complex

LLMs generate tokens probabilistically, so tasks requiring systematic logical reasoning need careful prompting, verification, or a reasoning-oriented model.

Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit

Section titled “Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit”

Chain-of-Thought (CoT) is a technique for improving LLM reasoning accuracy by having the model describe its problem-solving process step by step before outputting a final answer.[1]

The Difference Between Just Outputting the Answer vs. Showing the Thought Process

Section titled “The Difference Between Just Outputting the Answer vs. Showing the Thought Process”
[Standard output]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Answer: "2"

[Using Chain-of-Thought]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Thought process:
  Step 1: Initial number of apples = 5
  Step 2: Apples eaten = 3
  Step 3: Remaining apples = 5 - 3 = 2
Answer: "2"

The difference is hard to see in simple examples, but the original CoT paper reports gains on several multi-step reasoning benchmarks.[1]

CoT can be elicited by prompt design in some models, but behavior varies by model. Reasoning-oriented models may instead expose reasoning controls or use internal reasoning steps without showing the full hidden chain.

Reasoning-oriented models can incorporate reasoning behavior through training, reinforcement learning, and inference-time compute allocation.[2][3]

Optimizing the Thinking Chain via Reinforcement Learning

Section titled “Optimizing the Thinking Chain via Reinforcement Learning”

Some reasoning model training uses reinforcement learning. DeepSeek-R1, for example, reports using large-scale reinforcement learning to incentivize reasoning capability, with additional training stages to improve readability and performance.[3]

graph LR
    Q["Problem Input"]
    Think["Internal thinking steps\n(trial, verification, correction)"]
    Answer["Final answer"]
    Reward["Correct → reward\nIncorrect → penalty"]
    Update["Update model weights\n(toward better thinking patterns)"]

    Q --> Think --> Answer --> Reward --> Update
    Update -.->|"Next problem"| Q

Reasoning Model Examples and Current Specs

Section titled “Reasoning Model Examples and Current Specs”

OpenAI, Anthropic, Google, DeepSeek, and other providers may expose reasoning-oriented models or modes. Current model names, limits, pricing, and API behavior change over time, so use official model documentation when choosing a current provider model.[4][5][6]

DeepSeek-R1 is a useful public research example because its paper describes reinforcement learning for reasoning behavior and releases related open-weight models.[3] Treat benchmark claims as task-specific signals rather than universal rankings.

Processing Flow: Standard LLM vs. Reasoning Model

Section titled “Processing Flow: Standard LLM vs. Reasoning Model”
graph TB
    subgraph Normal["Standard LLM"]
        NI["Input prompt"] --> NO["Output (direct generation)"]
    end

    subgraph Reasoning["Reasoning Model"]
        RI["Input prompt"]
        RT1["Thinking step 1\nDecompose the problem"]
        RT2["Thinking step 2\nForm and test a hypothesis"]
        RT3["Thinking step 3\nFind and correct errors"]
        RN["Thinking step N\n..."]
        RO["Output final answer"]

        RI --> RT1 --> RT2 --> RT3 --> RN --> RO
    end

Comparison: Standard LLM vs. Reasoning Model

Section titled “Comparison: Standard LLM vs. Reasoning Model”
ComparisonStandard LLMReasoning-oriented model or mode
Response speedUsually optimized for latencyOften slower because more compute is spent
CostOften lowerOften higher because more compute is spent
Simple tasksSufficient accuracyOverpowered
Complex logical reasoningMay need prompting and verificationOften designed for these tasks
Math/proof problemsMay be unreliable without checksOften better suited, but still needs verification
Transparency of thought processModel-dependentModel-dependent; hidden reasoning may not be exposed

Tasks Reasoning Models Excel and Struggle At

Section titled “Tasks Reasoning Models Excel and Struggle At”
  • Mathematics, statistics, and proofs: Multi-step calculations, mathematical proofs, statistical reasoning
  • Complex programming: Bug identification and fixing, algorithm optimization
  • Logic puzzles and deduction problems: Organizing multiple conditions to reach a consistent answer
  • Scientific analysis: Interpreting experimental data, hypothesis testing

Tasks They’re Not Suited For (Where Standard LLMs Are Better)

Section titled “Tasks They’re Not Suited For (Where Standard LLMs Are Better)”
  • Real-time dialogue where speed matters: Chatbots, customer support
  • Short summarization and translation: Simple conversion tasks
  • Creative content generation: Poetry, stories, marketing copy
  • Bulk processing where cost efficiency matters: Processing large volumes of documents
graph TD
    Task["What is the nature of the task?"]
    Task -->|"Complex reasoning / calculation required"| R["Consider a reasoning-oriented model or mode"]
    Task -->|"Speed / cost-focused general tasks"| N["Use a general or lightweight model"]
    R --> Check["Check budget and latency"]
    Check -->|"Need current specs"| Docs["Check official model docs"]
    Check -->|"Need proof of quality"| Eval["Run a task-specific evaluation"]

When to choose a reasoning model:

  • Solving calculation problems in mathematics, physics, or chemistry
  • Fixing bugs in complex code
  • Planning and optimization problems with multiple constraints
  • When high-accuracy judgment is required with no tolerance for errors

When to choose a standard LLM:

  • Summarizing and classifying large volumes of emails
  • Responding in real-time in a chatbot
  • Generating blog posts or marketing copy
  • Processing large numbers of requests while keeping API costs down
  • A reasoning model or mode spends extra compute on multi-step problem solving
  • CoT prompting can help some models, while reasoning-oriented models may use training and inference-time compute
  • Response speed and cost can be higher, so use the right tool for the task
  • Current model names and specs should be checked in official provider documentation

Q: Are reasoning models always better than standard LLMs?

A: No. They can be better for complex reasoning tasks, but simple tasks or speed-focused use cases may be better served by a general or lightweight model.

Q: Is the “thought process” of a reasoning model really thinking like a human?

A: It is different from human thinking. Model reasoning traces are generated text or hidden internal computation, not evidence of conscious understanding. Treat them as a problem-solving mechanism that still needs verification.

Q: Is DeepSeek R1 always better because it is open-weight?

A: No. Open-weight availability can help with self-hosting and inspection, but quality, safety, latency, and operating cost still need task-specific evaluation.[3]

Q: Can Chain-of-Thought be used with standard LLMs by specifying it in the prompt?

A: Sometimes. CoT prompting can help some models on multi-step reasoning tasks, but results vary and hidden reasoning policies differ by provider.[1]

  1. Jason Wei et al., Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, January 28, 2022
  2. Long Ouyang et al., Training language models to follow instructions with human feedback, March 4, 2022
  3. DeepSeek-AI et al., DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning, January 22, 2025
  4. OpenAI, Models
  5. Anthropic, Claude models overview
  6. Google AI for Developers, Gemini models