Skip to content
X

Reasoning Models

A reasoning model is an AI model that executes multiple internal thinking steps before outputting an answer — solving problems incrementally before returning a final result. Since 2024, these models have demonstrated dramatically higher accuracy than standard LLMs on complex problems in mathematics, programming, and logic puzzles, and practical deployment has been advancing rapidly.

Target audience: Those who understand LLM basics (token prediction, Transformer) and want to know how reasoning models specifically work.

Estimated learning time: 25 minutes to read

Prerequisites: Must have read Transformer Models and BERT vs. GPT

A standard LLM probabilistically predicts tokens for a given prompt, outputting them from left to right. This approach is well-suited for fluent text generation and general knowledge answers, but has limitations for complex logical reasoning, mathematical calculation, and multi-step problem solving.

Examples of Problems That Standard LLMs Struggle With

Section titled “Examples of Problems That Standard LLMs Struggle With”
Problem: "When a certain integer is multiplied by 3 and 7 is added, the result is 40. What is the integer?"

Issues with standard LLMs:
- May output the correct answer (11), but errors are frequent
- The reasoning process behind the answer is unclear
- Accuracy drops sharply as problems become more complex

LLMs operate through probabilistic pattern matching, so they can handle problems similar to their training data but are unreliable for problems requiring systematic logical reasoning.

Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit

Section titled “Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit”

Chain-of-Thought (CoT) is a technique for improving LLM reasoning accuracy by having the model describe its problem-solving process step by step before outputting a final answer.

The Difference Between Just Outputting the Answer vs. Showing the Thought Process

Section titled “The Difference Between Just Outputting the Answer vs. Showing the Thought Process”
[Standard output]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Answer: "2"

[Using Chain-of-Thought]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Thought process:
  Step 1: Initial number of apples = 5
  Step 2: Apples eaten = 3
  Step 3: Remaining apples = 5 - 3 = 2
Answer: "2"

The difference is hard to see in simple examples, but for problems requiring multi-step reasoning (mathematical proofs, program debugging, logic puzzles, etc.), many studies have shown that CoT significantly improves accuracy.

CoT takes effect simply by adding “Let’s think step by step” to the prompt. Reasoning models are optimized to perform this process automatically within the model.

Reasoning models achieve CoT not as a mere prompting technique, but by incorporating it into the model’s training process.

Optimizing the Thinking Chain via Reinforcement Learning

Section titled “Optimizing the Thinking Chain via Reinforcement Learning”

Reasoning model training uses reinforcement learning. The model repeatedly receives rewards when the answer it outputs through thinking steps is correct, and penalties when it’s wrong, training it to learn “thinking patterns that accurately solve problems.”

graph LR
    Q["Problem Input"]
    Think["Internal thinking steps\n(trial, verification, correction)"]
    Answer["Final answer"]
    Reward["Correct → reward\nIncorrect → penalty"]
    Update["Update model weights\n(toward better thinking patterns)"]

    Q --> Think --> Answer --> Reward --> Update
    Update -.->|"Next problem"| Q

OpenAI o1 is the first full-scale reasoning model, announced by OpenAI in September 2024. It internally generates “thinking tokens” before the answer, then outputs the final response based on that thought process.

  • Dramatically improved accuracy on AIME (math olympiad) problems compared to standard LLMs
  • Thinking chain optimized through reinforcement learning
  • Response time is longer than standard LLMs, but accuracy on complex problems is high

OpenAI o3 is the successor to o1. Reasoning capabilities are further enhanced, with improved accuracy on complex coding, scientific reasoning, and logic puzzles.

DeepSeek R1 is an open-source reasoning model published in 2025 by Chinese AI company DeepSeek.

  • Achieves reasoning capability comparable to o1 at dramatically lower cost
  • Model weights are published as open-source, enabling self-hosting
  • Had a major industry impact from the perspective of reducing inference costs

Claude 3.7 Sonnet — Extended Thinking (Anthropic)

Section titled “Claude 3.7 Sonnet — Extended Thinking (Anthropic)”

Claude 3.7 Sonnet is Anthropic’s model featuring Extended Thinking.

  • Thinking time (budget_tokens) can be set via API parameters
  • Has an option to disclose the thinking process to the user
  • Demonstrates high accuracy on complex coding and analysis tasks

Processing Flow: Standard LLM vs. Reasoning Model

Section titled “Processing Flow: Standard LLM vs. Reasoning Model”
graph TB
    subgraph Normal["Standard LLM"]
        NI["Input prompt"] --> NO["Output (direct generation)"]
    end

    subgraph Reasoning["Reasoning Model"]
        RI["Input prompt"]
        RT1["Thinking step 1\nDecompose the problem"]
        RT2["Thinking step 2\nForm and test a hypothesis"]
        RT3["Thinking step 3\nFind and correct errors"]
        RN["Thinking step N\n..."]
        RO["Output final answer"]

        RI --> RT1 --> RT2 --> RT3 --> RN --> RO
    end

Comparison: Standard LLM vs. Reasoning Model

Section titled “Comparison: Standard LLM vs. Reasoning Model”
ComparisonStandard LLMReasoning Model
Response speedFast (seconds)Slow (tens of seconds to minutes)
CostLowHigh (due to thinking tokens)
Simple tasksSufficient accuracyOverpowered
Complex logical reasoningLow accuracyHigh accuracy
Math/proof problemsLow reliabilityHigh reliability
Long code generationVariable qualityConsistent quality
Transparency of thought processNoneYes (model-dependent)
Representative modelsGPT-4o, Claude 3.5 Sonneto1, o3, DeepSeek R1, Claude 3.7

Tasks Reasoning Models Excel and Struggle At

Section titled “Tasks Reasoning Models Excel and Struggle At”
  • Mathematics, statistics, and proofs: Multi-step calculations, mathematical proofs, statistical reasoning
  • Complex programming: Bug identification and fixing, algorithm optimization
  • Logic puzzles and deduction problems: Organizing multiple conditions to reach a consistent answer
  • Scientific analysis: Interpreting experimental data, hypothesis testing

Tasks They’re Not Suited For (Where Standard LLMs Are Better)

Section titled “Tasks They’re Not Suited For (Where Standard LLMs Are Better)”
  • Real-time dialogue where speed matters: Chatbots, customer support
  • Short summarization and translation: Simple conversion tasks
  • Creative content generation: Poetry, stories, marketing copy
  • Bulk processing where cost efficiency matters: Processing large volumes of documents
graph TD
    Task["What is the nature of the task?"]
    Task -->|"Complex reasoning / calculation required"| R["Use reasoning model\no1 / o3 / DeepSeek R1"]
    Task -->|"Speed / cost-focused general tasks"| N["Use standard LLM\nGPT-4o / Claude 3.5 Sonnet"]
    R --> Check["Check budget and latency"]
    Check -->|"Cost is top priority"| DS["DeepSeek R1 (OSS)"]
    Check -->|"Accuracy is top priority"| O3["OpenAI o3"]
    Check -->|"Transparency is a priority"| C37["Claude 3.7 Extended Thinking"]

When to choose a reasoning model:

  • Solving calculation problems in mathematics, physics, or chemistry
  • Fixing bugs in complex code
  • Planning and optimization problems with multiple constraints
  • When high-accuracy judgment is required with no tolerance for errors

When to choose a standard LLM:

  • Summarizing and classifying large volumes of emails
  • Responding in real-time in a chatbot
  • Generating blog posts or marketing copy
  • Processing large numbers of requests while keeping API costs down
  • A reasoning model is a type of LLM that improves accuracy by executing internal thinking steps before producing an answer
  • Because it optimizes CoT at the training level, it’s strong on complex logic problems
  • Response speed and cost are higher than standard LLMs, so using the right tool for the right task is important
  • Key models: OpenAI o1/o3 (high accuracy), DeepSeek R1 (OSS, low cost), Claude 3.7 Extended Thinking (transparency)

Q: Are reasoning models always better than standard LLMs?

A: For complex reasoning tasks specifically, yes — but not in every situation. For simple tasks or speed-focused use cases, standard LLMs are more appropriate. Response times can reach tens of seconds, so applying them to real-time dialogue is sometimes impractical.

Q: Is the “thought process” of a reasoning model really thinking like a human?

A: It’s different from human thinking. The thinking steps of a reasoning model are sequences of tokens optimized through reinforcement learning to lead to correct answers. The underlying mechanism is fundamentally different from conscious human thought, but it produces output that functionally resembles problem-solving steps.

Q: Is DeepSeek R1 on par with OpenAI o1?

A: It varies by benchmark. On math and coding tasks, it demonstrates accuracy comparable to o1; on reasoning tasks centered on English, differences can appear. In terms of cost efficiency (open-source, self-hostable), DeepSeek R1 has a significant advantage.

Q: Can Chain-of-Thought be used with standard LLMs by specifying it in the prompt?

A: Yes. Even with standard LLMs, adding “Let’s think step by step” to the prompt can produce an effect similar to CoT. However, since reasoning models optimize CoT at the training level, they deliver more stable, higher accuracy even with the same prompt.


Next step: What Is Generative AI? (Back to overview)