Reasoning Models

A reasoning model is an AI model that executes multiple internal thinking steps before outputting an answer — solving problems incrementally before returning a final result. Since 2024, these models have demonstrated dramatically higher accuracy than standard LLMs on complex problems in mathematics, programming, and logic puzzles, and practical deployment has been advancing rapidly.

Target audience: Those who understand LLM basics (token prediction, Transformer) and want to know how reasoning models specifically work.

Estimated learning time: 25 minutes to read

Prerequisites: Must have read Transformer Models and BERT vs. GPT

Why Reasoning Models Were Needed

A standard LLM probabilistically predicts tokens for a given prompt, outputting them from left to right. This approach is well-suited for fluent text generation and general knowledge answers, but has limitations for complex logical reasoning, mathematical calculation, and multi-step problem solving.

Examples of Problems That Standard LLMs Struggle With

Problem: "When a certain integer is multiplied by 3 and 7 is added, the result is 40. What is the integer?"

Issues with standard LLMs:
- May output the correct answer (11), but errors are frequent
- The reasoning process behind the answer is unclear
- Accuracy drops sharply as problems become more complex

LLMs operate through probabilistic pattern matching, so they can handle problems similar to their training data but are unreliable for problems requiring systematic logical reasoning.

Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit

Chain-of-Thought (CoT) is a technique for improving LLM reasoning accuracy by having the model describe its problem-solving process step by step before outputting a final answer.

The Difference Between Just Outputting the Answer vs. Showing the Thought Process

[Standard output]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Answer: "2"

[Using Chain-of-Thought]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Thought process:
  Step 1: Initial number of apples = 5
  Step 2: Apples eaten = 3
  Step 3: Remaining apples = 5 - 3 = 2
Answer: "2"

The difference is hard to see in simple examples, but for problems requiring multi-step reasoning (mathematical proofs, program debugging, logic puzzles, etc.), many studies have shown that CoT significantly improves accuracy.

Applying CoT in Prompts

CoT takes effect simply by adding “Let’s think step by step” to the prompt. Reasoning models are optimized to perform this process automatically within the model.

How Reasoning Models Work

Reasoning models achieve CoT not as a mere prompting technique, but by incorporating it into the model’s training process.

Optimizing the Thinking Chain via Reinforcement Learning

Reasoning model training uses reinforcement learning. The model repeatedly receives rewards when the answer it outputs through thinking steps is correct, and penalties when it’s wrong, training it to learn “thinking patterns that accurately solve problems.”

graph LR
    Q["Problem Input"]
    Think["Internal thinking steps\n(trial, verification, correction)"]
    Answer["Final answer"]
    Reward["Correct → reward\nIncorrect → penalty"]
    Update["Update model weights\n(toward better thinking patterns)"]

    Q --> Think --> Answer --> Reward --> Update
    Update -.->|"Next problem"| Q

Key Reasoning Models

OpenAI o1 (September 2024)

OpenAI o1 is the first full-scale reasoning model, announced by OpenAI in September 2024. It internally generates “thinking tokens” before the answer, then outputs the final response based on that thought process.

Dramatically improved accuracy on AIME (math olympiad) problems compared to standard LLMs
Thinking chain optimized through reinforcement learning
Response time is longer than standard LLMs, but accuracy on complex problems is high

OpenAI o3 (2025)

OpenAI o3 is the successor to o1. Reasoning capabilities are further enhanced, with improved accuracy on complex coding, scientific reasoning, and logic puzzles.

DeepSeek R1 (2025)

DeepSeek R1 is an open-source reasoning model published in 2025 by Chinese AI company DeepSeek.

Achieves reasoning capability comparable to o1 at dramatically lower cost
Model weights are published as open-source, enabling self-hosting
Had a major industry impact from the perspective of reducing inference costs

Claude 3.7 Sonnet — Extended Thinking (Anthropic)

Claude 3.7 Sonnet is Anthropic’s model featuring Extended Thinking.

Thinking time (budget_tokens) can be set via API parameters
Has an option to disclose the thinking process to the user
Demonstrates high accuracy on complex coding and analysis tasks

Processing Flow: Standard LLM vs. Reasoning Model

graph TB
    subgraph Normal["Standard LLM"]
        NI["Input prompt"] --> NO["Output (direct generation)"]
    end

    subgraph Reasoning["Reasoning Model"]
        RI["Input prompt"]
        RT1["Thinking step 1\nDecompose the problem"]
        RT2["Thinking step 2\nForm and test a hypothesis"]
        RT3["Thinking step 3\nFind and correct errors"]
        RN["Thinking step N\n..."]
        RO["Output final answer"]

        RI --> RT1 --> RT2 --> RT3 --> RN --> RO
    end

Comparison: Standard LLM vs. Reasoning Model

Comparison	Standard LLM	Reasoning Model
Response speed	Fast (seconds)	Slow (tens of seconds to minutes)
Cost	Low	High (due to thinking tokens)
Simple tasks	Sufficient accuracy	Overpowered
Complex logical reasoning	Low accuracy	High accuracy
Math/proof problems	Low reliability	High reliability
Long code generation	Variable quality	Consistent quality
Transparency of thought process	None	Yes (model-dependent)
Representative models	GPT-4o, Claude 3.5 Sonnet	o1, o3, DeepSeek R1, Claude 3.7

Tasks Reasoning Models Excel and Struggle At

Tasks They Excel At

Mathematics, statistics, and proofs: Multi-step calculations, mathematical proofs, statistical reasoning
Complex programming: Bug identification and fixing, algorithm optimization
Logic puzzles and deduction problems: Organizing multiple conditions to reach a consistent answer
Scientific analysis: Interpreting experimental data, hypothesis testing

Tasks They’re Not Suited For (Where Standard LLMs Are Better)

Real-time dialogue where speed matters: Chatbots, customer support
Short summarization and translation: Simple conversion tasks
Creative content generation: Poetry, stories, marketing copy
Bulk processing where cost efficiency matters: Processing large volumes of documents

Practical Usage Guide

graph TD
    Task["What is the nature of the task?"]
    Task -->|"Complex reasoning / calculation required"| R["Use reasoning model\no1 / o3 / DeepSeek R1"]
    Task -->|"Speed / cost-focused general tasks"| N["Use standard LLM\nGPT-4o / Claude 3.5 Sonnet"]
    R --> Check["Check budget and latency"]
    Check -->|"Cost is top priority"| DS["DeepSeek R1 (OSS)"]
    Check -->|"Accuracy is top priority"| O3["OpenAI o3"]
    Check -->|"Transparency is a priority"| C37["Claude 3.7 Extended Thinking"]

When to choose a reasoning model:

Solving calculation problems in mathematics, physics, or chemistry
Fixing bugs in complex code
Planning and optimization problems with multiple constraints
When high-accuracy judgment is required with no tolerance for errors

When to choose a standard LLM:

Summarizing and classifying large volumes of emails
Responding in real-time in a chatbot
Generating blog posts or marketing copy
Processing large numbers of requests while keeping API costs down

Summary

A reasoning model is a type of LLM that improves accuracy by executing internal thinking steps before producing an answer
Because it optimizes CoT at the training level, it’s strong on complex logic problems
Response speed and cost are higher than standard LLMs, so using the right tool for the right task is important
Key models: OpenAI o1/o3 (high accuracy), DeepSeek R1 (OSS, low cost), Claude 3.7 Extended Thinking (transparency)

Frequently Asked Questions

Q: Are reasoning models always better than standard LLMs?

A: For complex reasoning tasks specifically, yes — but not in every situation. For simple tasks or speed-focused use cases, standard LLMs are more appropriate. Response times can reach tens of seconds, so applying them to real-time dialogue is sometimes impractical.

Q: Is the “thought process” of a reasoning model really thinking like a human?

A: It’s different from human thinking. The thinking steps of a reasoning model are sequences of tokens optimized through reinforcement learning to lead to correct answers. The underlying mechanism is fundamentally different from conscious human thought, but it produces output that functionally resembles problem-solving steps.

Q: Is DeepSeek R1 on par with OpenAI o1?

A: It varies by benchmark. On math and coding tasks, it demonstrates accuracy comparable to o1; on reasoning tasks centered on English, differences can appear. In terms of cost efficiency (open-source, self-hostable), DeepSeek R1 has a significant advantage.

Q: Can Chain-of-Thought be used with standard LLMs by specifying it in the prompt?

A: Yes. Even with standard LLMs, adding “Let’s think step by step” to the prompt can produce an effect similar to CoT. However, since reasoning models optimize CoT at the training level, they deliver more stable, higher accuracy even with the same prompt.

Next step: What Is Generative AI? (Back to overview)