Reasoning Models
A reasoning model is an AI model that executes multiple internal thinking steps before outputting an answer — solving problems incrementally before returning a final result. Since 2024, these models have demonstrated dramatically higher accuracy than standard LLMs on complex problems in mathematics, programming, and logic puzzles, and practical deployment has been advancing rapidly.
Target audience: Those who understand LLM basics (token prediction, Transformer) and want to know how reasoning models specifically work.
Estimated learning time: 25 minutes to read
Prerequisites: Must have read Transformer Models and BERT vs. GPT
Why Reasoning Models Were Needed
Section titled “Why Reasoning Models Were Needed”A standard LLM probabilistically predicts tokens for a given prompt, outputting them from left to right. This approach is well-suited for fluent text generation and general knowledge answers, but has limitations for complex logical reasoning, mathematical calculation, and multi-step problem solving.
Examples of Problems That Standard LLMs Struggle With
Section titled “Examples of Problems That Standard LLMs Struggle With”Problem: "When a certain integer is multiplied by 3 and 7 is added, the result is 40. What is the integer?"
Issues with standard LLMs:
- May output the correct answer (11), but errors are frequent
- The reasoning process behind the answer is unclear
- Accuracy drops sharply as problems become more complexLLMs operate through probabilistic pattern matching, so they can handle problems similar to their training data but are unreliable for problems requiring systematic logical reasoning.
Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit
Section titled “Chain-of-Thought (CoT) — The Technique of Making Thinking Explicit”Chain-of-Thought (CoT) is a technique for improving LLM reasoning accuracy by having the model describe its problem-solving process step by step before outputting a final answer.
The Difference Between Just Outputting the Answer vs. Showing the Thought Process
Section titled “The Difference Between Just Outputting the Answer vs. Showing the Thought Process”[Standard output]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Answer: "2"
[Using Chain-of-Thought]
Question: "There are 5 apples. If 3 are eaten, how many are left?"
Thought process:
Step 1: Initial number of apples = 5
Step 2: Apples eaten = 3
Step 3: Remaining apples = 5 - 3 = 2
Answer: "2"The difference is hard to see in simple examples, but for problems requiring multi-step reasoning (mathematical proofs, program debugging, logic puzzles, etc.), many studies have shown that CoT significantly improves accuracy.
Applying CoT in Prompts
Section titled “Applying CoT in Prompts”CoT takes effect simply by adding “Let’s think step by step” to the prompt. Reasoning models are optimized to perform this process automatically within the model.
How Reasoning Models Work
Section titled “How Reasoning Models Work”Reasoning models achieve CoT not as a mere prompting technique, but by incorporating it into the model’s training process.
Optimizing the Thinking Chain via Reinforcement Learning
Section titled “Optimizing the Thinking Chain via Reinforcement Learning”Reasoning model training uses reinforcement learning. The model repeatedly receives rewards when the answer it outputs through thinking steps is correct, and penalties when it’s wrong, training it to learn “thinking patterns that accurately solve problems.”
graph LR
Q["Problem Input"]
Think["Internal thinking steps\n(trial, verification, correction)"]
Answer["Final answer"]
Reward["Correct → reward\nIncorrect → penalty"]
Update["Update model weights\n(toward better thinking patterns)"]
Q --> Think --> Answer --> Reward --> Update
Update -.->|"Next problem"| QKey Reasoning Models
Section titled “Key Reasoning Models”OpenAI o1 (September 2024)
Section titled “OpenAI o1 (September 2024)”OpenAI o1 is the first full-scale reasoning model, announced by OpenAI in September 2024. It internally generates “thinking tokens” before the answer, then outputs the final response based on that thought process.
- Dramatically improved accuracy on AIME (math olympiad) problems compared to standard LLMs
- Thinking chain optimized through reinforcement learning
- Response time is longer than standard LLMs, but accuracy on complex problems is high
OpenAI o3 (2025)
Section titled “OpenAI o3 (2025)”OpenAI o3 is the successor to o1. Reasoning capabilities are further enhanced, with improved accuracy on complex coding, scientific reasoning, and logic puzzles.
DeepSeek R1 (2025)
Section titled “DeepSeek R1 (2025)”DeepSeek R1 is an open-source reasoning model published in 2025 by Chinese AI company DeepSeek.
- Achieves reasoning capability comparable to o1 at dramatically lower cost
- Model weights are published as open-source, enabling self-hosting
- Had a major industry impact from the perspective of reducing inference costs
Claude 3.7 Sonnet — Extended Thinking (Anthropic)
Section titled “Claude 3.7 Sonnet — Extended Thinking (Anthropic)”Claude 3.7 Sonnet is Anthropic’s model featuring Extended Thinking.
- Thinking time (budget_tokens) can be set via API parameters
- Has an option to disclose the thinking process to the user
- Demonstrates high accuracy on complex coding and analysis tasks
Processing Flow: Standard LLM vs. Reasoning Model
Section titled “Processing Flow: Standard LLM vs. Reasoning Model”graph TB
subgraph Normal["Standard LLM"]
NI["Input prompt"] --> NO["Output (direct generation)"]
end
subgraph Reasoning["Reasoning Model"]
RI["Input prompt"]
RT1["Thinking step 1\nDecompose the problem"]
RT2["Thinking step 2\nForm and test a hypothesis"]
RT3["Thinking step 3\nFind and correct errors"]
RN["Thinking step N\n..."]
RO["Output final answer"]
RI --> RT1 --> RT2 --> RT3 --> RN --> RO
endComparison: Standard LLM vs. Reasoning Model
Section titled “Comparison: Standard LLM vs. Reasoning Model”| Comparison | Standard LLM | Reasoning Model |
|---|---|---|
| Response speed | Fast (seconds) | Slow (tens of seconds to minutes) |
| Cost | Low | High (due to thinking tokens) |
| Simple tasks | Sufficient accuracy | Overpowered |
| Complex logical reasoning | Low accuracy | High accuracy |
| Math/proof problems | Low reliability | High reliability |
| Long code generation | Variable quality | Consistent quality |
| Transparency of thought process | None | Yes (model-dependent) |
| Representative models | GPT-4o, Claude 3.5 Sonnet | o1, o3, DeepSeek R1, Claude 3.7 |
Tasks Reasoning Models Excel and Struggle At
Section titled “Tasks Reasoning Models Excel and Struggle At”Tasks They Excel At
Section titled “Tasks They Excel At”- Mathematics, statistics, and proofs: Multi-step calculations, mathematical proofs, statistical reasoning
- Complex programming: Bug identification and fixing, algorithm optimization
- Logic puzzles and deduction problems: Organizing multiple conditions to reach a consistent answer
- Scientific analysis: Interpreting experimental data, hypothesis testing
Tasks They’re Not Suited For (Where Standard LLMs Are Better)
Section titled “Tasks They’re Not Suited For (Where Standard LLMs Are Better)”- Real-time dialogue where speed matters: Chatbots, customer support
- Short summarization and translation: Simple conversion tasks
- Creative content generation: Poetry, stories, marketing copy
- Bulk processing where cost efficiency matters: Processing large volumes of documents
Practical Usage Guide
Section titled “Practical Usage Guide”graph TD
Task["What is the nature of the task?"]
Task -->|"Complex reasoning / calculation required"| R["Use reasoning model\no1 / o3 / DeepSeek R1"]
Task -->|"Speed / cost-focused general tasks"| N["Use standard LLM\nGPT-4o / Claude 3.5 Sonnet"]
R --> Check["Check budget and latency"]
Check -->|"Cost is top priority"| DS["DeepSeek R1 (OSS)"]
Check -->|"Accuracy is top priority"| O3["OpenAI o3"]
Check -->|"Transparency is a priority"| C37["Claude 3.7 Extended Thinking"]When to choose a reasoning model:
- Solving calculation problems in mathematics, physics, or chemistry
- Fixing bugs in complex code
- Planning and optimization problems with multiple constraints
- When high-accuracy judgment is required with no tolerance for errors
When to choose a standard LLM:
- Summarizing and classifying large volumes of emails
- Responding in real-time in a chatbot
- Generating blog posts or marketing copy
- Processing large numbers of requests while keeping API costs down
Summary
Section titled “Summary”- A reasoning model is a type of LLM that improves accuracy by executing internal thinking steps before producing an answer
- Because it optimizes CoT at the training level, it’s strong on complex logic problems
- Response speed and cost are higher than standard LLMs, so using the right tool for the right task is important
- Key models: OpenAI o1/o3 (high accuracy), DeepSeek R1 (OSS, low cost), Claude 3.7 Extended Thinking (transparency)
Frequently Asked Questions
Section titled “Frequently Asked Questions”Q: Are reasoning models always better than standard LLMs?
A: For complex reasoning tasks specifically, yes — but not in every situation. For simple tasks or speed-focused use cases, standard LLMs are more appropriate. Response times can reach tens of seconds, so applying them to real-time dialogue is sometimes impractical.
Q: Is the “thought process” of a reasoning model really thinking like a human?
A: It’s different from human thinking. The thinking steps of a reasoning model are sequences of tokens optimized through reinforcement learning to lead to correct answers. The underlying mechanism is fundamentally different from conscious human thought, but it produces output that functionally resembles problem-solving steps.
Q: Is DeepSeek R1 on par with OpenAI o1?
A: It varies by benchmark. On math and coding tasks, it demonstrates accuracy comparable to o1; on reasoning tasks centered on English, differences can appear. In terms of cost efficiency (open-source, self-hostable), DeepSeek R1 has a significant advantage.
Q: Can Chain-of-Thought be used with standard LLMs by specifying it in the prompt?
A: Yes. Even with standard LLMs, adding “Let’s think step by step” to the prompt can produce an effect similar to CoT. However, since reasoning models optimize CoT at the training level, they deliver more stable, higher accuracy even with the same prompt.
Next step: What Is Generative AI? (Back to overview)