What Is Deep Learning?
About 5 minutes
Deep learning is a machine learning technique that uses multi-layer neural networks to automatically learn advanced features from data. The book Deep Learning treats deep learning as a field within machine learning and explains representation learning and multi-layer neural networks systematically.[1]
The Nested Relationship: AI, ML, and Deep Learning
Section titled “The Nested Relationship: AI, ML, and Deep Learning”graph TD
AI["AI (Artificial Intelligence)\nAll techniques that replicate human intelligence"]
ML["Machine Learning\nAutomatic pattern learning from data"]
DL["Deep Learning\nMulti-layer neural networks for automatic feature extraction"]
AI --> ML
ML --> DLDeep learning is a powerful technique within machine learning, but it is not the best fit for every problem. When data is scarce or interpretability is important, traditional machine learning methods can be easier to use.[1]
The Basics of Neural Networks
Section titled “The Basics of Neural Networks”A neural network is a mathematical model inspired by the connected structure of neurons (nerve cells) in the human brain. A large number of “nodes (units)” are arranged in layers, and adjacent layers are connected to process information.[1]
Basic Structure
Section titled “Basic Structure”graph LR
subgraph IL["Input Layer"]
I1["x₁"]
I2["x₂"]
I3["x₃"]
end
subgraph HL["Hidden Layers (multiple)"]
H1["h₁"]
H2["h₂"]
H3["h₃"]
end
subgraph OL["Output Layer"]
O1["y"]
end
I1 --> H1
I1 --> H2
I1 --> H3
I2 --> H1
I2 --> H2
I2 --> H3
I3 --> H1
I3 --> H2
I3 --> H3
H1 --> O1
H2 --> O1
H3 --> O1| Layer Type | Role |
|---|---|
| Input Layer | Receives the data — pixel values, numbers, and other features are fed in here |
| Hidden Layers | Progressively extracts features from the input — this is the “deep” part |
| Output Layer | Produces the final prediction or classification result |
Why Is It Called “Deep”?
Section titled “Why Is It Called “Deep”?”“Deep” refers to having multiple hidden layers. More layers allow the network to learn increasingly abstract features.[1]
In image recognition, for example, shallow layers learn edges and color patterns, while deeper layers learn higher-level features such as “eyes,” “nose,” and “face.”[2]
Differences Between Traditional ML and Deep Learning
Section titled “Differences Between Traditional ML and Deep Learning”The biggest difference is whether feature engineering is performed by humans or learned automatically.
| Traditional Machine Learning | Deep Learning | |
|---|---|---|
| Feature design | Designed manually by humans | Learned automatically from data |
| Data requirements | Can work with relatively small amounts | Requires large amounts of data |
| Compute resources | Runs on modest hardware | Requires high compute (e.g., GPUs) |
| Interpretability | High (e.g., decision trees) | Low (black-box problem) |
| Strengths | Structured (tabular) data | Unstructured data: images, audio, text |
Understanding Through Analogy
Section titled “Understanding Through Analogy”Think of a cooking recipe.
- Traditional machine learning: A food researcher (a human) pre-selects the important features of a recipe (ingredients used, cooking time, calories, and so on) before training.
- Deep learning: Show the model tens of thousands of food photos and let it discover on its own what features determine whether a dish is delicious.
Tasks Where Deep Learning Excels
Section titled “Tasks Where Deep Learning Excels”Image Recognition (CNN: Convolutional Neural Network)
Section titled “Image Recognition (CNN: Convolutional Neural Network)”A CNN (Convolutional Neural Network) is a network architecture designed to process image data. It efficiently learns spatial patterns within images.[1]
- Object detection and classification (a field that advanced through image benchmarks such as ImageNet)
- Medical image diagnosis (detecting diseases from X-ray and MRI images)
- Autonomous driving (recognizing roads, pedestrians, and signs)
Natural Language Processing (Transformer / LLM)
Section titled “Natural Language Processing (Transformer / LLM)”A Transformer is a neural network architecture published in 2017. It uses attention mechanisms to process sequence data.[3] Many modern large language models are based on Transformer-style designs.
An LLM (Large Language Model) is a language model pre-trained on large amounts of text data. Google’s MLCC explains LLMs through tokens, Transformers, and text prediction.[4]
- GPT series (OpenAI): Text generation and code completion
- Claude (Anthropic): Dialogue, document analysis, and coding assistance
- Gemini (Google): Multimodal processing (text, images, and video combined)
Speech Recognition
Section titled “Speech Recognition”Speech recognition converts audio waveforms into text. It is used in voice assistants on smartphones and automatic caption generation.
How LLMs Work (Overview)
Section titled “How LLMs Work (Overview)”LLMs learn and run inference through the following process.
graph LR
PT["Massive text data\n(Pre-training)"] --> LLM["LLM\n(Transformer-based)"]
LLM --> FT["Fine-tuning\nfor specific tasks"]
FT --> APP["Applications\n(dialogue, translation, summarization, etc.)"]- Pre-training: Train on large amounts of text by learning to “predict the next token (word fragment)”
- Fine-tuning: Additional training for a specific use case, such as conversation
- Inference: Generate text in response to user input
Q: Is deep learning always better than traditional machine learning?
A: No. Deep learning often requires large amounts of data and high compute resources. When data is scarce, or when working with structured (tabular) data, traditional methods can be easier to use.[1]
Q: Can I do deep learning without a GPU?
A: Small experiments can run on a CPU, although training can be slow. Larger models often require accelerators such as GPUs.
Q: Are LLMs and chatbots the same thing?
A: No. An LLM is a model (engine) for understanding and generating language. A chatbot is an application that uses an LLM to provide a conversational interface. The same LLM can power many applications beyond chatbots: translation, summarization, code generation, and more.
References
Section titled “References”- Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, 2016
- Matthew D. Zeiler, Rob Fergus, Visualizing and Understanding Convolutional Networks, November 12, 2013
- Ashish Vaswani et al., Attention Is All You Need, June 12, 2017
- Google, Machine Learning Crash Course