What Is Generative AI?
Generative AI is a collective term for AI technology that learns patterns from large amounts of data and generates new data such as text, images, audio, and video. Since the popularization of ChatGPT and image generation AI in 2022, it has rapidly penetrated society and is now used in many fields including business, education, and creative work.
Target audience: Those who are just getting interested in AI, or those who want to get an overview of generative AI.
Estimated learning time: 15 minutes to read
Prerequisites: None
The Difference Between Generative AI and Traditional AI (Discriminative AI)
Section titled “The Difference Between Generative AI and Traditional AI (Discriminative AI)”There are two broad types of AI: discriminative AI and generative AI.
Discriminative AI classifies or identifies input data. It makes judgments like “Is this image a cat or a dog?” or “Is this email spam or not?”
Generative AI creates new data based on the patterns it has learned. It generates things like “Create a new image of a cat” or “Automatically write an email.”
| Comparison | Discriminative AI | Generative AI |
|---|---|---|
| Purpose | Classify or identify data | Generate new data |
| Output | Labels, probabilities, scores | Text, images, audio, video |
| Representative examples | Image classification, spam filters, face recognition | ChatGPT, DALL-E, Stable Diffusion |
| Learning method | Learns from labeled data | Learns the distribution of patterns |
| Main applications | Quality control, medical diagnosis, search | Text writing, image generation, code completion |
What Generative AI Can Do
Section titled “What Generative AI Can Do”Generative AI is being applied across multiple modalities (types of data).
Text Generation
Section titled “Text Generation”Text generation AI generates sentences based on an input prompt (instruction).
- Writing, summarizing, and translating text
- Automatic code generation and debugging assistance
- Providing information in conversational format (chatbots)
Examples: ChatGPT (OpenAI), Claude (Anthropic), Gemini (Google)
Image Generation
Section titled “Image Generation”Image generation AI creates new images from text descriptions (prompts) or reference images.
- Generating images from text (Text-to-Image)
- Style transfer of existing images
- Image editing and completion (inpainting)
Examples: DALL-E (OpenAI), Stable Diffusion (Stability AI), Midjourney
Music and Audio Generation
Section titled “Music and Audio Generation”Music generation AI creates compositions and audio from text or musical instructions.
- Generating music from text
- Voice cloning and conversion
- Speech synthesis (Text-to-Speech)
Examples: Suno, Udio, ElevenLabs
Video Generation
Section titled “Video Generation”Video generation AI creates video content from text or images.
- Generating short videos from text (Text-to-Video)
- Converting still images to video
- Video editing and completion
Examples: Sora (OpenAI), Runway, Pika
The History of Generative AI
Section titled “The History of Generative AI”The development of generative AI accelerated when three elements came together: algorithms, data, and computing power.
timeline
title Key Milestones in Generative AI
2014 : GAN introduced (Ian Goodfellow)
2017 : Transformer paper "Attention Is All You Need"
2018 : BERT (Google) / GPT-1 (OpenAI)
2019 : GPT-2
2020 : GPT-3 (175 billion parameters)
2021 : DALL-E / Codex
2022 : ChatGPT / Stable Diffusion / Midjourney
2023 : GPT-4 / Claude 2 / Llama 2
2024 : Claude 3 / GPT-4o / Gemini 1.5
2025 : Claude 3.5/4 / GPT-o3 / Rise of reasoning models
2026 : Practical deployment phase of AI agentsKey Milestones
Section titled “Key Milestones”| Year | Event | Significance |
|---|---|---|
| 2014 | GAN (Generative Adversarial Network) — Ian Goodfellow | Two competing networks produce high-quality generation |
| 2017 | Transformer paper “Attention Is All You Need” — Vaswani et al. | A parallelizable architecture emerges, becoming the foundation for large-scale models |
| 2018 | BERT (Google), GPT-1 (OpenAI) | The paradigm of pre-training + fine-tuning is established |
| 2019 | GPT-2 | High-quality text generation ability is first widely recognized |
| 2020 | GPT-3 (175 billion parameters) | Demonstrates versatility to handle diverse tasks with few examples |
| 2021 | DALL-E, Codex | Ability to generate images and code from text is demonstrated |
| 2022 | ChatGPT, Stable Diffusion, Midjourney | Era begins where ordinary users use generative AI daily |
| 2023 | GPT-4, Claude 2, Llama 2 | Dramatic capability improvement and rise of open-source models |
| 2024 | Claude 3, GPT-4o, Gemini 1.5 | Multimodal (integration of text, images, audio) becomes mainstream |
| 2025 | Claude 3.5/4, GPT-o3, rise of reasoning models | ”Think before answering” reasoning models used for complex problem solving |
| 2026 | Practical deployment phase of AI agents | Multiple AIs collaborate to autonomously execute complex tasks |
Why Generative AI Is Advancing So Rapidly Now
Section titled “Why Generative AI Is Advancing So Rapidly Now”The rapid advancement of generative AI results from three elements aligning simultaneously.
graph TD
A["Computing Power\nGPU/TPU advances\nCloud infrastructure development"] --> D["Rapid advancement\nof generative AI"]
B["Data\nVast amounts of text\nand images on the internet"] --> D
C["Algorithms\nThe Transformer\nQuality improvement via RLHF"] --> DComputing Power: Advances in GPU/TPU performance and cloud infrastructure development have made it possible to train models with hundreds of billions of parameters.
Data: The vast amounts of text, image, and audio data accumulated on the internet can now be used as training material.
Algorithms: The emergence of the Transformer architecture and the spread of reinforcement learning from human feedback (RLHF) have made practically high-quality generation possible.
Summary
Section titled “Summary”- Generative AI is a collective term for AI technology that learns data patterns and generates new content
- While discriminative AI “classifies and identifies,” generative AI “creates new data”
- Supported modalities are rapidly expanding: text, images, music, and video
- Starting from the 2017 Transformer paper, it has rapidly developed through the combination of computing power, data, and algorithms
Frequently Asked Questions
Section titled “Frequently Asked Questions”Q: What’s the difference between generative AI and “regular AI”?
A: What was widely used as “regular AI” was discriminative AI, which classifies and predicts data. Unlike discriminative AI, generative AI generates new data (text, images, etc.) from learned patterns. Both are types of AI, but their purposes and outputs are fundamentally different.
Q: Do I need specialized knowledge to use generative AI?
A: Products like ChatGPT, Claude, and Midjourney can be used from a browser without specialized knowledge. Technical knowledge is needed for development via API or fine-tuning custom models, but no special skills are required just to use them.
Q: How can generative AI “create new things”?
A: Generative AI learns statistical patterns from large amounts of data and probabilistically generates new data that follows those patterns. It doesn’t create something truly “original” — it generates new combinations based on the distribution of its training data.
Q: What’s the difference between a GAN and an LLM?
A: A GAN (Generative Adversarial Network) is a method that produces high-quality images and other outputs through competition between a generator network and a discriminator network. An LLM (Large Language Model) is a large-scale language model based on the Transformer, specialized for generating and understanding text. Both are forms of generative AI, but their architectures and strengths differ.
Next step: Transformer Models