JSON Processing in Python
About 5 minutes
JSON (JavaScript Object Notation) is a lightweight text-based format for representing data. JSON is used everywhere in engineering: API responses, configuration files, logs, and more. Python has a built-in json module for working with JSON, and pydantic enables type-safe data processing with automatic validation.
What is JSON?
Section titled “What is JSON?”JSON represents data in the following format:
{
"name": "Alice",
"age": 30,
"is_active": true,
"tags": ["python", "ai", "engineer"],
"address": {
"city": "Tokyo",
"zip": "100-0001"
},
"notes": null
}Type correspondence between JSON and Python:
| JSON Type | Python Type | Example |
|---|---|---|
| object | dict | {"key": "value"} |
| array | list | [1, 2, 3] |
| string | str | "hello" |
| number | int / float | 42, 3.14 |
| true/false | True / False | True |
| null | None | None |
Basic Operations with the json Module
Section titled “Basic Operations with the json Module”Python’s standard json module converts between strings and Python objects.
json.loads(): JSON string to Python object
Section titled “json.loads(): JSON string to Python object”import json
# A JSON string (e.g., a response received from an API)
json_string = '{"name": "Alice", "age": 30, "is_active": true}'
# Convert to a Python dictionary
data = json.loads(json_string)
print(type(data)) # <class 'dict'>
print(data["name"]) # Alice
print(data["age"]) # 30
print(data["is_active"]) # True (Python bool)json.dumps(): Python object to JSON string
Section titled “json.dumps(): Python object to JSON string”import json
data = {
"model": "claude-opus-4-5",
"max_tokens": 1024,
"messages": [
{"role": "user", "content": "Hello"}
]
}
# Convert Python dictionary to JSON string
json_string = json.dumps(data)
print(json_string)
# {"model": "claude-opus-4-5", "max_tokens": 1024, "messages": [{"role": "user", "content": "Hello"}]}
# With indentation for readability
json_string_readable = json.dumps(data, indent=2)
print(json_string_readable)
# {
# "model": "claude-opus-4-5",
# "max_tokens": 1024,
# "messages": [
# {
# "role": "user",
# "content": "Hello"
# }
# ]
# }Reading and Writing Files
Section titled “Reading and Writing Files”import json
# Write to a JSON file
data = {"version": "1.0", "settings": {"theme": "dark", "language": "en"}}
with open("config.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
# Read from a JSON file
with open("config.json", "r", encoding="utf-8") as f:
loaded = json.load(f)
print(loaded["settings"]["theme"]) # darkWorking with Nested JSON
Section titled “Working with Nested JSON”API responses often have nested structures:
import json
# Example modeled on a GitHub API response
response_json = """
{
"id": 12345,
"name": "my-project",
"owner": {
"login": "alice",
"type": "User"
},
"topics": ["ai", "python", "api"],
"license": {
"name": "MIT License",
"spdx_id": "MIT"
},
"private": false,
"stargazers_count": 128,
"homepage": null
}
"""
repo = json.loads(response_json)
# Accessing nested data
print(repo["owner"]["login"]) # alice
print(repo["topics"][0]) # ai
print(repo["license"]["name"]) # MIT License
# Use get() to safely access keys that may not exist
homepage = repo.get("homepage") # None (no KeyError)
description = repo.get("description", "N/A") # Specify a default value
print(homepage) # None
print(description) # N/AProcessing a JSON Array
Section titled “Processing a JSON Array”import json
# A JSON list of users
users_json = """
[
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"},
{"id": 3, "name": "Carol", "role": "user"}
]
"""
users = json.loads(users_json)
# Extract all names
names = [user["name"] for user in users]
print(names) # ['Alice', 'Bob', 'Carol']
# Filter by role
admins = [user for user in users if user["role"] == "admin"]
print(admins) # [{'id': 1, 'name': 'Alice', 'role': 'admin'}]Common Pitfalls
Section titled “Common Pitfalls”1. Confusing a string with an object
Section titled “1. Confusing a string with an object”import json
# Forgetting json.loads() leaves it as a string
raw = '{"name": "Alice"}'
print(raw["name"]) # TypeError! Cannot index a string this way
data = json.loads(raw)
print(data["name"]) # OK: Alice2. json.loads() vs. json.load()
Section titled “2. json.loads() vs. json.load()”import json
# loads() → converts from a string (the "s" stands for string)
data = json.loads('{"key": "value"}')
# load() → converts from a file object
with open("data.json") as f:
data = json.load(f)3. Non-ASCII characters (ensure_ascii)
Section titled “3. Non-ASCII characters (ensure_ascii)”import json
data = {"city": "Tōkyō", "greeting": "Bonjour"}
# By default, non-ASCII characters are Unicode-escaped
print(json.dumps(data))
# {"city": "Tōkyō", "greeting": "Bonjour"}
# Use ensure_ascii=False to preserve non-ASCII characters
print(json.dumps(data, ensure_ascii=False))
# {"city": "Tōkyō", "greeting": "Bonjour"}Type-safe JSON Processing with pydantic
Section titled “Type-safe JSON Processing with pydantic”pydantic uses Python type hints to automatically validate and convert data. It is especially useful when working with API responses.
pip install pydanticDefining Data Models with BaseModel
Section titled “Defining Data Models with BaseModel”from pydantic import BaseModel
from typing import Optional
# Define the data model
class Owner(BaseModel):
login: str
type: str
class Repository(BaseModel):
id: int
name: str
owner: Owner
topics: list[str] = [] # Default value
private: bool = False
stargazers_count: int = 0
homepage: Optional[str] = None # Allow None
description: Optional[str] = None
# Create a model instance from a dictionary
data = {
"id": 12345,
"name": "my-project",
"owner": {"login": "alice", "type": "User"},
"topics": ["ai", "python"],
"stargazers_count": 128,
"homepage": None,
"private": False
}
repo = Repository(**data)
# Access fields with type safety
print(repo.name) # my-project
print(repo.owner.login) # alice (nested model auto-converted)
print(repo.topics) # ['ai', 'python']
print(repo.homepage) # NoneCatching Validation Errors
Section titled “Catching Validation Errors”from pydantic import BaseModel, ValidationError
class User(BaseModel):
id: int
name: str
age: int
# Pass invalid data
try:
user = User(id="not-an-int", name="Alice", age=30)
except ValidationError as e:
print(e)
# 1 validation error for User
# id
# Input should be a valid integer, unable to parse string as an integer
# [type=int_parsing, input_value='not-an-int', ...]When a type mismatch occurs, a clear, descriptive error message is produced.
Practical Example: Parsing an API Response with pydantic
Section titled “Practical Example: Parsing an API Response with pydantic”import requests
from pydantic import BaseModel
from typing import Optional
class GitHubUser(BaseModel):
login: str
id: int
name: Optional[str] = None
company: Optional[str] = None
public_repos: int = 0
followers: int = 0
def get_github_user(username: str) -> GitHubUser | None:
"""Fetch GitHub user info and return it as a pydantic model"""
try:
response = requests.get(
f"https://api.github.com/users/{username}",
headers={"Accept": "application/vnd.github.v3+json"},
timeout=10
)
response.raise_for_status()
# Convert the response dictionary to a pydantic model
# Fields not defined in the model are silently ignored
return GitHubUser(**response.json())
except requests.exceptions.RequestException as e:
print(f"API call error: {e}")
return None
# Run it
user = get_github_user("torvalds")
if user:
print(f"Name: {user.name}")
print(f"Public repos: {user.public_repos}")
print(f"Followers: {user.followers:,}")To convert a model back to a dictionary or JSON string:
user_dict = user.model_dump()
user_json = user.model_dump_json()Summary
Section titled “Summary”json.loads()converts a JSON string to a dictionary;json.dumps()converts a dictionary to a JSON string- Use
json.load()/json.dump()for file operations - Use
ensure_ascii=Falsewhen working with non-ASCII characters - pydantic’s
BaseModelprovides type-safe data processing and validation - Convert an API response dictionary to a pydantic model with
Model(**response.json())
As a next step, learn how to combine pydantic with Anthropic API responses in Python AI SDK in Practice.
Q: How should I choose between the json module and pydantic?
A: The json module is sufficient for simple configuration file reading and writing. pydantic is valuable for processing API responses and when type checking or validation is required. FastAPI uses pydantic internally, making the integration seamless.
Q: How do I handle dates in JSON (e.g., “2026-05-13T09:00:00Z”)?
A: json.loads() does not automatically convert ISO 8601 date strings to datetime objects. With pydantic, specifying a datetime field type triggers automatic conversion. With the standard library, use datetime.fromisoformat().
Q: How do I efficiently process large JSON files?
A: For files larger than a few hundred MB, rather than loading the entire file into memory with json.load(), use the ijson library for streaming-based processing.
Q: What changed between pydantic v1 and v2?
A: pydantic v2 (released in 2023) changed the API. dict() became model_dump() and json() became model_dump_json(). New projects should use v2.
See the references for the external specifications and background sources used on this page.[1][2][3]