Python AI SDK in Practice
About 5 minutes
The Anthropic SDK is the official library for calling the Claude API from Python. It enables simple, clean code for the core Claude API features: sending and receiving messages, streaming responses, and tool use. This page walks through practical, runnable code examples to build a solid understanding of how to use the SDK.
Installing the SDK
Section titled “Installing the SDK”pip install anthropicSetting the API Key
Section titled “Setting the API Key”Manage API keys through environment variables — never write them directly in code:
# macOS / Linux
export ANTHROPIC_API_KEY="sk-ant-..."
# Windows (PowerShell)
$env:ANTHROPIC_API_KEY = "sk-ant-..."For projects using a .env file, python-dotenv is convenient:
pip install python-dotenv# .env file (add this to .gitignore)
ANTHROPIC_API_KEY=sk-ant-...from dotenv import load_dotenv
load_dotenv() # Load the .env fileReading the API key from an environment variable ensures that it is not exposed even if the code is published to GitHub.
Basic Message Sending and Receiving
Section titled “Basic Message Sending and Receiving”The simplest way to query Claude:
import anthropic
client = anthropic.Anthropic()
# ANTHROPIC_API_KEY environment variable is loaded automatically
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain Python in three sentences."}
]
)
# Extract the response text
print(message.content[0].text)Understanding the Response Structure
Section titled “Understanding the Response Structure”import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "What is 1+1?"}]
)
# Fields of the response object
print(message.id) # Unique message ID
print(message.model) # Model name used
print(message.role) # "assistant"
print(message.stop_reason) # "end_turn" (normal completion)
# Token usage
print(message.usage.input_tokens) # Number of input tokens
print(message.usage.output_tokens) # Number of output tokens
# Extracting the text
text = message.content[0].text
print(text) # "2."Using a System Prompt
Section titled “Using a System Prompt”import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system="You are a Python expert. Always include working, concrete code examples in your responses.",
messages=[
{"role": "user", "content": "How do I remove duplicates from a list?"}
]
)
print(message.content[0].text)Maintaining Conversation History
Section titled “Maintaining Conversation History”import anthropic
client = anthropic.Anthropic()
# A list to accumulate conversation history
conversation = []
def chat(user_message: str) -> str:
"""Interact with Claude while maintaining conversation history"""
conversation.append({"role": "user", "content": user_message})
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
system="You are a helpful Python learning assistant.",
messages=conversation
)
assistant_message = response.content[0].text
conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
# Carry on a multi-turn conversation
print(chat("Can you explain Python list comprehensions?"))
print(chat("Could you show a more detailed example?"))
print(chat("How are they different from dictionary comprehensions?"))Streaming Responses
Section titled “Streaming Responses”LLM responses take time to generate. Streaming allows generated text to be displayed in real time, improving the user experience.
import anthropic
client = anthropic.Anthropic()
# Use a with block to manage the stream
with client.messages.stream(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Explain the basics of cloud architecture."}]
) as stream:
# Print text as it is generated
for text in stream.text_stream:
print(text, end="", flush=True)
print() # Final newline
# Retrieve the final message object after the stream ends
final_message = stream.get_final_message()
print(f"\nTokens used: {final_message.usage.input_tokens} in / {final_message.usage.output_tokens} out")Fine-grained Control Over Stream Events
Section titled “Fine-grained Control Over Stream Events”import anthropic
client = anthropic.Anthropic()
with client.messages.stream(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": "Compare three programming languages."}]
) as stream:
for event in stream:
if hasattr(event, "type"):
if event.type == "content_block_start":
print("[Generation started]")
elif event.type == "content_block_delta":
if hasattr(event.delta, "text"):
print(event.delta.text, end="", flush=True)
elif event.type == "content_block_stop":
print("\n[Generation ended]")
elif event.type == "message_stop":
print("[Message complete]")Tool Use (Function Calling)
Section titled “Tool Use (Function Calling)”Tool use lets Claude call external functions or APIs — for retrieving weather data, searching databases, performing calculations, and more.
Defining and Calling a Simple Tool
Section titled “Defining and Calling a Simple Tool”import anthropic
import json
client = anthropic.Anthropic()
# Tool definition (tells Claude what functions are available)
tools = [
{
"name": "get_weather",
"description": "Retrieves the current weather for a specified city",
"input_schema": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to check weather for (e.g., Tokyo, London)"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit"
}
},
"required": ["city"]
}
}
]
def get_weather(city: str, unit: str = "celsius") -> dict:
"""Call a real weather API (this example returns dummy data)"""
# In a real implementation, call an API like OpenWeatherMap
weather_data = {
"Tokyo": {"temp": 22, "condition": "Sunny", "humidity": 60},
"London": {"temp": 15, "condition": "Cloudy", "humidity": 75},
}
data = weather_data.get(city, {"temp": 20, "condition": "Unknown", "humidity": 50})
if unit == "fahrenheit":
data["temp"] = data["temp"] * 9 / 5 + 32
return {"city": city, "temperature": data["temp"], "unit": unit,
"condition": data["condition"], "humidity": data["humidity"]}
def process_tool_call(tool_name: str, tool_input: dict) -> str:
"""Call the appropriate function based on tool name and arguments"""
if tool_name == "get_weather":
result = get_weather(**tool_input)
return json.dumps(result)
return json.dumps({"error": "Unknown tool"})
def chat_with_tools(user_message: str) -> str:
"""Interact with Claude, allowing it to use tools"""
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=tools,
messages=messages
)
# If no tool use needed, return the answer
if response.stop_reason == "end_turn":
return response.content[0].text
# If tool use is needed
if response.stop_reason == "tool_use":
# Add the assistant response (including tool calls) to the conversation
messages.append({"role": "assistant", "content": response.content})
# Process each tool call
tool_results = []
for content_block in response.content:
if content_block.type == "tool_use":
tool_result = process_tool_call(
content_block.name,
content_block.input
)
tool_results.append({
"type": "tool_result",
"tool_use_id": content_block.id,
"content": tool_result
})
# Add tool results and continue
messages.append({"role": "user", "content": tool_results})
# Run it
answer = chat_with_tools("Compare the weather in Tokyo and London.")
print(answer)Error Handling and Retries
Section titled “Error Handling and Retries”Rate limits and network errors can occur when calling AI APIs:
import anthropic
import time
client = anthropic.Anthropic()
def call_with_retry(messages: list, max_retries: int = 3) -> str:
"""Retry logic that handles rate limiting"""
for attempt in range(max_retries):
try:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
messages=messages
)
return response.content[0].text
except anthropic.RateLimitError:
if attempt < max_retries - 1:
wait_seconds = 2 ** attempt # Exponential backoff: 1s, 2s, 4s
print(f"Rate limit reached. Retrying in {wait_seconds} second(s)...")
time.sleep(wait_seconds)
else:
raise
except anthropic.APIConnectionError:
print("Network connection error.")
raise
except anthropic.AuthenticationError:
print("Invalid API key. Check the ANTHROPIC_API_KEY environment variable.")
raise
except anthropic.APIStatusError as e:
print(f"API error: {e.status_code} - {e.message}")
raise
return ""
result = call_with_retry([{"role": "user", "content": "Hello!"}])
print(result)Best Practices
Section titled “Best Practices”API Key Management
Section titled “API Key Management”import os
import anthropic
# Always read from environment variable
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
raise ValueError("ANTHROPIC_API_KEY environment variable is not set")
client = anthropic.Anthropic(api_key=api_key)Setting max_tokens Appropriately
Section titled “Setting max_tokens Appropriately”# For short answers (classification, decisions)
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=256, # Keep low to control cost
messages=[{"role": "user", "content": "Is the sentiment of this text Positive or Negative? Answer in one word."}]
)
# For long answers (document generation, detailed explanations)
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096, # Set higher
messages=[{"role": "user", "content": "Please provide a detailed explanation of async processing in Python."}]
)Practical: A Simple Q&A CLI Tool
Section titled “Practical: A Simple Q&A CLI Tool”A small, complete CLI application that combines everything covered above:
"""
simple_qa.py - A simple Claude Q&A CLI tool
Usage:
python simple_qa.py
"""
import anthropic
import os
import sys
def create_client() -> anthropic.Anthropic:
"""Create an Anthropic client"""
api_key = os.environ.get("ANTHROPIC_API_KEY")
if not api_key:
print("Error: ANTHROPIC_API_KEY environment variable is not set")
sys.exit(1)
return anthropic.Anthropic(api_key=api_key)
def ask_claude(client: anthropic.Anthropic, question: str, history: list) -> str:
"""Ask Claude a question and return the streaming response"""
history.append({"role": "user", "content": question})
full_response = ""
print("\nClaude: ", end="", flush=True)
with client.messages.stream(
model="claude-opus-4-5",
max_tokens=1024,
system="You are a helpful and knowledgeable assistant.",
messages=history
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)
full_response += text
print() # Newline
history.append({"role": "assistant", "content": full_response})
return full_response
def main():
"""Main loop"""
client = create_client()
history = []
print("=== Claude Q&A Tool ===")
print("Enter a question. Type 'quit' or 'exit' to end.\n")
while True:
try:
question = input("You: ").strip()
if not question:
continue
if question.lower() in ("quit", "exit"):
print("Goodbye.")
break
if question.lower() == "clear":
history.clear()
print("Conversation history cleared.\n")
continue
ask_claude(client, question, history)
print()
except KeyboardInterrupt:
print("\n\nGoodbye.")
break
except anthropic.RateLimitError:
print("Rate limit reached. Please wait before retrying.")
except anthropic.APIConnectionError:
print("Network connection error. Please check your connection.")
if __name__ == "__main__":
main()How to run it:
export ANTHROPIC_API_KEY="sk-ant-..."
python simple_qa.pySummary
Section titled “Summary”- Create a client with
anthropic.Anthropic()and send messages withclient.messages.create() - Always read API keys from environment variables; never write them in code
- Use
client.messages.stream()for streaming responses - Tool use (function calling) allows Claude to call external functions
- Check
stop_reasonand the type of each element incontentwhen processing responses - Handle exceptions like
anthropic.RateLimitErrorappropriately
Q: How do I use models other than claude-opus-4-5?
A: Just change the model parameter. Faster, lower-cost models (e.g., claude-haiku-4-5) can reduce latency and costs. See Anthropic’s model list for options.
Q: Does the cost depend on the max_tokens setting?
A: No. Cost is based on the number of tokens actually generated. max_tokens is a ceiling — if the response is short, you are only charged for what was generated.
Q: Is the cost different with and without streaming?
A: No. Streaming only affects when the output is displayed, not the number of tokens generated.
Q: How do I debug tool use errors?
A: Print response.content to inspect its contents and verify that the tool_use block contains the expected name and input values. If the input_schema definition is incorrect, Claude may not be able to generate the correct JSON arguments.
See the references for the external specifications and background sources used on this page.[1][2][3][4]