API Design & Gateway
About 10 minutes
An API (Application Programming Interface) is a communication contract between different software components. Clients (browsers, mobile apps, other services) learn from the API specification what to send and what to receive in return, without needing to know the server’s internal implementation. In production AI services, multiple microservices collaborate, making API design and a unified API gateway indispensable.
Three API Styles: REST, GraphQL, and gRPC
Section titled “Three API Styles: REST, GraphQL, and gRPC”Three API styles are widely used in modern web services. Each was created to solve different problems.
REST API
Section titled “REST API”REST (Representational State Transfer) is an API design style that leverages HTTP’s built-in mechanisms. It is designed around “resources” and uses HTTP methods to express operations.
| HTTP Method | Meaning | Example |
|---|---|---|
GET | Retrieve | GET /users/123 → Fetch user 123 |
POST | Create | POST /conversations → Create a new conversation |
PUT | Full update | PUT /users/123 → Update all fields of user 123 |
PATCH | Partial update | PATCH /users/123 → Update specific fields of user 123 |
DELETE | Delete | DELETE /conversations/456 → Delete conversation 456 |
The response communicates state via HTTP status codes.
200 OK → Success (retrieval or update)
201 Created → Creation successful
400 Bad Request → Problem with the client's request
401 Unauthorized → Authentication required
403 Forbidden → Authenticated but access not permitted
404 Not Found → Resource does not exist
429 Too Many Requests → Rate limit reached
500 Internal Server Error → Server-side errorGraphQL
Section titled “GraphQL”GraphQL is a query-language-based API specification developed by Facebook. It uses a single endpoint (/graphql) where the client specifies only the fields it needs.
# The client specifies only the fields it needs
query {
user(id: "123") {
name
email
conversations(last: 5) {
id
createdAt
messages(last: 1) {
content
}
}
}
}Where REST would require multiple endpoint calls to combine data, GraphQL retrieves all necessary data in a single request.
gRPC (Google Remote Procedure Call) is a binary-protocol-based RPC framework developed by Google. It defines typed interfaces in Protocol Buffers (.proto files) and communicates in binary rather than JSON.
// chat.proto
service ChatService {
rpc SendMessage (MessageRequest) returns (MessageResponse);
rpc StreamResponse (MessageRequest) returns (stream MessageChunk);
}
message MessageRequest {
string conversation_id = 1;
string content = 2;
}Comparison: REST vs GraphQL vs gRPC
Section titled “Comparison: REST vs GraphQL vs gRPC”| Item | REST | GraphQL | gRPC |
|---|---|---|---|
| Protocol | HTTP/JSON | HTTP/JSON | HTTP/2 + Protocol Buffers (binary) |
| Endpoints | Multiple (per resource) | Single (/graphql) | Per method |
| Type definition | OpenAPI (optional) | Schema required | .proto file required |
| Primary use | Public APIs, simple CRUD | Complex frontend data fetching | Microservice-to-microservice |
| Learning curve | Low | Medium | Medium–High |
| Browser support | Native | Native | Limited (grpc-web required) |
| Performance | Standard | Standard | Fast (binary communication) |
What Is an API Gateway
Section titled “What Is an API Gateway”An API gateway is a single entry point that receives all client requests. Instead of exposing multiple backend services directly to clients, the API gateway handles everything centrally.
API Gateway Architecture
Section titled “API Gateway Architecture”graph TD
Browser["Browser"] --> GW["API Gateway"]
Mobile["Mobile App"] --> GW
External["External Service"] --> GW
GW --> AuthCheck["① Auth Check\nJWT validation / API key check"]
AuthCheck --> RateLimit["② Rate Limit Check\nCounter managed in Redis"]
RateLimit --> Route["③ Routing\nDetermine target service by path"]
Route --> UserSvc["User Service\n:8001"]
Route --> AISvc["AI Chat Service\n:8002"]
Route --> FileSvc["File Service\n:8003"]
GW --> Log["④ Log & Metrics Collection\nDatadog / CloudWatch"]
UserSvc --> DB["PostgreSQL"]
AISvc --> LLMAPI["LLM API\nAnthropic / OpenAI"]
FileSvc --> Storage["Object Storage\nS3 / GCS"]Six Responsibilities of an API Gateway
Section titled “Six Responsibilities of an API Gateway”1. Single Entry Point
Section titled “1. Single Entry Point”Clients only need to know the API gateway’s URL, regardless of how many microservices sit behind it. Backend service topology changes without affecting client code.
2. Authentication and Authorization
Section titled “2. Authentication and Authorization”Every request goes through JWT token validation or API key verification. Invalid requests are rejected before reaching any backend service.
Request header example:
Authorization: Bearer eyJhbGciOiJSUzI1NiJ9...
X-API-Key: sk-proj-xxxxxxxxxxxxx3. Rate Limiting
Section titled “3. Rate Limiting”Limits the number of requests per user or IP address, protecting backend services from overload and preventing API abuse.
4. Request Routing
Section titled “4. Request Routing”Determines the forwarding target based on the request path (e.g., /api/chat/* → AI service, /api/users/* → user service).
5. SSL Termination
Section titled “5. SSL Termination”The configuration where HTTPS (encrypted) is used between the client and the API gateway, and HTTP is used within the internal network between the gateway and backend services. Certificate management is centralized at the gateway.
6. Logging and Metrics Collection
Section titled “6. Logging and Metrics Collection”All API request logs can be collected in one place. Latency, error rate, and throughput are sent to monitoring tools.
Three Rate Limiting Algorithms
Section titled “Three Rate Limiting Algorithms”Rate limiting is a mechanism that controls the number of requests per unit of time. Its purpose is to protect backend services from overload and prevent API abuse.
| Algorithm | Mechanism | Characteristics |
|---|---|---|
| Fixed Window | Reset counter every minute | Simple to implement. Bursts can occur at window boundaries |
| Sliding Window | Track requests in the last N seconds | Better burst prevention. Slightly more complex |
| Token Bucket | Refill tokens at a fixed rate. Consume a token per request | Allows limited bursts while constraining average rate |
Major gateways like AWS API Gateway and Kong have these algorithms built in.
API Versioning
Section titled “API Versioning”When changing API specifications, version management is needed to avoid breaking existing clients.
URL Versioning (Most Common)
Section titled “URL Versioning (Most Common)”https://api.example.com/v1/chat
https://api.example.com/v2/chatVisible in the URL, easy to understand, and cache-friendly.
Header Versioning
Section titled “Header Versioning”GET /chat
Accept: application/vnd.api+json; version=2Keeps URLs clean but makes implementation and testing more complex.
In practice, URL versioning is by far the most common approach. Sufficient migration time should be announced before v1 is retired.
AI API-Specific Design
Section titled “AI API-Specific Design”AI services using LLM APIs have design considerations beyond typical REST APIs.
Streaming Responses (SSE)
Section titled “Streaming Responses (SSE)”LLM generation takes several seconds to tens of seconds. Returning the response only after generation completes means a long silent period for users. Server-Sent Events (SSE) allows generated text to be sent to the client progressively.
# SSE streaming example with FastAPI
from fastapi.responses import StreamingResponse
async def stream_chat(message: str):
async def generate():
async with anthropic_client.messages.stream(
model="claude-opus-4-5",
max_tokens=1024,
messages=[{"role": "user", "content": message}]
) as stream:
async for text in stream.text_stream:
yield f"data: {text}\n\n"
return StreamingResponse(generate(), media_type="text/event-stream")Cost-Per-Call Tracking
Section titled “Cost-Per-Call Tracking”LLM APIs are metered by token count. Record input tokens, output tokens, and cost in the DB for every API request.
{
"request_id": "req_123",
"user_id": "user_456",
"model": "claude-opus-4-5",
"input_tokens": 250,
"output_tokens": 1800,
"cost_usd": 0.0234,
"latency_ms": 4200
}Major API Gateway Products
Section titled “Major API Gateway Products”| Product | Type | Characteristics |
|---|---|---|
| AWS API Gateway | Managed | Easy integration with AWS services. Works well with Lambda and ECS |
| Kong | OSS/Managed | Rich plugin ecosystem. Can run on-premises |
| Nginx | OSS | High-performance reverse proxy with high customizability |
| Cloudflare Workers | Edge | Runs at the edge. Good for minimizing global latency |
| Traefik | OSS | High affinity with Kubernetes environments |
Summary
Section titled “Summary”- REST uses HTTP natively, is simple to design, and is optimal for public APIs and CRUD operations
- GraphQL allows fetching only the needed fields and is suited for complex frontend data retrieval
- gRPC uses binary communication for high speed, requires typed schemas, and is suited for microservice-to-microservice communication
- The API gateway centralizes authentication, rate limiting, routing, and log collection at a single entry point
- AI services have additional design points: streaming responses (SSE) and cost tracking
Frequently Asked Questions
Section titled “Frequently Asked Questions”Q: Do I need an API gateway for a small project?
A: Not always for a single service. However, authentication, rate limiting, and log collection provide value even at small scale. With AWS, you can start with API Gateway from the beginning, or implement equivalent functionality through FastAPI middleware. Consider a full-featured API gateway when you start splitting into microservices.
Q: When do I need API versioning?
A: I recommend designing versioning from the start for APIs used by external clients (partner companies, mobile apps). For internal service-to-service APIs, versioning may not be necessary if deployments can be coordinated for simultaneous updates.
Q: Should I choose REST or GraphQL?
A: If the frontend needs to flexibly fetch large amounts of different data types, GraphQL is a good fit. For simple CRUD or public APIs, REST keeps implementation costs lower. A common approach is to start with REST and consider GraphQL when frontend data fetching becomes complex.
Q: Can I call gRPC directly from a frontend browser?
A: Direct calls from a browser do not work with standard gRPC. You need to use the grpc-web specification and its proxy, or set up a transcoding layer on the backend that converts gRPC to HTTP. It is very effective for internal communication between microservices.
See the references for the external specifications and background sources used on this page.[1][2][3][4][5]