8.4.3 API Design and Serviceization
Learning Goals
Section titled “Learning Goals”- Understand the most basic things an LLM service API should define
- Learn how to design clear request and response structures
- Understand key service concepts such as idempotency, error returns,
trace_id, and version management - Read and understand a minimal API processing loop
Beginner terminology bridge
Section titled “Beginner terminology bridge”API design becomes much easier once these words are no longer mysterious:
| Term | Beginner meaning | In this lesson |
|---|---|---|
API | Application Programming Interface, a stable way for one program to call another program | The service entry point that other code depends on |
endpoint | A specific callable address, such as /api/v1/chat | The URL path where a capability is exposed |
schema | A rule that defines which fields are allowed and required | It keeps request and response structures predictable |
payload | The data body sent with a request | In this lesson, it is usually the user query and related metadata |
trace_id | A unique ID for following one request through the system | It helps connect API logs, retrieval logs, model logs, and errors |
idempotency | Repeating the same request does not create uncontrolled side effects | It matters when retries happen after timeouts or network failures |
Do not memorize these as vocabulary only. In a real system, these terms are the pieces that let frontend, backend, logs, evaluation, and deployment cooperate.
Why API design is not “just wrapping some JSON”
Section titled “Why API design is not “just wrapping some JSON””What does a bad interface look like?
Section titled “What does a bad interface look like?”bad_request = { "msg": "What is the refund policy?"}
bad_response = { "text": "Refunds are available within 7 days"}What’s wrong here?
- What is
msg? User message? System message? - No
trace_id - No error structure
- No version information
- No context field
What is a good API design doing?
Section titled “What is a good API design doing?”At its core, it answers:
- What does the input look like?
- What does the output look like?
- How should errors be represented?
- Can it stay stable when called 10 times or 100,000 times?
In other words, API design is not “just writing an entry point”; it is defining:
The contract between the system and the outside world.
First, design the request structure
Section titled “First, design the request structure”A minimal request structure usually needs at least these fields
Section titled “A minimal request structure usually needs at least these fields”queryuser_id(optional)session_id(for multi-turn scenarios)metadata(optional)
A clearer request object
Section titled “A clearer request object”request = { "query": "What is the refund policy?", "user_id": 1, "session_id": "sess_001", "metadata": { "channel": "web" }}
print(request)Expected output:
{'query': 'What is the refund policy?', 'user_id': 1, 'session_id': 'sess_001', 'metadata': {'channel': 'web'}}Here, you can already feel:
- What the query is
- Who sent it
- Which session it belongs to
- What extra context is included
This is much better than “passing only a string.”
Then, design the response structure
Section titled “Then, design the response structure”Why must the response also be standardized?
Section titled “Why must the response also be standardized?”Because real consumers are often not just people, but also:
- Frontend applications
- Other services
- Logging systems
- Evaluation systems
They all need to consume the result reliably.
A more robust response structure
Section titled “A more robust response structure”response = { "trace_id": "trace_001", "answer": "A refund can be requested within 7 days after purchase, provided the learning progress is below 20%.", "sources": [ {"id": "doc_001", "section": "Refund Policy"} ], "usage": { "prompt_tokens": 120, "completion_tokens": 35 }}
print(response)Expected output:
{'trace_id': 'trace_001', 'answer': 'A refund can be requested within 7 days after purchase, provided the learning progress is below 20%.', 'sources': [{'id': 'doc_001', 'section': 'Refund Policy'}], 'usage': {'prompt_tokens': 120, 'completion_tokens': 35}}Why are these fields valuable?
Section titled “Why are these fields valuable?”trace_id: makes it easy to trace the request pathanswer: the actual business outputsources: helps with citation and verificationusage: helps with cost analysis
Error responses must also be designed
Section titled “Error responses must also be designed”Many systems only design successful responses
Section titled “Many systems only design successful responses”But in real engineering, the more common issues are actually:
- Invalid parameters
- Upstream timeouts
- Insufficient permissions
- Empty knowledge base
A unified error structure
Section titled “A unified error structure”error_response = { "trace_id": "trace_002", "error": { "code": "INVALID_ARGUMENT", "message": "query cannot be empty" }}
print(error_response)Expected output:
{'trace_id': 'trace_002', 'error': {'code': 'INVALID_ARGUMENT', 'message': 'query cannot be empty'}}This step is very important because it makes the caller clearly understand:
- What went wrong
- What category the error belongs to
- Whether it is worth retrying

A minimal runnable service handling function
Section titled “A minimal runnable service handling function”Simulate an API handler with pure Python first
Section titled “Simulate an API handler with pure Python first”def handle_chat(request): trace_id = "trace_demo_001"
if "query" not in request or not request["query"].strip(): return { "trace_id": trace_id, "error": { "code": "INVALID_ARGUMENT", "message": "query cannot be empty" } }
answer = f"System reply: {request['query']}" return { "trace_id": trace_id, "answer": answer, "sources": [], "usage": {"prompt_tokens": 12, "completion_tokens": 8} }
print(handle_chat({"query": "What is the refund policy?"}))print(handle_chat({"query": ""}))Expected output:
{'trace_id': 'trace_demo_001', 'answer': 'System reply: What is the refund policy?', 'sources': [], 'usage': {'prompt_tokens': 12, 'completion_tokens': 8}}{'trace_id': 'trace_demo_001', 'error': {'code': 'INVALID_ARGUMENT', 'message': 'query cannot be empty'}}What is this code actually teaching?
Section titled “What is this code actually teaching?”It teaches you:
- Validate the request first
- Every request should have a
trace_id - Both success and failure need a unified structure
This is already the most important layer of service design.
Why is idempotency important?
Section titled “Why is idempotency important?”What is idempotency?
Section titled “What is idempotency?”Simply put:
Repeated calls with the same request should produce the same or a controlled result.
This is especially important in these scenarios:
- Retries
- Re-sending after a timeout
- Network instability
Which APIs need idempotency more?
Section titled “Which APIs need idempotency more?”Especially:
- Ticket creation
- Payment initiation
- Order changes
A pure question-answering API is usually more like a “read-only operation,” so idempotency is easier to handle.
Why can’t version management be added later?
Section titled “Why can’t version management be added later?”Once others integrate with your API, changing fields casually becomes hard
Section titled “Once others integrate with your API, changing fields casually becomes hard”If today the response returns:
answer
and tomorrow it changes to:
response_text
the caller will break immediately.
A simple versioning strategy
Section titled “A simple versioning strategy”api_info = { "version": "v1", "endpoint": "/api/v1/chat"}
print(api_info)Expected output:
{'version': 'v1', 'endpoint': '/api/v1/chat'}Even for a small project, it is best to build version awareness early.
A FastAPI example closer to a real service
Section titled “A FastAPI example closer to a real service”If you want to see a style closer to a real backend, take a look at this minimal version.
from fastapi import FastAPIfrom pydantic import BaseModel, Field
class ChatRequest(BaseModel): query: str = Field(min_length=1) session_id: str | None = None
app = FastAPI()
@app.post("/api/v1/chat")def chat(payload: ChatRequest): return { "trace_id": "trace_demo_002", "answer": f"System reply: {payload.query}", "session_id": payload.session_id, }Although this code is simple, it is already closer to a real service because ChatRequest is a request schema. FastAPI uses it to validate the payload before your business logic runs. In production, you would usually add authentication, structured errors, logging, and a real trace ID generator.
If your goal is a “knowledge-base-driven SOP document assistant,” what should the minimal API look like?
Section titled “If your goal is a “knowledge-base-driven SOP document assistant,” what should the minimal API look like?”These systems usually need more than just a /chat endpoint.
At minimum, they often have interfaces like these:
| Endpoint | What it is responsible for |
|---|---|
/sop-drafts/generate | Generate a structured SOP draft from policy, case, and checklist evidence |
/sop-drafts/preview | Preview structured SOP sections before export |
/documents/ingest | Upload and parse PDF / Word / PPT |
/retrieval/search | Debug retrieval results |
When building for the first time, a more stable default approach is usually:
- Start with only one
generateendpoint - Return structured results or an export link first
- Then add debugging and batch interfaces
A very small request structure can be defined like this first:
generate_request = { "topic": "Refund escalation SOP", "audience": "frontline support", "doc_format": "word", "case_count": 2, "checklist_required": True,}
print(generate_request)Expected output:
{'topic': 'Refund escalation SOP', 'audience': 'frontline support', 'doc_format': 'word', 'case_count': 2, 'checklist_required': True}The value of this object is:
- It turns the slots collected during multi-turn conversation into actual service API parameters
Hands-on: Simulate an SOP Draft API Contract
Section titled “Hands-on: Simulate an SOP Draft API Contract”Before building a real FastAPI endpoint, first write the request validation and response contract in pure Python. This makes the service boundary clear.
REQUIRED_FIELDS = ["topic", "audience", "doc_format", "case_count", "checklist_required"]
def validate_generate_request(payload): missing = [field for field in REQUIRED_FIELDS if field not in payload or payload.get(field) is None] if missing: return False, { "code": "INVALID_ARGUMENT", "message": f"missing fields: {missing}" } if payload["doc_format"] not in {"word", "ppt"}: return False, { "code": "INVALID_ARGUMENT", "message": "doc_format must be word or ppt" } return True, None
def handle_generate(payload): trace_id = "trace_sop_001" ok, error = validate_generate_request(payload) if not ok: return {"trace_id": trace_id, "error": error}
return { "trace_id": trace_id, "status": "accepted", "sop_draft": { "title": payload["topic"], "audience": payload["audience"], "format": payload["doc_format"], "sections": ["Policy Summary", "Handled Cases", "Frontline Checklist"], } }
generate_request = { "topic": "Refund escalation SOP", "audience": "frontline support", "doc_format": "word", "case_count": 2, "checklist_required": True,}
print(handle_generate(generate_request))print(handle_generate({"topic": "Refund escalation SOP", "doc_format": "pdf"}))Expected output:
{'trace_id': 'trace_sop_001', 'status': 'accepted', 'sop_draft': {'title': 'Refund escalation SOP', 'audience': 'frontline support', 'format': 'word', 'sections': ['Policy Summary', 'Handled Cases', 'Frontline Checklist']}}{'trace_id': 'trace_sop_001', 'error': {'code': 'INVALID_ARGUMENT', 'message': "missing fields: ['audience', 'case_count', 'checklist_required']"}}
This exercise is useful because it forces you to design success and failure together. A service is not ready just because it can return a happy-path answer.
Common mistakes beginners make most often
Section titled “Common mistakes beginners make most often”The request structure is too loose
Section titled “The request structure is too loose”It may feel convenient at first, but it becomes very painful later.
The error structure is inconsistent
Section titled “The error structure is inconsistent”This makes it increasingly difficult for the frontend and other services to integrate.
No trace_id
Section titled “No trace_id”When something goes wrong, it becomes hard to trace the request path.
Binding the API too tightly to a single business logic from the start
Section titled “Binding the API too tightly to a single business logic from the start”This makes future expansion very difficult.
Evidence to Keep
Section titled “Evidence to Keep”Keep this page’s proof of learning as a small evidence card:
- Service Contract
- endpoint, input schema, output schema, error schema
- Run Signal
- latency, throughput, logs, health check, or container status
- Observability
- request id, trace id, structured log, or metric
- Failure Check
- timeout, retry storm, missing log, deployment mismatch
- Ops Action
- backoff, queue, alert, rollout, or rollback
Summary
Section titled “Summary”The most important thing in this section is not getting the API to run, but understanding:
The core of API design is turning input, output, errors, and traceability into a stable system contract.
Once the contract is clear, the service can truly be relied on by others for the long term.
Exercises
Section titled “Exercises”- Add support for a
session_idfield tohandle_chat(). - Design a unified error code enum, such as
INVALID_ARGUMENT,TIMEOUT, andNOT_FOUND. - Think about it: if this were a “ticket creation” API, how would you consider idempotency?
- Explain in your own words: why is API design essentially about defining a system contract?
Reference implementation and walkthrough
session_idshould flow through request parsing, state lookup, logs, and response trace. Validate empty or malformed IDs.- An error enum gives clients stable handling and separates user errors from service errors.
- An idempotency key prevents duplicate tickets when the client retries after a timeout.
- An API contract defines inputs, outputs, errors, permissions, timing expectations, and compatibility.