Skip to content

8.4.5 Containerization and Deployment

  • Understand why LLM applications are especially well-suited to containerization
  • Read the key structure of a minimal Dockerfile
  • Understand the core concepts of images, containers, ports, and environment variables
  • Read a small Docker Compose startup example
  • Understand that containerization is not the end of deployment, but the starting point

Docker becomes much less intimidating once the nouns are separated:

TermBeginner meaningWhy it matters
imageA packaged runtime template, like a recipe plus ingredientsYou build it once and run containers from it
containerA running instance created from an imageThis is the actual process serving requests
DockerfileThe build recipe for an imageIt records the base image, dependencies, files, and startup command
portThe doorway where a service listens for requests-p 8000:8000 maps the host port to the container port
environment variableConfiguration injected from outside the codeAPI keys, model names, and runtime modes should not be hardcoded
ComposeA tool for starting multiple related containers togetherUseful when the app needs a vector database, Redis, or Postgres

The core idea is not “learn Docker commands by heart,” but “make the runtime environment reproducible.”


What is the biggest hidden risk of a local script?

Section titled “What is the biggest hidden risk of a local script?”

When you can run a project locally, it often depends on many implicit conditions:

  • Python version
  • Package versions
  • System dependencies
  • Environment variables
  • Startup command

Once you change the person, the machine, or the server, these conditions can easily cause problems.

What does containerization actually solve?

Section titled “What does containerization actually solve?”

The core value of containerization is:

Package the application together with the runtime environment it depends on.

This lets you reproduce more reliably:

  • What was installed
  • Which versions were used
  • Which command was used to start it

This is especially important for LLM applications, because they often depend on:

  • Web frameworks
  • Model services
  • Vector databases
  • System tools

  • Image: like a recipe + ingredient kit
  • Container: the actual dish made from that recipe

In other words:

  • An image is a static template
  • A container is a running instance

Because during deployment, you usually:

  1. Build the image first
  2. Then start the container

If you do not clearly understand this order, Docker commands will feel confusing for a long time.

Docker image, container, and Compose deployment diagram


FROM python:3.14-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["python", "app.py"]
  • FROM

    • Choose the base image
  • WORKDIR

    • Set the working directory
  • COPY requirements.txt .

    • Copy in the dependency file
  • RUN pip install ...

    • Install dependencies
  • COPY . .

    • Copy the project code in as well
  • EXPOSE 8000

    • Indicate the port the service listens on
  • CMD

    • The default command executed when the container starts

This is the core skeleton of a Dockerfile.


First prepare a small app that can actually run

Section titled “First prepare a small app that can actually run”

To make the Docker deployment example more concrete, let’s first write a very simple app.py.

app.py
from http.server import BaseHTTPRequestHandler, HTTPServer
import json
class Handler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == "/health":
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({"status": "ok"}).encode())
return
self.send_response(200)
self.send_header("Content-Type", "application/json")
self.end_headers()
self.wfile.write(json.dumps({"message": "hello from llm app"}).encode())
server = HTTPServer(("0.0.0.0", 8000), Handler)
print("serving on 8000")
server.serve_forever()

Run it locally first:

Terminal window
python app.py

In another terminal, test the service:

Terminal window
curl http://localhost:8000/
curl http://localhost:8000/health

Expected output:

Terminal window
{"message": "hello from llm app"}
{"status": "ok"}

Because containerization is not about talking about Dockerfiles in the abstract, but about understanding them around a real running application.


This minimal service does not depend on any third-party packages, so requirements.txt can be empty, or you may even not need it. But to stay close to a real project, we will keep the structure.

requirements.txt
FROM python:3.14-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py .
EXPOSE 8000
CMD ["python", "app.py"]
Terminal window
docker build -t mini-llm-app .
docker run -p 8000:8000 mini-llm-app

Then visit:

  • http://localhost:8000/
  • http://localhost:8000/health

and you will see the returned results.

You can also verify it from the command line:

Terminal window
curl http://localhost:8000/
curl http://localhost:8000/health

Expected output:

Terminal window
{"message": "hello from llm app"}
{"status": "ok"}

This is the smallest containerization loop.


LLM applications often have configurations like these:

  • API Key
  • Model name
  • Vector database address
  • Runtime mode

These are usually not hardcoded in the code; environment variables are a better fit.

import os
model_name = os.getenv("MODEL_NAME", "demo-model")
port = int(os.getenv("PORT", "8000"))
print("MODEL_NAME =", model_name)
print("PORT =", port)

Expected output without extra environment variables:

Terminal window
MODEL_NAME = demo-model
PORT = 8000

How do you pass environment variables in Docker?

Section titled “How do you pass environment variables in Docker?”
Terminal window
docker run -p 8000:8000 -e MODEL_NAME=qwen-demo mini-llm-app

This step is very important, because real deployment almost always relies on configuration injection.

To make the running service show configuration, you can read MODEL_NAME in app.py and return it from the root endpoint. The key idea is the same: code stays stable, configuration changes outside the image.


Because real projects usually have more than one service

Section titled “Because real projects usually have more than one service”

An LLM application may also need to work with:

  • Web service
  • Vector database
  • Redis
  • Postgres

If you write docker run by hand for each one, things quickly become messy.

version: "3.9"
services:
app:
build: .
ports:
- "8000:8000"
environment:
MODEL_NAME: demo-model

Startup command:

Terminal window
docker compose up --build

This is why Compose is very useful for local development and small-scale deployments.


Containerization does not mean deployment is finished

Section titled “Containerization does not mean deployment is finished”

This is a very common misunderstanding.

Containerization solves packaging and the runtime environment

Section titled “Containerization solves packaging and the runtime environment”

But going live still requires considering:

  • Logs
  • Health checks
  • Resource limits
  • Automatic restarts
  • Canary releases
  • Reverse proxies

An endpoint like:

  • /health

is very valuable. Because deployment systems usually need to know:

Is this container alive right now, and can it accept requests?


The image becomes bloated.

You do not know when the service is broken.

Things break easily when you switch environments.

Thinking containerization automatically makes things scalable

Section titled “Thinking containerization automatically makes things scalable”

It does not. Containerization is only the first step; orchestration, monitoring, and operations come next.

If a build fails with no space left on device, first inspect Docker storage:

Terminal window
docker system df
docker builder prune

Only prune what you no longer need. In team or CI environments, it is safer to clean build cache first before deleting images or volumes.


Keep this page’s proof of learning as a small evidence card:

Service Contract
endpoint, input schema, output schema, error schema
Run Signal
latency, throughput, logs, health check, or container status
Observability
request id, trace id, structured log, or metric
Failure Check
timeout, retry storm, missing log, deployment mismatch
Ops Action
backoff, queue, alert, rollout, or rollback

The most important thing in this section is not memorizing Docker commands, but understanding:

The core value of containerization is standardizing “application + dependencies + startup method” together, so deployment becomes a reproducible process instead of personal machine experience.

Once you make this step solid, service orchestration and production operations will have a foundation.


  1. Use the app.py and Dockerfile from this section to actually build a minimal image locally.
  2. Add another environment variable to the service, such as APP_MODE=dev.
  3. Think about this: why is the /health endpoint important for deployment systems?
  4. Explain in your own words: why is containerization the starting point of deployment, not the end?
Reference implementation and walkthrough
  1. The build should produce an image that starts reliably and exposes the expected port and health endpoint.
  2. APP_MODE should be read from the environment and reflected in config or logs without code changes.
  3. /health lets deployment systems know whether to route traffic, restart the container, or roll back.
  4. Containers package runtime, but deployment still needs secrets, scaling, logs, monitoring, storage, networking, security, and release processes.