Skip to main content

8 LLM Application Development and RAG

Main visual for LLM applications and RAG

Chapter 7 explained how an LLM produces text. Chapter 8 turns that model into a useful application: connect documents, retrieve evidence, answer with citations, log failures, and improve with an evaluation set.

Think of RAG as "read before answering." The model should not guess from memory when the answer must come from your course notes, company documents, product manuals, or private knowledge base.

See the RAG Application Loop

RAG application loop

Use this loop as the chapter map.

LayerJobWhat you should print or save
KnowledgeParse documents, clean text, split chunks, keep metadatachunks.jsonl, source, section, page, version
RetrievalFind the chunks most relevant to a questionquery, top-k chunks, scores, source IDs
GenerationAsk the LLM to answer only from retrieved contextfinal prompt, answer, citations, no-answer reason
ApplicationWrap the flow as CLI, API, chat UI, or internal toolrequest, response, error handling, user feedback
OperationsCompare quality, cost, latency, and failures over timeeval set, logs, token cost, latency, failure cases

Learning Order And Task List

Do the workshop after the basics. First make the retrieval chain visible; then replace simple parts with stronger components.

StepReadDoEvidence to keep
8.1RAG basics, document processing, retrieval, evaluationBuild a tiny document-to-answer loopchunks, top-k output, cited answer
8.2Deployment and unified APIsUnderstand cloud API, local model, and unified calling layerone calling note or config comparison
8.3LLM app developmentWrap the RAG loop with API, tools, dialog, or document parsingrequest/response sample and error path
8.4Engineering practicesAdd async, logging, monitoring, API design, or Docker noteslogs, config, deployment checklist
8.5Stage projectRun 8.5.6 Hands-on: Full Chapter 8 RAG App Workshopworkshop output, one added doc, one added eval case

First Runnable Loop: Tiny RAG Without a Framework

Before LangChain, LlamaIndex, or a vector database, run the smallest possible chain. The goal is not a powerful retriever; the goal is to see every step.

Create ch08_tiny_rag.py and run it with Python 3.10 or later.

import re

docs = [
{
"id": "ragops",
"source": "study-guide.md#ragops",
"text": "A RAG app needs an evaluation set with fixed questions, expected sources, ideal answers, and failure labels.",
},
{
"id": "chunking",
"source": "rag-basics.md#chunking",
"text": "A RAG app splits documents into chunks and keeps source metadata so answers can cite evidence.",
},
{
"id": "agentops",
"source": "agent-guide.md#trace",
"text": "Agent systems record tool calls, observations, permissions, and recovery steps.",
},
]

question = "Why does a RAG app need an evaluation set?"
STOPWORDS = {"a", "an", "the", "why", "does", "with", "and", "so", "can", "be"}


def tokenize(text: str) -> set[str]:
return set(re.findall(r"[\w\u4e00-\u9fff\u3040-\u30ff]+", text.lower())) - STOPWORDS


query_tokens = tokenize(question)
ranked = sorted(
(
(len(query_tokens & tokenize(doc["text"])), doc)
for doc in docs
),
key=lambda item: item[0],
reverse=True,
)

print("question:", question)
print("top chunks:")
for score, doc in ranked[:2]:
print(f"- {doc['id']} score={score} source={doc['source']}")

best = ranked[0][1]
answer = (
"Use a fixed evaluation set so every RAG change can be compared "
f"against the same questions and expected sources. [{best['source']}]"
)
print("answer:", answer)

Expected output:

question: Why does a RAG app need an evaluation set?
top chunks:
- ragops score=4 source=study-guide.md#ragops
- chunking score=2 source=rag-basics.md#chunking
answer: Use a fixed evaluation set so every RAG change can be compared against the same questions and expected sources. [study-guide.md#ragops]

Operation tip: add one new document, ask one new question, and print the top-k chunks before reading the final answer. If the evidence is wrong, the answer cannot be trusted.

Depth Ladder

LevelWhat you can prove
Minimum passYou can print chunks, top-k scores, answer, and citation for one question.
Project-readyYou can add metadata, handle empty retrieval with a no-answer response, and compare changes on a fixed eval set.
Deeper checkYou can separate document, chunking, retrieval, reranking, generation, citation, latency, and cost failures.

Debug Bad RAG Answers

RAG debugging ladder

When the answer is bad, locate the failing layer before changing the model.

SymptomPrint firstLikely fix
The answer has no sourcefinal prompt and retrieved chunkskeep source IDs in chunks and require citations
The source document has the answer but retrieval misses itoriginal text search and chunk textadjust chunk size, add keywords, use hybrid search
Many chunks are recalled but the best one is not firsttop-k scores and manual relevance labelsadd reranking or rule-based filtering
The answer uses old informationdocument version and index build timerebuild index and add regression tests
You cannot tell whether quality improvedbefore/after answers on the same questionscreate a fixed evaluation set

Common Failures

  • Treating "connected a vector database" as "RAG is done." RAG quality also depends on document quality, chunking, ranking, Prompt, citations, and evaluation.
  • Adding frameworks before understanding the chain. Frameworks are easier after you can print query, chunks, prompt, answer, and source.
  • Letting the model answer when retrieval is empty. A useful RAG app must say "I do not know from the provided sources."
  • Forgetting metadata. Without source, page, section, and version, citations and debugging become weak.
  • Optimizing by feeling. Use the same evaluation questions every time you change chunking, retrieval, reranking, or Prompt.

Pass Check

Before entering Chapter 9, you should be able to:

  • explain why RAG solves private, fresh, and citable knowledge problems;
  • run the tiny RAG script and inspect top-k chunks before the answer;
  • create chunks with source metadata and cite those sources in the answer;
  • separate document, chunking, retrieval, generation, citation, and deployment failures;
  • run the full Chapter 8 workshop, add one document, add one evaluation case, and record the result in a README.

For a printable checklist, use 8.0 Learning Checklist. For the guided project, start with 8.5.6 Hands-on: Full Chapter 8 RAG App Workshop.