Skip to main content

7.8.1 Project Roadmap: Choose Prompt, RAG, or Finetuning

This capstone turns Chapter 7 into one engineering decision: is the problem task expression, missing knowledge, unstable format, unsafe behavior, or weak evaluation?

See the Project Route First

LLM capstone project roadmap

LLM project method-selection loop

Portfolio evidence pack diagram

Do not start with the strongest model or the most complex framework. Start with a small domain task, a prompt baseline, fixed examples, and a failure log.

Run an Evidence Pack Check

Use this tiny project log before writing a report. It forces you to show the baseline, improvement, next route, and whether finetuning is actually justified.

project = {
"task": "classify course questions",
"baseline_pass_rate": 0.62,
"prompt_v2_pass_rate": 0.78,
"rag_needed": True,
"finetune_needed": False,
}

improvement = project["prompt_v2_pass_rate"] - project["baseline_pass_rate"]

print("task:", project["task"])
print("improvement:", round(improvement, 2))
print("next_route:", "RAG" if project["rag_needed"] else "Prompt")
print("fine_tune_now:", project["finetune_needed"])

Expected output:

task: classify course questions
improvement: 0.16
next_route: RAG
fine_tune_now: False

If your project cannot fill these fields, keep the project smaller. A clear comparison beats a large but untestable demo.

Learn in This Order

StepDoEvidence
1Pick one domain taskOne-sentence task definition and 10 fixed examples
2Build a prompt baselinePrompt version, outputs, pass/fail notes
3Classify failure typesTask wording, missing knowledge, format drift, safety boundary
4Choose the next methodPrompt iteration, RAG, or finetuning decision note
5Package the resultREADME, run command, screenshots, failure case, next steps

If you want a guided starter, run 7.8.4 Hands-on: Full Chapter 7 Workshop before designing your own domain project.

Project Deliverable Standards

DeliverableMinimum StandardStronger Portfolio Version
READMEGoal, run command, model or API choice, sample input/outputAdd method trade-offs, cost notes, evaluation, and retrospective
ExamplesAt least 10 fixed casesCompare prompt, RAG, finetuning, or rule-based versions
EvaluationClear pass/fail ruleAdd scores, failure-type statistics, and regression notes
Prompt/data recordSave prompt versions or sample formatAdd schema validation, data quality checks, and safety notes
PresentationScreenshot or short GIF proving it runsExplain why the chosen route beats alternatives

Pass Check

You pass this chapter when you can clearly explain “why not finetune here,” “why RAG is needed here,” or “why this prompt change works,” with a fixed evaluation set rather than a single good answer.

The final project can be basic: compare two prompt versions on one domain task. The stronger version adds RAG or a small finetuning experiment, but only after the baseline and failure log prove the need.