Skip to main content

7 LLM Principles, Prompt, and Fine-tuning

Main visual for LLM principles

This chapter answers one practical question: when a user sends text to an LLM, what path does that text take, and how can you make the result stable enough for an application?

Do not start by memorizing model names. Start with the loop you can operate: text becomes tokens, tokens become vectors, the Transformer predicts the next token from context, and you control the result with Prompt, structured output, RAG, fine-tuning, or tools.

See the Whole Loop First

Token to answer lifecycle

Use this picture as the map for the whole chapter.

TermPlain meaningWhat you check in practice
TokenA small text unit after splitting inputDoes the prompt fit the context window?
EmbeddingA vector representation of a token or text chunkAre similar meanings close enough to compare or retrieve?
TransformerThe architecture that mixes token context with attentionWhich earlier words or examples affect the answer?
PretrainingLearning general language patterns from large dataWhat general capability already exists before your task?
PromptThe task instructions and context you send nowCan a clearer instruction solve the problem first?
Fine-tuningUpdating model behavior with training examplesIs this a repeated behavior pattern, not just missing knowledge?
AlignmentMaking outputs safer and closer to human intentWhat can still fail even when the answer sounds fluent?

Learning Order And Task List

The workshop belongs at the end. First build the mental model, then run the full experiment.

StepReadDoEvidence to keep
7.1NLP crash courseRun tokenizer and embedding examplesNotes that explain token, vector, and context
7.2LLM overview and historyMark where scale, data, instruction tuning, and alignment changed model behaviorOne timeline or capability map
7.3-7.4Transformer and pretrainingRead for intuition, not memorizationA diagram that explains attention, context, and training objective
7.5Prompt engineeringCompare prompt versions with fixed inputsPrompt versions, outputs, scores, failures
7.6Fine-tuningDecide whether a task needs Prompt, RAG, or fine-tuningA short decision table
7.7AlignmentCheck failure modes and safety boundariesA safety/evaluation checklist
7.8Stage projectRun 7.8.4 Hands-on: Full Chapter 7 WorkshopTerminal output, pass rate, README notes

First Runnable Loop: Prompt Testing Without an API

Prompt experiment loop

Prompt work should feel like testing software: keep the input cases fixed, change one prompt variable at a time, validate the output, and save failures.

Create ch07_prompt_test.py and run it with Python 3.10 or later. This offline toy runner does not call a real model; it teaches the evaluation loop. When you connect a real LLM SDK later, replace only toy_model().

import json

cases = [
{"topic": "gradient descent", "level": "beginner"},
{"topic": "RAG", "level": "intermediate"},
]

prompts = {
"plain": "Explain the topic.",
"teacher": "You are a patient AI teacher. Explain the topic in 3 short bullets.",
"json": "Return JSON with keys: topic, level, summary, next_step.",
}


def toy_model(prompt: str, case: dict) -> str:
topic = case["topic"]
level = case["level"]
if "Return JSON" in prompt:
return json.dumps(
{
"topic": topic,
"level": level,
"summary": f"{topic} explained for {level} learners",
"next_step": "Run one small example and record the result",
},
ensure_ascii=False,
)
if "patient AI teacher" in prompt:
return f"- Define {topic}\n- Show one example\n- Ask the learner to retry"
return f"{topic} is an AI concept."


def validate_json(raw: str) -> bool:
try:
data = json.loads(raw)
except json.JSONDecodeError:
return False
return {"topic", "level", "summary", "next_step"} <= data.keys()


for prompt_name, prompt in prompts.items():
passed = 0
for case in cases:
output = toy_model(prompt, case)
ok = validate_json(output) if prompt_name == "json" else bool(output.strip())
passed += int(ok)
print(f"{prompt_name}: {passed}/{len(cases)} cases passed")

Expected output:

plain: 2/2 cases passed
teacher: 2/2 cases passed
json: 2/2 cases passed

Operation tip: add one bad case, one longer case, and one output field requirement. If the score changes, write down which prompt change caused it. That habit matters more than a single good-looking answer.

Depth Ladder

LevelWhat you can prove
Minimum passYou can explain the token-to-answer path and run a fixed prompt test without a real API.
Project-readyYou can keep inputs fixed, change one prompt or schema variable at a time, validate structured output, and save failures.
Deeper checkYou can justify Prompt vs RAG vs fine-tuning vs tools using evidence, not preference, and name the safety boundary.

Choose Prompt, RAG, Fine-tuning, Or Tools

Prompt, RAG, fine-tuning, and tool choice map

When an LLM result is weak, do not jump straight to fine-tuning.

SymptomTry firstMove on when
The answer style is vagueImprove the Prompt and add examplesFixed cases still fail after clear instructions
The app needs JSON or table outputAdd a schema and parser validationThe model repeatedly misses fields or types
The answer lacks private or fresh factsUse RAG with retrieved documentsRetrieval is accurate but the model still applies the wrong behavior
The model must follow a repeated domain behaviorConsider fine-tuning or LoRAYou have enough high-quality examples and evaluation cases
The task needs external actionUse tools or an Agent workflowThe model must call APIs, search files, or execute steps

Common Failures

  • Treating an LLM as a database: fluent text is not proof of truth.
  • Changing the prompt, input cases, and model all at once: you cannot tell what improved the result.
  • Asking for structured output without validating it: a JSON-looking answer can still be invalid.
  • Fine-tuning too early: many problems should start with Prompt, RAG, tools, or product logic.
  • Reading Transformer details before seeing any output loop: the theory becomes hard to anchor.

Pass Check

Before entering Chapter 8, you should be able to:

  • explain token, embedding, attention, context window, pretraining, Prompt, fine-tuning, and alignment in plain language;
  • run the prompt testing loop above and change one prompt variable at a time;
  • save prompt versions, fixed input cases, outputs, scores, and failure samples;
  • decide whether a task should start with Prompt, structured output, RAG, fine-tuning, tools, or an Agent;
  • run the full chapter workshop and record the result in a short README.

For a printable checklist, use 7.0 Learning Checklist. For the guided project, start with 7.8.4 Hands-on: Full Chapter 7 Workshop.