Skip to content

7.5.5 Prompt Engineering Practice

  • Learn how to judge why a prompt is bad
  • Learn how to improve a prompt from four angles: goal, constraints, examples, and output format
  • Understand the prompt iteration process for several typical tasks
  • Build the habit of debugging Prompts instead of writing them by guesswork

The Most Common Misunderstandings About Prompt Engineering

Section titled “The Most Common Misunderstandings About Prompt Engineering”

Misunderstanding: A prompt is just “writing more politely”

Section titled “Misunderstanding: A prompt is just “writing more politely””

In fact, Prompt Engineering really cares about:

  • Whether the task definition is clear
  • Whether the output requirements are explicit
  • Whether the constraints are executable
  • Whether the examples provide enough guidance

Politeness is usually not the key point.

A Prompt is the task interface documentation you write for the model.

If the documentation is vague, the model’s output will naturally be unstable.


Task: Sentiment classification for user reviews

Section titled “Task: Sentiment classification for user reviews”

A very poor prompt might look like this:

Help me analyze this comment.

What is wrong with it?

  • It does not say what to analyze
  • It does not specify the output format
  • It does not define the label set
  • It does not say whether an explanation is needed
Please determine the sentiment of the review below. Only output positive or negative. Do not output anything else.
Review: This course is explained very clearly, and there are many examples.

This version is much clearer because it defines:

  • Task: sentiment classification
  • Output set: positive / negative
  • Output constraint: no extra content

The Four Core Dimensions of Prompt Debugging

Section titled “The Four Core Dimensions of Prompt Debugging”

First ask:

  • Is the model supposed to classify, summarize, extract, or rewrite?

Then ask:

  • Is the output a sentence?
  • A label?
  • JSON?
  • A table?

For example:

  • Do not hallucinate
  • Do not output extra explanations
  • Answer only based on the given text

For some tasks, instructions alone are not enough. It is better to add few-shot examples.

These four questions basically form the main thread of Prompt practice.


The example below does not call a real large model. Instead, it uses a “task specification object” to help you learn how to break down Prompt requirements.

prompt_spec = {
"task": "sentiment_classification",
"allowed_labels": ["positive", "negative"],
"output_format": "single_label",
"constraints": ["Do not output explanations", "Only output the label"]
}
print(prompt_spec)

Expected output:

Terminal window
{'task': 'sentiment_classification', 'allowed_labels': ['positive', 'negative'], 'output_format': 'single_label', 'constraints': ['Do not output explanations', 'Only output the label']}

This example looks simple, but it teaches you something very important:

Behind a good Prompt, there is usually a clearer task specification.


Summarize this paragraph.

Problems:

  • It does not say how long the summary should be
  • It does not say what style to use
  • It does not say whether key points should be preserved
Please summarize the text below into 3 bullet points in Chinese, with no more than 20 characters per point.

This is much better.

Please summarize the text below into 3 bullet points in Chinese, with no more than 20 characters per point.
Keep only the facts, and do not add any information that is not in the original text.

At this point, the prompt has moved from “able to respond” to “more stable and controllable.”


When the task definition is not clear enough from language alone

Section titled “When the task definition is not clear enough from language alone”

For example, if you ask the model to decide whether a sentence is:

  • fact
  • opinion

If you only provide definitions, the model may interpret them inconsistently. In this case, few-shot examples are very helpful.

few_shot_examples = [
{"input": "Beijing is the capital of China.", "output": "fact"},
{"input": "This course is very interesting.", "output": "opinion"}
]
for ex in few_shot_examples:
print(ex)

Expected output:

Terminal window
{'input': 'Beijing is the capital of China.', 'output': 'fact'}
{'input': 'This course is very interesting.', 'output': 'opinion'}

The role of few-shot is not “writing more words,” but:

Showing the model the judgment style you want.


How Can You Write Prompts More Stably for Structured Tasks?

Section titled “How Can You Write Prompts More Stably for Structured Tasks?”

A typical scenario: Information extraction

Section titled “A typical scenario: Information extraction”

If you only say:

Help me extract resume information.

Then the model may:

  • Miss fields
  • Use inconsistent field names
  • Output extra explanations
Please extract the information from the resume below and output JSON.
Fields:
- name: string
- school: string
- skills: list[string]
Do not output any extra explanation.

This clearly explains the task, structure, and boundaries.


The “Minimal Experiment” Habit in Prompt Practice

Section titled “The “Minimal Experiment” Habit in Prompt Practice”

The biggest trap in Prompt debugging is this:

  • The task description changes
  • The examples change
  • The output format changes too

Then you have no idea which change actually mattered.

Change only one variable at a time, for example:

  1. First add output constraints only
  2. Then add few-shot examples only
  3. Then change only the format requirements

This is very similar to tuning model hyperparameters.

Prompt debugging loop comic


test_cases = [
{"input": "This course is explained very clearly.", "expected": "positive"},
{"input": "The content is a bit messy.", "expected": "negative"}
]
for case in test_cases:
print(case)

Expected output:

Terminal window
{'input': 'This course is explained very clearly.', 'expected': 'positive'}
{'input': 'The content is a bit messy.', 'expected': 'negative'}

Because Prompt Engineering also needs evaluation. Without test samples, you can only judge whether a prompt is “good” based on feeling.

A more mature approach is:

  • Have an input set
  • Have expected outputs
  • Check whether the prompt consistently matches expectations

Not clearly defining the output when writing prompts

Section titled “Not clearly defining the output when writing prompts”

This makes post-processing increasingly painful.

Thinking prompt tuning can only rely on inspiration

Section titled “Thinking prompt tuning can only rely on inspiration”

In fact, it is very similar to ordinary engineering debugging: run small experiments, look at the results, and improve step by step.

Getting one example right does not mean the prompt is stable.


Keep this page’s proof of learning as a small evidence card:

Baseline Prompt
first version and failure
Changed Variable
one prompt dimension changed at a time
Score
simple pass/fail or rubric result
Failure Bucket
instruction, context, format, or ambiguity
Next Iteration
one concrete edit to try

The most important thing in this section is not memorizing how many Prompt techniques you know, but building this habit:

Treat Prompt as a task interface to design, and as a system component to debug.

When you start iterating around the task goal, format, constraints, and examples instead of writing one sentence by intuition, Prompt Engineering truly begins to mature.


  1. Choose a task you are familiar with, first write a “bad prompt,” then improve it step by step into a better version.
  2. Add a few-shot version for the “sentiment classification” task.
  3. Rewrite the “text summarization” task into a structured output format, such as JSON.
  4. Explain in your own words: Why is Prompt Engineering not “writing one nice sentence,” but “designing a task interface”?
Reference implementation and walkthrough
  1. A bad prompt is vague, such as “analyze this.” A better version names the task, input, expected output, labels, constraints, and at least one failure boundary.
  2. The few-shot version should include representative positive, negative, and neutral examples, then require the same label format for the new case.
  3. A structured summary could return {"summary": "...", "key_points": ["..."], "risks": ["..."], "missing_info": ["..."]}.
  4. Prompt engineering is interface design because it defines inputs, outputs, constraints, validation expectations, and failure handling between the model and the surrounding system.