6.8.4 Project: Generative Models in Practice [Optional]
Learning Objectives
Section titled “Learning Objectives”- Explain why generative projects need different evaluation from classification.
- Track quality and diversity together.
- Build a small checkpoint review table.
- Identify mode collapse and blurry-output failure modes.
- Package generated samples as project evidence.
See the Evaluation Loop First
Section titled “See the Evaluation Loop First”
trainsample checkpointsreview quality + diversitykeep failureschoose next step
For a practice project, choose a generation target that is:
- visually inspectable;
- small enough to train or simulate;
- easy to compare across checkpoints.
Digits, icons, simple shapes, or tiny grayscale patterns are better first projects than open-ended photorealistic generation.
Lab: Checkpoint Review Dashboard
Section titled “Lab: Checkpoint Review Dashboard”Create generative_review_dashboard.py:
checkpoints = [ {"epoch": 1, "quality": 0.20, "diversity": 0.80, "note": "mostly noise"}, {"epoch": 10, "quality": 0.45, "diversity": 0.72, "note": "outlines appear"}, {"epoch": 30, "quality": 0.68, "diversity": 0.60, "note": "usable but varied"}, {"epoch": 60, "quality": 0.75, "diversity": 0.48, "note": "possible collapse"},]
print("generation_review")for row in checkpoints: status = "candidate" if row["quality"] >= 0.6 and row["diversity"] >= 0.55 else "review" print( f"epoch={row['epoch']:03d} " f"quality={row['quality']:.2f} " f"diversity={row['diversity']:.2f} " f"status={status}" )
selected = max( [row for row in checkpoints if row["diversity"] >= 0.55], key=lambda row: row["quality"],)print("selected_epoch:", selected["epoch"])Run it:
python generative_review_dashboard.pyExpected output:
generation_reviewepoch=001 quality=0.20 diversity=0.80 status=reviewepoch=010 quality=0.45 diversity=0.72 status=reviewepoch=030 quality=0.68 diversity=0.60 status=candidateepoch=060 quality=0.75 diversity=0.48 status=reviewselected_epoch: 30
Why not pick epoch 60? Because quality is higher but diversity is lower. A good generative project does not select only the prettiest sample.
What to Save
Section titled “What to Save”| Evidence | Why |
|---|---|
| samples by checkpoint | shows training progression |
| failure samples | reveals limits honestly |
| diversity notes | catches repeated outputs |
| quality notes | explains visual improvements |
| training logs | shows stability or collapse |
| final selection rule | makes the choice reproducible |
Quality, Diversity, Stability
Section titled “Quality, Diversity, Stability”| Dimension | Good sign | Warning sign |
|---|---|---|
| Quality | samples look like target data | noisy, blurry, broken structure |
| Diversity | samples vary meaningfully | repeated outputs or one dominant style |
| Stability | checkpoints improve gradually | sudden collapse or oscillation |
| Interpretability | failures are documented | only best samples are shown |
The common trade-off:
best-looking single sample != best project checkpointProject Upgrade Path
Section titled “Project Upgrade Path”| Version | What to add |
|---|---|
| basic | one model, fixed sampling seed, checkpoint samples |
| standard | quality/diversity table and failure samples |
| challenge | compare VAE, GAN, or diffusion-style outputs |
| portfolio | clear story: data, model, samples, failures, next step |
Evidence to Keep
Section titled “Evidence to Keep”A generative project should leave this minimum evidence:
- Checkpoint Samples
- fixed-seed samples across epochs
- Quality Note
- what improved visually
- Diversity Note
- whether outputs repeat
- Failure Sample
- blurry, broken, collapsed, or unrealistic output
- Selection Rule
- why this checkpoint was kept
- Next Action
- data, objective, architecture, or sampling change
Common Mistakes
Section titled “Common Mistakes”| Mistake | Fix |
|---|---|
| showing only best samples | show average and failure samples too |
| ignoring diversity | track repeated outputs or unique patterns |
| comparing checkpoints by memory | use the same fixed seed set |
| using a dataset too complex at first | start with small visual targets |
| not explaining model choice | state why VAE, GAN, or another method fits the goal |
Exercises
Section titled “Exercises”- Add an epoch
90with quality0.80and diversity0.30. Should it be selected? - Add a
failurefield to each checkpoint. - Write a 4-row table for your own generative project idea.
- Explain mode collapse using the checkpoint table.
- Draft a portfolio section titled “Why I selected this checkpoint.”
Project reference and review notes
- Usually no, unless the project values quality far more than diversity. A diversity score of
0.30is a warning sign for repeated or narrow outputs. - The
failurefield should name visible problems such as repetition, artifacts, prompt mismatch, unsafe output, or poor diversity. - A useful table has rows for idea, data/source, evaluation signal, and main risk. The table should help someone judge whether the project can be evaluated.
- Mode collapse means the model produces a small set of similar outputs. In the checkpoint table, it looks like acceptable quality with low diversity.
- The portfolio section should justify the selected checkpoint with evidence: quality, diversity, failure notes, sample outputs, and why rejected checkpoints were weaker.
Key Takeaways
Section titled “Key Takeaways”- Generative projects need evaluation stories, not just galleries.
- Quality and diversity must be read together.
- Failure samples make the project more credible.
- A clear checkpoint selection rule is part of the deliverable.