Skip to content

6.8.4 Project: Generative Models in Practice [Optional]

  • Explain why generative projects need different evaluation from classification.
  • Track quality and diversity together.
  • Build a small checkpoint review table.
  • Identify mode collapse and blurry-output failure modes.
  • Package generated samples as project evidence.

Generative model project evaluation loop

trainsample checkpointsreview quality + diversitykeep failureschoose next step

For a practice project, choose a generation target that is:

  • visually inspectable;
  • small enough to train or simulate;
  • easy to compare across checkpoints.

Digits, icons, simple shapes, or tiny grayscale patterns are better first projects than open-ended photorealistic generation.

Create generative_review_dashboard.py:

checkpoints = [
{"epoch": 1, "quality": 0.20, "diversity": 0.80, "note": "mostly noise"},
{"epoch": 10, "quality": 0.45, "diversity": 0.72, "note": "outlines appear"},
{"epoch": 30, "quality": 0.68, "diversity": 0.60, "note": "usable but varied"},
{"epoch": 60, "quality": 0.75, "diversity": 0.48, "note": "possible collapse"},
]
print("generation_review")
for row in checkpoints:
status = "candidate" if row["quality"] >= 0.6 and row["diversity"] >= 0.55 else "review"
print(
f"epoch={row['epoch']:03d} "
f"quality={row['quality']:.2f} "
f"diversity={row['diversity']:.2f} "
f"status={status}"
)
selected = max(
[row for row in checkpoints if row["diversity"] >= 0.55],
key=lambda row: row["quality"],
)
print("selected_epoch:", selected["epoch"])

Run it:

Terminal window
python generative_review_dashboard.py

Expected output:

Terminal window
generation_review
epoch=001 quality=0.20 diversity=0.80 status=review
epoch=010 quality=0.45 diversity=0.72 status=review
epoch=030 quality=0.68 diversity=0.60 status=candidate
epoch=060 quality=0.75 diversity=0.48 status=review
selected_epoch: 30

Checkpoint review result map for generative models

Why not pick epoch 60? Because quality is higher but diversity is lower. A good generative project does not select only the prettiest sample.

EvidenceWhy
samples by checkpointshows training progression
failure samplesreveals limits honestly
diversity notescatches repeated outputs
quality notesexplains visual improvements
training logsshows stability or collapse
final selection rulemakes the choice reproducible
DimensionGood signWarning sign
Qualitysamples look like target datanoisy, blurry, broken structure
Diversitysamples vary meaningfullyrepeated outputs or one dominant style
Stabilitycheckpoints improve graduallysudden collapse or oscillation
Interpretabilityfailures are documentedonly best samples are shown

The common trade-off:

best-looking single sample != best project checkpoint
VersionWhat to add
basicone model, fixed sampling seed, checkpoint samples
standardquality/diversity table and failure samples
challengecompare VAE, GAN, or diffusion-style outputs
portfolioclear story: data, model, samples, failures, next step

A generative project should leave this minimum evidence:

Checkpoint Samples
fixed-seed samples across epochs
Quality Note
what improved visually
Diversity Note
whether outputs repeat
Failure Sample
blurry, broken, collapsed, or unrealistic output
Selection Rule
why this checkpoint was kept
Next Action
data, objective, architecture, or sampling change
MistakeFix
showing only best samplesshow average and failure samples too
ignoring diversitytrack repeated outputs or unique patterns
comparing checkpoints by memoryuse the same fixed seed set
using a dataset too complex at firststart with small visual targets
not explaining model choicestate why VAE, GAN, or another method fits the goal
  1. Add an epoch 90 with quality 0.80 and diversity 0.30. Should it be selected?
  2. Add a failure field to each checkpoint.
  3. Write a 4-row table for your own generative project idea.
  4. Explain mode collapse using the checkpoint table.
  5. Draft a portfolio section titled “Why I selected this checkpoint.”
Project reference and review notes
  1. Usually no, unless the project values quality far more than diversity. A diversity score of 0.30 is a warning sign for repeated or narrow outputs.
  2. The failure field should name visible problems such as repetition, artifacts, prompt mismatch, unsafe output, or poor diversity.
  3. A useful table has rows for idea, data/source, evaluation signal, and main risk. The table should help someone judge whether the project can be evaluated.
  4. Mode collapse means the model produces a small set of similar outputs. In the checkpoint table, it looks like acceptable quality with low diversity.
  5. The portfolio section should justify the selected checkpoint with evidence: quality, diversity, failure notes, sample outputs, and why rejected checkpoints were weaker.
  • Generative projects need evaluation stories, not just galleries.
  • Quality and diversity must be read together.
  • Failure samples make the project more credible.
  • A clear checkpoint selection rule is part of the deliverable.