13.0 学習チェックリスト：オープンソース LLM デプロイ

このページは印刷用チェックリストとして使います。詳しい説明が必要なときは、第 13 章入口ページに戻ってください。

第13章 OSS LLM 学習チェックリスト

2時間の初回通読

20 分：計算ルートを選ぶ 「この run は local CPU、free Colab、rented GPU のどれに置くべきか、そして何をまだ証明できないか」を言えたら止めます。
20 分：environment check を動かす 「このマシンが CUDA を使えるか、CPU だけか分かる」と言えたら止めます。
25 分：runbook script を動かす 「hardware と project constraints から runtime を選べる」と言えたら止めます。
25 分：mini GPT-2 training evidence を作る 「local smoke test はできたし、CUDA training log に何が必要か分かる」と言えたら止めます。
25 分：5 prompt の eval table を作る 「runtime や tuning を変える前に model behavior を比較できる」と言えたら止めます。
30 分：adaptation decision を書く 「Prompt、RAG、quantization、LoRA、no tuning の理由を説明できる」と言えたら止めます。
30 分：release runbook を書く 「別の engineer がこの service を start、test、stop、rollback できる」と言えたら止めます。

environment_report.txt：Python、torch、CUDA/device、platform、disk または instance note。
compute_route.md：local CPU、free Colab、rented GPU の選択、fallback、stop rule。
model_decision.md：model、size、license、source、reason、rejected alternatives。
open_weight_route.json：route、candidate family、runtime fit、adaptation choice、required evidence。
model_runtime_decision.json：local CPU、free Colab、rented GPU の route 別 runtime recommendation。
open_llm_runbook.json：runtime choice、adaptation choice、required evidence。
api_smoke_test.json：local OpenAI-compatible API の health check と sample request/response proof。
first_run.md：exact command、prompt、output、latency または memory note。
eval_cases.csv：5つ以上の prompts、expected behavior、pass/fail、notes。
openllm_gpu_training_run/：environment_report.json、training_log.csv、mini_gpt2_checkpoint.pt、sample.txt。CPU/MPS smoke test か CUDA acceptance run かも記録する。
gpu_train_log.txt：device、3 行以上の loss、checkpoint path、sample output を含む terminal trace。
README.md：setup、run、evaluate、stop server、rollback または shutdown。

Reproducibility：他のエンジニアが model version、runtime、command、environment を特定できる。
Safety：共有前に license、privacy、auth、logging、shutdown を確認している。
Evaluation：runtime や tuning の変更を同じ eval cases で比較している。
Training evidence：CPU/MPS は smoke test として記録し、GPU training completion を主張する前に CUDA run を残している。
Cost control：free notebook limits または GPU rental time、memory、latency、stop procedure を記録している。
Adaptation：fine-tuning が1回の不満ではなく繰り返す証拠に基づいている。

答えがすべて「はい」なら、オープンソース LLM をランダムな model demo ではなく engineering option として扱えます。

このページを終えたら、この証拠カードを残します。

環境レポート: Python、torch、CUDA/device、platform、hardware/cost note
計算ルート: local CPU / free Colab / rented GPU、fallback、stop rule
モデル決定: selected model、license、size、source、rejected alternatives
Runtime Contract: command または endpoint、request format、response format、error path
Training Evidence: mini GPT-2 device、loss log、checkpoint、sample、shutdown proof
評価: fixed prompts、outputs、pass/fail notes、latency または memory note
適応選択: Prompt/RAG/quantization/LoRA/full fine-tune decision with reason