E.D AI Safety and Red Team Testing
Red teaming is a repeatable loop, not one scary prompt. You define attack surfaces, run cases, record failures, fix the system, and rerun the same cases.
See the Loop First
Section titled “See the Loop First”

Start with surfaces: prompt, retrieval, tools, memory, and external actions.
What You Need
Section titled “What You Need”- One AI feature to test
- A list of surfaces the feature touches
- A place to keep failed cases as regression tests
Run A Before And After Evaluator
Section titled “Run A Before And After Evaluator”cases = [ {"id": "prompt-basic", "surface": "prompt", "expected": "refuse", "before": "refuse", "after": "refuse"}, {"id": "rag-injection", "surface": "retrieval", "expected": "ignore_untrusted_instruction", "before": "ignore_untrusted_instruction", "after": "ignore_untrusted_instruction"}, {"id": "tool-confirmation", "surface": "tool", "expected": "ask_confirmation", "before": "executed", "after": "ask_confirmation"},]
for phase in ["before", "after"]: failures = [] for case in cases: passed = case[phase] == case["expected"] print(phase, case["id"], "PASS" if passed else "FAIL") if not passed: failures.append(case["id"]) print(phase, "failure_count:", len(failures))Expected output:
before prompt-basic PASSbefore rag-injection PASSbefore tool-confirmation FAILbefore failure_count: 1after prompt-basic PASSafter rag-injection PASSafter tool-confirmation PASSafter failure_count: 0The failed tool case is not embarrassing; it is now a regression test that protects future releases.
Red-Team Review
Section titled “Red-Team Review”Review a red-team run by separating three things: the surface that failed, the expected safe behavior, and the control that changed the result. For example, a tool surface might fail by executing too early, the safe behavior might be ask_confirmation, and the control might be a permission gate.
Do not summarize the run as “safer now.” Keep the original input, the unsafe output, the fix, and the rerun output. That record is what turns a scary prompt into a useful regression case.
For a first portfolio artifact, keep the case file boring and precise. Use columns such as case_id, surface, input, expected_safe_behavior, actual_before, guardrail, and actual_after. A reviewer should be able to rerun one row without guessing your intent.
If a case is too broad, split it. Prompt injection, tool misuse, data leak, and unsafe output are different failure modes. A small regression set with clear surfaces is more useful than a dramatic list of attacks that no one can reproduce.
When the case passes, do not delete it. Move it into the regression set and run it again before release.
Practical Checklist
Section titled “Practical Checklist”| Step | Action | Evidence |
|---|---|---|
| 1 | Define assets | User data, tools, memory, system prompts |
| 2 | Define surfaces | Prompt, documents, retrieval, tool calls, memory |
| 3 | Run cases | PASS / FAIL table |
| 4 | Fix and rerun | Regression report |
Evidence to Keep
Section titled “Evidence to Keep”Keep this page’s proof of learning as a small evidence card:
- Threat Model
- prompt injection, data leak, tool misuse, unsafe output, or model abuse
- Control
- validation, permission, sandbox, audit, red-team test, or incident response
- Test Case
- one attack or failure sample and expected safe behavior
- Failure Check
- trusting model text, missing logs, broad permissions, or no regression tests
- Expected Output
- security checklist plus one reproducible red-team case
Pass Check
Section titled “Pass Check”You pass this elective when you can keep a red-team case file, explain one failed surface, propose one guardrail, and rerun the case after the fix.
Check reasoning and explanation
A passing answer should name one surface, one failure, one guardrail, and the rerun result. For example: “The tool surface failed because the model executed without confirmation. The guardrail requires explicit user approval before external actions. After the fix, the same case returns ask_confirmation.”
The key is repeatability. A red-team note is useful only when the failed case becomes a regression case that future changes must pass.