9.10.2 Project: Intelligent Research Assistant

Learning Objectives

Learn how to define a clear scope for a research assistant project
Learn how to connect “retrieval -> reading -> summarization -> citation” into a closed loop
Learn how to define the most important evaluation criteria for this project
Learn how to package it as a convincing portfolio project

First, narrow the project scope

A research assistant project that is good for practice should start as:

Given a topic
Retrieve several documents
Output a structured summary
Attach a source to each summary item

Rather than immediately trying to build:

Automatic paper writing
Automatic literature reviews

Why?

Because for a research assistant, “trustworthy” matters more than “flashy.”

What does the minimal closed loop of a portfolio-grade research assistant look like?

Input a topic or question
Retrieve candidate materials
Select the most relevant materials
Generate a structured summary
Provide a source for each summary item
Perform error analysis and build a regression set

As long as these 6 steps are clear, the project already has strong portfolio value.

Research assistant citation trace map

Recommended progression order

For beginners, a more stable sequence is usually:

Narrow the topic scope first
Build the simplest retrieval baseline
Add structured summarization
Finally add citation validation and failure case demonstrations

This makes it easier to turn a “trustworthy research assistant” into a clear closed loop.

Let’s first look at a minimal research assistant example

This example does three things:

Simulate retrieval with keyword matching
Generate a structured summary
Attach a source to each summary item

docs = [
    {
        "id": "d1",
        "title": "RAG improves factual grounding",
        "text": "RAG can improve factual grounding by retrieving external evidence.",
        "keywords": {"rag", "retrieval", "grounding", "evidence"},
    },
    {
        "id": "d2",
        "title": "Long context still struggles with precision",
        "text": "Long context models may still miss key details without retrieval or re-ranking.",
        "keywords": {"long", "context", "retrieval", "ranking"},
    },
    {
        "id": "d3",
        "title": "Citations increase user trust",
        "text": "Users trust generated summaries more when each claim is tied to an explicit source.",
        "keywords": {"citation", "trust", "summary", "source"},
    },
]


def retrieve(query, top_k=2):
    query_terms = set(query.lower().split())
    scored = []
    for doc in docs:
        score = len(query_terms & doc["keywords"])
        scored.append((score, doc))
    scored.sort(key=lambda x: x[0], reverse=True)
    return [doc for score, doc in scored[:top_k] if score > 0]


def summarize_with_citations(query):
    hits = retrieve(query, top_k=2)
    bullets = []
    for doc in hits:
        bullets.append(
            {
                "claim": doc["text"],
                "source_id": doc["id"],
                "source_title": doc["title"],
            }
        )
    return bullets


query = "rag retrieval citation trust"
result = summarize_with_citations(query)
for item in result:
    print(item)

Expected output:

{'claim': 'RAG can improve factual grounding by retrieving external evidence.', 'source_id': 'd1', 'source_title': 'RAG improves factual grounding'}
{'claim': 'Users trust generated summaries more when each claim is tied to an explicit source.', 'source_id': 'd3', 'source_title': 'Citations increase user trust'}

Why is this example more valuable than a “project skeleton dataclass”?

Because it already reflects the most important product characteristic of a research assistant:

The result is not a black-box summary
Every conclusion can be traced back to a source

Why is citation the make-or-break issue for this kind of project?

Because without sources, users have a hard time telling whether:

The system really read it from the documents
Or the model just made it up

How should this project be evaluated?

Retrieval quality

For example:

Are the retrieved documents actually relevant?

Summary quality

For example:

Does it cover the key points?
Does it over-generalize?

Citation accuracy

This is a particularly important layer for a research assistant:

Does each claim really have support in the cited source?

A minimal evaluation data structure

Continue in the same file or Python session, because this block reuses summarize_with_citations().

eval_cases = [
    {
        "query": "rag retrieval grounding",
        "expected_source_ids": {"d1", "d2"},
    },
    {
        "query": "citation trust summary",
        "expected_source_ids": {"d3"},
    },
]

for case in eval_cases:
    hit_ids = sorted(item["source_id"] for item in summarize_with_citations(case["query"]))
    overlap = sorted(set(hit_ids) & case["expected_source_ids"])
    print({
        "query": case["query"],
        "hit_ids": hit_ids,
        "overlap": overlap,
    })

Expected output:

{'query': 'rag retrieval grounding', 'hit_ids': ['d1', 'd2'], 'overlap': ['d1', 'd2']}
{'query': 'citation trust summary', 'hit_ids': ['d3'], 'overlap': ['d3']}

Research assistant citation and evaluation result map

Use this map to check whether retrieval hits, citations, and final evaluation agree with each other.

The easiest pitfalls to fall into

Retrieval is correct, but the summary loses key points

The summary sounds smooth, but the sources do not match

The project only shows one answer that “looks smart”

What is actually most worth showing in a research assistant is:

Query
Retrieval results
Summary items
Citation sources

That complete trace.

How do you polish it into a portfolio-grade project?

Show four columns on the page

Query
Retrieved sources
Structured summary
Citations

Prepare 5–10 fixed evaluation questions

This lets you consistently show:

before / after
retrieval strategy changes
summary strategy improvements

List failure cases separately

For example:

Retrieved irrelevant documents
Missed the correct document
Summary claim does not match the citation

What to include when delivering the project

A flowchart from query to citation
Side-by-side display of retrieval results and the final summary
Failure cases where citations do not match or the summary misses important points
A short explanation of how you define “trustworthy output”

Portfolio-grade Agent delivery standards

If you use the research assistant as an Agent portfolio project, do not just show the final summary. Instead, show the full closed loop of “goal, tools, execution, citations, evaluation, and safety boundaries.”

Deliverable	Minimum Requirement	Portfolio-Grade Requirement
Goal definition	Can input a research topic	Clearly states scope, data sources, and unsupported tasks
Tool list	At least one retrieval or reading tool	Clearly explains each tool’s purpose, parameters, return values, and permission boundaries
Execution trace	Prints the retrieval and summarization process	Saves each step’s action, arguments, observation, and next decision
Citation checking	Each summary item has a source	Every key claim can be traced to a specific source fragment
Failure recovery	Returns an error when a tool fails	Distinguishes empty results, timeouts, unsupported citations, and missed summary points
Evaluation records	Prepares a small set of test questions	Has a fixed evaluation set, baseline, failed samples, and improvement records
Safety boundaries	Does not automatically execute high-risk actions	Clearly defines read-only tools, human confirmation, max steps, and cost limits

This table upgrades the project from “can summarize materials” to a “trustworthy, traceable, and reviewable Agent system.”

A minimal Agent trace example

goal: Summarize the differences between RAG and long-context models
step 1: action=retrieve, arguments={query: "rag long context retrieval"}
observation: matched d1, d2
step 2: action=read_sources, arguments={source_ids: ["d1", "d2"]}
observation: read content related to grounding, precision, and ranking
step 3: action=summarize_with_citations
observation: generated 3 summary items, each with a source_id
step 4: action=check_citations
observation: 2 passed, 1 had insufficient evidence
final: return 2 trustworthy summaries and flag 1 for human review

The value of this trace is: if the final result has a problem, you can replay exactly which step went wrong, instead of only staring at the final answer and guessing.

Failure case library

The most common failure in a research assistant is not “completely unable to answer,” but “sounds reasonable yet is not trustworthy.” You should record at least the following failure types.

Retrieval miss Key materials never enter the candidate set. Check whether the query is too narrow, keywords mismatch, or top_k is too small. Improve with query rewrite, hybrid retrieval, expanded candidates, and reranking.
Incomplete reading The matched document is correct, but a key paragraph is missed. Check chunk size and context packing. Improve with parent-child retrieval or adjusted context assembly.
Over-generalized summary The summary sounds right but drops limiting conditions. Improve the prompt so each answer preserves claim, condition, and source.
Unsupported citation The claim and source do not match. Add citation checks and verify each claim one by one.
Looping calls The Agent keeps retrieving and never stops. Add maximum steps and stop when no new information appears.

Putting these failure cases into the project will show engineering ability much better than only showing success cases.

Evidence to Keep

Keep this page’s proof of learning as a small evidence card:

Project Goal: what the agent should accomplish and what it must not do
Baseline: single-agent loop before adding advanced features
Trace Pack: goal, plan, tool calls, observations, memory, evaluation
Failure Log: one failed or unsafe run with root cause
Deliverable: README, run command, trace screenshot/log, next step

Summary

The most important idea in this section is to establish a portfolio-grade judgment:

The real highlight of a research assistant project is not “it can summarize,” but “it can organize retrieval, summarization, and citation into output that is trustworthy, traceable, and reviewable.”

Once that is in place, the project will feel much more like a mature Agent project.

Suggested version roadmap

Version	Goal	Delivery Focus
Basic	Get the minimal loop working	Can input, process, and output, while keeping a set of examples
Standard	Turn it into a showcaseable project	Add configuration, logs, error handling, README, and screenshots
Challenge	Approach portfolio quality	Add evaluation, comparison experiments, failure analysis, and a next-step roadmap

It is recommended to finish the basic version first. Do not try to make it large and complete from the beginning. Each time you level up, write into the README what new capability was added, how it was validated, and what problems remain.

Exercises

Add another document to the example so that a query creates “competition among relevant documents.”
Think about why “citation accuracy” is more critical in a research assistant than in ordinary Q&A.
If a summary looks great but the source does not match, would you count it as success? Why?
If you turn this project into a portfolio piece, which 4 sections should the homepage show first?

Project reference and review notes

Add a second document that partially overlaps with the first but disagrees on one detail or covers a different subtopic. The expected result is that retrieval ranks both and the answer cites the right source for each claim.
Citation accuracy is critical because research assistants are judged by traceable evidence, not only fluent summaries. A wrong citation can make a true-sounding answer unusable.
Do not count it as success if the source does not match. The correct result is to flag the mismatch, revise the summary, or say the evidence is insufficient.
A portfolio homepage should first show the problem, the evidence/citation flow, the demo trace, and evaluation results. Those four sections prove it is more than a generic chatbot.

9.10.2 Project: Intelligent Research Assistant

Learning Objectives

First, narrow the project scope

Why?

What does the minimal closed loop of a portfolio-grade research assistant look like?

Recommended progression order

Let’s first look at a minimal research assistant example

Why is this example more valuable than a “project skeleton dataclass”?

Why is citation the make-or-break issue for this kind of project?

How should this project be evaluated?

Retrieval quality

Summary quality

Citation accuracy

A minimal evaluation data structure

The easiest pitfalls to fall into

Retrieval is correct, but the summary loses key points

The summary sounds smooth, but the sources do not match

The project only shows one answer that “looks smart”

How do you polish it into a portfolio-grade project?

Show four columns on the page

Prepare 5–10 fixed evaluation questions

List failure cases separately

What to include when delivering the project

Portfolio-grade Agent delivery standards

Recommended README structure

A minimal Agent trace example

Failure case library

Evidence to Keep

Summary

Suggested version roadmap

Exercises