9.6.4 LlamaIndex

Section positioning

If LangGraph is more like a “state flow and workflow orchestration framework,” then LlamaIndex is more like:

A framework centered on organizing knowledge and documents.

It is especially well suited not for “multi-role collaboration” itself, but for:

how to organize documents after they come in
how to split them into chunks
how to build indexes
how to retrieve information
how to turn that into a Q&A entry point

Learning objectives

Understand LlamaIndex’s core abstract objects
Understand why it is especially suitable for knowledge and document scenarios
Understand the chain: Document -> Node -> Index -> Retriever -> Query Engine
Build judgment for when to prioritize LlamaIndex

Why are many LLM projects actually “knowledge system projects” first?

Not all systems are solving conversation problems

The core of many real-world LLM applications is not chatting, but:

enterprise knowledge base Q&A
document retrieval
research material integration
assisted report generation

What these tasks have in common is:

The way knowledge is organized directly determines system quality.

This is exactly where LlamaIndex is most valuable

It does not just ask, “How do we tune the model?” Instead, it asks:

how documents enter the system
how information is split
how retrieval structures are built
how queries are organized

So a very practical way to think about it is:

LlamaIndex is more like a knowledge system framework than a pure workflow framework.

First, distinguish the most important concepts

Document

The most original unit of knowledge. For example:

an article
a PDF
a piece of webpage content

Node

A smaller unit after a Document has been split. In many knowledge systems, what is actually used for retrieval is often not the whole document, but a finer-grained node.

Index

The way these nodes are organized into a queryable structure.

Retriever

Responsible for finding the relevant nodes based on the user query.

Query Engine

A higher-level layer that combines “query -> retrieval -> result organization” into a more complete unit.

Remember this one sentence first:

Documents are the raw material, nodes are the cut-up raw material, indexes are the storage structure, retrievers find the items, and query engines present the items to the user.

First, go through this chain with pure Python

Document -> Node

documents = [
    {"id": "doc1", "text": "You can request a refund within 7 days after purchase if your learning progress is below 20%."},
    {"id": "doc2", "text": "You can receive a certificate after completing all projects and passing the test."}
]

nodes = []
for doc in documents:
    nodes.append({
        "doc_id": doc["id"],
        "text": doc["text"]
    })

print(nodes)

Expected output:

[{'doc_id': 'doc1', 'text': 'You can request a refund within 7 days after purchase if your learning progress is below 20%.'}, {'doc_id': 'doc2', 'text': 'You can receive a certificate after completing all projects and passing the test.'}]

Although this example is simple, it already expresses a core idea:

Original documents are usually not used directly for Q&A. They are first turned into knowledge units that are more suitable for indexing and retrieval.

Why is “document ingestion” the first step in a knowledge system?

Raw documents are usually messy

Real documents may contain:

headers and footers
repeated paragraphs
table noise
very long paragraphs

If you do not handle these well first, retrieval later often gets worse too.

So the most common first step in a knowledge system is not “tune the model”

It is:

read the documents
clean them
split them
add metadata

That is why frameworks like LlamaIndex emphasize ingest so much.

Why are indexing and retrieval at its center?

Because knowledge applications fear one thing most: “the documents are there, but the system can’t find them”

If:

there are many documents
there are many nodes
questions are expressed very flexibly

then without a good indexing and retrieval layer, even a very strong model will be dragged down.

A minimal retrieval example

If you want to run this snippet locally, install scikit-learn first.

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

node_texts = [node["text"] for node in nodes]
vectorizer = TfidfVectorizer(token_pattern=r"(?u)\b\w+\b")
index_matrix = vectorizer.fit_transform(node_texts)

def retrieve(query):
    query_vec = vectorizer.transform([query])
    scores = cosine_similarity(query_vec, index_matrix)[0]
    best_idx = scores.argmax()
    return nodes[best_idx]

print(retrieve("What is the refund policy?"))

Expected output:

{'doc_id': 'doc1', 'text': 'You can request a refund within 7 days after purchase if your learning progress is below 20%.'}

What abstract ideas does this code really correspond to?

It already corresponds to:

node
index
retriever

In other words, much of LlamaIndex’s value is essentially about organizing this knowledge chain more systematically.

Why is it worth separating out the Query Engine?

Because Q&A is not just “return the most similar paragraph”

In a real system, you often still need to decide:

how many results to return
whether to summarize them
whether to include sources
whether to call the model again

At that point, a “query engine” looks more like a system-level abstraction than a single retriever.

A very simple Query Engine example

def query_engine(query):
    node = retrieve(query)
    return {
        "answer": node["text"],
        "source": node["doc_id"]
    }

print(query_engine("What is the refund policy?"))

Expected output:

{'answer': 'You can request a refund within 7 days after purchase if your learning progress is below 20%.', 'source': 'doc1'}

This example is teaching you:

Retrieval is only the middle layer. In the end, you still need a layer that organizes the result into a user-facing query interface.

What is the most important difference between LlamaIndex and LangGraph?

If we summarize it very roughly, remember this:

LangGraph is more about “how task states flow”
LlamaIndex is more about “how knowledge is organized”

Of course, you can mix them in real projects, but their first concerns are indeed different.

So if the essence of your project is:

document Q&A
knowledge base assistant
RAG main pipeline

then abstractions like LlamaIndex will usually feel more natural.

When is LlamaIndex not necessarily the main focus?

If your system is more about:

multi-Agent collaboration
complex loops
explicit state machines

then LlamaIndex may not be the “main framework,” but rather a knowledge-layer component.

So do not think of it as a “universal Agent framework.” Instead, think of it as:

A framework that is especially convenient for knowledge and retrieval problems.

Common mistakes beginners make

Looking only at the model and ignoring document ingestion

Many knowledge system problems actually come from the document entry point.

Thinking that once indexing is done, the Q&A system is complete

An index is only the middle layer, not the end product.

Not understanding its boundary with workflow-oriented frameworks

This makes it easy to expect it to solve problems that are not its strongest area.

Summary

The most important thing in this section is not memorizing LlamaIndex APIs, but understanding:

The value of LlamaIndex lies in organizing document knowledge from raw text into a structure that can be retrieved, cited, and queried.

Once you view it as a “knowledge organization framework” rather than a “universal framework,” many judgments become much clearer.

Exercises

Explain in your own words what Document, Node, Index, Retriever, and Query Engine are like.
Think about why the quality of document ingestion directly affects retrieval results later.
Recreate 3 nodes using your own knowledge base data and run the retrieval example again.
Explain: if the main pipeline of the system is multi-Agent collaboration rather than knowledge retrieval, why might LlamaIndex not be the best choice as the “main framework”?

Learning objectives​

Why are many LLM projects actually “knowledge system projects” first?​

Not all systems are solving conversation problems​

This is exactly where LlamaIndex is most valuable​

First, distinguish the most important concepts​

Document​

Node​

Index​

Retriever​

Query Engine​

First, go through this chain with pure Python​

Document -> Node​

Why is “document ingestion” the first step in a knowledge system?​

Raw documents are usually messy​

So the most common first step in a knowledge system is not “tune the model”​

Why are indexing and retrieval at its center?​

Because knowledge applications fear one thing most: “the documents are there, but the system can’t find them”​

A minimal retrieval example​

What abstract ideas does this code really correspond to?​

Why is it worth separating out the Query Engine?​

Because Q&A is not just “return the most similar paragraph”​

A very simple Query Engine example​

What is the most important difference between LlamaIndex and LangGraph?​

When is LlamaIndex not necessarily the main focus?​

Common mistakes beginners make​

Looking only at the model and ignoring document ingestion​

Thinking that once indexing is done, the Q&A system is complete​

Not understanding its boundary with workflow-oriented frameworks​

Summary​

Exercises​

Learning objectives

Why are many LLM projects actually “knowledge system projects” first?

Not all systems are solving conversation problems

This is exactly where LlamaIndex is most valuable

First, distinguish the most important concepts

Document

Node

Index

Retriever

Query Engine

First, go through this chain with pure Python

Document -> Node

Why is “document ingestion” the first step in a knowledge system?

Raw documents are usually messy

So the most common first step in a knowledge system is not “tune the model”

Why are indexing and retrieval at its center?

Because knowledge applications fear one thing most: “the documents are there, but the system can’t find them”

A minimal retrieval example

What abstract ideas does this code really correspond to?

Why is it worth separating out the Query Engine?

Because Q&A is not just “return the most similar paragraph”

A very simple Query Engine example

What is the most important difference between LlamaIndex and LangGraph?

When is LlamaIndex not necessarily the main focus?

Common mistakes beginners make

Looking only at the model and ignoring document ingestion

Thinking that once indexing is done, the Q&A system is complete

Not understanding its boundary with workflow-oriented frameworks

Summary

Exercises