Skip to content

11.4.2 Sequence Labeling Tasks

BIO label to entity recovery diagram

  • Understand the fundamental difference between sequence labeling and sentence-level classification
  • Understand why label schemes such as BIO / BIOES are commonly used
  • Use a runnable example to understand token-level labeling
  • Build the connection between sequence labeling and information extraction tasks

What Problem Is Sequence Labeling Solving?

Section titled “What Problem Is Sequence Labeling Solving?”

It is not just deciding “what kind of sentence this is,” but “which part of the sentence is what”

Section titled “It is not just deciding “what kind of sentence this is,” but “which part of the sentence is what””

For example, the sentence:

  • “Zhang San works at Peking University”

If you do text classification, you might only output:

  • This is a sentence about a person and a location

But sequence labeling cares more about:

  • Zhang San is a person name
  • Peking University is an organization name

Because many real-world applications are not satisfied with sentence-level understanding. They care more about:

  • person names
  • addresses
  • organization names
  • amounts
  • time expressions

That is, the positions and boundaries of these specific spans.

Text classification is like putting a label on an entire article. Sequence labeling is like using a highlighter to circle important parts in the sentence.


Many pieces of information we want to extract are not single words, but a continuous span. For example:

  • Shanghai Jiao Tong University
  • June 1, 2025

That is why common label schemes do not simply write:

  • PERSON
  • LOCATION

Instead, they write:

  • B-PER
  • I-PER
  • O
  • B-: beginning of an entity
  • I-: inside an entity
  • O: not part of any entity

This lets the system distinguish more clearly:

  • where an entity starts
  • where it ends

tokens = ["Zhang San", "works at", "Peking", "University", "today"]
tags = ["B-PER", "O", "B-ORG", "I-ORG", "O"]
for tok, tag in zip(tokens, tags):
print(tok, tag)

Expected output:

Terminal window
Zhang San B-PER
works at O
Peking B-ORG
University I-ORG
today O

The token list and the tag list have the same length. That one-to-one alignment is the first thing to verify in every sequence-labeling dataset.

What is the most important thing in this example?

Section titled “What is the most important thing in this example?”

It shows you:

  • a sequence input
  • a corresponding sequence output

This is the most essential form of sequence labeling:

Input a sequence of tokens, output a sequence of labels of the same length.

Why are Peking University labeled as B-ORG / I-ORG?

Section titled “Why are Peking University labeled as B-ORG / I-ORG?”

Because the goal here is to express:

  • this is one continuous entity

not two separate entities.


The following example recovers entity spans from token + BIO labels.

tokens = ["Zhang San", "works at", "Peking", "University", "today"]
tags = ["B-PER", "O", "B-ORG", "I-ORG", "O"]
def decode_entities(tokens, tags):
entities = []
current_tokens = []
current_type = None
for token, tag in zip(tokens, tags):
if tag == "O":
if current_tokens:
entities.append((" ".join(current_tokens), current_type))
current_tokens = []
current_type = None
continue
prefix, entity_type = tag.split("-", 1)
if prefix == "B":
if current_tokens:
entities.append((" ".join(current_tokens), current_type))
current_tokens = [token]
current_type = entity_type
elif prefix == "I" and current_type == entity_type:
current_tokens.append(token)
else:
# If the label is invalid, simply cut off and restart
if current_tokens:
entities.append((" ".join(current_tokens), current_type))
current_tokens = [token]
current_type = entity_type
if current_tokens:
entities.append((" ".join(current_tokens), current_type))
return entities
print(decode_entities(tokens, tags))

Expected output:

Terminal window
[('Zhang San', 'PER'), ('Peking University', 'ORG')]

This is the step that turns token-level labels into the project output people actually use: entity text plus entity type.

Because it connects the “labeling task” with the “extraction result.” In real systems, what we usually care about is not the labels themselves, but:

  • entity spans
  • entity types

What Is the Relationship Between Sequence Labeling and Information Extraction?

Section titled “What Is the Relationship Between Sequence Labeling and Information Extraction?”

The most classic example is:

  • named entity recognition

It can also be used for:

  • slot filling
  • keyword extraction
  • event trigger identification

So it is a “foundational skill” for information extraction

Section titled “So it is a “foundational skill” for information extraction”

Many extraction systems become more complex later, but the most basic first step is often still:

  • mark the key spans first

Keep this page’s proof of learning as a small evidence card:

Schema
entity types, BIO tags, or sequence-label rules
Prediction
token-level labels and extracted spans
Metric
entity precision/recall/F1 and boundary cases
Failure Check
span boundary, nested entity, unknown word, or inconsistent annotation
Expected Output
gold-vs-predicted span table with at least one miss

Mistake 1: Treating sequence labeling like ordinary classification

Section titled “Mistake 1: Treating sequence labeling like ordinary classification”

The biggest difference from sentence-level classification is:

  • the output is sequence-aligned

Mistake 2: Only looking at labels and ignoring boundary recovery

Section titled “Mistake 2: Only looking at labels and ignoring boundary recovery”

Real systems care more about the final extracted entity spans, not the label table itself.

Mistake 3: Designing the label scheme casually

Section titled “Mistake 3: Designing the label scheme casually”

If the label design is messy, both the model and the evaluation will become messy too.


The most important takeaway from this lesson is to build one core intuition:

The core of sequence labeling is to assign labels to each token in the input sequence, so that the key spans and boundaries inside the sentence can be recovered.

Once this intuition is solid, it will be much smoother to learn NER, BiLSTM+CRF, and information extraction projects later.


  1. Add another time entity to the example, such as 2025, and write a BIO label sequence yourself.
  2. Why is the key role of the BIO label scheme to express entity boundaries?
  3. Explain in your own words: what is the biggest difference between sequence labeling and text classification?
  4. Think about this: if an invalid I-XXX appears in the label sequence, how should the system handle it more robustly?
Reference implementation and walkthrough
  1. For 2025, use B-TIME if it is a one-token time entity; use B-TIME I-TIME ... only when the entity spans multiple tokens.
  2. BIO expresses boundaries by marking where an entity begins and which following tokens continue the same entity.
  3. Sequence labeling outputs one label per token, while text classification outputs one label for the whole text.
  4. An invalid I-XXX should be repaired or rejected by post-processing, logged as an error, and traced back to training labels or decoding rules.