Skip to main content

6.8.3 Project: Text Sentiment Analysis

Section Overview

Sentiment analysis is a good first NLP project because the hard parts are visible: label boundaries, tokenization, negation, sarcasm, mixed sentiment, and error analysis.

Learning Objectives

  • Define sentiment labels before choosing a model.
  • Build an interpretable keyword baseline.
  • Improve one known error type with a simple negation rule.
  • Turn wrong predictions into error buckets.
  • Package a small NLP project as a reproducible deliverable.

See the Project Loop First

Sentiment analysis project closed loop

label boundary -> baseline -> predictions -> error buckets -> targeted upgrade

Start with binary labels:

  • positive: clearly recommends, praises, or expresses satisfaction.
  • negative: clearly complains, rejects, or expresses dissatisfaction.

Do not begin with too many labels such as neutral, mixed, irony, and unclear until the basic loop is stable.

Lab: Keyword Baseline and Negation Fix

Create sentiment_project_baseline.py:

from collections import Counter


def tokenize(text):
text = text.lower()
for ch in ",.!?":
text = text.replace(ch, "")
return text.split()


train = [
("clear examples and practical pace", "positive"),
("recommended and systematic course", "positive"),
("messy confusing and too fast", "negative"),
("unclear examples and weak structure", "negative"),
]

val = [
("clear and practical course", "positive"),
("messy and confusing pace", "negative"),
("not recommended", "negative"),
]

positive_words = Counter()
negative_words = Counter()

for text, label in train:
if label == "positive":
positive_words.update(tokenize(text))
else:
negative_words.update(tokenize(text))

positive_words.update(["recommended"] * 2)
negative_words.update(["messy"] * 2)


def predict(text):
score = sum(positive_words[t] - negative_words[t] for t in tokenize(text))
return ("positive" if score >= 0 else "negative"), score


def predict_with_negation(text):
score = 0
flip = False

for token in tokenize(text):
if token in {"not", "no", "never"}:
flip = True
continue

token_score = positive_words[token] - negative_words[token]
if flip and token_score != 0:
token_score *= -1
flip = False

score += token_score

return ("positive" if score >= 0 else "negative"), score


print("sentiment_baseline")
for text, gold in val:
pred, score = predict(text)
print({"gold": gold, "pred": pred, "score": score, "text": text})

print("with_negation")
for text, gold in val:
pred, score = predict_with_negation(text)
print({"gold": gold, "pred": pred, "score": score, "text": text})

Run it:

python sentiment_project_baseline.py

Expected output:

sentiment_baseline
{'gold': 'positive', 'pred': 'positive', 'score': 3, 'text': 'clear and practical course'}
{'gold': 'negative', 'pred': 'negative', 'score': -3, 'text': 'messy and confusing pace'}
{'gold': 'negative', 'pred': 'positive', 'score': 3, 'text': 'not recommended'}
with_negation
{'gold': 'positive', 'pred': 'positive', 'score': 3, 'text': 'clear and practical course'}
{'gold': 'negative', 'pred': 'negative', 'score': -3, 'text': 'messy and confusing pace'}
{'gold': 'negative', 'pred': 'negative', 'score': -3, 'text': 'not recommended'}

What this teaches:

  • the baseline is explainable because every token changes the score;
  • not recommended fails before the negation rule;
  • a targeted rule fixes one error type without pretending to solve all language understanding.

Error Buckets

Wrong cases should be grouped by type, not hidden.

error_buckets = {
"negation": [],
"sarcasm": [],
"mixed_sentiment": [],
"other": [],
}

examples = [
("Not recommended for this course", "negative", "positive"),
("Great, it got stuck again", "negative", "positive"),
("The content is great, but the pace is too fast", "negative", "positive"),
]

for text, gold, pred in examples:
lower = text.lower()
if "not" in lower:
error_buckets["negation"].append(text)
elif "great" in lower and "again" in lower:
error_buckets["sarcasm"].append(text)
elif "but" in lower:
error_buckets["mixed_sentiment"].append(text)
else:
error_buckets["other"].append(text)

for name, rows in error_buckets.items():
print(name, len(rows), rows)

This is project evidence. It shows what the model fails at and what you would improve next.

Upgrade Path

VersionWhat to addWhy
rule baselinekeyword counts and negation ruleexplainable starting point
traditional MLTF-IDF + LogisticRegressionstronger baseline with low cost
neural baselineembedding + pooling or small Transformerlearn representation features
portfolio versionerror buckets, comparison table, demo commandshows engineering judgment

What to Show in the README

Keep the README concrete:

  • label definitions;
  • dataset source and split;
  • run command;
  • baseline comparison table;
  • error buckets;
  • examples the model gets right and wrong;
  • next-step plan.

Common Mistakes

MistakeFix
labels are vaguewrite label rules before training
only reporting accuracyinclude error buckets and examples
ignoring negationtest not, never, and no cases
adding a deep model too earlykeep a rule or TF-IDF baseline
hiding sarcasm/mixed sentiment errorsdocument them as known limitations

Exercises

  1. Add "not clear" and "never useful" to validation examples.
  2. Add an other bucket example that your rules cannot classify.
  3. Replace keyword counts with TF-IDF in your project plan.
  4. Write a label rule for neutral, but do not add it to the model yet.
  5. Create a README outline for this project.

Key Takeaways

  • Sentiment projects live or die by label boundaries and error analysis.
  • Simple baselines are useful because they are explainable.
  • Negation is a classic first failure type.
  • Error buckets make the project stronger than a single accuracy score.