Understanding BM25 Scores:¶

Why Negative Numbers Are Totally Normal.

Opening Intuition: What Is BM25 Trying to Do?¶

BM25 is a relevance scoring function. Its job is simple:

Given a query, how relevant is each document?

But BM25 doesn’t care about absolute numbers.

It only cares about relative ranking.

Think of BM25 like a judge in a talent show:

It doesn’t matter whether the judge scores contestants 1–10 or -10–0
What matters is who ranks higher

This is why negative scores are not a problem — they’re just part of the math.

Negative Scores Are Normal¶

BM25 scores can be positive, negative, or zero.

The only rule that matters:

Higher scores = better matches (even if they’re negative)

This is can be confusing initially.

Interpreting BM25 Scores¶

Positive Scores¶

These happen when the query terms are rare across your collection.

Rare terms → high IDF → positive BM25
Example: Searching for “quantum” in a blog collection

Negative Scores¶

These happen when the query terms are very common.

Common terms → negative IDF → negative BM25
Example: Searching for “the” or “programming”

Zero Scores¶

These happen when:

The document contains none of the query terms
Or the terms are so common that the math cancels out

Why Negative Scores Happen (The Math Intuition)¶

BM25 uses the IDF formula:

IDF = ln((N - n + 0.5) / (n + 0.5))

Where:

N = total number of documents
n = number of documents containing the term

If a term appears in more than half of your documents, the fraction becomes < 1, and:

ln(<1) = negative number
→ negative IDF
→ negative BM25

This is expected and correct.

Mental Model:¶

Think of IDF Like “Uniqueness Points”

Rare terms earn bonus points
Common terms earn penalty points
BM25 adds up all the bonuses and penalties

If all your query terms are common, you get a negative total, but the ranking still works.

Example Breakdown¶

Document	"the"	"programming"	"language"	BM25
Doc 1	Yes	Yes	Yes	-5.2
Doc 2	Yes	Yes	No	-4.8
Doc 3	No	No	No	0.0

Ranking:

Doc 2 (best)
Doc 1
Doc 3 (worst)

Even though Doc 2 has a negative score, it’s still the best match.

Zero values as in Doc 3 are filtered out even though 0 > negative number.

When You’ll See Each Score Type¶

Positive Scores¶

Rare terms
Technical vocabulary
Small document sets

Negative Scores¶

Common words
Terms appearing in most documents
Broad/general vocabulary

Zero Scores¶

No matching terms
Completely unrelated content

Practical Experiments¶

1. Positive Score Experiment¶

Documents:

“Python is great”
“Java is fast”
“C++ is powerful”

Query: python → positive score

2. Negative Score Experiment¶

Documents:

“Programming in Python”
“Programming in Java”
“Programming in C++”

Query: programming → negative scores

3. Zero Score Experiment¶

Documents:

“Python programming”
“Java coding”
“C++ development”

Query: cooking recipes → zero

BM25 vs TF‑IDF¶

Developers often ask this, so it’s worth adding:

Feature	TF‑IDF	BM25
Term frequency	Linear	Saturates (diminishing returns)
Document length	Not handled well	Normalized
Ranking quality	Good	Better
Negative scores	Yes	Yes
Search engines	Rarely	Very common

BM25 is essentially a smarter, more realistic TF‑IDF.

Why BM25 Still Matters in 2026¶

Even with vector search, embeddings, and RAG:

BM25 is still the best first‑stage retriever
It’s fast, cheap, and interpretable
It handles exact keyword matching better than embeddings
Hybrid search = BM25 + vectors → best of both worlds

Your students will encounter BM25 everywhere:
Elasticsearch, OpenSearch, Meilisearch, Vespa, Solr, and even WordPress plugins.

Final Takeaways¶

Negative BM25 scores are normal
Zero means no match
Higher scores always win, even if negative
BM25 is ranking-focused, not absolute-value-focused
Understanding IDF is the key to understanding everything else

BM25 doesn’t care whether scores are positive or negative — it only cares about which document is the best match.