Understanding BM25 Scores:¶
Why Negative Numbers Are Totally Normal.
Opening Intuition: What Is BM25 Trying to Do?¶
BM25 is a relevance scoring function. Its job is simple:
Given a query, how relevant is each document?
But BM25 doesn’t care about absolute numbers.
It only cares about relative ranking.
Think of BM25 like a judge in a talent show:
- It doesn’t matter whether the judge scores contestants 1–10 or -10–0
- What matters is who ranks higher
This is why negative scores are not a problem — they’re just part of the math.
Negative Scores Are Normal¶
BM25 scores can be positive, negative, or zero.
The only rule that matters:
Higher scores = better matches (even if they’re negative)
This is can be confusing initially.
Interpreting BM25 Scores¶
Positive Scores¶
These happen when the query terms are rare across your collection.
- Rare terms → high IDF → positive BM25
- Example: Searching for “quantum” in a blog collection
Negative Scores¶
These happen when the query terms are very common.
- Common terms → negative IDF → negative BM25
- Example: Searching for “the” or “programming”
Zero Scores¶
These happen when:
- The document contains none of the query terms
- Or the terms are so common that the math cancels out
Why Negative Scores Happen (The Math Intuition)¶
BM25 uses the IDF formula:
Where:
- N = total number of documents
- n = number of documents containing the term
If a term appears in more than half of your documents, the fraction becomes < 1, and:
- ln(<1) = negative number
- → negative IDF
- → negative BM25
This is expected and correct.
Mental Model:¶
Think of IDF Like “Uniqueness Points”
- Rare terms earn bonus points
- Common terms earn penalty points
- BM25 adds up all the bonuses and penalties
If all your query terms are common, you get a negative total, but the ranking still works.
Example Breakdown¶
| Document | "the" | "programming" | "language" | BM25 |
|---|---|---|---|---|
| Doc 1 | Yes | Yes | Yes | -5.2 |
| Doc 2 | Yes | Yes | No | -4.8 |
| Doc 3 | No | No | No | 0.0 |
Ranking:
- Doc 2 (best)
- Doc 1
- Doc 3 (worst)
Even though Doc 2 has a negative score, it’s still the best match.
Zero values as in Doc 3 are filtered out even though 0 > negative number.
When You’ll See Each Score Type¶
Positive Scores¶
- Rare terms
- Technical vocabulary
- Small document sets
Negative Scores¶
- Common words
- Terms appearing in most documents
- Broad/general vocabulary
Zero Scores¶
- No matching terms
- Completely unrelated content
Practical Experiments¶
1. Positive Score Experiment¶
Documents:
- “Python is great”
- “Java is fast”
- “C++ is powerful”
Query: python → positive score
2. Negative Score Experiment¶
Documents:
- “Programming in Python”
- “Programming in Java”
- “Programming in C++”
Query: programming → negative scores
3. Zero Score Experiment¶
Documents:
- “Python programming”
- “Java coding”
- “C++ development”
Query: cooking recipes → zero
BM25 vs TF‑IDF¶
Developers often ask this, so it’s worth adding:
| Feature | TF‑IDF | BM25 |
|---|---|---|
| Term frequency | Linear | Saturates (diminishing returns) |
| Document length | Not handled well | Normalized |
| Ranking quality | Good | Better |
| Negative scores | Yes | Yes |
| Search engines | Rarely | Very common |
BM25 is essentially a smarter, more realistic TF‑IDF.
Why BM25 Still Matters in 2026¶
Even with vector search, embeddings, and RAG:
- BM25 is still the best first‑stage retriever
- It’s fast, cheap, and interpretable
- It handles exact keyword matching better than embeddings
- Hybrid search = BM25 + vectors → best of both worlds
Your students will encounter BM25 everywhere:
Elasticsearch, OpenSearch, Meilisearch, Vespa, Solr, and even WordPress plugins.
Final Takeaways¶
- Negative BM25 scores are normal
- Zero means no match
- Higher scores always win, even if negative
- BM25 is ranking-focused, not absolute-value-focused
- Understanding IDF is the key to understanding everything else
BM25 doesn’t care whether scores are positive or negative — it only cares about which document is the best match.