Understanding BM25 Scores:¶
Why Negative Numbers Are Totally Normal
Opening Intuition: What Is BM25 Trying to Do?¶
BM25 is a relevance scoring function. Its job is simple:
Given a query, how relevant is each document?
But BM25 doesn’t care about absolute numbers.
It only cares about relative ranking.
Think of BM25 like a judge in a talent show:
- It doesn’t matter whether the judge scores contestants 1–10 or -10–0
- What matters is who ranks higher
This is why negative scores are not a problem — they’re just part of the math.
Negative Scores Are Normal¶
BM25 scores can be positive, negative, or zero.
The only rule that matters:
Higher scores = better matches (even if they’re negative)
This is the part that confuses beginners, so emphasize it early and often.
Interpreting BM25 Scores¶
Positive Scores¶
These happen when the query terms are rare across your collection.
- Rare terms → high IDF → positive BM25
- Example: Searching for “quantum” in a blog collection
Negative Scores¶
These happen when the query terms are very common.
- Common terms → negative IDF → negative BM25
- Example: Searching for “the” or “programming”
Zero Scores¶
These happen when:
- The document contains none of the query terms
- Or the terms are so common that the math cancels out
Why Negative Scores Happen (The Math Intuition)¶
BM25 uses the IDF formula:
Where:
- N = total number of documents
- n = number of documents containing the term
If a term appears in more than half of your documents, the fraction becomes < 1, and:
- ln(<1) = negative number
- → negative IDF
- → negative BM25
This is expected and correct.
Mental Model:¶
Think of IDF Like “Uniqueness Points”
- Rare terms earn bonus points
- Common terms earn penalty points
- BM25 adds up all the bonuses and penalties
If all your query terms are common, you get a negative total, but the ranking still works.
Example Breakdown¶
| Document | "the" | "programming" | "language" | BM25 |
|---|---|---|---|---|
| Doc 1 | Yes | Yes | Yes | -5.2 |
| Doc 2 | Yes | Yes | No | -4.8 |
| Doc 3 | No | No | No | 0.0 |
Ranking:
- Doc 2 (best)
- Doc 1
- Doc 3 (worst)
Even though Doc 2 has a negative score, it’s still the best match.
When You’ll See Each Score Type¶
Positive Scores¶
- Rare terms
- Technical vocabulary
- Small document sets
Negative Scores¶
- Common words
- Terms appearing in most documents
- Broad/general vocabulary
Zero Scores¶
- No matching terms
- Completely unrelated content
Practical Experiments¶
1. Positive Score Experiment¶
Documents:
- “Python is great”
- “Java is fast”
- “C++ is powerful”
Query: python → positive score
2. Negative Score Experiment¶
Documents:
- “Programming in Python”
- “Programming in Java”
- “Programming in C++”
Query: programming → negative scores
3. Zero Score Experiment¶
Documents:
- “Python programming”
- “Java coding”
- “C++ development”
Query: cooking recipes → zero
BM25 vs TF‑IDF¶
Students often ask this, so it’s worth adding:
| Feature | TF‑I | BM2 |
|---|---|---|
| Term frequency | Linear | Saturates (diminishing returns) |
| Document length | Not handled well | Normalized |
| Ranking quality | Good | Better |
| Negative scores | Yes | Yes |
| Search engines | Rarely | Very common |
BM25 is essentially a smarter, more realistic TF‑IDF.
Why BM25 Still Matters in 2026¶
Even with vector search, embeddings, and RAG:
- BM25 is still the best first‑stage retriever
- It’s fast, cheap, and interpretable
- It handles exact keyword matching better than embeddings
- Hybrid search = BM25 + vectors → best of both worlds
Your students will encounter BM25 everywhere:
Elasticsearch, OpenSearch, Meilisearch, Vespa, Solr, and even WordPress plugins.
Final Takeaways¶
- Negative BM25 scores are normal
- Zero means no match
- Higher scores always win, even if negative
- BM25 is ranking-focused, not absolute-value-focused
- Understanding IDF is the key to understanding everything else
BM25 doesn’t care whether scores are positive or negative — it only cares about which document is the best match.