Compare your content against BM25-ranked competitors for your search query... because why not?
Currently supports content in English and German. Language detection affects:
Impact:Scores for unsupported languages will be inflated by 20-40% due to stop words being counted as significant terms. BM25L and BM25-adpt are most affected due to their length normalization components.
Example:When processing Russian content (unsupported), common words like "в", "и", "на" remain in the text, leading to different term frequencies and document length calculations than intended.
Impact:Scores typically show 30-50% higher values for unsupported languages, as common function words artificially increase both TF and IDF components. WDF*IDF is slightly more robust, showing 20-35% inflation.
Example:For Chinese content (unsupported), function words that should be filtered remain in the calculations, potentially inflating frequency scores of non-meaningful terms.
Impact:Scores can be 40-60% higher for unsupported languages because unfiltered stop words create artificial term frequency patterns that the Poisson model interprets as significant.
Example:For Arabic content (unsupported), common particles and articles remain unfiltered, which affects the term frequency normalization and probability calculations.
Note: If language detection fails or detects an unsupported language, English stop words are used as fallback. This affects all scoring functions and may impact ranking accuracy.