What is BM25 Algorithm
BM25 is search algorithm that helps retrieve relevant documents. It is used by search engines in one form or another. It improves upon the TF-IDF (Term Frequency, Inverse Document Frequency) algorithm by making a few adjustments in the formula.
Let's First Learn About TF-IDF algorithm
Let's say you have these three documents. For a given search query, you want the most relevant document.
Enter your search term
Documents could also mean three different websites, or three different recipes, or three different text files in your computer, or three different pages of a book, verses in the Bible, etc.
The Impact of Climate Change on Coastal Cities
Rising sea levels and more frequent natural disasters are putting coastal cities around the world at risk. According to scientists, climate change is causing these issues due to the melting of polar ice caps and glaciers. As a result, cities like Miami and New York are seeing increased flooding and erosion, posing significant threats to infrastructure and public health.
The Benefits of Meditation
Meditation has been shown to have numerous benefits for both physical and mental health. Studies have found that regular meditation practice can reduce stress and anxiety, improve sleep quality, and even boost the immune system. Additionally, meditation has been linked to increased focus and attention, making it a valuable tool for students and professionals alike.
The Evolution of Artificial Intelligence
Artificial intelligence (AI) has come a long way since its inception in the mid-20th century. Today, AI is being used in a wide range of applications, from voice assistants like Siri and Alexa to self-driving cars and medical diagnosis tools. As AI technology continues to advance, it's expected that we'll see even more innovative applications and potential benefits, such as increased efficiency and improved decision-making.
Term Frequency:
Each word from your search query is taken. The term frequency is calculated on a document level. In other words we want to know how many times the word appears in a document divided by the total number of unique words in the document.
The higher this score for a document, the more relevant that document is to the user's query word.
Inverse Document Frequency:
If a word is too common across all documents, it is given a lesser score. This is a corpus-level metric. This filters out the commonly used words like 'a', 'and' and 'the', etc as they hold no information.
Let's implement this for the above documents