Stefano Fiorucci’s Post

View profile for Stefano Fiorucci, graphic

Contributing to Haystack, the LLM Framework 🏗️ | NLP Engineer, Craftsman and Explorer 🧭

🚧 𝐔𝐩𝐝𝐚𝐭𝐞: Recent evaluations have raised questions about the validity of BM42. Future developments may address these concerns. Please consider this when reading the post. --- 👋 Say hello to BM42 🔎 Qdrant introduced BM42, an algorithm that aims to replace BM25 in hybrid RAG pipelines (dense + sparse retrieval). They found that BM25, while relevant for a long time, has some limitations in common RAG scenarios. To understand the motivation and inspiration for BM42, let's first examine BM25 and SPLADE. 👇 𝐁𝐌25 BM25 is an evolution of TF-IDF and has two components: - Inverse Document Frequency = term importance within a collection - a component incorporating Term Frequency = term importance within a document ❌ Qdrant folks observed that the TF component relies heavily on document statistics, which only makes sense for longer texts. This is not the case with common RAG pipelines, where documents are short. 𝐒𝐏𝐋𝐀𝐃𝐄 SPLADE takes a different approach, using a BERT-based model to create a bag-of-words representation of the text. While it generally performs better than BM25, it has some drawbacks: ⚠️ tokenization issues with out-of-vocabulary words ⚠️ adaptation to new domains requires fine-tuning ⚠️ computationally heavy 𝐁𝐌42 Taking inspiration from SPLADE, the Qdrant team developed BM42 to improve BM25. IDF works well, so they kept it. ✅ But how to quantify term importance within a document❓ 💡 The solution lies in the attention matrix of Transformer models:  we can use the attention row for the [CLS] token! To fix tokenization issues, BM42 merges subwords and sums their attention weights. In their implementation, Qdrant used all-MiniLM-L6-v2 model,  but this technique can work with any Transformer, no fine-tuning needed. We've already integrated BM42 into the #haystack LLM orchestration framework. ⚡ 𝐁𝐌42 𝐢𝐧 𝐚𝐜𝐭𝐢𝐨𝐧: Haystack + Qdrant + FastEmbed 📓 https://lnkd.in/dSMigDWf 🗂️ Resources in the comments 💬 #rag #informationretrieval #transformers #nlp

  • No alternative text description for this image
  • No alternative text description for this image
Stefano Fiorucci

Contributing to Haystack, the LLM Framework 🏗️ | NLP Engineer, Craftsman and Explorer 🧭

2w

Does this require a transformer inference pass on every document indexed?

Like
Reply
Junte Zhang

Making new things possible with search engines

2w

Any comparisons with this metric compared with BM25 on existing test collections? How much difference is there in relative rank? 

Francesco Saverio Zuppichini

OPINIONS ARE MY OWN - Machine Learning Engineer 📸🦾 | Computer Vision | Generative AI | NLP | Web Dev | Open Source

2w

yeah but it way slower right? Like you need to run a model that has the [CLS] token so you need to keep one more model online

John Ryan

Director of Engineering | AI | Search Relevance

1w

QDrants implementation of BM42 is certainly a step in the right direction. The importance of a term relative to the corpus is where BM25 gets its true power and I don't think it's quite there yet as a replacement. In the Expert Network industry a main concern is the ever changing nature of the data. Hot Topics and the constant evolution of expert profiles mean we are constantly updating our search data. Having to perform inferences on every document on each update is a big hit where ingestion and query speed are crucial to achieve high quality relevant searches. Still love the way this space is going.

Like
Reply

Transformer models significantly influenced BM42's development by providing a novel way to quantify term importance within documents. Stefano Fiorucci's post highlights how BM42 utilizes the attention matrix from Transformer models, specifically the attention row for the [CLS] token, to improve upon BM25's limitations. This approach, inspired by SPLADE but avoiding its drawbacks, allows for effective document relevance scoring without the need for fine-tuning, making it a versatile solution for various document lengths.

Like
Reply
Alex G.

AI Research YouTube Videos 0day News

2w

https://github.com/vtempest/Wiki-BM25-search Calculate term specificity for a single doc with BM25 formula by using Wikipedia IDF - problem with BM25 and TF-IDF is that a large set of documents is needed https://github.com/vtempest/Wiki-BM25-search/blob/master/test/bm42.js I have benchmarked and wrote bm42.js implemented in transformers.js with BERT model. It does not seem fully accurate yet in setting weights / ner

Andre Zayarni

Co-founder at Qdrant, Vector Database.

2w

Superfast integration! Qdrant deepset 🚀

Vladimir Blagojevic

Senior Software Engineer @ deepset

2w

Wow, cool Stefano Fiorucci - moving super fast 🚀

See more comments

To view or add a comment, sign in

Explore topics