Using_to_llama_similarities() normalization technique may lead to low quality retrieved documents / chunks. #12567

Subham0793 · 2024-04-04T04:00:51Z

Question Validation

I have searched both the documentation and discord for an answer.

Question

Because of _to_llama_similarities function, the scores are normalized and maximum score is always at 1.0

What is the reason of this normalization technique, and why has this being called by default?

The problem may arise when some query which is not related to a particular document, is used for retrieving the chunks, and still the results will contain scores starting from 1.0 , even when there is no relation of the chunk with respect to the query.

Code snippet:

return VectorStoreQueryResult( nodes=top_k_nodes, ids=top_k_ids, similarities=_to_llama_similarities(top_k_scores), )

Line no. 623 in

https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-elasticsearch/llama_index/vector_stores/elasticsearch/base.py

Isn't this a bug? Why was this specifically made to keep the max value at 1.0? say we are getting top 5 docs from a query. And the doc at k=1 need not be of score 1.0, it can be of score 0.85 The problem with this is we are giving high scores for those documents which are not that similar to the query.

I had posted similar question earlier but did not get a satisfactory reply.

The text was updated successfully, but these errors were encountered:

dosubot · 2024-04-04T04:01:18Z

Hey @Subham0793, welcome back! 👋 It's always interesting to dive into these kinds of challenges with you. I'm rolling up my sleeves to take a closer look at the _to_llama_similarities() function and the normalization technique. Hang tight, I'll get back to you with a full analysis shortly.

logan-markewich · 2024-04-04T04:06:28Z

Please make a PR if you find this to be a bug. I agree, but don't have elastic to test with

dosubot · 2024-04-04T04:11:09Z

Hey @Subham0793,

I've taken a deep dive into your query regarding the _to_llama_similarities() function and its normalization technique. It's a great question and touches on a fundamental aspect of how similarity scores are interpreted and utilized within the context of document retrieval systems.

The normalization to a maximum score of 1.0 is a common practice in many retrieval systems, not just in the context of the LLaMA index. The primary reason for this approach is to standardize the output scores across different queries, making them easier to interpret and compare. By scaling the highest score to 1.0, it ensures that the scores are within a predictable range, which can be particularly useful when integrating with systems that expect similarity scores to fall within a specific range.

However, your concern about the potential for high scores for documents that are not significantly similar to the query is valid. The normalization process does indeed mean that the highest-scoring document will always receive a score of 1.0, regardless of its absolute similarity to the query. This can be misleading in scenarios where the most relevant document is not highly similar to the query, as it might give the impression of a high degree of relevance that isn't actually there.

The decision to implement this normalization by default likely stems from a balance between standardization and practical utility. For many applications, having a consistent scoring range is beneficial and simplifies downstream processing. However, for more nuanced applications where the absolute similarity scores are critical, this approach might not be ideal.

Given the feedback from the maintainer, Logan Markewich, suggesting to make a PR if you find this to be a bug, it seems there is openness to revisiting this design choice. It might be worth exploring alternative approaches that retain the benefits of score normalization while providing more granularity or context about the absolute similarity levels. For instance, including additional metrics or flags that indicate the confidence level of the similarity score could be one way to address this concern.

In summary, the normalization technique used in _to_llama_similarities() is a design choice aimed at standardizing score outputs, but your observation highlights an important limitation in scenarios where absolute similarity levels are crucial. It might be an opportune time to contribute to the project by suggesting enhancements or modifications that can accommodate a wider range of use cases.

If you have any more questions or need further clarification, feel free to reach out!

Best,
Dosu

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich · 2024-04-04T04:22:14Z

as long as they are sorted most to least similar, I don't have a preference

Subham0793 added the question Further information is requested label Apr 4, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 4, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 11, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using_to_llama_similarities() normalization technique may lead to low quality retrieved documents / chunks. #12567

Using_to_llama_similarities() normalization technique may lead to low quality retrieved documents / chunks. #12567

Subham0793 commented Apr 4, 2024

dosubot bot commented Apr 4, 2024

logan-markewich commented Apr 4, 2024

dosubot bot commented Apr 4, 2024

About Dosu

logan-markewich commented Apr 4, 2024

Using_to_llama_similarities() normalization technique may lead to low quality retrieved documents / chunks. #12567

Using_to_llama_similarities() normalization technique may lead to low quality retrieved documents / chunks. #12567

Comments

Subham0793 commented Apr 4, 2024

Question Validation

Question

dosubot bot commented Apr 4, 2024

logan-markewich commented Apr 4, 2024

dosubot bot commented Apr 4, 2024

About Dosu

logan-markewich commented Apr 4, 2024