IR Chapt 5

Information Retrieval
Chapter 5:
Retrieval Evaluation
IR Evaluation
• It is known that measuring or evaluating the

performance and accuracy of the system is very
important after IR system is designed.
• According to (Singhal, 2001), there are two main things

to measure in IR system; these are: effectiveness of the
system and its efficiency
Cont..
• Effectiveness:-Power to be effective; the quality of being able to
bring about an effect
• How is a system capable of retrieving relevant documents from
the collection?
about user satisfaction

• Efficiency:- The ratio of the output to the input of any system
• Skillfulness in avoiding wasted time and effort
It is about time, space

…cont
 To measure ad hoc (informal) information retrieval

effectiveness in the standard way, we need a test collection
consisting of three things:
1. A document collection
2. A test suite (set)of information needs, expressible as
queries
3. A set of relevance judgments, standardly a binary
assessment of either relevant or non relevant for each
query-document pair
Document collection
• Specific questions that might be considered when gathering

documents include:
1. How many items should be gathered?

2. What items should be sampled to create the document
collection?
3. What about copyright constraints?
Example (N=128)
….cont
The standard approach to information retrieval system

evaluation revolves around the notion of relevant and
non relevant documents.
With respect to a user information need, a document in
the test collection is given a binary classification as
either relevant or non relevant.
This decision is referred to as the gold standard or
ground truth judgment of relevance.
Mind Break
A document is relevant if it addresses the stated
information need, not because it just happens to
contain all the words in the query.
How ?
Types of Evaluation Strategies
•System-centered studies
– Given documents, queries, and relevance judgments
• Try several variations of the system
• Measure which system returns the “best” hit list
•User-centered studies
– Given several users, and at least two retrieval systems
• Have each user try the same task on both systems
• Measure which system works the “best” for users information need
Performance measures (Recall, Precision, etc.)
• The two most frequent and basic measures for

information retrieval effectiveness are :
1. Precision and
2. Recall.
Precision
Precision (P) is the fraction of retrieved documents that

are relevant
The ability to retrieve top-ranked documents
that are mostly relevant.
Precision is percentage of retrieved documents
that are relevant to the query (i.e. number of retrieved
documents that are relevant).
Precision Formula
Recall
Recall (R) is the fraction of relevant documents that are
retrieved
– The ability of the search to find all of the

relevant items in the corpus
– Recall is percentage of relevant documents

retrieved from the database in response to users query.
Recall Formula
Question
• When do you think the precision/recall has value

100% ? Or sometimes we can get the value of
precision and recall 100% or one. How can we
justify this value?
Example
Examples
An IR system returns 8 relevant documents, and 10
non relevant documents. There are a total of 20
relevant documents in the collection.
a. What is the precision of the system on this search,
and
b. what is its recall?
c. What is F-measure?
R- Precision
Precision at the R-th position in the ranking of results for a

query, where R is the total number of relevant documents.
It requires having a set of known relevant documents, from

which we calculate the precision of the top relevant
documents returned
– Calculate precision after R documents are seen
– Can be averaged over all queries
Example
Example 2:
Exercise
• Given a query q, for which the relevant documents are d1,
d6, d10, d15, d22, d26, an IR system retrieves the following
ranking: d6, d2, d11, d3, d10, d1, d14, d15, d7, d23.
• compute the precision and recall for this ranking at each
retrieved document.
Cont..
Cont..
• The average precision over positions 1, 5, 6, and 8
where relevant documents were found is
(1.0+0.40+0.50+0.50)/6=0.40. The R-precision is the
precision at position 6, which is 3/6=0.50.
total retrieved
Problems with both precision and recall
 Number of irrelevant documents in the collection is not
taken into account.
 Recall is undefined when there is no relevant document
in the collection.
 Precision is undefined when no document is retrieved.
Other measures
 Noise = retrieved irrelevant docs / retrieved docs
 Silence/Miss = non-retrieved relevant docs / relevant
docs
Noise = 1 – Precision; Silence = 1 – Recall

F-measure
• A single measure that trades off precision versus

recall is the F measure, which is the weighted
harmonic mean of precision and recall:
• One measure of performance that takes into accounts
both recall and precision. Harmonic mean of recall
and precision:
Exercise
• The following list of Rs and Ns represents relevant (R) and
non relevant (N) returned documents in a ranked list of 20
documents retrieved in response to a query from a collection
of 10,000 documents. The top of the ranked list (the document
the system thinks is most likely to be relevant) is on the left of
the list. This list shows 6 relevant documents. Assume that
there are 8 relevant documents in total in the collection.
RRNNNNNNRNRNNNRNNNNR
Questions
• Calculate the following:
a) What is the precision of the system on the top 20?

b) What is recall?
c) What is p@10?
d) What is the F-measure on the top 20?
e) Assume that these 20 documents are the complete result set
of the system. What is the MAP for the query?
f) Noise
g) Silence
Difficulties in Evaluating IR System
 IR systems essentially facilitate communication between a

user and document collections
 Relevance is a measure of the effectiveness of
communication
– Effectiveness is related to the relevancy of retrieved
items.
– Relevance: relates to problem, information need,
query and a document or surrogate
……..cont
 Relevance judgments is made by

– The user who posed the retrieval problem
– An external judge
– Is the relevance judgment made by users and external

person the same?
 Relevance judgment is usually:

……….cont
– Subjective: Depends upon a specific user’s judgment.

– Situational: Relates to user’s current needs.
– Cognitive: Depends on human perception and
behavior.
– Dynamic: Changes over time.
Information Retrieval
Chapter 6:
Query Languages and Operations
Introduction
• Information is the main value of Information Society.
• Depending on the particular application scenario and on the

type of information that has to be managed and searched,
different techniques need to be devised.
• The dictionary definition of query is a set of instructions passed

to a database to retrieve particular data.
Cont….
• A query is the formulation of a user information need.
• A query is composed of keywords and the documents

containing such keywords are searched for popular and
Intuitive, Easy to express, Allow fast ranking.
Cont…
Query language (QL) refers to any computer programming language
that requests and retrieves data from database and information
systems by sending queries.
• Query Languages: A source language consisting of procedural

operators that invoke functions to be executed.
Keyword-based queries
 Queries are combinations of words.
 The document collection is searched for documents that

contain these words.
 Word queries are intuitive, easy to express and provide fast

ranking.
popular Keyword-based queries are
1. Single-word queries:
 A query is a single word
 Simplest form of query.
 All documents that include this word are retrieved.
 Documents may be ranked by the frequency of this word in the

document.
Con’ted
2. phrase queries:
A query is a sequence of words treated as a single unit. Also
called “literal string” or “exact phrase” query, Phrase is usually
surrounded by quotation marks,
All documents that include this phrase are retrieved, Usually,

separators (commas, colons, etc.) and “trivial words” (e.g., “a”,
“the”, or “of”) in the phrase are ignored,
Conted
In effect, this query is for a set of words that must appear in
sequence, Allows users to specify a context and thus gain
precision.
Example: “United States of America”.

Con’ted
3. Multiple-word queries:
A query is a set of words (or phrases).

Two interpretations:
• A document is retrieved if it includes any of the query words.
• A document is retrieved if it includes each of the query
words.
Cont..
Documents may be ranked by the number of query words they
contain: A document containing n query words is ranked higher
than a document containing n-1 query words.
Documents containing all the query words are ranked at the top.
Documents containing only one query word are ranked at
bottom.
Frequency counts may still be used to break ties among documents
that contain the same query words.
Cont…
4. Proximity queries:
Restrict the distance within a document between two search
terms.
Important for large documents in which the two search words
may appear in different contexts.
Proximity specifications limit the acceptable occurrences and

hence increase the precision of the search.
Cont….
General Format: Word1 within m units of Word2. Unit may be
character, word, paragraph, etc.
Example:
• nuclear within 0 paragraphs of cleanup
Finds documents that discuss “nuclear” and “cleanup” in the

same paragraph.
• united within 5 words of american

Structural queries
 So far, we assumed documents that are entirely free of

structure.
 Structured documents would allow more powerful queries.
 Queries could combine text queries with structural queries:

queries that relate to the structure of the document.
 Mixing contents and structure in queries:

Cont…
• Contents words, phrases, or patterns and
• Structural constraints containment, proximity, or other

restrictions on structural elements
Example
• Example: Retrieve documents that contain a page in which the
phrase “terrorist attack” appears in the text and a photo whose
caption contains the phrase “World Trade Center”.
• The corresponding query could be: same page (“terrorist

attack”, photo (caption (“World Trade Center”))).
Types
 Three main structures
 Fixed structure
Hypertext structure
Hierarchical structure
Fixed structure
 Document is divided to a fixed set of fields, much like a filled
form.
 Fields may be associated with types, such as date.
 Each field has text and fields cannot nest or overlap.
 Queries (multiple-words, Boolean, proximity, patterns, etc.) are

targeted at particular fields.
Hypertext structure
Hierarchical structure
 Intermediate model between fixed structure and hypertext.

 The “anarchic” hypertext network is restricted to a hierarchical
structure.
 The model allows recursive decomposition of documents.
 Queries may combine Regular text queries, which are targeted

at particular areas (the target area is defined by a “path
expression”) and Queries on the structure itself; for example
“retrieve documents with at least 5 sections
Cont…..
Relevance feedback
 After initial retrieval results are presented, allow the user to

provide feedback on the relevance of one or more of the
retrieved documents.
 The system use this feedback information to reformulate the
query and Produce new results based on reformulated query.
After that allows more interactive, multi-pass process.
RF
 The idea of relevance feedback (RF) is to involve the user in
RELEVANCE FEEDBACK the retrieval process so as to
improve the final result set.
 In particular, the user gives feedback on the relevance of

documents in an initial set of results.
The basic procedure is:
 The user issues a (short, simple) query.
 The system returns an initial set of retrieval results.
 The user marks some returned documents as relevant or

non relevant.
 The system computes a better representation of the

information need based on the user feedback.
 The system displays a revised set of retrieval results.

Architecture
THE END OF:
Chapter 5 and 6

IR Chapt 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IR Chapt 5

Uploaded by

Copyright:

Available Formats

Information Retrieval

• It is known that measuring or evaluating the

• According to (Singhal, 2001), there are two main things

about user satisfaction

It is about time, space

 To measure ad hoc (informal) information retrieval

• Specific questions that might be considered when gathering

1. How many items should be gathered?

The standard approach to information retrieval system

• The two most frequent and basic measures for

Precision (P) is the fraction of retrieved documents that

– The ability of the search to find all of the

– Recall is percentage of relevant documents

• When do you think the precision/recall has value

Precision at the R-th position in the ranking of results for a

It requires having a set of known relevant documents, from

Noise = 1 – Precision; Silence = 1 – Recall

• A single measure that trades off precision versus

a) What is the precision of the system on the top 20?

 IR systems essentially facilitate communication between a

 Relevance judgments is made by

– Is the relevance judgment made by users and external

 Relevance judgment is usually:

– Subjective: Depends upon a specific user’s judgment.

• Depending on the particular application scenario and on the

• The dictionary definition of query is a set of instructions passed

• A query is the formulation of a user information need.

• A query is composed of keywords and the documents

• Query Languages: A source language consisting of procedural

 Queries are combinations of words.

 The document collection is searched for documents that

 Word queries are intuitive, easy to express and provide fast

 Documents may be ranked by the frequency of this word in the

All documents that include this phrase are retrieved, Usually,

Example: “United States of America”.

A query is a set of words (or phrases).

Proximity specifications limit the acceptable occurrences and

Finds documents that discuss “nuclear” and “cleanup” in the

• united within 5 words of american

 So far, we assumed documents that are entirely free of

 Queries could combine text queries with structural queries:

 Mixing contents and structure in queries:

• Structural constraints containment, proximity, or other

• The corresponding query could be: same page (“terrorist

 Three main structures

 Fields may be associated with types, such as date.

 Each field has text and fields cannot nest or overlap.

 Queries (multiple-words, Boolean, proximity, patterns, etc.) are

 Intermediate model between fixed structure and hypertext.

 Queries may combine Regular text queries, which are targeted

 After initial retrieval results are presented, allow the user to

 In particular, the user gives feedback on the relevance of

 The user marks some returned documents as relevant or

 The system computes a better representation of the

 The system displays a revised set of retrieval results.

You might also like