Project Proposal

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Sjanta: Natural Language Question Answering System

Md. Arafat Rahman


Dept. of Computer Science & Engineering,
Begum Rokeya University, Rangpur, Bangladesh
[email protected]



Abstract

As users struggle to navigate the wealth of on-line information now available, the need for automated question
answering systems becomes more urgent. We need systems that allow a user to ask a question in everyday language
and receive an answer quickly and succinctly, with sufficient context to validate the answer.
Current search engines can return ranked lists of documents, but they do not deliver answers to the user. Question
Answering System addresses this problem. It tries to find out the exact and precise answer of the natural language
question. Sjanta is a question answering system that addresses the above mentioned problem


Introduction

There is a large amount of textual data on a variety of digital mediums such as digital archives,
the Web and the hard drives of our personal computers. Efficiently locating information on these
digital mediums has become one of the most important challenges in the last decade.

Search engines have been used to locate the documents which are related to user information
need. Natural language questions are the best way of expressing user information need but these
questions cannot be used directly by search engines. A natural language question is transformed
into a query which is a set of keywords. These keywords describe the user information need.
After a query is entered into a search engine, the search engine retrieves a set of documents that
are ranked according to their relevance to the query. To find the desired information, the user
reads through the returned document set.

However, in many situations a user wants a particular piece of information rather than a
document set. Question Answering (QA) which is a kind of Information Retrieval has addressed
this problem. The benefit of Question Answering Systems is two-fold:
1) They take natural language questions rather than queries
2) They return explicit answers rather than set of documents

Question Answering is the task of returning a particular piece of information in response to a
natural language question. The aim of a question answering system is to present the needed
information directly, instead of documents containing potentially relevant information.

Motivation

Inspiration for this research project came from the fact that much research has been put into QA
over the last decade along with a trend towards an open advancement of question answering.
Being both an interesting interdisciplinary research area & having practical application, question
answering has gained some public attention in the past years. The best known example of a QA
system could be IBM Watson which won a Jeopardy! competition live on television. Other well
known examples would be Apples Siri and Google Now. But no existing QA systems are able
to address the problem fully. In particular context-aware question answering i.e, answering
question with respect to previous context is not address properly. So it is still a challenging
research project to build an efficient QA system. So we would like to investigate the problem to
propose an open domain question answering system that takes advantage of Web data to answer
both factoid and non-factoid questions

Proposed System: Sjanta
We would like to build a system namely Sjanta which will take natural language (or natural
language-like) questions and answer them accordingly.

System Description

Like a typical QA system, Sjanta also consists of several modules in its framework. Sjanta has 4
modules as follow-







Who is the president of Bangladesh?









Abdul Hamid





Figure: Different modules of Sjanta QAS




Question Analysis


Document Retrieval


Extracting Answer Candidates


Answer Generation
Question
Answer
Here I will brief each module by an example natural language question: Who is the president of
Bangladesh?

A. Question Analysis:

This module extracts focus from questions and analyzes what is asked i.e the answer type of the
question. A question may be categorized into two general type as-

1. Factoid questions: The answer of these questions is fixed and requires a single phrase as
its answer. For example the taken example (Who is the president of Bangladesh?) is a
factoid question and requires a named entity as its answer.

2. Non-factoid Questions: These are descriptive type question whose answers may vary both
in contents and size. For example describe BFS graph traversal technique. is a non-
factoid type question.

The question analysis module also extracts keywords from question for further processing. For
example the words president and Bangladesh will be extracted as keywords for the above
mentioned example question.

B. Document Retrieval:

The document retrieval module searches related documents (or passages) using extracted focus
of question and keywords. For example the related pages of Wikipedia that contain information
about the president of Bangladesh may be retrieved in this phase.

C. Extracting Answer Candidates (EAC):

The EAC module searches sentences or phrases from retrieved document that may be the answer
of the query. It also ranks the candidate answer.

D. Answer Generation:

The answer generation module generates answer based on the ranking of answer candidates. For
example the phrase Abdul Hamid will be generated as the answer of the question who is the
president of Bangladesh?

Implementation plan

There are many approaches of building QA system; the machine learning approach; complex
pattern matching approach and keyword based approach are some well known approaches. Since
keyword based approach is more intuitive we would like to investigate this approach here.

We will use different open source Information Retrieval tools like Stanford NLP parser, Lucene
etc together with our own devised tools for performing the tasks of different module. Java will be
the primary language for building Sjanta QA system.
Conclusion

Since the amount of data in the web is growing very fast it is becoming difficult for users to
locate needed information within short time. So it has become necessary to build an autonomous
system that will make the information finding job easier. Question Answering System is such a
system that will make the job of finding precise and exact information easier.


References

[1] Giuseppe Attardi, Antonio Cisternino, Francesco Formica, Maria Simi, and Alessandro
Tommasi ; PiQASso: Pisa Question Answering System.

[2] Jimmy Lin and Boris Katz; Question Answering from the Web Using Knowledge Annotation
and Knowledge Mining Techniques.

[3] Silviu Cucerzan and Eugene Agichtein; Factoid Question Answering over Unstructured and
Structured Web Content.

[4] http://ntcir.nii.ac.jp/jp/QALab-1/

You might also like