Better Information Retrieval System
Note from Jati
Better Information Retrieval System
In this paper, we will describe the following points: describing these points are not enough, there will be no character in your essay. See our note when we discussed this in class.
EXPLORE the problem -- not the topic
1. Who is your reader?
2. What is your purpose?
3. Who are you, the writer? (What image or persona do you want to project?)
• Does the author provide necessary background knowledge as a basis for the article?
• Is the problem/ topic clearly defined?
• Did the writer bring a new perspective to the topic?
• Did the article sustain the reader’s interest?
• What additional suggestions would you purpose?
• Who are my readers?
• What do they already know and what do I need to tell them about the topic?
• Why would the reader want to read this article?
• How would the reader be able to use the article?
• What possible objections might cause the reader to dismiss my ideas?
Defining the author
• Why do I want to write this article?
• What special knowledge, experience, and perspectives do I bring to the topic?
• How does my article relate to other publications on the topic?
- Information Retrieval System implements string matching algorithm with term-document and query weighting concept
- To build a better Information Retrieval System, we also need to consider the human factor in typing query
- Natural Language Processing makes a better Information Retrieval System by detecting typos and remaking it to something meaningful
The growth of internet has led the researchers to build a system that can retrieve relevant information based on what user need. In this case, user gives query to the system. Then, the system uses it to retrieve the relevant documents and rank them based on a term-query weighting concept. This system is known as Information Retrieval System.
In the early days, the system just used string matching algorithms to retrieve documents from the given query. Lately, this concept is considered to be weak and unreliable. The documents retrieved were usually overloaded and most of them were not relevant. For example, if the user wanted to retrieve document related to George W. Bush with query “Bush”, the system would return all document that have string “Bush”, no matter whether the bush is George W. Bush or just bush in the garden.
Then, the researcher thought to insert a weighing concept to the system. After scanning every term in each candidate document, the system creates an inverted table of term and query. Each table contains the weight of every term. From these two tables, the system computes similarity of each document by multiplying the query weight and the document-term weight. The higher its similarity brings higher relevancy, or in this case, higher rank.
Generally, every information retrieval system implements two weighing concepts, term-document and query weighing concept. Both concepts use one of four term frequencies (tf) calculation: (raw, logarithmic, binary, and augmented), index document frequency (idf) and normalization. The result of this concept is an inverted index that is used in similarity calculation.
To build better information system, the system also needs to consider the case when user mistypes the query. In this case, Natural Language Processing is needed to remake the query to something meaningful. For example, when user types “a beter information retieval”, the system should recognize the typos and remake it to “a better information retrieval”. This concept has been implemented by the biggest search engine in the world nowadays, Google.
By :
Anton Rifco Susilo (13504046)
Chandra Gondowasito (13504100)
In this paper, we will describe the following points:
- Information Retrieval System implements string matching algorithm with term-document and query weighting concept
- To build a better Information Retrieval System, we also need to consider the human factor in typing query
- Natural Language Processing makes a better Information Retrieval System by detecting typos and remaking it to something meaningful
The growth of internet has led the researchers to build a system that can retrieve relevant information based on what user need. In this case, user gives query to the system. Then, the system uses it to retrieve the relevant documents and rank them based on a term-query weighting concept. This system is known as Information Retrieval System.
In the early days, the system just used string matching algorithms to retrieve documents from the given query. Lately, this concept is considered to be weak and unreliable. The documents retrieved were usually overloaded and most of them were not relevant. For example, if the user wanted to retrieve document related to George W. Bush with query “Bush”, the system would return all document that have string “Bush”, no matter whether the bush is George W. Bush or just bush in the garden.
Then, the researcher thought to insert a weighing concept to the system. After scanning every term in each candidate document, the system creates an inverted table of term and query. Each table contains the weight of every term. From these two tables, the system computes similarity of each document by multiplying the query weight and the document-term weight. The higher its similarity brings higher relevancy, or in this case, higher rank.
Generally, every information retrieval system implements two weighing concepts, term-document and query weighing concept. Both concepts use one of four term frequencies (tf) calculation: (raw, logarithmic, binary, and augmented), index document frequency (idf) and normalization. The result of this concept is an inverted index that is used in similarity calculation.
To build better information system, the system also needs to consider the case when user mistypes the query. In this case, Natural Language Processing is needed to remake the query to something meaningful. For example, when user types “a beter information retieval”, the system should recognize the typos and remake it to “a better information retrieval”. This concept has been implemented by the biggest search engine in the world nowadays, Google.
Anton Rifco Susilo - 13504046
Chandra Gondowasito - 13504100

0 Comments:
Post a Comment
<< Home