Rohini U |
||||||||
|
||||||||
Masters Thesis Personalized Search(January
2005 -June
2007) Advisor: Dr. Vasudeva Varma,
Associate Professor, IIIT, Hyd. Most existing retrieval systems, including the web
search engines, suffer from the problem of "one size fits all": the decision of
which documents to retrieve is made based only on the query posed, without
consideration of a particular user's preferences and search context. When a
query (e.g. "jaguar") is ambiguous, the search results are inevitably mixed in
content (e.g. containing documents on
the jaguar cat and on the jaguar car), which is certainly non-optimal for the
user, who is burdened by having to sift through the mixed results. In order to
optimize retrieval accuracy, we clearly need to model the user appropriately
and personalize search according to each individual user. The major goal of
personalized search is to accurately model a user's information need and store
it in the user profile and then rerank the results to suit to the user's
interests using the user profile. While there is
some existing work in the personalized search, it is far from a solved problem.
In this thesis, we studied the problem of personalized search. There are two
main challenges in personalized search . The first is to accurately capture and
model the user context and the second is to use this information to improve the
web search results. We focus on long term learning of user profiles based on
implicit context. We propose three approaches for learning user profiles and
using it to improve the results using the user profiles. The first approach is using statistical
language model techniques and the second is using machine learning approaches.
We used support vector machines for learning the user profile. In the third
approach we propose an interesting approach where we try to learn the user
profile by making use of just the past queries of the user. Undergraduate Thesis Prepositional Phrase Attachment (January 2003 - January 2004) Advisor: Dr. Bendre, Visiting Professorand Dr. Sangal, Director, Professor at IIIT Hyd. The task here is to attach a
prepositional phrase in a given sentence to a noun phrase or a verb phrase in
the sentence. The approach uses a combination of supervised and unsupervised
methods for learning. Un
supervised learning is done using Expectation Maximization algorithm. The
attachment of the prepositional phrase is done using scores calculated usingWordNet,
a lexical database and back-off smoothing. The approach showed an improvement
in performanceover
the start of the art approaches. This was done as a part of my undergraduate thesis also called Final Year Project(FYP) in undergraduate.The approach was later extended to multiple PP Attachment. Details can be found in A Hybrid Approach to Single and Multiple PP Attachment using WordNet (2005) Creating Simulated Feedback (July 2005 - November 2006) Advisor: Dr. Vasudeva
Varma, Associate Professor, IIIT, Hyd. Relevance feedback has widely been used in
explicit and implicit methods. Explicit feedback methods are expensive to
conduct as it requires a lot of manual effort and it is difficult to scale up
these methods for gathering large volumes of data. Implicit feedback methods on
the other hand are easy to conduct. Most search engines already have logs of
user interactions over the web which is a very valuable source for research
related to user feedback. However, such feedback is usually not available to
public or even research communities at large for various reasons like, it
posing threat to individual privacy of the web users etc. Motivated by these
problems, we propose Simulated Feedback based on insights from query log
analysis and using artificial methods to generate feedback. It is a potential
area where the outcome of research can directly be used for the benefit of
research communities in web search engines and personalization. Also given the
constraint of non-availability of implicit feedback data it is difficult to evaluate personalization algorithms.
Simulated Feedback can act as a boon here and help benefit web search
community. Creation of simulated feedback is done in 2 steps. In the first step a simulated user is created and in the second step a web search process is simulating mimicking a typical web search process. We have evaluated simulated feedback by comparing it with implicit feedback available from query logs and with explicit feedback from judges and achieved significant results. The benefit from "Simulated feedback" is two fold. Firstly it is easy to obtain and also the process of obtaining the feedback data is repeatable. Given a document set and a search engine deployed on this document set, one can start generating simulated feedback in much larger volumes than what can be obtained by either explicit or implicit feedback methods. Secondly, it enjoys the benefit of customizability, where a researcher can customize the creation of the feedback for his purposes, be it testing search engines for specific domains, testing research algorithms in personalization research or testing query modification techniques. Collaborative Web Search Advisor: Dr. Vasudeva
Varma, Associate Professor, IIIT, Hyd.
Collaborative Web Search, motivated by
Collaborative filtering and Recommendation systems, makes use of like-minded
users.Like-minded users are the group
of users who would be interested in similar results for similar queries.The search history of the like minded users
is to improve search results. Search in Presence of
Errors Worked with: Mr. Vamshi Ambati, Technical Director, Digital
Library of As Digital Libraries are becoming widely accepted and created, the
need for retrieval in these collections becomes necessary. However, one problem
that the content poses is presence of errors created by Optical Character
Recognizers ( Digital Library Reading
Assistant Worked with: Mr. Vamshi Ambati, Technical Director, Digital
Library of The development of technologies that enable access to information
regardless of geographic or language barriers is a key factor for the success
of digital libraries. This project deals with providing multi lingual access to
digital libraries. Multi lingual Information retrieval system is provided and
also a multi lingual reading assistant is provided to enable the users reading
a book from a different language. Two primary sub components of such a tool are
the one that performs the translation of the requested word, phrase, sentence
or a paragraph and the interface that helps a user provide feedback whenever
unsatisfied with the translation. Cross Lingual Information Retrieval (CLIR)(January 2005 - April 2005) Advisor: Dr. Vasudeva
Varma, Associate Professor, IIIT, Hyd The objective of the project is Cross Lingual Information
Retrieval System. Given a query in one language, source, the documents in
another language , the target language are retrieved. The focus of the current
project is retrieving English documents for Hindi queries. It involves the
following important steps preprocessing of the queries, query translation,
retrieving documents, ranking of the results.
Shabdanjali, a manually built Hindi-English dictionary is used in the query translation process. Another dictionary was automatically built
from a couple of Hindi English parallel corpora like India Today, Emilee etc.
The system is designed so that cross lingual retrieval or multi lingual
retrieval involving other languages can be easily incorporated. For example
Telugu-English CLIR, Telugu, Hindi CLIR etc. User profile learning for Personalized Web Search using
achine Learning Course: Machine Learning (Spring 2004) Instructor: Dr. C. V.
Jawahar, Associate Professor, IIIT, Hyd. The objective of the project was
learning user profiles using machine learning algorithms. A user profile is
used to capture the user's interests regarding the documents liked by the user.
User profile learning is done as a binary classification task. The two classes
considered are - relevant class, which describes the documents liked by the
user and irrelevant class which describes the documents not liked by the
user. Other Major
Projects Extracting
Patterns and Relationsusing NLP Course: Web data and
Knowledge Management (Fall 2004) Instructor: Dr. P. K. Reddy, Associate Professor, IIIT,
Hyd. The project was
based on the paper titled Extracting patterns and relations from the World Wide Webby Sergey Brin(1998) The
approach described in the paper was used to automatically extract player information
consisting of player name, sport,
country he/she belongs to from
News Paper corpus collected from various sources like Hindu, Times of India
etc. Natural Language Processing (NLP) techniques were used modifying theapproach appropriately to suit the
requirement. Use of NLP techniques showed improvement in performance. Text
Classification using NLP Course: Pattern
Recognition (Spring 2004) Instructor:
Dr. P. J. Narayanan, Dean (R&D)
Professor, IIIT, Hyd. The project
involved developing a text classification system. A variety of methods of feature extraction were
investigated using NLP and their effectiveness was studied. Naive Bayes Classifier
was used for classification. Reuter's corpus was used for experiments.
Pronoun Co-reference resolution using the centering approach Course: Introduction to Natural Language
Processing (Fall 2003) Instructor: Prof. Rajeev Sangal, Director, Professor,
IIIT Hyd. The project was based on the paper by Susan et.al. - 'A
Centering Approach to Pronouns'. The paper presents an algorithm to track
the discourse context and bind pronouns to the corresponding entities
introduced in the discourse. The project involved implementing the algorithm
and evaluating its effectiveness in diverse discourses. Virus Detection System Semester Project Advisor: Dr. Ram Murthy, Associate Professor, IIIT, Hyd. This project was done as a semester Project in Fall 2002 in
undergraduate. The objective of this project is to simulate a system which checks for the existence of
virus in an incoming packet and informs
the user about it. The system has a pre fed table of viruses and its
definitions. Every incoming packet is checked against this existing set of
definitions and if a virus is found, then a message is displayed, informing the
user about the source of the virus. |