Rohini's Home Page

Projects

Masters Thesis

Personalized Search(January 2005 -June 2007)

Advisor: Dr. Vasudeva Varma, Associate Professor, IIIT, Hyd.

Most existing retrieval systems, including the web search engines, suffer from the problem of "one size fits all": the decision of which documents to retrieve is made based only on the query posed, without consideration of a particular user's preferences and search context. When a query (e.g. "jaguar") is ambiguous, the search results are inevitably mixed in content (e.g. containing documents on the jaguar cat and on the jaguar car), which is certainly non-optimal for the user, who is burdened by having to sift through the mixed results. In order to optimize retrieval accuracy, we clearly need to model the user appropriately and personalize search according to each individual user. The major goal of personalized search is to accurately model a user's information need and store it in the user profile and then rerank the results to suit to the user's interests using the user profile.

While there is some existing work in the personalized search, it is far from a solved problem. In this thesis, we studied the problem of personalized search. There are two main challenges in personalized search . The first is to accurately capture and model the user context and the second is to use this information to improve the web search results. We focus on long term learning of user profiles based on implicit context. We propose three approaches for learning user profiles and using it to improve the results using the user profiles. The first approach is using statistical language model techniques and the second is using machine learning approaches. We used support vector machines for learning the user profile. In the third approach we propose an interesting approach where we try to learn the user profile by making use of just the past queries of the user.

Undergraduate Thesis

Prepositional Phrase Attachment (January 2003 - January 2004)

Advisor: Dr. Bendre, Visiting Professorand Dr. Sangal, Director, Professor at IIIT Hyd.

The task here is to attach a prepositional phrase in a given sentence to a noun phrase or a verb phrase in the sentence. The approach uses a combination of supervised and unsupervised methods for learning. Un supervised learning is done using Expectation Maximization algorithm. The attachment of the prepositional phrase is done using scores calculated usingWordNet, a lexical database and back-off smoothing. The approach showed an improvement in performanceover the start of the art approaches.

This was done as a part of my undergraduate thesis also called Final Year Project(FYP) in undergraduate.The approach was later extended to multiple PP Attachment. Details can be found in A Hybrid Approach to Single and Multiple PP Attachment using WordNet (2005)

Creating Simulated Feedback (July 2005 - November 2006)

Advisor: Dr. Vasudeva Varma, Associate Professor, IIIT, Hyd.

Relevance feedback has widely been used in explicit and implicit methods. Explicit feedback methods are expensive to conduct as it requires a lot of manual effort and it is difficult to scale up these methods for gathering large volumes of data. Implicit feedback methods on the other hand are easy to conduct. Most search engines already have logs of user interactions over the web which is a very valuable source for research related to user feedback. However, such feedback is usually not available to public or even research communities at large for various reasons like, it posing threat to individual privacy of the web users etc. Motivated by these problems, we propose Simulated Feedback based on insights from query log analysis and using artificial methods to generate feedback. It is a potential area where the outcome of research can directly be used for the benefit of research communities in web search engines and personalization. Also given the constraint of non-availability of implicit feedback data it is difficult to

evaluate personalization algorithms. Simulated Feedback can act as a boon here and help benefit web search community.

Creation of simulated feedback is done in 2 steps. In the first step a simulated user is created and in the second step a web search process is simulating mimicking a typical web search process. We have evaluated simulated feedback by comparing it with implicit feedback available from query logs and with explicit feedback from judges and achieved significant results. The benefit from "Simulated feedback" is two fold. Firstly it is easy to obtain and also the process of obtaining the feedback data is repeatable. Given a document set and a search engine deployed on this document set, one can start generating simulated feedback in much larger volumes than what can be obtained by either explicit or implicit feedback methods. Secondly, it enjoys the benefit of customizability, where a researcher can customize the creation of the feedback for his purposes, be it testing search engines for specific domains, testing research algorithms in personalization research or testing query modification techniques.

Collaborative Web Search

Advisor: Dr. Vasudeva Varma, Associate Professor, IIIT, Hyd.

Collaborative Web Search, motivated by Collaborative filtering and Recommendation systems, makes use of like-minded users.Like-minded users are the group of users who would be interested in similar results for similar queries.The search history of the like minded users is to improve search results.

Search in Presence of Errors

Worked with: Mr. Vamshi Ambati, Technical Director, Digital Library of India

As Digital Libraries are becoming widely accepted and created, the need for retrieval in these collections becomes necessary. However, one problem that the content poses is presence of errors created by Optical Character Recognizers (OCR). Related approaches have so far dealt this problem as a correction of the documents followed by retrieval. In this project we are exploring probabilistic models based on statistical language modeling. We do not assume an explicit phase of error correction in the text.

Digital Library Reading Assistant

Worked with: Mr. Vamshi Ambati, Technical Director, Digital Library of India.

The development of technologies that enable access to information regardless of geographic or language barriers is a key factor for the success of digital libraries. This project deals with providing multi lingual access to digital libraries. Multi lingual Information retrieval system is provided and also a multi lingual reading assistant is provided to enable the users reading a book from a different language. Two primary sub components of such a tool are the one that performs the translation of the requested word, phrase, sentence or a paragraph and the interface that helps a user provide feedback whenever unsatisfied with the translation.

Cross Lingual Information Retrieval (CLIR)(January 2005 - April 2005)

Advisor: Dr. Vasudeva Varma, Associate Professor, IIIT, Hyd

The objective of the project is Cross Lingual Information Retrieval System. Given a query in one language, source, the documents in another language , the target language are retrieved. The focus of the current project is retrieving English documents for Hindi queries.

It involves the following important steps preprocessing of the queries, query translation, retrieving documents, ranking of the results. Shabdanjali, a manually built Hindi-English dictionary is used in the query translation process. Another dictionary was automatically built from a couple of Hindi English parallel corpora like India Today, Emilee etc. The system is designed so that cross lingual retrieval or multi lingual retrieval involving other languages can be easily incorporated. For example Telugu-English CLIR, Telugu, Hindi CLIR etc.

User profile learning for Personalized Web Search using achine Learning

Course: Machine Learning (Spring 2004)

Instructor: Dr. C. V. Jawahar, Associate Professor, IIIT, Hyd.

The objective of the project was learning user profiles using machine learning algorithms. A user profile is used to capture the user's interests regarding the documents liked by the user. User profile learning is done as a binary classification task. The two classes considered are - relevant class, which describes the documents liked by the user and irrelevant class which describes the documents not liked by the user. SVM (Support Vector Machine), the state of the art Machine Learning algorithm for Text Classification was used for user profile learning and its effectiveness was studied.

Other Major Projects

Extracting Patterns and Relationsusing NLP

Course: Web data and Knowledge Management (Fall 2004)

Instructor: Dr. P. K. Reddy, Associate Professor, IIIT, Hyd.

The project was based on the paper titled Extracting patterns and relations from the World Wide Webby Sergey Brin(1998) The approach described in the paper was used to automatically extract player information consisting of player name, sport, country he/she belongs to from News Paper corpus collected from various sources like Hindu, Times of India etc. Natural Language Processing (NLP) techniques were used modifying theapproach appropriately to suit the requirement. Use of NLP techniques showed improvement in performance.

Text Classification using NLP

Course: Pattern Recognition (Spring 2004)

Instructor: Dr. P. J. Narayanan, Dean (R&D) Professor, IIIT, Hyd.

The project involved developing a text classification system. A variety of methods of feature extraction were investigated using NLP and their effectiveness was studied. Naive Bayes Classifier was used for classification. Reuter's corpus was used for experiments.

Pronoun Co-reference resolution using the centering approach

Course: Introduction to Natural Language Processing (Fall 2003)

Instructor: Prof. Rajeev Sangal, Director, Professor, IIIT Hyd.

The project was based on the paper by Susan et.al. - 'A Centering Approach to Pronouns'. The paper presents an algorithm to track the discourse context and bind pronouns to the corresponding entities introduced in the discourse. The project involved implementing the algorithm and evaluating its effectiveness in diverse discourses.

Virus Detection System

Semester Project

Advisor: Dr. Ram Murthy, Associate Professor, IIIT, Hyd.

This project was done as a semester Project in Fall 2002 in undergraduate. The objective of this project is to simulate a system which checks for the existence of virus in an incoming packet and informs the user about it. The system has a pre fed table of viruses and its definitions. Every incoming packet is checked against this existing set of definitions and if a virus is found, then a message is displayed, informing the user about the source of the virus.

Rohini U