Mihir Shekhar

M.S By Research, Student

Center for Data Engineering

IIIT - Hyderabad

Telangana, India

E-mail : [firstname]dot[lastname]@ research.iiit.ac.in

Phone- 0091 9581 826 727

Facebook    Linked    Twitter    g+   

Mihir Shekhar

About Me

I am a M.S by Research, student working with Prof. Kamalakar Karlapalem at International Institute of Information Technology, Hyderabad. I pursued my Bachelor's degree in Computer Science and Engineering from Jalpaiguri Government Engineering College. Before pursuing my Master's, I worked for Tata Consultancy Services, Kolkata as a software engineer. After that my desire to pursue higher studies brought me to IIIT-H, where I joined as a M.S by Research student. I work in Center for Data Engineering (CDE) at the university.

My research focuses on Clustering. I am currently working on creating efficient and automatic methods for online clustering using graph based representation, capable of handling diversified data set. I am also working on generating efficient representatives of dataset. As part of research associate in CDE, I am also working on various projects which utilises text mining. I am currently working on creating a statistical parser for tagging medical terms to its corresponding disease.

My research interests include Application of Machine Learning and Graph Based Techniques to problems in data mining, Information Retrieval, Information Extraction etc. In free time I love to listen music, play chess or badminton and cook innovative dishes.


Research Interests:

  • Data Mining
  • Information Extraction and Retrieval
  • Graph Theory
  • Natural Language Processing
  • Machine Learning

Publications :

  • K Santosh, Romil Bansal, Mihir Shekhar, Vasudeva Varma, Author Profiling: Predicting Age and Gender from Blogs, Notebook for PAN at CLEF 2013, Valencia, Spain. link

Current & Past Projects:

  • Medical Document Analysis (current)
  • This project involves creating a statistical parser for medical documents and assign med- ical terms like drugs, symptoms, etc. to their corresponding disease. This project is funded by Hitachi R&D Labs.
  • Twitter Data Analysis
  • This project work involves retrieval of semantic knowledge from tweets. Semantics of our interest are : Event/Episode Detection, Sentiment Analysis and Concept Extraction. This project is funded by Hitachi R&D Labs. link
  • Author Profiling on blogs
  • This project involves prediction of Age and Gender of author from blogs written by them. SVM and Decision tree was used for performing classification. link
  • Web Content Filtering
  • This project involves creation of an automatic system for categorization of web pages into different classes based on their content. A web content filter is built on top of it, to block undesired categories dynamically.
  • StackOverFlow Tag Prediction
  • This project involves prediction of tags for StackOverFlow data. κ-nearest neighbor approach and HMM's built on tag graph was used to predict results.
  • Finding Most Influential Entities in Web
  • This project involves finding and ranking of most influential people among a group of Baidu users differentiating between fake users and genuine users. Grapchi was used in implementing algorithms for scalablity and speed. It can process a billion entities in 15 minutes.
  • Data-Mining on Accidents Dataset
  • This project involved data pre-processing of a huge dataset, followed by data clustering using K-Means and DBSCAN algorithms and Frequent Item-Set Generation to analyse the trends in the occurrence of traffic casualities based on several conditions.
  • Wikipedia Search Engine
  • Created a fully functional offline search engine. This search engine was implemented over Wikipedia Corpus of size 42GB. Multilevel indices were built on page title, infobox, text and outlinks to support queries for multiple fields. Everything was done from scratch without the use of any existing tools like Lucene, Lemur and wikixmlj parser.


Work Experience