Text-To-Speech synthesis through Voice conversion using minimal data     Thesis, 2010-present
Advisor: Dr. S P Kishore
The thesis aims at building high quality statistical parametric speech synthesizers by only using a very small training database. It requires transformation of spectral features, excitation features as well as non-linear duration modeling in voice transformation from a synthetic voice to that of a specific target speaker. As a part of this, I did prosody modeling in voice conversion at phone-level with and without the use of transcriptions for target speech. In case of only speech is given, I used sphinx recognizer, segmentation and clustering methods to get the symbolic form where as in case of transcriptions, viterbi forced-alignment can be performed against the speech to get the corresponding sequence of phones.

Extraction of cues for Language Identication and Accent Modelling     
Advisors: Dr. Kishore S Prahallad, Prof. Peri Bhaskara Rao
Automatic Language Identification can be used as pre-processing for either machines or humanlisteners. A multilingual voice-controlled information retrieval system can be used as preprocessing for machines where as it can also be used to route an incoming telephone call to human operator at switchboard who is fluent in corresponding language which is a pre-processing for human-listeners. In this thesis, I analysed different Indian languages based on prosodic and spectral information. Unless like International Languages, most of the Indian Languages share a similar phoneme set. The discrimination of Indian Languages at word level is a challenging problem. Therefore, I extracted cues based on phoneme durations, intonation, intensity, rhythm and stress. Later I use these differences in modelling the speech to change the accent.

Duration Modelling In Voice Conversion Using ANN         Semester Project, 2011
Advisor: Dr. Kishore S Prahallad
Voice conversion aims at transforming the characteristics of a speech signal uttered by a source speaker in such a way that the transformed speech sounds like the target speaker. Such a conversion requires transformation of spectral and prosody features. In this project, we propose a technique for duration transformation of source speaker to that of a target speaker. This work is done in the framework of Artificial neural networks based voice conversion. The results are evaluated using subjective and objective measures confirm that incorporating durational modification to voice transformation improves the voice quality and has the characteristics of target speaker.

A Comparison of Prosody Modification Using two methods         Semester Project, 2010
Advisors: Dr. Biksha Raj, Dr. Kishore Prahallad and Prof. B Yegnanarayana
In this project, we compare two methods for prosody (duration and pitch) modification. Those two methods are prosody modification using instants of Significant Excitation and Mel-Cepstral vocoder. We show that duration modifications are better using Mel- Cepstral vocoder for higher modification factor while pitch modifications are better using instants of Significant Excitations. In the end we show that Mel-Cepstral vocoder provides flexibility for non-uniform prosody manipulation

Hand-written Digits Recognition         Course Project 2010
Advisor: Dr. Anoop
In this project, we showed how to solve the problem of classifying the handwritten numeric characters (0-9). For this we will be using classifier tool called lnknet that does the classification based on the input features that are extracted from each image using ANN classifier. So, our main aim is to derive the most desirable features from each image for classification. Different features have been extracted and first the classifier is trained with the extracted features. When a test image is presented, classification is done based on the extracted features from the test image.

Pronunciation Checker         Winter School Project, 2009-2010
Advisor: Dr. Biksha Raj
This Project involved implementation of a pronunciation checker. In this project we had taken the correct pronunciation word as reference and then we compared with the input. Comparison was done by using DTW (Dynamic Time Wrapping) algorithm, VQ (Vector Quantization) codebook approach.

Phoneme boundaries based Automatic Speech Segmentation         Summer 2009
Advisor: Dr. Kishore Prahallad
In this project, we proposed an algorithm used to automatic speech segmentation and labelling for English speech database. In our proposed method, the dissimilarity(difference between two consecutive frames) process is first performed on speech feature to obtain more robust feature and then we applied a threshold value to decide segmentation points. Experiment results show that our proposed method can efficiently detect up to 70 percentage. The accuracy of results are further increased by using more segmentation techniques.

Morphological Background Detection and Enhancement         Course Project 2009
Advisor: Dr. Jayanthi Sivaswamy
In this project, we first detected the background of the image using various methods. Using the background detected and comparing it with the original image, we did a non-linear mapping of the pixel intensities to get an enhanced image.

Time Scale Modulation         Monsoon 2009
Time Scale Modification (TSM) is the process of speeding up or slowing down a sound without affecting the pitch of the sound signal. TSM of a speech should sound like as if the same speaker is speaking at the slower rate or faster rate. I have implemented two algorithms for this task: SOLA and PSOLA.

Other Projects:

  • Design of GUI in python for audio recording and manipulations
  • Design of 3D Building using OpenGL
  • Design of Asteroid Shooter Game using OpenGL
  • Design of Inter-College Football Tournament Website
  • Design of Courier Portal for IIIT-H
  • Design of Traffic Controller Using Ultra-Sonic Sensors