Projects
Mentioned below are some of the projects that I have worked on in the past or currently working on. They have been categorized as 'NCSU - Grad School'and 'JIIT-Undergrad' depending on what phase in my education I completed them.
NCSU - Grad School -
Sentiment Analysis of Twitter Data
using Python, NLTK and scikit-learn.
Developed multi-classifier models to perform sentiment analysis of twitter data for the Sentiment140 dataset and compared the performance of Naive Bayes, MaxEntropy and SVM classifiers over different performance metrics.
Achieved comparable results (~75% accuracy) to existing approaches despite using a general corpus and default parameters.
Abstract:
Opinion Mining or Sentiment Analysis is the technique of determining the attitude of a speaker with
respect to a topic or an overall document. Twitter is a popular platform where people express their
opinions and sentiments. Sentiment analysis of twitter data can help in the search of products,
companies, movies, books etc. based on the overall sentiment associated with that tweet.
We present a sentiment analysis classification engine for Twitter data. We apply certain preprocessing
approaches to cleanse the data, extract useful features from it and then build a training model on it. We
present a comparative model using Naives Bayes, Max Entropy and SVM classifiers. We also collected
results using the Random Forest and AdaBoost ensemble classifiers. We achieved an average accuracy
of ~75% through different experiments using varied classifiers and preprocessing
steps.
VoiceTravel - A Dialogue Assistant for Flight Search
using Python, NLTK , Django and Android SDK.
Developed a dialogue based system that provides the user with a spoken interface to search flights. Implemented the system as an android application that communicates with a remote Django server to perform language processing on user utterance through named entity recognition, part-of-speech tagging etc.
Abstract:
VoiceTravel is an interactive flight search dialogue system designed for
mobile devices running the Android mobile operating system. The app accepts the search criteria
using voice based input from user and returns the flights matching the criteria. The search criteria in
the current version of the application includes origin-city, destination-city, date of travel and an optional
time parameter. The criteria can be specified in
regular English and in any order. Multiple criteria
can be specified in a single utterance as well.
VoiceTravel utilizes advancements in computational linguistics and smartphone space to disrupt
how air travel search is done.
Tag Based Data De-duplication over a Network
using C++, Pthreads and Socket Programming.
Developed an in-line data de-duplication scheme for a file-system using fixed sized blocks for a network structure. System supports tag-based file retrieval, file-locking and caching at client end for bandwidth optimization. The system had a de-duplication factor of 0.7 over different test sets and also had a robust underlying peer to peer network
Abstract:
The motivation of our project is to build a data de-duplication scheme which would enable
us to build an object repository in a network. A network is characterized by several hosts,
all requesting concurrent access to files and entering new files (objects) into the network.
Hence, in order to meet the clients' needs, we need to come up with a computational solution
which would enable a central server to service these requests and store data with minimum
storage space.
We came up with a client-server model where the server would break up files into fixed size
blocks, create a mapping between blocks and files and also calculate hash values for all these
blocks which would help in the de-duplication process. The difficult part is to implement
this system over a network with one server handling concurrent requests for the same file
from different client and we aim to solve this problem through this project. The project is
somewhat analogous to the Amazon S3 repository which stores de-duplicated objects and
provides access to users on an on-demand basis.
Reasonable Care Database System
using Java, JDBC, SWING and Oracle.
Developed a database application for a university health center scenario with functionalities like making appointments, managing schedules, managing billing using credit card and insurance details, recording medical history etc. All these functionalities were implemented using SQL and an appropriate UI.
Abstract:
ReasonableCare database system maintains medical reports, appointment information etc. for students and doctors who are part of the university. The system allows a student to book and cancel appointments with doctor as per availability or the type of health problem. At any time, a student or a doctor has access to his past and upcoming appointments and doctors and nurses can read and update a student medical records. Every new student has to get 3 vaccinations, which if not done by student by the end of first semester will lead to suspension of privileges on student’s records. The system also maintains student health insurance information and amount of co-payment required to be done by student which depends on whether the appointment is with a general practitioner or a specialist or it can depend on the reason of visit to the doctor. If the deductible amount for the current year has been paid in full by student, the health center tracks the health insurance information from student account and bill the insurance company for the remaining balance. Students takes consultations from nurses over the phone which is updated in system and is accessible to both the nurse and student for future reference. Nurses are able to access information of the list of doctors the student has been seeing for consulting or specific doctor in case of a particular requirement.
FIFA World Cup 2014 Prediction System
using Python, PyBrain and Pandas
Implemented a neural network multi-layer perceptron model to predict the outcome of the FIFA World Cup 2014. The model uses a single hidden node and the features have been derived from the past performance at different World Cups by the participating teams. I am currently working on extending the model by adding features specific to a particular player and also consider the performance in 'non-world cup' matches between participating teams. I achieved 98.5% correct classifications overall.
Abstract:
No additional detail here.
Comparative Analysis of Graph Search Algorithms
using C++
Presented a comparative analysis of graph search algorithms like A*, Greedy Search, Uniform Cost Search, Breadth First Search and Depth First Search by running each search algorithm on a map of United States. Reported results in terms of distance travelled and the number of intermediate cities it takes to get to the destination from the source. All algorithms were implemented from scratch.
Abstract:
No additional detail here.
JIIT - Undergrad -
Sign Language Gesture Recognition using Microsoft Kinect
using C++, Python, OpenCV and LIBSVM.
Developed a framework for sign language gesture recognition using machine learning and computer vision algorithms by creating a characteristic depth and motion profile for each gesture using only depth images. Images were taken from a Microsoft Kinect. Published in IEEE.
Abstract:
In last decade lot of efforts had been made by
research community to create sign language recognition system
which provide a medium of communication for differently-abled
people and their machine translations help others having trouble
in understanding such sign languages. Computer vision and
machine learning can be collectively applied to create such
systems. In this paper, we present a sign language recognition
system which makes use of depth images that were captured
using a Microsoft Kinect® camera. Using computer vision
algorithms, we develop a characteristic depth and motion profile
for each sign language gesture. The feature matrix thus
generated was trained using a multi-class SVM classifier and the
final results were compared with existing techniques. The dataset
used is of sign language gestures for the digits 0-9.
Network Traffic Analyzer
using Python, oTCL and Matlab.
A classifier that uses machine learning techniques to
classify incoming network traffic based upon features like throughput, packet length, packet inter-arrival time etc. into
its source type without using the port number information. Results were obtained by performing experiements on self-generated traffic flow datasets using this approach.
Abstract:
Traffic classification has a vital role in tasks as wide ranging as trend analyses, adaptive networkbased
QoS marking of traffic, dynamic access control and lawful interception. Traditionally
performed using port and payload based analysis, recent years have seen an increased interest
in the development of machine learning techniques for classification.
Much of this existing research focuses on the achievable accuracy (classification accuracy) of
different machine learning algorithms. These experiments have used different (thus not
comparable) datasets and features. The process of defining appropriate features, performing
feature selection and the influence of this on classification and computation performance has
not been studied.
With this project we recognize that real-time traffic classifiers will operate under constraints,
which limit the number and type of features that can be calculated. On this basis we define 9
flow features like Flow Duration
Packet length
Inter-arrival time of packets
Number of packets
Throughput
Packet Size (minimum, maximum, average, standard deviation) which are simple to compute and are well understood within the networking
community. We evaluate the classification accuracy and computational performance of K-NN
Clustering Algorithm, Naïve Bayes Algorithm and Multi Class Logistic Regression algorithms using
the 9 features divided into the three reduced feature vectors.