nlp

Cable-cord cutter sentiment analysis using Reddit data

Understanding customer sentiments about their products or services is key to any business. In this project, we scraped data from Reddit and performed Named entity recognition,topic modelling and sentiment analysis on the comments to understand public views about moving from cable channels to streaming services

Salary prediction based on job description using XGboost

During my job search, i often wondered how companies such as Glassdoor,LinkedIn and others are able to identify the pay scale of a particular job. Applying the text processing and predictive analytics skills that we learnt from our course, we have achieved the objective of predicting high and low paying jobs with ~80% accuracy which is an increase of 30% from baseline accuracy(50%)

Author attribution using TF-IDF, PCA and Randomforest

Author attribution is one of the famous NLP technique to identify the author of an unidentified article or to determine the genuine author of a publication when there are multiple claims. In this mini project, I have tried to attribute articles to the respective authors and used multiple classification techniques to come up with the most suitable model for the analysis.

Car brand association analysis using web scraped data

Understanding how customers are associating a brand with their own sentiments is crucial information to growth in industries. In this mini project, we have found associations between luxury cars discussed in Edmund’s forum and generated insights regarding what attributes are customers talking about when it comes to these brands