Data Science: Uncovering Customer Sentiments of Brands/Products Using Sentiment Analysis (Part 1)

Generate customer sentiment analysis of products/brands using VADER in Python

David-kyn
6 min readSep 5, 2021

Context: This article is part of Heicoders Academy’s continual effort to enhance our students’ conceptual understanding of Data Science concepts and tools.

Introduction

Text mining is the process of transforming unstructured text into a structured format (using NLP techniques) and thereafter performing analysis on the structured text data to identify meaningful patterns and generate new insights. Some tasks which text mining is typically applied to includes: Sentiment Analysis , Named Entity Recognition, Speech Recognition and Auto Suggest.

Figure 1: Text Mining end-to-end Process

In the article, we will go into more details about Sentiment Analysis:

  1. What is Sentiment Analysis?
  2. What is VADER?
  3. Implementation of VADER
  4. Uses cases of Sentiment Analysis

1. What is Sentiment Analysis?

Sentiment analysis is the process of deriving (statistically) whether a piece of text is positive, negative or neutral. Majority of sentiment analysis approaches take one of two forms:

  • polarity-based, where pieces of texts are classified as either positive or negative, or
  • valence-based, where the intensity of the sentiment is taken into account.
Figure 2: Polarity-based vs Valence-based Approach

For example, the words ‘good’ and ‘excellent’ would be treated the same in a polarity-based approach, whereas ‘excellent’ would be treated as more positive than ‘good’ in a valence-based approach. Generally, valence-based systems are preferred by researchers and analyst as they provide more information. For instance, they help us recognize changes in sentiment intensity over time in order to detect when rhetoric on a subject / product is heating up or cooling down

2. What is VADER?

Typically to build a sentiment analysis model on our own, we would need to go through the process of: (1) pre-processing, (2) feature extraction, and thereafter (3) train our machine learning model. This end-to-end process can be complex, and the results often leaves more to be desired when one is not a domain expert in linguistics. Moreover, it is easy to imagine the complexity in implementing such a complex model like this from scratch.

Figure 3: Vader Library

Fortunately, researchers from Georgia Institute of Technology have developed a powerful and easy-to-use library which handles the entire end-to-end process of sentiment analysis — VADER. This greatly reduced the complexity of developing a sentiment analysis model to just a few lines of codes.

VADER is rule-based model that comprises a huge lexicon (fancy word for dictionary), where each word in the lexicon is rated as to whether it is positive or negative, and in many cases, how positive or negative. Below is a snippet of VADER’s lexicon. You can observe that words with stronger positive intensity have higher positive ratings and words with stronger negative intensity have lower negative ratings.

Figure 4: Sentiment Rating in VADER’s Lexicon

Check out the research paper for more details on how the researchers developed the VADER model.

Essentially, VADER:

  • Analyses a piece of text it checks to see if any of the words in the text are present in the lexicon.
  • For example, the sentence “The food is good and the atmosphere is nice” has two words in the lexicon (good and nice) with ratings of 1.9 and 1.8 respectively.
  • After analysing a piece of text, VADER will output four sentiment metrics as shown in the table.
Figure 5: Sentiment Metric Produced by VADER

The first three, positive, neutral and negative, represent the proportion of the text that falls into those categories. As you can see, our example sentence was rated as 45% positive, 55% neutral and 0% negative. The final metric, the compound score, is the sum of all of the lexicon ratings (1.9 and 1.8 in this case) which have been standardised to range between -1 and 1. In this case, our example sentence has a rating of 0.69, which is pretty positive.

There are other alternate lexicons to VADER such as Harvard’s General Inquirer, Loughran McDonald, Hu & Liu. That said, VADER really shines when handling text data from social media given that it was specially tuned to interpret elements that are common in social media such as: abbreviation, punctuations and even emojis.

For example, from the tweet below, you can observe that the elements of that the writer is unhappy (in the blue boxes) are actually informal writing — multiple punctuation marks, acronyms and an emojis. If we didn’t take this information into account, this tweet would actually look neutral to a sentiment analysis model.

Figure 6: Customer Complaints

3. Implementation of VADER

As mentioned, the implementation of VADER is as simple as a few lines of code. Here we will use a list of tweets obtained from Kaggle to illustrate the implementation. Students who have taken Heicoders Academy’s AI200: Applied Machine Learning course should find the code fairly easy to understand.

To use VADER you simply have to import the vaderSentiment library

Thereafter, pass any sentence you want to run the sentiment analysis on to the polarity_scores() function as shown below:

We have prepared a full Jupyter notebook illustrating how to use the VADER library to generate sentiment analysis on tweets of airline customer. Here is a snippet of the output generated by our VADER model.

Figure 7: Output of Vader Model

This will provide you with a clear idea of the end-to-end workflow when implementing a sentiment analysis using the VADER library. You can access this notebook via our Heicoders Telegram group: https://t.me/heicoders_professionals

Figure 8: Screenshot of Provided Sample Code

4. Use Cases of Sentiment Analysis

The are a multitude of benefits and use-cases of sentiment analysis : from algorithmic trading to improving brand reputation to formulating customer acquisition strategy. For instance, one can quickly scrap the social media of you and your competitor and thereafter use a Sentiment Analysis Model to compare the consumer sentiments of your brand vis-à-vis your competitors.

With a little creativity, you could build endless applications with this VADER library. To help get you started, here are some articles that further elaborates on the application and power of sentiment analysis:

Conclusion

The VADER library abstracts away a lot of the difficulty in implementing sentiment analysis models. In fact, as we shown in this article, sentiment analysis is really just a few lines of code with the VADER library. This is remarkable given that the VADER library is a powerful sentiment analysis model that is widely used by the industry in a variety of applications.

This is also the reason why we wrote this article — to motivate our students to grab hold of low hanging fruits like these and start applying sentiment analysis to the data in your workplace / projects.

Now, while VADER is a really excellent library, there have been breakthroughs in recent years in the field of text mining that paved the way for even more powerful sentiment analysis models. In our next article, we will share how to use transfer learning to generate even more accurate sentiment analysis models.

We hope you enjoyed reading this article as much as we did writing this! =)

References:

  1. http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf
  2. https://t-redactyl.io/blog/2017/04/using-vader-to-handle-sentiment-analysis-with-social-media-text.html

--

--