Home COVID-19 Vaccine Tweets Sentiment Analysis
Post
Cancel

COVID-19 Vaccine Tweets Sentiment Analysis

Overview

During the height of the COVID-19 pandemic in the summer of 2021, I analyzed the sentiment on COVID-19 vaccines in Canada by applying machine learning and natural language processing (NLP) techniques to tweets on COVID-19. This post documents it.

Summary

  • I developed classification models that classify COVID-19 vaccine-related tweets into three sentiment classes, positive, neutral, and negative, using scikit-learn. The models classified the sentiments of the tweets with 69% accuracy for Canada and 84% accuracy for the world.
  • I applied NLP techniques such as lemmatization to extract vaccine-related tweets using NLTK.
  • I extracted and analyzed n-grams related to vaccines from vaccine-related tweets.
  • Canada and worldwide data show more positive and neutral sentiments toward COVID-19 vaccines.

Problem Description

More and more people have been vaccinated and protected from COVID-19. Although normalcy gradually returns, some hesitancy exists around COVID-19 vaccines. Understanding the reluctance and acceptance of COVID-19 vaccines is key to preventing and mitigating the resurgence of the COVID-19 pandemic during the post-pandemic recovery.

Project Goal

During the pandemic, people have posted tweets about COVID-19 vaccines, including positive and negative messages.

Through machine learning and NLP techniques, I aim to:

  • Predict whether the tweets are positive, neutral, or negative
  • Understand the public opinion and sentiment on COVID-19 vaccines
  • Provide insights on the hesitancy and acceptance of COVID-19 vaccines

Code

My code can be found here:

Dataset

The following is the dataset that I used:

DatasetCOVID-19 Geo-Tagged Tweets Dataset at IEEE DataPort
AuthorRabindra Lamsal
DataCOVID-19 related tweet IDs witout tweet messages
(I hydrated the IDs to get complete data.)
Dataset Overview402,970 tweets
478 CSV files
Global coverage
English
March 20, 2020 to July 11, 2021

Hydrating tweet IDs is the process of getting complete details of tweets using the Twitter API.

Machine Learning Workflow

The following describes the overview of my machine learning workflow for the project:

Machine Learning Workflow Machine Learning Workflow

Exploratory Data Analysis

I plotted a word cloud to see the most prominent words in the processed data containing vaccine-related tweets. The following word cloud shows words such as COVID-19, vaccine, dose, pandemic, and AstraZeneca appear in vaccine-related tweets.

World Cloud for COVID-19 Vaccine-Related Tweets (Canada) Word Cloud for COVID-19 Vaccine-Related Tweets in Canada

I visualized n-grams to see words and phrases associated with COVID-19 vaccine-related tweets. Click on each of the following images to enlarge it.

Canada












Worldwide







My summary of the analysis is as follows:

  • Positive Sentiments
    • People feel optimistic about their first and second doses of the vaccine.
    • People feel happy and safe about the vaccine.
    • People show excitement about making appointments for vaccines.
    • People are optimistic about the availability of vaccines.
    • People accept COVID-19 vaccines and are not hesitant about getting their doses.
  • Negative Sentiments
    • A small number of people are pessimistic about COVID-19 vaccines.
    • A small number of people are concerned about side effects.
    • A small number of people are pessimistic about herd immunity, but this is not a negative sentiment toward vaccines.
    • A small number of people feel COVID-19 mutilations could render vaccines ineffective.

Modelling

I divided the processed data into a train set and a test set. I trained and validated my models on the train set using k-fold cross-validations. Then I evaluated the models on the test set.

Model Training

 CanadaWorldwide
Processed Data Size1,10823,693
Train-Test Split70%-30% 
X (Features)Vectorized Tweets 
y (Target Values)Sentiment Labels (Positive, Neutral, Negative) 
ModelsLogistic Regression
Naive Bayes
Linear SVC (Support Vector Classifier)
Decision Tree
Linear SVC

Model Testing

Cross-Validation

I performed 10-fold cross-validations on classification models and computed the mean accuracy, precision, and recall scores for Canada and the world, as shown below:

Canada
ModelMean AccuracyMean PrecisionMean Recall
Logistic Regression0.70840.70840.6803
Naive Bayes0.62170.62170.6054
Linear SVC0.71490.71490.7129
Decision Tree0.69680.70060.7001
Worldwide
ModelMean AccuracyMean PrecisionMean Recall
Linear SVC0.84730.84730.8466
Testing Model on Test Set

I evaluated my Linear SVC models on the test set. The models predict the sentiments of vaccine-related tweets with 69% accuracy for Canada and 84% accuracy for the world, as shown below:

Canada
 PrecisionRecallF1 Score# of Samples
Negative0.230.100.1430
Neutral0.600.770.67118
Positive0.810.740.77185
     
Accuracy  0.69333
Macro AVG0.550.540.53333
Weighted AVG0.680.690.68333

Confusion Matrix (Canada) Confusion Matrix (Canada)

Worldwide
 PrecisionRecallF1 Score# of Samples
Negative0.770.670.71995
Neutral0.810.880.842474
Positive0.890.870.883639
     
Accuracy  0.847108
Macro AVG0.820.810.817108
Weighted AVG0.840.840.847108

Confusion Matrix (Worldwide) Confusion Matrix (Worldwide)

Results and Insights Gained

My linear SVC models for Canada and the world classify COVID-19 vaccine-related tweets into three sentiment classes with 69% and 84% accuracy, respectively. The model for Canada has low precision (27%) and recall (13%) for the Negative sentiment class, and this is likely due to the small volume of data (1108 tweets) after data processing.

Based on my analysis, Canada and worldwide data show more positive and neutral sentiments toward COVID-19 vaccines. Most people are not skeptical about COVID-19 vaccines and are not hesitant about getting their doses. A few people have shown negativity around COVID-19 vaccines and their side effects.

This post is licensed under CC BY 4.0 by the author.
Contents