COVID-19 Vaccine Tweets Sentiment Analysis

Overview

During the height of the COVID-19 pandemic in the summer of 2021, I analyzed the sentiment on COVID-19 vaccines in Canada by applying machine learning and natural language processing (NLP) techniques to tweets on COVID-19. This post documents it.

Summary

I developed classification models that classify COVID-19 vaccine-related tweets into three sentiment classes, positive, neutral, and negative, using scikit-learn. The models classified the sentiments of the tweets with 69% accuracy for Canada and 84% accuracy for the world.
I applied NLP techniques such as lemmatization to extract vaccine-related tweets using NLTK.
I extracted and analyzed n-grams related to vaccines from vaccine-related tweets.
Canada and worldwide data show more positive and neutral sentiments toward COVID-19 vaccines.

Problem Description

More and more people have been vaccinated and protected from COVID-19. Although normalcy gradually returns, some hesitancy exists around COVID-19 vaccines. Understanding the reluctance and acceptance of COVID-19 vaccines is key to preventing and mitigating the resurgence of the COVID-19 pandemic during the post-pandemic recovery.

Project Goal

During the pandemic, people have posted tweets about COVID-19 vaccines, including positive and negative messages.

Through machine learning and NLP techniques, I aim to:

Predict whether the tweets are positive, neutral, or negative
Understand the public opinion and sentiment on COVID-19 vaccines
Provide insights on the hesitancy and acceptance of COVID-19 vaccines

Code

My code can be found here:

https://github.com/masaki9/COVID-19-Tweets

Dataset

The following is the dataset that I used:

Dataset	COVID-19 Geo-Tagged Tweets Dataset at IEEE DataPort
Author	Rabindra Lamsal
Data	COVID-19 related tweet IDs witout tweet messages (I hydrated the IDs to get complete data.)
Dataset Overview	402,970 tweets 478 CSV files Global coverage English March 20, 2020 to July 11, 2021

Hydrating tweet IDs is the process of getting complete details of tweets using the Twitter API.

Machine Learning Workflow

The following describes the overview of my machine learning workflow for the project:

Machine Learning Workflow

Exploratory Data Analysis

I plotted a word cloud to see the most prominent words in the processed data containing vaccine-related tweets. The following word cloud shows words such as COVID-19, vaccine, dose, pandemic, and AstraZeneca appear in vaccine-related tweets.

World Cloud for COVID-19 Vaccine-Related Tweets (Canada) Word Cloud for COVID-19 Vaccine-Related Tweets in Canada

I visualized n-grams to see words and phrases associated with COVID-19 vaccine-related tweets. Click on each of the following images to enlarge it.

Canada

Worldwide

My summary of the analysis is as follows:

Positive Sentiments
- People feel optimistic about their first and second doses of the vaccine.
- People feel happy and safe about the vaccine.
- People show excitement about making appointments for vaccines.
- People are optimistic about the availability of vaccines.
- People accept COVID-19 vaccines and are not hesitant about getting their doses.
Negative Sentiments
- A small number of people are pessimistic about COVID-19 vaccines.
- A small number of people are concerned about side effects.
- A small number of people are pessimistic about herd immunity, but this is not a negative sentiment toward vaccines.
- A small number of people feel COVID-19 mutilations could render vaccines ineffective.

Modelling

I divided the processed data into a train set and a test set. I trained and validated my models on the train set using k-fold cross-validations. Then I evaluated the models on the test set.

Model Training

	Canada	Worldwide
Processed Data Size	1,108	23,693
Train-Test Split	70%-30%
X (Features)	Vectorized Tweets
y (Target Values)	Sentiment Labels (Positive, Neutral, Negative)
Models	Logistic Regression Naive Bayes Linear SVC (Support Vector Classifier) Decision Tree	Linear SVC

Model Testing

Cross-Validation

I performed 10-fold cross-validations on classification models and computed the mean accuracy, precision, and recall scores for Canada and the world, as shown below:

Canada

Model	Mean Accuracy	Mean Precision	Mean Recall
Logistic Regression	0.7084	0.7084	0.6803
Naive Bayes	0.6217	0.6217	0.6054
Linear SVC	0.7149	0.7149	0.7129
Decision Tree	0.6968	0.7006	0.7001

Worldwide

Model	Mean Accuracy	Mean Precision	Mean Recall
Linear SVC	0.8473	0.8473	0.8466

Testing Model on Test Set

I evaluated my Linear SVC models on the test set. The models predict the sentiments of vaccine-related tweets with 69% accuracy for Canada and 84% accuracy for the world, as shown below:

Canada

	Precision	Recall	F1 Score	# of Samples
Negative	0.23	0.10	0.14	30
Neutral	0.60	0.77	0.67	118
Positive	0.81	0.74	0.77	185

Accuracy			0.69	333
Macro AVG	0.55	0.54	0.53	333
Weighted AVG	0.68	0.69	0.68	333

Confusion Matrix (Canada)

Worldwide

	Precision	Recall	F1 Score	# of Samples
Negative	0.77	0.67	0.71	995
Neutral	0.81	0.88	0.84	2474
Positive	0.89	0.87	0.88	3639

Accuracy			0.84	7108
Macro AVG	0.82	0.81	0.81	7108
Weighted AVG	0.84	0.84	0.84	7108

Confusion Matrix (Worldwide)

Results and Insights Gained

My linear SVC models for Canada and the world classify COVID-19 vaccine-related tweets into three sentiment classes with 69% and 84% accuracy, respectively. The model for Canada has low precision (27%) and recall (13%) for the Negative sentiment class, and this is likely due to the small volume of data (1108 tweets) after data processing.

Based on my analysis, Canada and worldwide data show more positive and neutral sentiments toward COVID-19 vaccines. Most people are not skeptical about COVID-19 vaccines and are not hesitant about getting their doses. A few people have shown negativity around COVID-19 vaccines and their side effects.

COVID-19 Vaccine Tweets Sentiment Analysis

Overview

Summary

Problem Description

Project Goal

Code

Dataset

Machine Learning Workflow

Exploratory Data Analysis

Canada

Worldwide

Modelling

Model Training

Model Testing

Cross-Validation

Canada

Worldwide

Testing Model on Test Set

Canada

Worldwide

Results and Insights Gained

Further Reading

Tokyo Airbnb Data Analytics

COVID-19 Transmission Analysis with Topic Modelling

Mental Health Sentiment Analysis API in ML.NET and ASP.NET Core