Sentiment Analysis with Logistic Regression
In previous posts, we explored sentiment analysis using a simple bag-of-words model and Naive Bayes. In this post, we will delve into sentiment analysis with logistic regression. Unlike Naive Bayes, which is a generative model, logistic regression is a discriminative model. The key difference lies in their approach: a generative model estimates the joint probability of the features and the target variable, whereas a discriminative model estimates the conditional probability of the target variable given the features.
Components of a Machine Learning Classifier
- Feature Representation: A method to represent the input data.
- Classification Function: A function that estimates the class
given the input features by modeling . - Loss Function: A measure of the discrepancy between the predicted class and the true class.
- Optimization Algorithm: A method to minimize the loss function.
After acquiring the data, the text data must be tokenized and vectorized. Various vectorization techniques can be employed, such as bag-of-words, TF-IDF, or word embeddings.
In this instance, we will use subword tokenization along with basic encoding, utilizing the Hugging Face library for text tokenization.
For the classification function, we will implement logistic regression with a sigmoid activation function.
Let's first understand the sigmoid function:
The goal of Logistic Regression is to train a classifier that can distinguish between two classes. The sigmoid function is used to squash the input value between 0 and 1, which can be interpreted as the probability of the input belonging to a particular class.
consider a single input observation
where
Here we want to know the probability of the input belonging to class
The decision is positive sentiment vs negative sentiment.the features represent the words in the text.
Logistic regression solves this task by learning the weights
The binary cross-entropy loss is defined as:
where
To make a prediction on test instance after training the classifier first multiply the input features with the learned weights and bias. The reulting value is weighted sum of the input features.
But here
The output of the sigmoid function is the predicted probability of the input belonging to class 1.
To make output of sigmoid a probability, we need to make sure the two cases
Once we have the predicted probability, we can make a decision by setting a threshold. If the predicted probability is greater than the threshold, we predict class 1; otherwise, we predict class 0.The threshold is also called the decision boundary.
Let's walk through a simple example to understand the logistic regression model. Consider a dataset with two features,
| 1 | 2 | 0 |
| 2 | 3 | 0 |
| 3 | 4 | 1 |
| 4 | 5 | 1 |
The logistic regression model is defined as:
where
Let's assume the model parameters are
The predicted probability is 0.832, which is greater than the threshold of 0.5. Therefore, we predict class 1 for the test instance.
Naive Bayes vs Logistic Regression
Naive Bayes has a strong assumption of feature independence, which may not hold true in practice. Logistic regression does not make this assumption and can capture complex relationships between features.
Consider two features,
Multinomial Logistic Regression
In the case of multi-class classification, we can extend logistic regression to the multinomial logistic regression model.
The softmax function is used to compute the probability of each class given the input features.
The softmax function is defined as:
where
The loss function used in multinomial logistic regression is the cross-entropy loss, which measures the difference between the predicted probabilities and the true labels.
Cross-Entropy Loss
We need a loss function that expresses, for an observation
The cross-entropy loss is defined as:
where
For optimization, we can use gradient descent to minimize the loss function and learn the model parameters.
Gradient descent is an iterative optimization algorithm that updates the model parameters in the direction of the steepest decrease in the loss function.
Model parameters are updated as follows:
where
Conclusion
In this post, we explored sentiment analysis with logistic regression. We discussed the components of a machine learning classifier, the sigmoid function, the binary cross-entropy loss, and the softmax function. We also compared Naive Bayes and logistic regression and introduced multinomial logistic regression for multi-class classification tasks. Finally, we discussed the cross-entropy loss and gradient descent for optimization.