In this python notebook, we will show you how to use the pre-trained model to predict the sentiment of a sentence. This notebook goes together with our State of the Art paper on Transformer Models for the eBISS 2023.
First, we need to install the dependencies. We will use the transformers library from HuggingFace, which implements several pre-trained models that we will fine-tune for our task. From this library we will also use the BERT tokenizer. A tokenizer is a function that splits a sentence into tokens, which are the basic units of a language. For example, the sentence "I love transformers" can be tokenized into the following tokens: ["I", "love", "transformers"]. Tokens are then mapped to a continuous space, called embeddings, which are used as input to the model.
We will also use the datasets library from HuggingFace, which provides a convenient way to load the IMDB dataset.
!pip install torch transformers datasets
!pip install accelerate -U
Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: torch in /home/jose/.local/lib/python3.10/site-packages (2.0.1) Requirement already satisfied: transformers in /home/jose/.local/lib/python3.10/site-packages (4.30.2) Requirement already satisfied: datasets in /home/jose/.local/lib/python3.10/site-packages (2.13.1) Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from torch) (3.6.0) Requirement already satisfied: typing-extensions in /home/jose/.local/lib/python3.10/site-packages (from torch) (4.5.0) Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch) (1.9) Requirement already satisfied: networkx in /home/jose/.local/lib/python3.10/site-packages (from torch) (3.1) Requirement already satisfied: jinja2 in /usr/lib/python3/dist-packages (from torch) (3.0.3) Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.7.99) Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.7.99) Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.7.101) Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/jose/.local/lib/python3.10/site-packages (from torch) (8.5.0.96) Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.10.3.66) Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/jose/.local/lib/python3.10/site-packages (from torch) (10.9.0.58) Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in /home/jose/.local/lib/python3.10/site-packages (from torch) (10.2.10.91) Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.4.0.1) Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.7.4.91) Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in /home/jose/.local/lib/python3.10/site-packages (from torch) (2.14.3) Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in /home/jose/.local/lib/python3.10/site-packages (from torch) (11.7.91) Requirement already satisfied: triton==2.0.0 in /home/jose/.local/lib/python3.10/site-packages (from torch) (2.0.0) Requirement already satisfied: setuptools in /home/jose/.local/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (67.6.1) Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.37.1) Requirement already satisfied: cmake in /home/jose/.local/lib/python3.10/site-packages (from triton==2.0.0->torch) (3.26.4) Requirement already satisfied: lit in /home/jose/.local/lib/python3.10/site-packages (from triton==2.0.0->torch) (16.0.6) Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /home/jose/.local/lib/python3.10/site-packages (from transformers) (0.15.1) Requirement already satisfied: numpy>=1.17 in /home/jose/.local/lib/python3.10/site-packages (from transformers) (1.23.5) Requirement already satisfied: packaging>=20.0 in /usr/lib/python3/dist-packages (from transformers) (21.3) Requirement already satisfied: pyyaml>=5.1 in /usr/lib/python3/dist-packages (from transformers) (5.4.1) Requirement already satisfied: regex!=2019.12.17 in /home/jose/.local/lib/python3.10/site-packages (from transformers) (2023.6.3) Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from transformers) (2.25.1) Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /home/jose/.local/lib/python3.10/site-packages (from transformers) (0.13.3) Requirement already satisfied: safetensors>=0.3.1 in /home/jose/.local/lib/python3.10/site-packages (from transformers) (0.3.1) Requirement already satisfied: tqdm>=4.27 in /home/jose/.local/lib/python3.10/site-packages (from transformers) (4.65.0) Requirement already satisfied: pyarrow>=8.0.0 in /home/jose/.local/lib/python3.10/site-packages (from datasets) (12.0.1) Requirement already satisfied: dill<0.3.7,>=0.3.0 in /home/jose/.local/lib/python3.10/site-packages (from datasets) (0.3.6) Requirement already satisfied: pandas in /home/jose/.local/lib/python3.10/site-packages (from datasets) (1.5.3) Requirement already satisfied: xxhash in /home/jose/.local/lib/python3.10/site-packages (from datasets) (3.2.0) Requirement already satisfied: multiprocess in /home/jose/.local/lib/python3.10/site-packages (from datasets) (0.70.14) Requirement already satisfied: fsspec[http]>=2021.11.1 in /home/jose/.local/lib/python3.10/site-packages (from datasets) (2023.6.0) Requirement already satisfied: aiohttp in /home/jose/.local/lib/python3.10/site-packages (from datasets) (3.8.4) Requirement already satisfied: attrs>=17.3.0 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (22.2.0) Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (3.1.0) Requirement already satisfied: multidict<7.0,>=4.5 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (6.0.4) Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (4.0.2) Requirement already satisfied: yarl<2.0,>=1.0 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (1.8.2) Requirement already satisfied: frozenlist>=1.1.1 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.3) Requirement already satisfied: aiosignal>=1.1.2 in /home/jose/.local/lib/python3.10/site-packages (from aiohttp->datasets) (1.3.1) Requirement already satisfied: python-dateutil>=2.8.1 in /home/jose/.local/lib/python3.10/site-packages (from pandas->datasets) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas->datasets) (2022.1) Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0) Requirement already satisfied: idna>=2.0 in /usr/lib/python3/dist-packages (from yarl<2.0,>=1.0->aiohttp->datasets) (3.3) Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: accelerate in /home/jose/.local/lib/python3.10/site-packages (0.20.3) Requirement already satisfied: numpy>=1.17 in /home/jose/.local/lib/python3.10/site-packages (from accelerate) (1.23.5) Requirement already satisfied: packaging>=20.0 in /usr/lib/python3/dist-packages (from accelerate) (21.3) Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages/psutil-5.9.1-py3.10-linux-x86_64.egg (from accelerate) (5.9.1) Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from accelerate) (5.4.1) Requirement already satisfied: torch>=1.6.0 in /home/jose/.local/lib/python3.10/site-packages (from accelerate) (2.0.1) Requirement already satisfied: filelock in /usr/lib/python3/dist-packages (from torch>=1.6.0->accelerate) (3.6.0) Requirement already satisfied: typing-extensions in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (4.5.0) Requirement already satisfied: sympy in /usr/lib/python3/dist-packages (from torch>=1.6.0->accelerate) (1.9) Requirement already satisfied: networkx in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (3.1) Requirement already satisfied: jinja2 in /usr/lib/python3/dist-packages (from torch>=1.6.0->accelerate) (3.0.3) Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.7.99) Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.7.99) Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.7.101) Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (8.5.0.96) Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.10.3.66) Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (10.9.0.58) Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (10.2.10.91) Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.4.0.1) Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.7.4.91) Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (2.14.3) Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (11.7.91) Requirement already satisfied: triton==2.0.0 in /home/jose/.local/lib/python3.10/site-packages (from torch>=1.6.0->accelerate) (2.0.0) Requirement already satisfied: setuptools in /home/jose/.local/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.6.0->accelerate) (67.6.1) Requirement already satisfied: wheel in /usr/lib/python3/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.6.0->accelerate) (0.37.1) Requirement already satisfied: cmake in /home/jose/.local/lib/python3.10/site-packages (from triton==2.0.0->torch>=1.6.0->accelerate) (3.26.4) Requirement already satisfied: lit in /home/jose/.local/lib/python3.10/site-packages (from triton==2.0.0->torch>=1.6.0->accelerate) (16.0.6)
We will use the IMDB dataset, which contains 50,000 movie reviews from the Internet Movie Database. Each review is labeled as either positive or negative.
Our objective is to train a model that can predict the sentiment of a movie review.
First, we will use a pre-trained model to predict the sentiment of a movie review. Then, we will fine-tune the model on the IMDB dataset and compare the results.
from datasets import load_dataset
dataset = load_dataset('imdb')
/home/jose/.local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm Found cached dataset imdb (/home/jose/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0) 100%|██████████| 3/3 [00:00<00:00, 886.25it/s]
Let's analyze a little bit the dataset. We will load the dataset and print the first 5 examples.
# Select 5 random samples
import random
random.seed(42)
dataset = dataset.shuffle()
for i in range(5):
print('---')
print(dataset['train'][i]['text'])
print('negative' if dataset['train'][i]['label'] == 0 else 'positive')
--- There is no relation at all between Fortier and Profiler but the fact that both are police series about violent crimes. Profiler looks crispy, Fortier looks classic. Profiler plots are quite simple. Fortier's plot are far more complicated... Fortier looks more like Prime Suspect, if we have to spot similarities... The main character is weak and weirdo, but have "clairvoyance". People like to compare, to judge, to evaluate. How about just enjoying? Funny thing too, people writing Fortier looks American but, on the other hand, arguing they prefer American series (!!!). Maybe it's the language, or the spirit, but I think this series is more English than American. By the way, the actors are really good and funny. The acting is not superficial at all... positive --- This movie is a great. The plot is very true to the book which is a classic written by Mark Twain. The movie starts of with a scene where Hank sings a song with a bunch of kids called "when you stub your toe on the moon" It reminds me of Sinatra's song High Hopes, it is fun and inspirational. The Music is great throughout and my favorite song is sung by the King, Hank (bing Crosby) and Sir "Saggy" Sagamore. OVerall a great family movie or even a great Date movie. This is a movie you can watch over and over again. The princess played by Rhonda Fleming is gorgeous. I love this movie!! If you liked Danny Kaye in the Court Jester then you will definitely like this movie. positive --- George P. Cosmatos' "Rambo: First Blood Part II" is pure wish-fulfillment. The United States clearly didn't win the war in Vietnam. They caused damage to this country beyond the imaginable and this movie continues the fairy story of the oh-so innocent soldiers. The only bad guys were the leaders of the nation, who made this war happen. The character of Rambo is perfect to notice this. He is extremely patriotic, bemoans that US-Americans didn't appreciate and celebrate the achievements of the single soldier, but has nothing but distrust for leading officers and politicians. Like every film that defends the war (e.g. "We Were Soldiers") also this one avoids the need to give a comprehensible reason for the engagement in South Asia. And for that matter also the reason for every single US-American soldier that was there. Instead, Rambo gets to take revenge for the wounds of a whole nation. It would have been better to work on how to deal with the memories, rather than suppressing them. "Do we get to win this time?" Yes, you do. negative --- In the process of trying to establish the audiences' empathy with Jake Roedel (Tobey Maguire) the filmmakers slander the North and the Jayhawkers. Missouri never withdrew from the Union and the Union Army was not an invading force. The Southerners fought for State's Rights: the right to own slaves, elect crooked legislatures and judges, and employ a political spoils system. There's nothing noble in that. The Missourians could have easily traveled east and joined the Confederate Army.<br /><br />It seems to me that the story has nothing to do with ambiguity. When Jake leaves the Bushwhackers, it's not because he saw error in his way, he certainly doesn't give himself over to the virtue of the cause of abolition. positive --- Yeh, I know -- you're quivering with excitement. Well, *The Secret Lives of Dentists* will not upset your expectations: it's solidly made but essentially unimaginative, truthful but dull. It concerns the story of a married couple who happen to be dentists and who share the same practice (already a recipe for trouble: if it wasn't for our separate work-lives, we'd all ditch our spouses out of sheer irritation). Campbell Scott, whose mustache and demeanor don't recall Everyman so much as Ned Flanders from *The Simpsons*, is the mild-mannered, uber-Dad husband, and Hope Davis is the bored-stiff housewife who channels her frustrations into amateur opera. One night, as Dad & the daughters attend one of Davis' performances, he discovers that his wife is channeling her frustrations into more than just singing: he witnesses his wife kissing and flirting with the director of opera. (One nice touch: we never see the opera-director's face.) Dreading the prospect of instituting the proceedings for separation, divorce, and custody hearings -- profitable only to the lawyers -- Scott chooses to pretend ignorance of his wife's indiscretions.<br /><br />Already, the literate among you are starting to yawn: ho-hum, another story about the Pathetic, Sniveling Little Cuckold. But Rudolph, who took the story from a Jane Smiley novella, hopes that the wellworn-ness of the material will be compensated for by a series of flashy, postmodern touches. For instance, one of Scott's belligerent patients (Denis Leary, kept relatively -- and blessedly -- in check) will later become a sort of construction of the dentist's imagination, emerging as a Devil-on-the-shoulder advocate for the old-fashioned masculine virtues ("Dump the b---h!", etc.). When not egged-on by his imaginary new buddy, Scott is otherwise tormented by fantasies that include his wife engaged in a three-way with two of the male dental-assistants who work in their practice. It's not going too far to say that this movie is *Eyes Wide Shut* for Real People (or Grown-Ups, at least). Along those lines, Campbell Scott and Hope Davis are certainly recognizable human beings as compared to the glamourpuss pair of Cruise and Kidman. Further, the script for *Secret Lives* is clearly more relevant than Kubrick's. As proof, I offer the depiction of the dentists' children, particularly the youngest one who is about 3 or 4 years old, and whose main utterance is "Dad! Dad! Dad! Dad! Dad! DAD!!!" This is Family Life, all right, with all its charms.<br /><br />The movie would make an interesting double-bill with *Kramer vs. Kramer*, as well. One can easily trace the Feminization of the American Male from 1979 to 2003. In this movie, Dad is the housewife as in *Kramer*, but he is in no way flustered by the domestic role, unlike Dustin Hoffman, who was too manly to make toast. Here, Scott gets all the plumb chores, such as wiping up the children's vomit, cooking, cleaning, taking the kids to whatever inane after-school activity is on the docket. And all without complaint. (And without directorial commentary. It's just taken for granted.)<br /><br />The film has virtues, mostly having to do with verisimilitude. However, it's dragged down from greatness by its insistence on trendy distractions, which culminate in a long scene where a horrible five-day stomach flu makes the rounds in the household. We must endure pointless fantasy sequences, initiated by the imaginary ringleader Leary. Whose existence, by the way, is finally reminiscent of the Brad Pitt character in *Fight Club*. And this finally drives home the film's other big flaw: lack of originality. In this review, I realize it's been far too easy to reference many other films. Granted, this film is an improvement on most of them, but still. *The Secret Lives of Dentists* is worth seeing, but don't get too excited about it. (Not that you were all that excited, anyway. I guess.) negative
We can observe the nature of the dataset. Each example is a movie or series review, and the label is either positive or negative. The reviews are quite long and contain a lot of information. This task is easy for us, but it is not so easy for a machine. We will see how we can train a model to perform this task.
Now, we need to tokenize the sentences, so that our models can understand them.
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
'''
Function: tokenize_function
Description: Tokenize the text
Input:
- examples, a dictionary with key 'text' and value as the text to be tokenized
- Padding is used to ensure that all sequences in a batch have the same length
by adding padding tokens. Setting padding to 'max_length' ensures that each
sequence is padded to have a length equal to max_length.
- truncation, When a text input is longer than the model can handle (or longer
than the specified maximum length), it needs to be truncated. Setting truncation=True
will ensure that inputs longer than the maximum length are truncated.
- max_length, specifies the maximum length of a sequence. Any input that is longer
than this value will be truncated, and any input shorter than this value will
be padded.
Output: tokenized text
'''
def tokenize_function(examples):
return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=128)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
Loading cached processed dataset at /home/jose/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-2f019c60187261be.arrow Loading cached processed dataset at /home/jose/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-271ea7d24566f69c.arrow Loading cached processed dataset at /home/jose/.cache/huggingface/datasets/imdb/plain_text/1.0.0/d613c88cf8fa3bab83b4ded3713f1f74830d1100e171db75bbddb80b3345c9c0/cache-76898d4e76125c65.arrow
Now, we are going to load a pre-trained model. We will use the BERT model, which is a transformer model that was pre-trained on a large corpus of text.
Of course, there are alternatives, such as the RoBERTa model, which is a variant of BERT that was trained on a larger corpus of text or the DistilBERT model, which is a smaller and faster version of BERT.
from transformers import BertForSequenceClassification
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
2023-06-27 12:13:45.777585: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-06-27 12:13:46.550359: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias'] - This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
from sklearn.metrics import classification_report
def evaluate_model(model, dataset, tokenizer, label_names):
model.eval()
true_labels = []
predictions = []
for batch in dataset:
inputs = tokenizer(batch['text'], return_tensors="pt", padding=True, truncation=True, max_length=128)
outputs = model(**inputs)
batch_predictions = outputs.logits.argmax(dim=1).tolist()
true_labels.append(batch['label'])
predictions.extend(batch_predictions)
print(classification_report(true_labels, predictions, target_names=label_names))
# Evaluate the vanilla model
print("\nPerformance of the Vanilla Model:")
evaluate_model(model, tokenized_datasets['test'], tokenizer, ['negative', 'positive'])
Performance of the Vanilla Model: precision recall f1-score support negative 0.54 0.04 0.07 12500 positive 0.50 0.97 0.66 12500 accuracy 0.50 25000 macro avg 0.52 0.50 0.37 25000 weighted avg 0.52 0.50 0.37 25000
As we can see, the recall for the negative class is just 0.04, which means that the model is not able to detect negative reviews.
As for the accuracy, we obtain a poor 0.5, so the model is not able to predict the sentiment of a review and we cannot rely on it for this task.
Let's see if we can improve these results by fine-tuning the model.
from transformers import TrainingArguments, Trainer
# Training arguments for the vanilla model
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
)
# Train the vanilla model (this will take a while)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test'],
tokenizer=tokenizer,
)
trainer.train()
/home/jose/.local/lib/python3.10/site-packages/transformers/optimization.py:411: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning warnings.warn( 5%|▌ | 500/9375 [17:34<5:13:51, 2.12s/it]
{'loss': 0.5194, 'learning_rate': 5e-05, 'epoch': 0.16}
11%|█ | 1000/9375 [35:24<4:49:17, 2.07s/it]
{'loss': 0.4712, 'learning_rate': 4.71830985915493e-05, 'epoch': 0.32}
16%|█▌ | 1500/9375 [52:38<4:30:12, 2.06s/it]
{'loss': 0.4143, 'learning_rate': 4.436619718309859e-05, 'epoch': 0.48}
21%|██▏ | 2000/9375 [1:09:40<4:15:00, 2.07s/it]
{'loss': 0.3892, 'learning_rate': 4.154929577464789e-05, 'epoch': 0.64}
27%|██▋ | 2500/9375 [1:26:44<3:52:57, 2.03s/it]
{'loss': 0.366, 'learning_rate': 3.8732394366197184e-05, 'epoch': 0.8}
32%|███▏ | 3000/9375 [1:43:42<3:34:07, 2.02s/it]
{'loss': 0.3652, 'learning_rate': 3.5915492957746486e-05, 'epoch': 0.96}
37%|███▋ | 3500/9375 [2:00:52<3:23:14, 2.08s/it]
{'loss': 0.3024, 'learning_rate': 3.3098591549295775e-05, 'epoch': 1.12}
43%|████▎ | 4000/9375 [2:18:09<3:06:48, 2.09s/it]
{'loss': 0.2816, 'learning_rate': 3.028169014084507e-05, 'epoch': 1.28}
48%|████▊ | 4500/9375 [2:35:34<2:53:49, 2.14s/it]
{'loss': 0.2937, 'learning_rate': 2.746478873239437e-05, 'epoch': 1.44}
53%|█████▎ | 5000/9375 [2:52:36<2:30:23, 2.06s/it]
{'loss': 0.2793, 'learning_rate': 2.4647887323943664e-05, 'epoch': 1.6}
59%|█████▊ | 5500/9375 [3:09:54<2:16:06, 2.11s/it]
{'loss': 0.2936, 'learning_rate': 2.1830985915492956e-05, 'epoch': 1.76}
64%|██████▍ | 6000/9375 [3:27:01<1:55:10, 2.05s/it]
{'loss': 0.2666, 'learning_rate': 1.9014084507042255e-05, 'epoch': 1.92}
69%|██████▉ | 6500/9375 [3:44:11<1:39:51, 2.08s/it]
{'loss': 0.2068, 'learning_rate': 1.619718309859155e-05, 'epoch': 2.08}
75%|███████▍ | 7000/9375 [4:01:30<1:21:01, 2.05s/it]
{'loss': 0.129, 'learning_rate': 1.3380281690140845e-05, 'epoch': 2.24}
80%|████████ | 7500/9375 [4:18:41<1:04:45, 2.07s/it]
{'loss': 0.1335, 'learning_rate': 1.056338028169014e-05, 'epoch': 2.4}
85%|████████▌ | 8000/9375 [4:36:03<47:29, 2.07s/it]
{'loss': 0.1369, 'learning_rate': 7.746478873239436e-06, 'epoch': 2.56}
91%|█████████ | 8500/9375 [4:53:22<29:53, 2.05s/it]
{'loss': 0.1447, 'learning_rate': 4.929577464788732e-06, 'epoch': 2.72}
96%|█████████▌| 9000/9375 [5:10:50<12:56, 2.07s/it]
{'loss': 0.1279, 'learning_rate': 2.112676056338028e-06, 'epoch': 2.88}
100%|██████████| 9375/9375 [5:24:40<00:00, 2.08s/it]
{'train_runtime': 19480.7727, 'train_samples_per_second': 3.85, 'train_steps_per_second': 0.481, 'train_loss': 0.27903265055338544, 'epoch': 3.0}
TrainOutput(global_step=9375, training_loss=0.27903265055338544, metrics={'train_runtime': 19480.7727, 'train_samples_per_second': 3.85, 'train_steps_per_second': 0.481, 'train_loss': 0.27903265055338544, 'epoch': 3.0})
We can save the fine-tuned model to disk, so that we can load it later and use it to make predictions. It is not strictly necessary, but the training takes very long and if we want to use the model later, we can just load it from disk.
model.save_pretrained('./fine-tuned_model')
tokenizer.save_pretrained('./tokenizer')
('./tokenizer/tokenizer_config.json', './tokenizer/special_tokens_map.json', './tokenizer/vocab.txt', './tokenizer/added_tokens.json')
Finally, we can now evaluate the fine-tuned model on the test set and compare the results with the vanilla model.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
# Load the model and tokenizer
loaded_model = AutoModelForSequenceClassification.from_pretrained('./fine-tuned_model')
loaded_tokenizer = AutoTokenizer.from_pretrained('./tokenizer')
# Evaluate the fine-tuned model
print("\nPerformance of the Fine-tuned Model:")
evaluate_model(loaded_model, tokenized_datasets['test'], loaded_tokenizer, ['negative', 'positive'])
Performance of the Fine-tuned Model: precision recall f1-score support negative 0.88 0.89 0.88 12500 positive 0.89 0.88 0.88 12500 accuracy 0.88 25000 macro avg 0.88 0.88 0.88 25000 weighted avg 0.88 0.88 0.88 25000
Observe how the precision for the negative class has increased from 0.54 to 0.88, and the recall from 0.04 to 0.89. This means that the fine-tuned model is much better at detecting negative reviews. The f1-score for the negative class has also increased from 0.07 to 0.88.
As for the positive class, the precision has gone from 0.50 to 0.89 and the recall from 0.97 to 0.88. The f1-score has also increased from 0.66 to 0.88. This means that the fine-tuned model is also better at detecting positive reviews, although the improvement is not as big as for the negative class.
Also, the overall accuracy has increased from 0.5 to 0.88, which means that the fine-tuned model is much better at predicting the sentiment of a movie review.
In this notebook, we have shown how to use a pre-trained model to predict the sentiment of a sentence. We have also shown how to fine-tune the model on a dataset and how to evaluate the fine-tuned model on a test set.
We have observed that the fine-tuned model is much better at predicting the sentiment of a movie review than the vanilla model. As we explained in our paper, this approach is very useful, since generic models that take a long time to train can be fine-tuned on a specific task and achieve state-of-the-art results, reducing the time and resources required to train a model from scratch.