Example: Classifying Movie Reviews#
This example is taken from Section 4.1 of the book “Deep Learning with Python” by François Chollet.
The IMDB dataset contains a set of 50,000 highly polarized reviews from the Internet Movie Database. They are split into 25,000 reviews for training and 25,000 reviews for testing, each set consisting of 50% negative and 50% positive reviews.
We will train a model to classify movie reviews as positive or negative, based on the text content of the reviews.
Step 1: Load the data#
from tensorflow.keras.datasets import imdb
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)
2025-05-08 15:02:33.257891: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-08 15:02:33.261124: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-05-08 15:02:33.269656: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1746716553.283900 43378 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1746716553.288101 43378 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1746716553.299354 43378 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1746716553.299367 43378 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1746716553.299369 43378 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1746716553.299370 43378 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-05-08 15:02:33.303335: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz
0/17464789 ━━━━━━━━━━━━━━━━━━━━ 0s 0s/step
3915776/17464789 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
12541952/17464789 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
17464789/17464789 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
train_data and test_data: lists of reviews, each review being a list of word indices (encoding a sequence of words).
train_labels and test_labels: lists of 0s and 1s, where 0 stands for “negative” and 1 stands for “positive”.
review_idx = 100
print(train_data[review_idx])
print(train_labels[review_idx])
[1, 13, 244, 6, 87, 337, 7, 628, 2219, 5, 28, 285, 15, 240, 93, 23, 288, 549, 18, 1455, 673, 4, 241, 534, 3635, 8448, 20, 38, 54, 13, 258, 46, 44, 14, 13, 1241, 7258, 12, 5, 5, 51, 9, 14, 45, 6, 762, 7, 2, 1309, 328, 5, 428, 2473, 15, 26, 1292, 5, 3939, 6728, 5, 1960, 279, 13, 92, 124, 803, 52, 21, 279, 14, 9, 43, 6, 762, 7, 595, 15, 16, 2, 23, 4, 1071, 467, 4, 403, 7, 628, 2219, 8, 97, 6, 171, 3596, 99, 387, 72, 97, 12, 788, 15, 13, 161, 459, 44, 4, 3939, 1101, 173, 21, 69, 8, 401, 2, 4, 481, 88, 61, 4731, 238, 28, 32, 11, 32, 14, 9, 6, 545, 1332, 766, 5, 203, 73, 28, 43, 77, 317, 11, 4, 2, 953, 270, 17, 6, 3616, 13, 545, 386, 25, 92, 1142, 129, 278, 23, 14, 241, 46, 7, 158]
0
word_index = imdb.get_word_index()
print(list(word_index.items())[0:10])
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb_word_index.json
0/1641221 ━━━━━━━━━━━━━━━━━━━━ 0s 0s/step
1641221/1641221 ━━━━━━━━━━━━━━━━━━━━ 0s 0us/step
[('fawn', 34701), ('tsukino', 52006), ('nunnery', 52007), ('sonja', 16816), ('vani', 63951), ('woods', 1408), ('spiders', 16115), ('hanging', 2345), ('woody', 2289), ('trawling', 52008)]
The imbdb object has a method get_word_index that returns a dictionary mapping words to an integer index.
review_idx = 200
word_index = imdb.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
# Note that the indices are offset by 3 because 0, 1, and 2 are
# reserved indices for “padding,” “start of sequence,” and “unknown.”
decoded_review = " ".join([reverse_word_index.get(i-3, "?")
for i in train_data[review_idx]])
print(f"Review index: {review_idx}")
print(f"Review data:\n{train_data[review_idx]}")
print(f"Review:\n{decoded_review}")
print(f"Label: {train_labels[review_idx]}")
Review index: 200
Review data:
[1, 14, 9, 6, 227, 196, 241, 634, 891, 234, 21, 12, 69, 6, 6, 176, 7, 4, 804, 4658, 2999, 667, 11, 12, 11, 85, 715, 6, 176, 7, 1565, 8, 1108, 10, 10, 12, 16, 1844, 2, 33, 211, 21, 69, 49, 2009, 905, 388, 99, 2, 125, 34, 6, 2, 1274, 33, 4, 130, 7, 4, 22, 15, 16, 6424, 8, 650, 1069, 14, 22, 9, 44, 4609, 153, 154, 4, 318, 302, 1051, 23, 14, 22, 122, 6, 2093, 292, 10, 10, 723, 8721, 5, 2, 9728, 71, 1344, 1576, 156, 11, 68, 251, 5, 36, 92, 4363, 133, 199, 743, 976, 354, 4, 64, 439, 9, 3059, 17, 32, 4, 2, 26, 256, 34, 2, 5, 49, 7, 98, 40, 2345, 9844, 43, 92, 168, 147, 474, 40, 8, 67, 6, 796, 97, 7, 14, 20, 19, 32, 2188, 156, 24, 18, 6090, 1007, 21, 8, 331, 97, 4, 65, 168, 5, 481, 53, 3084]
Review:
? this is a bit long 2 hours 20 minutes but it had a a lot of the famous pearl buck novel in it in other words a lot of ground to cover br br it was soap ? at times but had some visually dramatic moments too ? off by a ? attack at the end of the film that was astounding to view considering this film is about 70 years old the special effects crew on this film did a spectacular job br br paul muni and ? rainer were award winning actors in their day and they don't disappoint here both giving powerful performances the only problem is credibility as all the ? are played by ? and some of them like walter connolly just don't look real i'd like to see a re make of this movie with all asian actors not for pc reasons but to simply make the story look and sound more credible
Label: 1
# the words indices range from 1 to 9999
print(min(min(seq) for seq in train_data))
print(max(max(seq) for seq in train_data))
print(reverse_word_index[1])
print(word_index["the"])
print(max([len(seq) for seq in train_data]))
1
9999
the
1
2494
Step 2: Preprocess the data#
In this step, we will convert the lists of integers into tensors that our neural network can process.
We will implement multi-hot-encoding - a binary representation commonly used in NLP - to transform our lists into vectors of 0s and 1s. Each resulting tensor will be a 10,000-element vector where:
1 indicates the word appears in the review
0 indicates the word is absent
This representation creates a standardized format that our model can efficiently process while preserving the essential information about word presence in each review.
x = ['a', 'b', 'c']
for idx, val in enumerate(x):
print(idx, x[idx])
0 a
1 b
2 c
import numpy as np
def vectorize_sequences(sequences, dimension=10000):
results = np.zeros((len(sequences), dimension))
for i, sequence in enumerate(sequences):
for j in sequence:
results[i, j] = 1.
return results
x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype("float32")
y_test = np.asarray(test_labels).astype("float32")
# check that the vectorized data is correct
for i in train_data[0]:
if x_train[0, i] != 1.0:
print(f"i={i} x_train[0, {i}]={x_train[0, i]}")
# test your understanding: why are these values not the same?
print(sum(x_train[0]))
print(len(train_data[0]))
120.0
218
Step 3: Build the model#
from tensorflow import keras
from tensorflow.keras.layers import Dense
model = keras.Sequential([
Dense(16, activation="relu"),
Dense(16, activation="relu"),
Dense(1, activation="sigmoid")
])
model.compile(optimizer="a",
loss="binary_crossentropy",
metrics=["accuracy"])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 10
2 from tensorflow.keras.layers import Dense
4 model = keras.Sequential([
5 Dense(16, activation="relu"),
6 Dense(16, activation="relu"),
7 Dense(1, activation="sigmoid")
8 ])
---> 10 model.compile(optimizer="a",
11 loss="binary_crossentropy",
12 metrics=["accuracy"])
File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:122, in filter_traceback.<locals>.error_handler(*args, **kwargs)
119 filtered_tb = _process_traceback_frames(e.__traceback__)
120 # To get the full stack trace, call:
121 # `keras.config.disable_traceback_filtering()`
--> 122 raise e.with_traceback(filtered_tb) from None
123 finally:
124 del filtered_tb
File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/keras/src/optimizers/__init__.py:97, in get(identifier)
95 if isinstance(obj, Optimizer):
96 return obj
---> 97 raise ValueError(f"Could not interpret optimizer identifier: {identifier}")
ValueError: Could not interpret optimizer identifier: a
Step 4: Train the model#
To train the model, we will use the rmsprop optimizer and the binary_crossentropy loss function. We will also monitor accuracy during training. In addition, we will create a validation set by setting apart 10,000 samples from the original training data.
Here is an outline of the training loop:
Given the model, learning_rate, batch_size, epochs, train_data, train_labels, test_data, and test_labels
Initialize the optimizer with learning_rate
For each epoch from 1 to epochs
Initialize an empty list of training losses
For each batch of training data of size batch_size
Extract the inputs and targets
Do a forward pass through the model
Calculate the loss
Backward pass by updating the weights
Append the loss to the list of training losses
Initialize an empty list of validation losses
For each batch of validation data
Extract the inputs and targets
Do a forward pass through the model
Calculate the loss
Append the loss to the list of validation losses
Calculate and print the average training loss
Calculate and print the average validation loss
Batch size#
The batch_size parameter is a crucial hyperparameter that determines how many examples are processed together in a single forward/backward pass:
Instead of updating model weights after each individual example (inefficient) or after the entire dataset (memory-intensive), we update after each batch
Each batch contains exactly
batch_sizeexamples (except possibly the last batch, which might be smaller)The model weights are updated once per batch, not once per example
Smaller batch sizes mean more frequent weight updates but with noisier gradients
Larger batch sizes mean fewer weight updates per epoch but with more stable gradients
Validation Process#
For validation, the entire validation set is evaluated, but still processed in batches for memory efficiency:
The validation data is processed in batches of size
batch_size, just like training dataAll validation samples are evaluated and their metrics averaged together
Unlike training, no weight updates occur during validation
The final validation metric represents performance across the entire validation set
Validation typically happens once per epoch, not after every training batch
This approach enables evaluation on large validation sets that might not fit into memory all at once, while still getting a complete measure of model performance on the entire validation dataset.
x_val = x_train[:10000]
partial_x_train = x_train[10000:]
y_val = y_train[:10000]
partial_y_train = y_train[10000:]
We will train the model for 20 epochs in mini-batches of 512 samples.
history = model.fit(partial_x_train,
partial_y_train,
epochs=20,
batch_size=512,
validation_data=(x_val, y_val),
verbose=0)
Epoch 1/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 1s 15ms/step - accuracy: 0.6869 - loss: 0.5969 - val_accuracy: 0.8643 - val_loss: 0.3931
Epoch 2/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.8926 - loss: 0.3361 - val_accuracy: 0.8864 - val_loss: 0.3090
Epoch 3/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9213 - loss: 0.2432 - val_accuracy: 0.8867 - val_loss: 0.2839
Epoch 4/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9356 - loss: 0.1948 - val_accuracy: 0.8899 - val_loss: 0.2743
Epoch 5/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9522 - loss: 0.1552 - val_accuracy: 0.8867 - val_loss: 0.2814
Epoch 6/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9607 - loss: 0.1301 - val_accuracy: 0.8863 - val_loss: 0.2887
Epoch 7/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9694 - loss: 0.1076 - val_accuracy: 0.8806 - val_loss: 0.3029
Epoch 8/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9770 - loss: 0.0896 - val_accuracy: 0.8820 - val_loss: 0.3161
Epoch 9/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9828 - loss: 0.0755 - val_accuracy: 0.8822 - val_loss: 0.3304
Epoch 10/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9838 - loss: 0.0638 - val_accuracy: 0.8717 - val_loss: 0.3741
Epoch 11/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9862 - loss: 0.0593 - val_accuracy: 0.8727 - val_loss: 0.3833
Epoch 12/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 6ms/step - accuracy: 0.9922 - loss: 0.0435 - val_accuracy: 0.8773 - val_loss: 0.3902
Epoch 13/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9940 - loss: 0.0359 - val_accuracy: 0.8760 - val_loss: 0.4147
Epoch 14/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9951 - loss: 0.0307 - val_accuracy: 0.8780 - val_loss: 0.4320
Epoch 15/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9960 - loss: 0.0246 - val_accuracy: 0.8751 - val_loss: 0.4561
Epoch 16/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9969 - loss: 0.0228 - val_accuracy: 0.8724 - val_loss: 0.4916
Epoch 17/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9976 - loss: 0.0181 - val_accuracy: 0.8739 - val_loss: 0.4990
Epoch 18/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9993 - loss: 0.0122 - val_accuracy: 0.8725 - val_loss: 0.5202
Epoch 19/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 5ms/step - accuracy: 0.9995 - loss: 0.0105 - val_accuracy: 0.8736 - val_loss: 0.5407
Epoch 20/20
30/30 ━━━━━━━━━━━━━━━━━━━━ 0s 8ms/step - accuracy: 0.9997 - loss: 0.0082 - val_accuracy: 0.8539 - val_loss: 0.6420
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style("whitegrid")
history_dict = history.history
loss_values = history_dict["loss"]
val_loss_values = history_dict["val_loss"]
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, loss_values, "bo", label="Training loss")
plt.plot(epochs, val_loss_values, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.xticks(epochs)
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
sns.despine()
plt.grid(False)
plt.show()
acc = history_dict["accuracy"]
val_acc = history_dict["val_accuracy"]
epochs = range(1, len(loss_values) + 1)
plt.plot(epochs, acc, "bo", label="Training acc")
plt.plot(epochs, val_acc, "b", label="Validation acc")
plt.title("Training and validation accuracy")
plt.xticks(epochs)
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
sns.despine()
plt.grid(False)
plt.show()
This shows that the model quickly starts overfitting the training data. Overfitting occurs when a model learns the training data too well, capturing not just the underlying patterns but also the random noise and peculiarities specific to the training set. We can identify overfitting by observing a characteristic divergence between training and validation metrics: while training loss continues to decrease, validation loss begins to increase or plateau.
Several indicators of overfitting in this case include:
Decreasing training loss alongside increasing validation loss
Growing gap between training and validation accuracy
This behavior suggests the model is becoming too specialized to the training examples rather than learning generalizable patterns. There are several strategies to mitigate overfitting, which we will explore later in the course. In this case, we will stop training after 4 epochs to prevent overfitting.
model = keras.Sequential([
Dense(16, activation="relu"),
Dense(16, activation="relu"),
Dense(1, activation="sigmoid")
])
model.compile(
optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
model.fit(x_train, y_train, epochs=4, batch_size=512, verbose=0)
results = model.evaluate(x_test, y_test, verbose=0)
print(f"The test loss is {results[0]}")
print(f"The test accuracy is {results[1]}")
print("The predictions are:")
print(model.predict(x_test))
Epoch 1/4
49/49 ━━━━━━━━━━━━━━━━━━━━ 0s 4ms/step - accuracy: 0.7283 - loss: 0.5583
Epoch 2/4
49/49 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9013 - loss: 0.2886
Epoch 3/4
49/49 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9252 - loss: 0.2123
Epoch 4/4
49/49 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step - accuracy: 0.9375 - loss: 0.1774
782/782 ━━━━━━━━━━━━━━━━━━━━ 0s 467us/step - accuracy: 0.8821 - loss: 0.2909
The test loss is 0.29100528359413147
The test accuracy is 0.8838000297546387
The predictions are:
782/782 ━━━━━━━━━━━━━━━━━━━━ 0s 381us/step
[[0.17098129]
[0.99931735]
[0.5988198 ]
...
[0.0967738 ]
[0.05774209]
[0.46419108]]
Some things to try:
Try using different optimizers:
adamorsgdTry using one or three representation layers, and see how doing so affects validation and test accuracy.
Try using layers with more units or fewer units: 32 units, 64 units, and so on.
Try using the
mseloss function instead ofbinary_crossentropy.Try using the
tanhactivation instead ofrelu.
model = keras.Sequential([
Dense(16, activation="relu"),
Dense(16, activation="relu"),
Dense(1, activation="sigmoid")
])
model.compile(
optimizer="rmsprop",
loss="binary_crossentropy",
metrics=["accuracy"])
model.fit(x_train, y_train, epochs=4, batch_size=5, verbose=0)
results = model.evaluate(x_test, y_test)
print(f"The test loss is {results[0]}")
print(f"The test accuracy is {results[1]}")
print("The predictions are:")
print(model.predict(x_test))
Epoch 1/4
5000/5000 ━━━━━━━━━━━━━━━━━━━━ 5s 860us/step - accuracy: 0.8367 - loss: 0.3737
Epoch 2/4
5000/5000 ━━━━━━━━━━━━━━━━━━━━ 4s 861us/step - accuracy: 0.9137 - loss: 0.2294
Epoch 3/4
5000/5000 ━━━━━━━━━━━━━━━━━━━━ 4s 878us/step - accuracy: 0.9236 - loss: 0.2105
Epoch 4/4
5000/5000 ━━━━━━━━━━━━━━━━━━━━ 4s 863us/step - accuracy: 0.9327 - loss: 0.1933
782/782 ━━━━━━━━━━━━━━━━━━━━ 0s 453us/step - accuracy: 0.8801 - loss: 0.3087
The test loss is 0.30427753925323486
The test accuracy is 0.8822399973869324
The predictions are:
782/782 ━━━━━━━━━━━━━━━━━━━━ 0s 333us/step
[[0.15679161]
[0.99998003]
[0.9494275 ]
...
[0.15605684]
[0.09611651]
[0.56805176]]