Handwritten Digit Recognition using Convolutional Neural Network (CNN) with Tensorflow
A Deep Learning Analysis with Real World Data
Table of Contents
2 | Import the dependencies and load the dataset
- Dimension of train and test data
- Visualizing the data using TSNE
- Splitting data into training and validation dataset
- Dimension of training and validation data
- Converting training, testing, and validation data into an array
- Dimension of training, testing, and validation data after reshaping
- Visualize how the digits were written
- Reshaping train, test, and validation data
- Normalize train, test, and validation data
5 | Build the CNN model to Classify Handwritten Digits
- Summary of the training model
- Visualization of the model
- Compile the model using keras.optimizers.Adam
- Train the model
- Loss plot curve for training and validation dataset
- Accuracy plot curve for training and validation dataset
- Evaluation of the model accuracy
— Performance of training dataset
— Performance of validation dataset
— Save and load the model
— Visualise validation predicted data on how the digits were written
— Confusion matrix of validation dataset
So, let’s get started 🧑👈🙏💪
1 | Abstract
Introduction
Handwritten Digit Recognition is the process of digitizing human handwritten digit images. It is a difficult task for the machine because handwritten digits are not perfect and can be made with a variety of flavors. In order to address this issue, we created HDR, which uses the image of a digit to identify the digit that is present in the image.
In this project, we developed a Convolutional Neural Network (CNN)
model using the Tensorflow framework to Recognition of Handwritten Digit.
A convolutional neural network (CNN, or ConvNet) is a Deep Learning algorithm that can take in an input image, assign learnable weights and biases to various objects in the image and be able to distinguish one from the other.
It is used to analyze visual imagery. Object detection
, face recognition
, robotics
, video analysis
, segmentation
, pattern recognition
, natural language processing
, spam detection
, topic categorization
, regression analysis
, speech recognition
, image classification
are some of the examples that can be done using Convolutional Neural Networking.
Approach
We have used Sequential Keras model which has two pairs of Convolution2D
and MaxPooling2D
layers. The MaxPooling layer acts as a sort of downsampling using max values in a region instead of averaging. After that, we will use Flatten layer to convert multidimensional parameters to vectors.
The last layer has a Dense layer with 10 Softmax outputs. The output represents the network guess. The 0-th output represents a probability that the input digit is 0, the 1-st output represents a probability that the input digit is 1 and so on…
Result
CNN performed well, providing validation accuracy and loss score of 98.9%%
and 4.5%
respectively.
Conclusion
Convolutional neural network (CNN, or ConvNet) can be used to predict Handwritten Digits reasonably. We have successfully developed Handwritten digit recognition with Python, Tensorflow, and Machine Learning libraries. Handwritten Digits have been recognized by more than 98.9%
validation accuracy.
2 | Import the dependencies and load the dataset
We will import all of the modules that we will require to train our model.
Data Source: Kaggle
import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sn
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import math
import datetime
import platform# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directoryimport os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))# Load dataset
train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')
3 | Data Overview
The MNIST dataset contains 42,000
training images of handwritten digits from zero to nine (10 different classes) and 28,000
images for testing without the label identifier (for submission). These images are the grayscaled pixel value and are represented as a 28×28
matrix.
3.1 | Dimension of train and test data
print('train:', train.shape)
print('test:', test.shape)
X = train.iloc[:, 1:785]
y = train.iloc[:, 0]
X_test = test.iloc[:, 0:784]
3.2 Visualizing the data using TSNE
TSNE — t-Distributed Stochastic Neighbor embedding. This is a dimensionality reduction algorithm that is designed to keep local structure in the high dimensional data set, but cares less about global structure. Here, we use it to go from the 784 pixel-dimension of the images to two dimensions. This makes plotting easier. The color scale is the original MNIST label and one can see that the separation of the labels is apparent.
# WARNING: running t-SNE on the full data set takes a while.
X_tsn = X/255
from sklearn.manifold import TSNE
tsne = TSNE()
tsne_res = tsne.fit_transform(X_tsn)
plt.figure(figsize=(14, 12))
plt.scatter(tsne_res[:,0], tsne_res[:,1], c=y, s=2)
plt.xticks([])
plt.yticks([])
plt.colorbar()
3.3 | Splitting data into training and validation dataset
We are dividing our dataset (X) into two parts.
- The training dataset (80%) is used to fit our models
- The Validation dataset (20%) is used to evaluate our models
train_test_split()
the method returns us the training data, its labels, and also the validation data and its labels.
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size = 0.2,random_state = 1212)
3.4 | Dimension of training and validation data
print('X_train:', X_train.shape)
print('y_train:', y_train.shape)
print('X_validation:', X_validation.shape)
print('y_validation:', y_validation.shape)
3.5 | Converting training, testing, and validation data into an array
x_train_re = X_train.to_numpy().reshape(33600, 28, 28)
y_train_re = y_train.values
x_validation_re = X_validation.to_numpy().reshape(8400, 28, 28)
y_validation_re = y_validation.values
x_test_re = test.to_numpy().reshape(28000, 28, 28)
3.6 | Dimension of training, testing, and validation data after reshaping
print('x_train:', x_train_re.shape)
print('y_train:', y_train_re.shape)
print('x_validation:', x_validation_re.shape)
print('y_validation:', y_validation_re.shape)
print('x_test:', x_test_re.shape)# Save image parameters to the constants that we will use later for data re-shaping and for model traning.
(_, IMAGE_WIDTH, IMAGE_HEIGHT) = x_train_re.shape
IMAGE_CHANNELS = 1
print('IMAGE_WIDTH:', IMAGE_WIDTH);
print('IMAGE_HEIGHT:', IMAGE_HEIGHT);
print('IMAGE_CHANNELS:', IMAGE_CHANNELS);
4 | Explore the data
Here is how each image in the dataset looks like. It is a 28x28
matrix of integers (from 0 to 255) and each integer represents a color of a pixel.
pd.DataFrame(x_train_re[0])
4.1 | Visualise how the digits were written
plt.imshow(x_train_re[0], cmap=plt.cm.binary)
plt.show()# Let's print some more training examples to get the feeling of how the digits were written.numbers_to_display = 25
num_cells = math.ceil(math.sqrt(numbers_to_display))
plt.figure(figsize=(10,10))
for i in range(numbers_to_display):
plt.subplot(num_cells, num_cells, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train_re[i], cmap=plt.cm.binary)
plt.xlabel(y_train_re[i])
plt.show()
4.2 | Reshaping train, test, and validation data
In order to use convolution layers we need to reshape our data and add a color channel to it. As you’ve noticed currently every digit has a shape of (28, 28) which means that it is a 28x28 matrix of color values form 0 to 255. We need to reshape it to (28, 28, 1) shape so that each pixel potentially may have multiple channels (like Red, Green and Blue).
x_train_with_chanels = x_train_re.reshape(
x_train_re.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)x_validation_with_chanels = x_validation_re.reshape(
x_validation_re.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)x_test_with_chanels = x_test_re.reshape(
x_test_re.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)print('x_train_with_chanels:', x_train_with_chanels.shape)
print('x_validation_with_chanels:', x_validation_with_chanels.shape)
print('x_test_with_chanels:', x_test_with_chanels.shape)
4.3 | Normalize train, test, and validation data
x_train_normalized = x_train_with_chanels / 255
x_validation_normalized = x_validation_with_chanels / 255
x_test_normalized = x_test_with_chanels / 255
5 | Build the CNN model to Classify Handwritten Digits
A Convolutional Neural Network model generally consists of convolutional and pooling layers.
We are using Sequential Keras model which have two pairs of Convolution2D and MaxPooling2D layers. The MaxPooling layer acts as a sort of downsampling using max values in a region instead of averaging.
After that we will use Flatten layer to convert multidimensional parameters to vector.
The last layer will be a Dense layer with 10 Softmax outputs. The output represents the network guess. The 0-th output represents a probability that the input digit is 0, the 1-st output represents a probability that the input digit is 1 and so on..
model = tf.keras.models.Sequential()model.add(tf.keras.layers.Convolution2D(
input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS),
kernel_size=5,
filters=8,
strides=1,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))model.add(tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=(2, 2)
))model.add(tf.keras.layers.Convolution2D(
kernel_size=5,
filters=16,
strides=1,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))model.add(tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=(2, 2)
))model.add(tf.keras.layers.Flatten())model.add(tf.keras.layers.Dense(
units=128,
activation=tf.keras.activations.relu
));model.add(tf.keras.layers.Dropout(0.2))model.add(tf.keras.layers.Dense(
units=10,
activation=tf.keras.activations.softmax,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))
5.1 | Summary of the training model
Here is our model summary so far.
model.summary()
5.2 | Visualization of the model
In order to plot the model, the graphviz should be installed. A model summary that describes the various layers defined in the model.
tf.keras.utils.plot_model(
model,
show_shapes=True,
show_layer_names=True,
)
5.3 | Compile the model using keras.optimizers.Adam
adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)model.compile(
optimizer=adam_optimizer,
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy']
)
5.4 | Train the model
log_dir=".logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)training_history = model.fit(
x_train_normalized,
y_train_re,
epochs=10,
validation_data=(x_validation_normalized, y_validation_re),
callbacks=[tensorboard_callback]
)print("The model has successfully trained")
6 | Model evaluation
6.1 | Loss plot curve for training and validation
Let’s see how the loss function was changing during the training. We expect it to get smaller and smaller with every next epoch.
plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.plot(training_history.history['loss'], label='training set')
plt.plot(training_history.history['val_loss'], label='validation set')
plt.legend()
6.2 | Accuracy plot curve for training and validation
plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.plot(training_history.history['accuracy'], label='training set')
plt.plot(training_history.history['val_accuracy'], label='validation set')
plt.legend()
6.3 | Evaluation of the model accuracy
We need to compare the accuracy of our model on training set and on valiation set. We expect our model to perform similarly on both sets. If the performance on a validation set will be poor comparing to a training set it would be an indicator for us that the model is overfitted and we have a “high variance” issue.
6.3.1 | Performance of training dataset
%%capture
train_loss, train_accuracy = model.evaluate(x_train_normalized, y_train_re)
print('Train loss: ', train_loss)
print('Train accuracy: ', train_accuracy)
6.3.2 | Performance of validation dataset
%%capture
validation_loss, validation_accuracy = model.evaluate(x_validation_normalized, y_validation_re)
print('Validation loss: ', validation_loss)
print('Validation accuracy: ', validation_accuracy)
With a validation score of close to 99%, we proceed to use this model to predict for the test set.
6.3.3 | Save and load the model
We will save the entire model to a HDF5 file. The .h5 extension of the file indicates that the model shuold be saved in Keras format as HDF5 file.
model_name = 'digits_recognition_cnn.h5'
model.save(model_name, save_format='h5')
loaded_model = tf.keras.models.load_model(model_name)
6.3.4 | Visualise validation predicted data on how the digits were written
To use the model that we’ve just trained for digits recognition we need to call predict() method.
predictions_one_hot = loaded_model.predict([x_validation_normalized])# Let's extract predictions with highest probabilites and detect what digits have been actually recognized.
predictions = np.argmax(predictions_one_hot, axis=1)
pd.DataFrame(predictions)# Show the predicted image
plt.imshow(x_validation_normalized[0].reshape((IMAGE_WIDTH, IMAGE_HEIGHT)), cmap=plt.cm.binary)
plt.show()
We see that our model made a correct prediction and it successfully recognized digit 1. Let’s print some more test examples and correspondent predictions to see how model performs and where it does mistakes.
numbers_to_display = 196
num_cells = math.ceil(math.sqrt(numbers_to_display))
plt.figure(figsize=(15, 15))for plot_index in range(numbers_to_display):
predicted_label = predictions[plot_index]
plt.xticks([])
plt.yticks([])
plt.grid(False)
color_map = ‘Greens’ if predicted_label == y_validation_re[plot_index] else ‘Reds’
plt.subplot(num_cells, num_cells, plot_index + 1)
plt.imshow(x_validation_normalized[plot_index].reshape((IMAGE_WIDTH, IMAGE_HEIGHT)), cmap=color_map)
plt.xlabel(predicted_label)plt.subplots_adjust(hspace=1, wspace=0.5)
plt.show()
6.3.5 | Confusion matrix of a validation dataset
confusion_matrix = tf.math.confusion_matrix(y_validation_re, predictions)
f, ax = plt.subplots(figsize=(9, 7))
sn.heatmap(
confusion_matrix,
annot=True,
linewidths=.5,
fmt="d",
square=True,
ax=ax
)
plt.show()
7 | Submission
test_pred = pd.DataFrame( loaded_model.predict([x_test_normalized]))
test_pred = pd.DataFrame(test_pred.idxmax(axis = 1))
test_pred.index.name = ‘ImageId’
test_pred = test_pred.rename(columns = {0: ‘Label’}).reset_index()
test_pred[‘ImageId’] = test_pred[‘ImageId’] + 1test_pred.head()
Many thanks for reading my post!🙏
If you have a moment, I encourage you to see my other kernels below: