Handwritten Digit Recognition using Convolutional Neural Network (CNN) with Tensorflow

Learner CARES
9 min readJul 25, 2022

--

A Deep Learning Analysis with Real World Data

Credit : Photo by author

Table of Contents

1 | Abstract

2 | Import the dependencies and load the dataset

3 | Data overview

  • Dimension of train and test data
  • Visualizing the data using TSNE
  • Splitting data into training and validation dataset
  • Dimension of training and validation data
  • Converting training, testing, and validation data into an array
  • Dimension of training, testing, and validation data after reshaping

4 | Explore the data

  • Visualize how the digits were written
  • Reshaping train, test, and validation data
  • Normalize train, test, and validation data

5 | Build the CNN model to Classify Handwritten Digits

  • Summary of the training model
  • Visualization of the model
  • Compile the model using keras.optimizers.Adam
  • Train the model

6 | Model evaluation

  • Loss plot curve for training and validation dataset
  • Accuracy plot curve for training and validation dataset
  • Evaluation of the model accuracy

— Performance of training dataset

— Performance of validation dataset

— Save and load the model

— Visualise validation predicted data on how the digits were written

— Confusion matrix of validation dataset

7 | Submission

So, let’s get started 🧑👈🙏💪

1 | Abstract

Back to Table of Contents

Introduction

Handwritten Digit Recognition is the process of digitizing human handwritten digit images. It is a difficult task for the machine because handwritten digits are not perfect and can be made with a variety of flavors. In order to address this issue, we created HDR, which uses the image of a digit to identify the digit that is present in the image.

In this project, we developed a Convolutional Neural Network (CNN) model using the Tensorflow framework to Recognition of Handwritten Digit.

A convolutional neural network (CNN, or ConvNet) is a Deep Learning algorithm that can take in an input image, assign learnable weights and biases to various objects in the image and be able to distinguish one from the other.

It is used to analyze visual imagery. Object detection, face recognition, robotics, video analysis, segmentation, pattern recognition, natural language processing, spam detection, topic categorization, regression analysis, speech recognition, image classification are some of the examples that can be done using Convolutional Neural Networking.

Approach

We have used Sequential Keras model which has two pairs of Convolution2D and MaxPooling2D layers. The MaxPooling layer acts as a sort of downsampling using max values in a region instead of averaging. After that, we will use Flatten layer to convert multidimensional parameters to vectors.

The last layer has a Dense layer with 10 Softmax outputs. The output represents the network guess. The 0-th output represents a probability that the input digit is 0, the 1-st output represents a probability that the input digit is 1 and so on…

Result

CNN performed well, providing validation accuracy and loss score of 98.9%% and 4.5% respectively.

Conclusion

Convolutional neural network (CNN, or ConvNet) can be used to predict Handwritten Digits reasonably. We have successfully developed Handwritten digit recognition with Python, Tensorflow, and Machine Learning libraries. Handwritten Digits have been recognized by more than 98.9% validation accuracy.

2 | Import the dependencies and load the dataset

Back to Table of Contents

We will import all of the modules that we will require to train our model.

Data Source: Kaggle

import tensorflow as tf
import matplotlib.pyplot as plt
import seaborn as sn
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import math
import datetime
import platform
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))
# Load dataset
train = pd.read_csv('/kaggle/input/digit-recognizer/train.csv')
test = pd.read_csv('/kaggle/input/digit-recognizer/test.csv')

3 | Data Overview

Back to Table of Contents

The MNIST dataset contains 42,000 training images of handwritten digits from zero to nine (10 different classes) and 28,000 images for testing without the label identifier (for submission). These images are the grayscaled pixel value and are represented as a 28×28 matrix.

3.1 | Dimension of train and test data

print('train:', train.shape)
print('test:', test.shape)
X = train.iloc[:, 1:785]
y = train.iloc[:, 0]
X_test = test.iloc[:, 0:784]

3.2 Visualizing the data using TSNE

TSNE — t-Distributed Stochastic Neighbor embedding. This is a dimensionality reduction algorithm that is designed to keep local structure in the high dimensional data set, but cares less about global structure. Here, we use it to go from the 784 pixel-dimension of the images to two dimensions. This makes plotting easier. The color scale is the original MNIST label and one can see that the separation of the labels is apparent.

# WARNING: running t-SNE on the full data set takes a while.
X_tsn = X/255
from sklearn.manifold import TSNE
tsne = TSNE()
tsne_res = tsne.fit_transform(X_tsn)
plt.figure(figsize=(14, 12))
plt.scatter(tsne_res[:,0], tsne_res[:,1], c=y, s=2)
plt.xticks([])
plt.yticks([])
plt.colorbar()

3.3 | Splitting data into training and validation dataset

We are dividing our dataset (X) into two parts.

  1. The training dataset (80%) is used to fit our models
  2. The Validation dataset (20%) is used to evaluate our models

train_test_split() the method returns us the training data, its labels, and also the validation data and its labels.

from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, test_size = 0.2,random_state = 1212)

3.4 | Dimension of training and validation data

print('X_train:', X_train.shape)
print('y_train:', y_train.shape)
print('X_validation:', X_validation.shape)
print('y_validation:', y_validation.shape)
Credit : Photo by author

3.5 | Converting training, testing, and validation data into an array

x_train_re = X_train.to_numpy().reshape(33600, 28, 28)
y_train_re = y_train.values
x_validation_re = X_validation.to_numpy().reshape(8400, 28, 28)
y_validation_re = y_validation.values
x_test_re = test.to_numpy().reshape(28000, 28, 28)

3.6 | Dimension of training, testing, and validation data after reshaping

print('x_train:', x_train_re.shape)
print('y_train:', y_train_re.shape)
print('x_validation:', x_validation_re.shape)
print('y_validation:', y_validation_re.shape)
print('x_test:', x_test_re.shape)
# Save image parameters to the constants that we will use later for data re-shaping and for model traning.
(_, IMAGE_WIDTH, IMAGE_HEIGHT) = x_train_re.shape
IMAGE_CHANNELS = 1
print('IMAGE_WIDTH:', IMAGE_WIDTH);
print('IMAGE_HEIGHT:', IMAGE_HEIGHT);
print('IMAGE_CHANNELS:', IMAGE_CHANNELS);
Credit : Phot by author

4 | Explore the data

Back to Table of Contents

Here is how each image in the dataset looks like. It is a 28x28 matrix of integers (from 0 to 255) and each integer represents a color of a pixel.

pd.DataFrame(x_train_re[0])

4.1 | Visualise how the digits were written

plt.imshow(x_train_re[0], cmap=plt.cm.binary)
plt.show()
# Let's print some more training examples to get the feeling of how the digits were written.numbers_to_display = 25
num_cells = math.ceil(math.sqrt(numbers_to_display))
plt.figure(figsize=(10,10))
for i in range(numbers_to_display):
plt.subplot(num_cells, num_cells, i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.imshow(x_train_re[i], cmap=plt.cm.binary)
plt.xlabel(y_train_re[i])
plt.show()
Credit: Photo by author

4.2 | Reshaping train, test, and validation data

In order to use convolution layers we need to reshape our data and add a color channel to it. As you’ve noticed currently every digit has a shape of (28, 28) which means that it is a 28x28 matrix of color values form 0 to 255. We need to reshape it to (28, 28, 1) shape so that each pixel potentially may have multiple channels (like Red, Green and Blue).

x_train_with_chanels = x_train_re.reshape(
x_train_re.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)
x_validation_with_chanels = x_validation_re.reshape(
x_validation_re.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)
x_test_with_chanels = x_test_re.reshape(
x_test_re.shape[0],
IMAGE_WIDTH,
IMAGE_HEIGHT,
IMAGE_CHANNELS
)
print('x_train_with_chanels:', x_train_with_chanels.shape)
print('x_validation_with_chanels:', x_validation_with_chanels.shape)
print('x_test_with_chanels:', x_test_with_chanels.shape)

4.3 | Normalize train, test, and validation data

x_train_normalized = x_train_with_chanels / 255
x_validation_normalized = x_validation_with_chanels / 255
x_test_normalized = x_test_with_chanels / 255

5 | Build the CNN model to Classify Handwritten Digits

Back to Table of Contents

A Convolutional Neural Network model generally consists of convolutional and pooling layers.

We are using Sequential Keras model which have two pairs of Convolution2D and MaxPooling2D layers. The MaxPooling layer acts as a sort of downsampling using max values in a region instead of averaging.

After that we will use Flatten layer to convert multidimensional parameters to vector.

The last layer will be a Dense layer with 10 Softmax outputs. The output represents the network guess. The 0-th output represents a probability that the input digit is 0, the 1-st output represents a probability that the input digit is 1 and so on..

model = tf.keras.models.Sequential()model.add(tf.keras.layers.Convolution2D(
input_shape=(IMAGE_WIDTH, IMAGE_HEIGHT, IMAGE_CHANNELS),
kernel_size=5,
filters=8,
strides=1,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))
model.add(tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=(2, 2)
))
model.add(tf.keras.layers.Convolution2D(
kernel_size=5,
filters=16,
strides=1,
activation=tf.keras.activations.relu,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))
model.add(tf.keras.layers.MaxPooling2D(
pool_size=(2, 2),
strides=(2, 2)
))
model.add(tf.keras.layers.Flatten())model.add(tf.keras.layers.Dense(
units=128,
activation=tf.keras.activations.relu
));
model.add(tf.keras.layers.Dropout(0.2))model.add(tf.keras.layers.Dense(
units=10,
activation=tf.keras.activations.softmax,
kernel_initializer=tf.keras.initializers.VarianceScaling()
))

5.1 | Summary of the training model

Here is our model summary so far.

model.summary()
Credit: Photo by author

5.2 | Visualization of the model

In order to plot the model, the graphviz should be installed. A model summary that describes the various layers defined in the model.

tf.keras.utils.plot_model(
model,
show_shapes=True,
show_layer_names=True,
)
Credit: Photo by author

5.3 | Compile the model using keras.optimizers.Adam

adam_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)model.compile(
optimizer=adam_optimizer,
loss=tf.keras.losses.sparse_categorical_crossentropy,
metrics=['accuracy']
)

5.4 | Train the model

log_dir=".logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)
training_history = model.fit(
x_train_normalized,
y_train_re,
epochs=10,
validation_data=(x_validation_normalized, y_validation_re),
callbacks=[tensorboard_callback]
)
print("The model has successfully trained")

6 | Model evaluation

Back to Table of Contents

6.1 | Loss plot curve for training and validation

Let’s see how the loss function was changing during the training. We expect it to get smaller and smaller with every next epoch.

plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.plot(training_history.history['loss'], label='training set')
plt.plot(training_history.history['val_loss'], label='validation set')
plt.legend()
Credit: Photo by author

6.2 | Accuracy plot curve for training and validation

plt.xlabel('Epoch Number')
plt.ylabel('Accuracy')
plt.plot(training_history.history['accuracy'], label='training set')
plt.plot(training_history.history['val_accuracy'], label='validation set')
plt.legend()
Credit: Photo by author

6.3 | Evaluation of the model accuracy

We need to compare the accuracy of our model on training set and on valiation set. We expect our model to perform similarly on both sets. If the performance on a validation set will be poor comparing to a training set it would be an indicator for us that the model is overfitted and we have a “high variance” issue.

6.3.1 | Performance of training dataset

%%capture
train_loss, train_accuracy = model.evaluate(x_train_normalized, y_train_re)
print('Train loss: ', train_loss)
print('Train accuracy: ', train_accuracy)

6.3.2 | Performance of validation dataset

%%capture
validation_loss, validation_accuracy = model.evaluate(x_validation_normalized, y_validation_re)
print('Validation loss: ', validation_loss)
print('Validation accuracy: ', validation_accuracy)

With a validation score of close to 99%, we proceed to use this model to predict for the test set.

6.3.3 | Save and load the model

We will save the entire model to a HDF5 file. The .h5 extension of the file indicates that the model shuold be saved in Keras format as HDF5 file.

model_name = 'digits_recognition_cnn.h5'
model.save(model_name, save_format='h5')
loaded_model = tf.keras.models.load_model(model_name)

6.3.4 | Visualise validation predicted data on how the digits were written

To use the model that we’ve just trained for digits recognition we need to call predict() method.

predictions_one_hot = loaded_model.predict([x_validation_normalized])# Let's extract predictions with highest probabilites and detect what digits have been actually recognized.
predictions = np.argmax(predictions_one_hot, axis=1)
pd.DataFrame(predictions)
# Show the predicted image
plt.imshow(x_validation_normalized[0].reshape((IMAGE_WIDTH, IMAGE_HEIGHT)), cmap=plt.cm.binary)
plt.show()

We see that our model made a correct prediction and it successfully recognized digit 1. Let’s print some more test examples and correspondent predictions to see how model performs and where it does mistakes.

numbers_to_display = 196
num_cells = math.ceil(math.sqrt(numbers_to_display))
plt.figure(figsize=(15, 15))
for plot_index in range(numbers_to_display):
predicted_label = predictions[plot_index]
plt.xticks([])
plt.yticks([])
plt.grid(False)
color_map = ‘Greens’ if predicted_label == y_validation_re[plot_index] else ‘Reds’
plt.subplot(num_cells, num_cells, plot_index + 1)
plt.imshow(x_validation_normalized[plot_index].reshape((IMAGE_WIDTH, IMAGE_HEIGHT)), cmap=color_map)
plt.xlabel(predicted_label)
plt.subplots_adjust(hspace=1, wspace=0.5)
plt.show()
Credit: Photo by author

6.3.5 | Confusion matrix of a validation dataset

confusion_matrix = tf.math.confusion_matrix(y_validation_re, predictions)
f, ax = plt.subplots(figsize=(9, 7))
sn.heatmap(
confusion_matrix,
annot=True,
linewidths=.5,
fmt="d",
square=True,
ax=ax
)
plt.show()
Credit: Photo by author

7 | Submission

Back to Table of Contents

test_pred = pd.DataFrame( loaded_model.predict([x_test_normalized]))
test_pred = pd.DataFrame(test_pred.idxmax(axis = 1))
test_pred.index.name = ‘ImageId’
test_pred = test_pred.rename(columns = {0: ‘Label’}).reset_index()
test_pred[‘ImageId’] = test_pred[‘ImageId’] + 1
test_pred.head()

Many thanks for reading my post!🙏

If you have a moment, I encourage you to see my other kernels below:

--

--

Learner CARES

Data Scientist, Kaggle Expert (https://www.kaggle.com/itsmohammadshahid/code?scroll=true). Focusing on only one thing — To help people learn📚 🌱🎯️🏆