Skip to main content

Deep Learning - Digit recognizer for MNIST - part 1


In this blog, I want to talk a little bit about testing different fully connected neural networks to see how each performs on this dataset:

https://www.kaggle.com/c/digit-recognizer


this is a very famous and one of the first classical data sets in the computer vision field. we have images like this:

and we want to be able to identify each image with least amount of errors.
 we have 42,000 labeled images for training
we have 28,000 non-labeled for kaggle evaluation.


In this blog I'll use few deep neural networks using Keras to build different models and evaluate them. each model will be different in either number of layers, nodes per layer to get a sense of what increases the accuracy of a neural network in this kind of problems and whether more layer & units and larger models mean better performance.

I'll assume you are familiar with python, basics of neural network.
as for Keras basics if you are not familiar with it, I recommend googling things along the way as it's not very complicated.


the link for the note book is here:
https://github.com/blabadi/ml_mnist_numbers/blob/blog-branch-part1/Digit%20Recognizer.ipynb

1- imports and libs


regular libraries to build Keras model and simple layers for fully connected NN and some utils to show images

2- define some parameters

we have 10 digits to classify an image as (0-9)
our images are of size 28x28 pixles
the directory is to save models in

the batch size tells the model to train with 32 images each iteration.
A training iteration has two steps: forward and backward passes over an image batch
1- forward: it first starts with random weights and predicts the images based on these weights
2- backward: then we calculate the error and our optimization algorithm will try to update and adjust the parameters by trying to minimize the error (using partial derivatives) with a learning rate parameter that decides how big  the change is.

the epoch is how many times to iterate over the whole training set while keeping the learned parameters from previous iterations so we can get as minimum error as possible based on the current hyper parameters setup.

3-  Loading the data set



we load our labeled data (X) and split it to two sets:
1- training : to be used by the model optimizer
2- test (validation): to be used by us to check the accuracy of the model on data it didn't see before but we have labeled it so we can compare it to a ground truth.

my split was 32,000 train, 10,000 validation ( almost 76% train to 24% validation )

we also split the data to the labels set (Y) and converted it to 1 hot encoding vector of 10 classes
so if you look at the last cell output you find number 3 is represented with a vector with 1 in position 3 of that vector, this is to be able to output a probability for each class in the model

4-Define the models 


I created different models but here is the biggest one:


it has  the input layer of size 28 x 28 = 784 input each represents a single pixle in the image and since it's only gray scale image ( a pixel can have value 0-255) and only one channel no RGB

it has the following layers:

- Drop out is a regularization layer to avoid over-fitting (memorizing the dataset instead of mapping the pattern) it drops some nodes randomly to prevent the neural network from memorizing them.

- Dense is a layer of nodes fully connected to the previous and next layer, each node has an activation function ( non linear function that is applied to the previous layer output ), each node will learn something about the data (feature, example the curves in a digit, straight lines, etc)
the deeper the node the more complex features it learns (that's why Neural networks work, they build connections that using composition can learn and map complex functions with ability to generalize for new data)
example:



image taken from this paper: http://www.cs.cmu.edu/~aharley/vis/harley_vis_isvc15.pdf

the last layer has 10 nodes (each node will output the probability of each digit)


5- Train the model

the commented code above will train the model to fit the training data this is the most time consuming step


each epoch the optimizer iterates over all images and prints the accuracy it got
with this model we achieved fairly quick result but as you can see the gains slow down very quickly
it, in my case it took few minutes to finish these epochs since I'm using a GPU.

6- Loading And Evaluating the models



here you can see all the models I tried with and trained. they are all saved on files to be reused later if needed.

here is the evaluation result on the validation set:

we let each model predict the digits and compare that to the ground truth, Keras does that for us.

our model achieved 99.15 % in training accuracy while here it got 97.74% in validation set accuracy which is expected to achieve less since these images are completely new to it. what we want to be careful of here to know if our model is not generalizing well, if your model gets 99% in training and gets say 80% in test accuracy then the gap is big this can be an indication that your model is over-fitting the training set and not generalizing on new data well (that's why I added regularization for my biggest model because the more parameter it has the easier it can overfit)


notes based on the results:


7-  Sample predictions from the non-labeled set


each image has the index:prediction over it, you can see some not-so-clear images like index : 399
, 366, 445 but our model got them correctly.


8- testing with my image


I created a digit image myself to see how the model

only 3 models were able to predict the correct digit.

the model achieved 97.7 % on the kaggle test set after submission.. for simple fully connected neural network it seems good for me !

In the next parts  I'll try with a more complex network using convolutions & residual network architecture to see how much more we can minimize the error, knowing that people achieved 99.7% on this already if not higher


Comments

Popular posts from this blog

Android RecyclerView - Adding Empty View

So RecyclerView was introduced to replace List view and it's optimized to reuse existing views and so it's faster and more efficient as stated in the documentation: https://developer.android.com/training/material/lists-cards.html While using it, I faced the issue of missing a useful functionality that is implemented in ListView. that feature is setting an empty view in case there was no records. In ListView it was as simple as this View emptyView = findViewById(R.id.mylist_empty_view); ListView  myList = .... myList.setEmptyView(emptyView); but this method doesn't exist for recycler view so we need a work around until android team fixes this. and here are the screen shots of both list view and recycler view fix List view : Recycler view : here is how I fixed it: here is the content of empty_view, it can be anything. Enjoy.

[PART 5] NuTracker ReactJS app - Add Login & Profile using Router

In the previous part we finished the dashboard read functionality, now we want to add the skeleton for other pages: - Login   In this page the user will be able to login to their account and the dashboard won't show unless the user is logged in. - Profile In this page the user will be able to update their daily nutrition goals that they can track in the dashboard. to be able to have multiple 'pages' in react and navigate from one to one, we need something that can switch the rendered content based on what we want, we can do that with if statements in the App components and store some location state, but why invent the wheel. React Router every major single page app web framework has the routing concept and functionality to interact with the usual browser urls and switch the content based what user should see. for example on the profile page I want the url path to be /profile, and for login to be /login and so on. in more advanced cases you want the users

Microservices 101, Docker & spring boot sample [Windows 10, home]

This is a very simplistic article :) if you are looking for a deep dive in microservices, see my state-of-the-art microservices full archeticture here: http://dev.basharallabadi.com/2019/03/part-1-spring-state-of-art.html   What are micro services? It's an architectural model for web services that basically requires each service** to be completely independent and loosely coupled from other consumer services, or services that it depends on. and it's not a new idea but it's catching pace in today large scale web applications. ** (by service we mean a component that controls and implements the business logic in a self contained manner, like orders service, products catalog service, accounts management service, all of these have their domain and can be clearly separated)   Why micro services emerged ? 1- Easy to scale services: if you have a single application and all the services share the same code base and war (package) then if you receive high demand on one