Skip to main content

Deep Learning - Digit recognizer for MNIST - part 1


In this blog, I want to talk a little bit about testing different fully connected neural networks to see how each performs on this dataset:

https://www.kaggle.com/c/digit-recognizer


this is a very famous and one of the first classical data sets in the computer vision field. we have images like this:

and we want to be able to identify each image with least amount of errors.
 we have 42,000 labeled images for training
we have 28,000 non-labeled for kaggle evaluation.


In this blog I'll use few deep neural networks using Keras to build different models and evaluate them. each model will be different in either number of layers, nodes per layer to get a sense of what increases the accuracy of a neural network in this kind of problems and whether more layer & units and larger models mean better performance.

I'll assume you are familiar with python, basics of neural network.
as for Keras basics if you are not familiar with it, I recommend googling things along the way as it's not very complicated.


the link for the note book is here:
https://github.com/blabadi/ml_mnist_numbers/blob/blog-branch-part1/Digit%20Recognizer.ipynb

1- imports and libs


regular libraries to build Keras model and simple layers for fully connected NN and some utils to show images

2- define some parameters

we have 10 digits to classify an image as (0-9)
our images are of size 28x28 pixles
the directory is to save models in

the batch size tells the model to train with 32 images each iteration.
A training iteration has two steps: forward and backward passes over an image batch
1- forward: it first starts with random weights and predicts the images based on these weights
2- backward: then we calculate the error and our optimization algorithm will try to update and adjust the parameters by trying to minimize the error (using partial derivatives) with a learning rate parameter that decides how big  the change is.

the epoch is how many times to iterate over the whole training set while keeping the learned parameters from previous iterations so we can get as minimum error as possible based on the current hyper parameters setup.

3-  Loading the data set



we load our labeled data (X) and split it to two sets:
1- training : to be used by the model optimizer
2- test (validation): to be used by us to check the accuracy of the model on data it didn't see before but we have labeled it so we can compare it to a ground truth.

my split was 32,000 train, 10,000 validation ( almost 76% train to 24% validation )

we also split the data to the labels set (Y) and converted it to 1 hot encoding vector of 10 classes
so if you look at the last cell output you find number 3 is represented with a vector with 1 in position 3 of that vector, this is to be able to output a probability for each class in the model

4-Define the models 


I created different models but here is the biggest one:


it has  the input layer of size 28 x 28 = 784 input each represents a single pixle in the image and since it's only gray scale image ( a pixel can have value 0-255) and only one channel no RGB

it has the following layers:

- Drop out is a regularization layer to avoid over-fitting (memorizing the dataset instead of mapping the pattern) it drops some nodes randomly to prevent the neural network from memorizing them.

- Dense is a layer of nodes fully connected to the previous and next layer, each node has an activation function ( non linear function that is applied to the previous layer output ), each node will learn something about the data (feature, example the curves in a digit, straight lines, etc)
the deeper the node the more complex features it learns (that's why Neural networks work, they build connections that using composition can learn and map complex functions with ability to generalize for new data)
example:



image taken from this paper: http://www.cs.cmu.edu/~aharley/vis/harley_vis_isvc15.pdf

the last layer has 10 nodes (each node will output the probability of each digit)


5- Train the model

the commented code above will train the model to fit the training data this is the most time consuming step


each epoch the optimizer iterates over all images and prints the accuracy it got
with this model we achieved fairly quick result but as you can see the gains slow down very quickly
it, in my case it took few minutes to finish these epochs since I'm using a GPU.

6- Loading And Evaluating the models



here you can see all the models I tried with and trained. they are all saved on files to be reused later if needed.

here is the evaluation result on the validation set:

we let each model predict the digits and compare that to the ground truth, Keras does that for us.

our model achieved 99.15 % in training accuracy while here it got 97.74% in validation set accuracy which is expected to achieve less since these images are completely new to it. what we want to be careful of here to know if our model is not generalizing well, if your model gets 99% in training and gets say 80% in test accuracy then the gap is big this can be an indication that your model is over-fitting the training set and not generalizing on new data well (that's why I added regularization for my biggest model because the more parameter it has the easier it can overfit)


notes based on the results:


7-  Sample predictions from the non-labeled set


each image has the index:prediction over it, you can see some not-so-clear images like index : 399
, 366, 445 but our model got them correctly.


8- testing with my image


I created a digit image myself to see how the model

only 3 models were able to predict the correct digit.

the model achieved 97.7 % on the kaggle test set after submission.. for simple fully connected neural network it seems good for me !

In the next parts  I'll try with a more complex network using convolutions & residual network architecture to see how much more we can minimize the error, knowing that people achieved 99.7% on this already if not higher


Comments

Popular posts from this blog

Android RecyclerView - Adding Empty View

So RecyclerView was introduced to replace List view and it's optimized to reuse existing views and so it's faster and more efficient as stated in the documentation:

https://developer.android.com/training/material/lists-cards.html

While using it, I faced the issue of missing a useful functionality that is implemented in ListView.
that feature is setting an empty view in case there was no records.

In ListView it was as simple as this

View emptyView = findViewById(R.id.mylist_empty_view);
ListView  myList = ....
myList.setEmptyView(emptyView);

but this method doesn't exist for recycler view so we need a work around until android team fixes this.


and here are the screen shots of both list view and recycler view fix

List view :

Recycler view :



here is how I fixed it:



here is the content of empty_view, it can be anything.



Enjoy.

Android - Multiple themes for one application

Sometimes you want to have multiple themes for your app
one strong example is having the ability to switch between dark and light themes because during night, a white bright screen can really be annoying for users eyes

Android will do most of the work for you but it may be required to change icons between themes to fit colors
In this blog I'll show a simple app with both dark and light themes and how to change icons without having to do that from code and keep things clean and centralized.
first of all let's create our activity, it will look something like this :


In /rest/values/styles.xml, we inherit Theme.AppCompat
 <!--
        Base application theme, dependent on API level. This theme is replaced
        by AppBaseTheme from res/values-vXX/styles.xml on newer devices.
    -->
    <style name="AppBaseTheme" parent="Theme.AppCompat">
        <!--
            Theme customizations available in newer API levels can go in
            res/values…

Creating your own OAuth2 server and clients using spring security - part 1

In this series of posts, I'll try to put together a simple working example on how to create your own OAuth2 server.

if you want to know more on OAuth2 and when to use it as authentication and authorization protocol then you can search about it on google and i'll put some URLs later.

Now I assume you are familiar with java web applications using Spring and maven.

to get started we need to create the server side with all dependencies required and i'll list them here, i'll use maven 2 to ease downloading dependencies for us.


Steps:

1- Create new maven project with arch type webapp:



2- Add the required depenedencies for spring, spring security, spring-oauth2, hibernate & other libraries (required for this tutorial only you can use other libraries if you like)

https://gist.github.com/anonymous/d33a31ddc3ba84375cf3

3- I used hibernate to automate the creation of the schema required by spring OAuth2 to manage tokens (it's required to have schema created in db if you a…