Image Classification using Convolutional Neural Network
- charithayrs1997
- Oct 21, 2022
- 4 min read
Convolutional Neural Network, drawing it's name from the biological human brain inspiration stands one of the best deep learning algorithms with it's best application in image classification. In the image classification problem statement, from among the collection of images belonging to different classes, we need to predict the class to which each image belongs by referring the model built for identifying. Visualizing this problem statement, first challenge is the representation of images. Pixel representation of 2 dimensions followed by RGB representation of the channels makes it complex. Initially this representation is taken and convolutional filters are used to reduce the complexity of the spatial representation of images using the spatial connectivity of neurons. Each CNN layer that we use is used to perform a vector multiplication on the pixel representation of image and the result is considered. This was convolution is performed across the whole spatial pixel representation. CNN layer is followed by the application of Pooling for further condensing the image representation. Mostly used pooling is MaxPooling which considers the maximum value of the spatial representation.

Implementation :
Fetching Dataset from Kaggle
Firstly fetching the dataset from kaggle is done providing the kaggle username, key and the API link copied from pertaining dataset page in kaggle.

The downloaded zip file is then unzipped to obtain the directory with three class subdirectories.

Pre-Processing of Data
Now the pre-processing of data is done firstly by verifying if all the images are of expected format. Here 'imghdr' utility is used to fetch the format of the image and OpenCV is used for working with image path. If any images aren't of format jpeg, jpg and png, they are deleted.

Secondly, splitting of the data provided in the dataset is carried out. For this a directory named 'splitdir' is created followed by 3 sub directories of 'train','validation' and 'test'.
All the images of dataset are shuffled and 60% images are split into testing, 20% into validation and 20% into testing.
Images are scaled so that the pixel representation would be between 0 and 1 by dividing each entity of pixel representation by 255 using Image Data Generator. This way train_generator, validation_generator and test_generator are created.
Build, Train the Model
Tensorflow provides two model building API's namely Sequential and Functional. At a high level Sequential model is used when one input pertains to single output and Functional model is used when multiple inputs pertain to multiple outputs.
Conv2D layer performs the convolution with the provided layer. For example Conv2D(16,(3,3),1) 16 indicates number of filters, 3*3 indicates pixel size of each filter and 1 indicates the stride move of one pixel each time. Activation 'relu' is used here, which means output of Conv2D layer is passed to relu function to make the negative values zero and to retain positive values. This way output is modified.

MaxPooling reduces the image data considering 2*2 pixel by default at a time.
After flattening and dense function application to further condense the image data, three values are obtained. In order to make them the probabilities, softmax function is applied.
While compiling the model, Adam optimizer is used, Categorical Cross Entropy loss is used because this is multi class data where if it is only two classes existing we can use Binary Cross Entropy.
Model summary is printed inorder to check the condensing of image and the size of output obtained along with total number of parameters.

Inorder to logout the model training, logdirectory is created.
Model utility provides two functionalities, model.fit for training and model.predict for prediction.
Initially, model is trained and validated against the validation split images. Here, epoch represents the duration of training. Both the training, validation accuracies and losses are recored.
Using the history, graphs are plotted between the training, validation accuracies, losses and epochs.


Also the graphs for all the experiments carried out with the hyper parameters are represented.
Test the Model :
This indicates the model.predict functionality. For the test split images, classes are predicted and printed in the output.

Please find the source code here.
Contribution :
In the reference provided, image classification model is built using CNN and Keras of tensorflow and the obtained training accuracy is 67.67 and validation accuracy is 73.89.
I've built the image classification model using Keras of tensorflow referring the above reference, experimented with different input image size, batch size of images and number of CNN layers and optimized the model to obtain training accuracy of 98.49 and validation accuracy of 99.09. I've also predicted the classes of images of the test split and displayed them.
Experiments-Explanation :
I've experimented on the initial model built by altering the batch size of images in image data generator, input size of images and number of CNN layers and following are the observations.
When input image size is(256,256), 5 CNN layers are used, with batch size of 16 validation accuracy is 48.05, batch size of 32 it is 92.79.
When 5 CNN layers are used with batch size of 32, with input image size (224,224), validation accuracy is 98.80 and with (256,256) it is 92.79.
When input image size is (256,256), batch size 32, following 80,20 split for training and validation with 5 CNN layers validation accuracy is 92.79 and with 3 CNN layers it is 99.10.
When input image size is (256,256), batch size 32, following 60,20,20 split for training, validation and testing with 5 CNN layers validation accuracy is 98.19 and with 3 CNN layers it is 99.09.




Challenges Faced :
Initially while working with google colab, I faced issues with fetching dataset. Firstly, I used python code to display an upload button and uploaded the zip file from local machine followed by unzipping it.

But later found that downloading the dataset from kaggle directly using API link is a much better way of doing it.
With the idea of only binary image classification, I knew using sigmoid activation function as a part of dense utility. For three class image classification here, I tried different ways and found out it would be simple using softmax activation function.
References :
The above provided references are used to get the idea of image classification using CNN.
Comments