Recognizing your dog’s mood with the power of Machine Learning (Part 2)

In the previous post, we discussed how to build an image classifier that can recognize and classify a dog’s mood. We specifically were classifying “happy” and “sad”.

We are now going to add more classes to the original model in order to see if we can classify more moods.

Since I last worked with TensorFlow’s image classification tools, I found that Google had acquired Keras and had integrated the API into their tools. This meant that their original image classification tools had been refactored. So after some searching, I found the refactored python script here. This script is basically a rewrite of their old tool for transfer learning but with added capabilities to use Tensorflow’s Hub to train models. For those that don’t know what Tensorflow Hub is, it is a platform that hosts pre-built models that have been prepped for usage or transfer learning that can be used for a variety of purposes including image classification, object detection to text embedding.

Before we run the script, we need to gather images. I initially wanted to classify happy, sad, angry, surprised, and scared. In order to do this, I scraped the internet for images and ran into some challenges. I found out that because of the way search engine work, I was finding a lot of images that didn’t quite match the dog’s mood because search engines index images based on the page that the image was found on in addition to the images’s alt text attributes. I also realized that a lot of the images results were based upon a human’s personification of the dog at that point in time which lead to some images that could overlap in terms of classifications. Adding additional classes made it more challenging for the additional layers added at the end of the model to be more accurate. In order to address the accuracy levels of less than 60%, I took two different approaches. First I started to change parameters like # of epochs or changing the learning rate. This helped but not significantly, I then took a closer look at my data and I determined that there was too much similarity between my images in surprised and scared. I then decided to remove “surprised” which resulted in significantly higher accuracy, a jump in nearly 10%.

This image classification tool also didn’t make it easy to visualize the progression of accuracy with the different changes I made so I had to install tensorboard on my machine. I later discovered that the tool doesn’t have a feature to produce log files that tensorboard can’t consume! I had to go in and create a flag in the tool itself for enabling and specifying an output directory for log files. Once I had done this, I was able to visually interpret how my changes to the parameters affected accuracy over #of epochs.

I also discovered that after Google integrated the Keras API into the platform, users that were trying to use their GPU to train their models were running into issues where the GPU ran out of memory. This was because, by default, the library allocates ALL of your GPU’s memory for your session’s operations. In order to get around this, the tensorflow team had implemented a mechanism that would slowly scale GPU memory allocation based on the model’s needs. So I went ahead and implemented this feature in the tool as well.

Once I had the model generated, I converted it for web usage using the following tool: https://github.com/tensorflow/tfjs/tree/master/tfjs-converter

This lets you take the model that you generated and convert it so that tensorflowjs can utilize the model.

I also realized that my React application was put together quickly and unorganized so I went ahead and refactored the application and broke it down into smaller components. I have a Camera component that handles the camera view which captures every frame and pushes the image to a canvas element and to a property on the component. I have a separate Predictions component that takes image data from the Camera component and a model property and runs a encode/decode process to determine image classification predictions on the current frame and displays the percentages over the picture. This allows me to switch out the model in realtime as I see fit and can switch between the old model and the new model.

I also realized that the application was quickly eating up memory because of the way the React application lifecycle works. I was essentially multiple contexts of the canvas for accessing and updating the component. I switched over to using effects and limiting which variables the effects trigger image changes on to improve the framerate and reduce memory consumption. Splitting up the prediction component and the camera component also made things more efficient and allowed the model to execute less often so that not every frame has to be run through the classifier.

You can find the latest iteration of this application on https://dog-mood.web.app/

Recognizing your dog’s mood with the power of Machine Learning (Part 2)

Comments

Leave a Reply Cancel reply