My First Foray Into Machine Learning

Like most tech companies these days, Red Hat is encouraging everyone to brush up on their AI knowledge. For my part I have been doing a number of online training courses lately and thought I would write up my experience for posterity.

I will start by saying that as I learn more about modern AI ("modern" because what is being hyped right now has been around in some form for many decades) it has only increased my concern about the ethics of AI and the way we're using the training data. However, that is a huge topic in and of itself and I'm not going to get into to it here. Maybe as a standalone post in the future. For now, just rest assured that I have concerns and will be watching this space very carefully in the coming years.

With that out of the way, let's get into the technical side of things. Most of this is going to be based on the first fast.ai lesson which is the first training I did that got into the technical side of AI in a meaningful way. However, one takeaway from my experience this week is that I probably should have gone through a few more lessons before trying to do my own thing. The first lesson provides a very high level overview of using the tools, but I did not come out of it with enough understanding to do anything different or unique.

What unique thing was I trying to do, you ask? Well, I am a regular user of GasBuddy, a crowd-sourced app for reporting gas prices, because I am a cheapskate and don't like spending more on gas than I have to. One feature I've always wanted to have is a way to snap a photo of a gas price sign and have image recognition pull the prices from it and report them automatically. Since the first lesson dealt with training a model to recognize photos, I thought I might be able to extend that to implement this feature I've always wanted. It still might be, but I'll spoil the ending for you and report that I failed quite spectacularly. :-)

A Comedy of Errors

There were several reasons for that:

  • Silly mistakes on my part
  • Lack of understanding of how the framework functions beyond the superficial use case covered in lesson 1
  • Lack(?) of documentation of how fast.ai works

I've been working in the same general area for quite a few years now and it's been a while since I tried to learn something completely new, and I've run into this before but had kind of forgotten in the intervening years. When you're brand new to something, it's very easy to get stuck on bugs of your own making simply because you don't know enough about what you're doing to recognize whether the problem is the tool, your use of the tool, or something basic that is unrelated to the tool.

If you're familiar with software, you'll probably recognize that I listed those in order of increasing likeliness. New programmers like to blame the tool ("Oh, this code won't compile because the compiler is broken"), but 99.9% of the time it's not the tool. Next up is your use of the tool. This is more common. In the compiler example, this might be caused by using a new syntax you're not familiar with and getting it wrong. Perhaps the most common is silly mistakes, like forgetting to include a semicolon or comma somewhere one is needed.

Once you get comfortable with a given development environment, you often can recognize which of these categories an error falls into almost immediately. When you're learning something new though (say a new programming language or library), you may think that you made a mistake with the new thing when you really forgot a semicolon, simply because you don't have an instinctive understanding how it works. In my case I did both.

Here are a couple of examples of problems I ran into and a brief discussion of why I think I struggled with them:

  1. A simple logic mistake in a loop. After I ran through the example code from the lesson, I naturally tried extending it a bit. The simplest case was to verify that the trained model would correctly identify more than just one image. To do this, I collected a few of my own vacation photos and (tried to) pass them into the vision model. Every one of them was being miscategorized with 100% certainty. Well, not quite. See, when I converted the single image verification to a loop, I looped over the filenames of the additional files. However, I forgot to update the verification call to actually use the new filenames. I was looping through the new files but always passing in the first to the actual function call. Oops.
  2. While the first example was a very silly mistake and should have been easily caught, it was not the only unexpected result I had gotten. I think that contributed to my barking up the wrong tree while debugging. The other thing I discovered when I started playing with the image categorization example was that if I attempted to predict the content of the second type of image (in the bird and forest example, I was trying to predict a forest photo instead of a bird), it would categorize it correctly but with what appeared to be a very low confidence. The probabilities I was getting back were in the range of .002, whereas a bird photo returned something close to 1. Initially I thought that meant the model was struggling to identify a forest photo, but I no longer believe that is the case.

This was a bit a of a perfect storm of problems that misled me into thinking I had incorrectly trained or called the vision learning model. Almost everything I did beyond what was in the original example was returning unexpected results. Usually that means I'm missing something fundamental, and I think that was the case here.

Analysis of My Mistakes

When I first started running into these problems I wasn't using the original bird and forest example. I had switched to forests and sunsets, thinking that those might be more difficult to classify and wanting to see how the model handled it. At first I thought it was just bad at recognizing sunsets, but after trying a number of different things to improve that, I went back to the bird and forest example and found the exact same thing. The model returned 1 for its confidence in identifying the bird, and near 0 for a picture of a forest.

My belief is that the model is returning near 1 when it is confident something is a bird, and near 0 for forests (or whatever image types you're using), but I could not find any discussion of what these numbers meant in the fast.ai documentation. I suspect the assumption is that you understand the underlying data model well enough to know what those numbers mean, but as a complete newbie I just don't. I thought it would return near 1 for any prediction it was confident in. This is also why I said earlier that I probably should have gone through a few more lessons before attempting to strike out on my own with ML development. Even with the simplified interface fast.ai provides, you still need some understanding of what's going on under the covers to use it properly.

We're now (finally) nearing the end of my first journey into machine learning. Once I realized that the probabilities seemed to be reciprocal for one of the two image categories, I did try adding a third just to see what would happen. In that case the first category had results near 1, the second was near 0, and the third was also near 0, but slightly further from 0 than the second category. I honestly have no idea how to interpret those numbers and I was running short of time for this experiment, so that's where I left off and starting writing this novel.

Conclusion

So did I get anything practical out of this exercise? I can't really say I did. I wrote some toy programs whose output I can't even properly understand, and even if I could understand them I have no real use for this at the moment. It's not even clear to me that I could extend this functionality to do the thing with GasBuddy that I was hoping to, since I need to be able to do more than just recognize a gas sign (although that's probably a good first step, so maybe not totally useless?).

That said, I did get my feet wet in the machine learning space and now I have a better idea what my knowledge gaps are. I also learned some completely new technology and there is always value in stretching your mind that way. Since it doesn't seem likely that AI is going away anytime soon, I expect I'll be back to build on this experience at some point in the not-too-distant future.