Number Recognition

This is a an exploration of the Kaggle Digit Recognizer competition, using the MNIST data. The goal is to accurately classify a handwritten single digit, in a 28 x 28 black and white image. Instead of just focusing on the end goal of performing well in the competition, I'm going to do a comparison of the canned classifier in the Scikit-Learn package, and a classifier using TensorFlow.

Scikit-Learn (Sklearn) is a machine learning library, that has many simple and easy to implement models, including a neural network created via a Multi Layered Perceptron (MLP). TensorFlow requires more hands on work by the user, constructing the network by hand. The strength of TensorFlow is the freedom to add additional calculations and layers, unavailable when using Sklearn. On the other hand, the Sklearn functions are incredibly easy to implement cross validation, and tune hyper parameters.

For our models we will show confusion matrices, which show the predicted vs true classifications, as well as accuracy scores. We will take the best performing models and upload them to the Kaggle competition, and report the final scores in the Final Model section. Kaggle provides a test set of data where the true result is not given, and only after uploading is a score reported.

Please not that when we discuss accuracies, we are referring to the total number of correct predictions relative to the total number of predictions.

The left hand navigation bar can be used to jump to different steps of the analysis, or, you can start right here with Data Exploration.