Image classification of the MNIST and CIFAR-10 data using KernelKnn and HOG (histogram of oriented gradients)

Lampros Mouselimis


In this vignette I’ll illustrate how to increase the accuracy on the MNIST (to approx. 98.4%) and CIFAR-10 data (to approx. 58.3%) using the KernelKnn package and HOG (histogram of oriented gradients).


MNIST data set

The MNIST data set of handwritten digits has a training set of 70,000 examples and each row of the matrix corresponds to a 28 x 28 image. The unique values of the response variable y range from 0 to 9. More information about the data can be found in the DataSets repository (the folder includes also an Rmarkdown file).

# using system('wget..') on a linux OS 


mnist <- read.table(unz("", "mnist.csv"), nrows = 70000, header = T, 
                    quote = "\"", sep = ",")
X = mnist[, -ncol(mnist)]

## [1] 70000   784

# the KernelKnn function requires that the labels are numeric and start from 1 : Inf

y = mnist[, ncol(mnist)] + 1          

## y
##    1    2    3    4    5    6    7    8    9   10 
## 6903 7877 6990 7141 6824 6313 6876 7293 6825 6958

K nearest neighbors do not perform well in high dimensions due to the curse of dimensionality (k observations that are nearest to a given test observation x1 may be very far away from x1 in p-dimensional space when p is large [ An introduction to statistical learning, James/Witten/Hastie/Tibshirani, pages 108-109 ]), leading to a very poor k-nearest-neighbors fit. One option to overcome this problem would be to use truncated svd (irlba package) to reduce the dimensions of the data. A second option, which is appropriate in case of images, would be to use image descriptors. In this vignette, I’ll compare those two approaches.

KernelKnnCV using truncated svd

I experimented with different settings and the following parameters gave the best results,

irlba_singlular_vectors k method kernel
40 8 braycurtis biweight_tricube_MULT

Utilizing truncated svd a 4-fold cross-validation KernelKnn model gives a 97.48% accuracy.

KernelKnnCV and HOG (histogram of oriented gradients)

In this chunk of code, besides KernelKnnCV I’ll also use HOG. The histogram of oriented gradients (HOG) is a feature descriptor used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy (Wikipedia).

By changing from the simple svd-features to HOG-features the accuracy of a 4-fold cross-validation model increased from 97.48% to 98.4% (approx. 1% difference)

CIFAR-10 data set

CIFAR-10 is an established computer-vision dataset used for object recognition. The data I’ll use in this example is a subset of an 80 million tiny images dataset and consists of 60,000 32x32 color images containing one of 10 object classes ( 6000 images per class ). Furthermore, the data were converted from RGB to gray, normalized and rounded to 2 decimal places (to reduce the storage size). More information about the data can be found in my DataSets repository (I included an Rmarkdown file).

I’ll build the kernel k-nearest-neighbors models in the same way I’ve done for the mnist data set and then I’ll compare the results.

# using system('wget..') on a linux OS 


cifar_10 <- read.table(unz("", "cifar_10.csv"), nrows = 60000, header = T, 
                       quote = "\"", sep = ",")
KernelKnnCV using truncated svd

The parameter settings are similar to those for the mnist data,

irlba_singlular_vectors k method kernel
40 8 braycurtis biweight_tricube_MULT

The accuracy of a 4-fold cross-validation model using truncated svd is 40.8%.

KernelKnnCV using HOG (histogram of oriented gradients)

Next, I’ll run the KernelKnnCV using the HOG-descriptors,

By using hog-descriptors in a 4-fold cross-validation model the accuracy in the cifar-10 data increases from 40.8% to 58.3% (approx. 17.5% difference).