I have hundreds of digital images of dogs and cats, I need to make an algorithm to recognize when it is the dog and when it is the cat. What steps should I take?
I have hundreds of digital images of dogs and cats, I need to make an algorithm to recognize when it is the dog and when it is the cat. What steps should I take?
First of all, it's cool to say that this is a famous machine-learning problem. It is available as Kaggle Challenge , from which you can also download the database. In fact, that's where I downloaded the data to write the answer.
I will show a very simple methodology for training a classifier for this problem. The answer is quite a hello world of this world, but it might help. This article describes a much more advanced methodology for forecasting (accuracy is 82% of the images)
Note also that this is a R
solution for this problem.
In R you can read the images using the imager
package.
library(imager)
library(dplyr)
library(tidyr)
library(stringr)
img <- imager::load.image("train/cat.0.jpg")
At first, I'll leave the image smaller and standardized. 100 x 100. This is to stay. is not a mandatory step, although it is recommended. I will also consider grayscale and non-colored images to further reduce them.
img <- imager::grayscale(img)
img <- imager::resize(img, 100, 100)
Now, we have a 100 x 100 matrix with each element representing the gray tone.
I'd rather represent the image as a data.frame in the R, because it's easier to manipulate. So I use the following code.
img_df <- as.matrix(img) %>%
data.frame() %>%
mutate(x = 1:nrow(.)) %>%
gather(y, t, -x) %>%
mutate(y = extract_numeric(y))
Here the image is represented in 3 columns of a data.frame. The first two x
e
y
identifies the position of the pixel. The latter represents the pixel's gray tone.
For entry into a statistical model / machine-learning algorithm it is necessary to obtain a database in which each row is an observation / an individual / a sample unit, and each column is a characteristic observed in that individual.
So to classify images of cats and dogs we need a database in which each image is represented in a row and each pixel of the image is a column (the pixels are the information observed from the image). In addition we will need a column indicating if the image is of a cat or a dog to train the algorithm / estimate its parameters.To convert the image to a line I use the following command:
img_line <- img_df %>%
mutate(colname = sprintf("x%03dy%03d", x, y)) %>%
select(-x, -y) %>%
spread(colname, t)
If you wanted to consider the color of the image in your model, at this step you would need to create a column for each pixel and each color, ie 3x100x100 = 30,000, you would end up with each image represented by a line of 30,000 columns.
I explained how you would do to process an image, but to train the algorithm several images are required. I will encapsulate the previous code in a function and use it to process a series of images.
processar <- function(path){
img <- imager::load.image("train/cat.0.jpg")
img <- imager::grayscale(img)
img <- imager::resize(img, 100, 100)
img_df <- as.matrix(img) %>%
data.frame() %>%
mutate(x = 1:nrow(.)) %>%
gather(y, t, -x) %>%
mutate(y = extract_numeric(y))
img_line <- img_df %>%
mutate(colname = sprintf("x%03dy%03d", x, y)) %>%
select(-x, -y) %>%
spread(colname, t)
return(img_line)
}
For demonstration purposes, I'll get a sample of 100 dog images and 100 cat images for model training. In practice, many more images are needed.
arqs <- list.files("train", full.names = T)
amostra_gato <- arqs[str_detect(arqs, "cat")] %>% sample(100)
amostra_cachorro <- arqs[str_detect(arqs, "dog")] %>% sample(100)
amostra <- c(amostra_gato, amostra_cachorro)
bd <- plyr::ldply(amostra, processar)
Y <- as.factor(rep(c("gato", "cachorro"), each = 100) ) # vetor de respostas
This step takes a long time and is computationally intense. You do a lot of processing and images are heavy files.
Here any machine-learning algorithm could be used. You have already transformed your images in a conventional database. Already notice, this usually takes quite a while. On my computer to train with 200 images of 10,000 columns took about 30 min.
I'm going to use random forest to do the sorting, but you could actually model any.
m <- randomForest::randomForest(bd, Y, ntree = 100)
I will not go into details of how modeling should be done. The right thing is for you to separate a training base and a test base. Verify that you have not overfitting, tuning parameters using cross-validation, etc. But that would make the answer very extensive, so I trained a random forest using all the patterns of the R function (changing only the number of trees).
I checked the error only on the build basis as well (which is wrong statistically, but ball forward).
tabela <- table(predict(m, type = "class"), Y)
acerto <- sum(diag(tabela))/sum(tabela)
acerto
With the trained model and a new processed image, use the following command to predict the category:
predict(m, newdata = img_line)