classification_report and sklearn confusion_matrix: values do not match?

Question

classification_report and sklearn confusion_matrix: values do not match?

Navigation

#1 by (3 votes)

3

Model: logistic regression with sklearn.

I decided to compare the results shown in the classification_report, calculating them using the confusion matrix but apparently the results do not match:

classification_report:

precision    recall  f1-score   support

      0       0.54      0.94      0.68     56000
      1       0.96      0.62      0.75    119341

avg / total 0.82 0.72 0.73 175341

Matrix of confusion generated:

    [52624  3376]
   [45307 74034]]

My calculations based on the confusion matrix above:

How well does the model match (accuracy)?

(TP + TN)/total

(74034 + 52624)/(52624 + 74034 +45307 + 74034)*100 = 51%

How accurate is the model (ratio of TP number to TP and FP sum)?

74034/(74034 + 3376)*100 = 95%

What Recall R of the model (ratio between TP number and sum of TP and FN)

74034/(74034 + 45307)*100 = 62%

As we can see, recall and precision do not beat. What's wrong? How to interpret the results? What do f1-score and support represent?

python-3.x machine-learning sklearn

asked by anonymous 24.06.2018 / 19:19

1 answer

FieldByName parameter bold CSS3 Fit table to fill in whitespace generated on the Landscape page

score 3 · Accepted Answer

I will try to explain step by step the analysis so that you can understand the problem or another person with the same problem can understand how to solve these things.

First, I'm going to generate 2 vectors, target and predicted , which will simulate the result of your classification. These vectors were created from the data you passed.

First, classification_report says you have 56,000 of class 0 and 119341 of class 1 in your rank. So I'm going to generate a vector with 56,000 zeros and 119341 ones.

import nump as np

class0 = 56000 
class1 = 119341
total = class0 + class1

target          = np.zeros(total, dtype=np.int)
target[class0:] = np.ones(class1, dtype=np.int)

# pra provar que os valores estao certos
sum(target == 0) == class0, sum(target == 1) == class1

With this, you have the target vector, with the data that your classification should have hit. Let's now generate predicted , which will have what your rating reported. This data was taken from its confusion matrix.

class0_hit  = 52624 # qto acertou da classe 0
class0_miss = 3376 # qto errou da classe 0
class1_miss = 45307 # qto errou da classe 1
class1_hit  = 74034 # qto acertou da classe 1

predicted = np.zeros(total, dtype=np.int)

predicted[class0_hit:class0_hit + class0_miss + class1_hit] = np.ones(class0_miss + class1_hit, dtype=np.int)

# pra provar que os valores estao certos
sum(predicted == 0) == class0_hit + class1_miss, sum(predicted == 1) == class0_miss + class1_hit

Now we can look at the sklearn's classification report and see what it tells us about these values:

from sklearn.metrics import classification_report
print (classification_report(target, predicted))

             precision    recall  f1-score   support

          0       0.54      0.94      0.68     56000
          1       0.96      0.62      0.75    119341

avg / total       0.82      0.72      0.73    175341

This is exactly the same as the classification report you have pasted. We get to the same point as you.

Looking at the confusion matrix:

from sklearn.metrics import confusion_matrix
print (confusion_matrix(target, predicted))

[[52624  3376]
 [45307 74034]]

Still the same. Let's look at what the acuracy says:

from sklearn.metrics import accuracy_score
accuracy_score(target, predicted)
> 0.7223524446649672

It returns 72%. Equal to the classification report. So why are your accounts giving 51% accuracy? In your account this:

(TP + TN)/total
(74034 + 52624)/(52624 + 74034 + 45307 + 74034)*100 = 51%

If you repair, the value 74.034 is repeated 2x. By doing the accounts using the values set in the code, it would look like this:

 acc = (class0_hit + class1_hit) / total
 > 0.7223524446649672

That hits the value of accuracy_score . The calculation of precision and recall are right:

from sklearn.metrics import precision_score
precision_score(target, predicted)
> 0.9563880635576799

from sklearn.metrics import recall_score
recall_score(target, predicted)
> 0.6203567927200208

But why, then, classification_report is returning those weird values at the end? The answer is simple and it's in his documentation.

The reported averages are the prevalence-weighted macro-average across classes (equivalent to precision_recall_fscore_support with average = 'weighted').

That is, it does not make the simple calculation, it takes into account the quantity of each class to calculate the average.

Let's take a look at this precision_recall_fscore_support method. It has a parameter called average , which is used to control the calculation behavior. Running it with the same parameter as classification_report we get the same result:

from sklearn.metrics import precision_recall_fscore_support
precision_recall_fscore_support(target, predicted, average='weighted')
> (0.8225591977440773, 0.7223524446649672, 0.7305824989909749, None)

Now, since your classification has only 2 classes, the right thing is to ask it to calculate with average binary. By changing the% parameter by%, we have:

precision_recall_fscore_support(target, predicted, average='binary')
> (0.9563880635576799, 0.6203567927200208, 0.75256542533456, None)

What exactly is the result we find using sklearn's own functions or doing the calculation in the hand.