ValueError in KFold of Scikit-learn: my dataset has two classes! What is going on?

1

I tried cross-validating a logistic regression using Scikit-learn . Here is the code:

 kf = KFold(n_splits=5, random_state=None, shuffle=False)
    kf.get_n_splits(previsores)
    for train_index, test_index in kf.split(previsores):

        X_train, X_test = previsores[train_index], previsores[test_index]
        y_train, y_test = classe[train_index], classe[test_index]

        logmodel.fit(X_train, y_train)
        print (confusion_matrix(y_test, logmodel.predict(X_test)))


        lista_matrizes.append(confusion_matrix(y_test, logmodel.predict(X_test)))
    #print(f" Matriz de Confusão Média \n{np.mean(lista_matrizes, axis=0)}")
    print("Matriz de Confusão Média")
    print(np.mean(lista_matrizes, axis=0))

I'm getting the following error:

ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1

My dataset has two classes (0 and 1) but I get the above error. What to do?

    
asked by anonymous 10.10.2018 / 17:52

1 answer

1

This can happen because one of the k-folder folders has taken samples of only one class. Take a look at the size of your dataset and the size of the folders. take a look at whether it's possible for a folder to grab only one class.

    
05.11.2018 / 23:43