Cross valdation n-fold

1
   W1        W2        W3        W4         A/N

0  0.543405  0.278369  0.424518  0.844776   1
1  0.121569  0.670749  0.825853  0.136707   1
2  0.891322  0.209202  0.185328  0.108377   1
3  0.978624  0.811683  0.171941  0.816225   0
4  0.431704  0.940030  0.817649  0.336112   0
5  0.372832  0.005689  0.252426  0.795663   0
6  0.598843  0.603805  0.105148  0.381943   1
7  0.890412  0.980921  0.059942  0.890546   1
8  0.742480  0.630184  0.581842  0.020439   1
9  0.544685  0.769115  0.250695  0.285896   1

I'm trying to use split k-fold

   kf= KFold(len(df),n_folds=10)

I'm trying to save example now:

for train,test in kf:
    xtr = X[col][train]   # aonde a col é col = w1,w2,w3,w4
    ytr = X['A/N'][train]
    xtest = X[col][test]
    ytest = X['A/N'][test]

Problem I can save only one column at a time when I try to save W1, W2 an index error happens, ie I can only save X [col [i]] [train]

    
asked by anonymous 31.10.2017 / 10:40

1 answer

1

You are not accessing the dataframe elements correctly. It is not recommended to use X [] [], because the dataframe understands this as (X []) [].

In your example, you are doing X [['W1', 'W2']] [2], which is understood as' create a new dataframe with the columns W1 and W2 of X and go to column 2 of this new one dataframe. See also Indexing

I also find it better if you do the division of X and y out of the loop. Because you are making copies of the dataframe. I also recommend you understand the difference between View vs Copy

I do not know which version of your KFold, but using this version sklearn.model_selection.KFold the code below does the KFold correctly

import pandas as pd
from sklearn.model_selection import KFold

df = pd.read_csv('kfold.csv')

X = df[['W1', 'W2', 'W3', 'W3']]
y = df['A/N']

kf= KFold(n_splits=10)
for train,test in kf.split(X):
    xtr = X.loc[train]
    ytr = y.loc[train]
    xtest = X.loc[test]
    ytest = y.loc[test]
    
31.10.2017 / 16:31