Doubt python creation of model machine learning

1

I have a question about creating my machine learning model. I want to create a template that expects the PSS_Stress

columns = "ExamID;FinalGrade;PSS_Stress;StudyID;TotalQuestions;avg_durationperquestion;avg_tbd;decision_time_efficiency;good_decision_time_efficiency;maxduration;median_tbd;minduration;num_decisions_made;question_enter_count;ratio_decisions;ratio_good_decisions;totalduration;variance_tbd".split(";")
data = pd.read_csv("dataset.csv")
df = pd.DataFrame(data,columns=columns)
dfimp = df.fillna(df.mean())  

X = dfimp.drop(['PSS_Stress'], axis=1) 
Y=dfimp['PSS_Stress'] 
validation_size = 0.20
seed = 7
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size)

cart = DecisionTreeClassifier()
cart.fit(X_train, Y_train) 
score=cart.score(X_validation, Y_validation)
print(score)

My question is in variable X. I in this variable I will have all the features of my dataset or all the features except my target variable which in this case is PSS_Stress which was as I did in the image above

    
asked by anonymous 30.11.2018 / 16:41

1 answer

1

X_train will have all variables needed to predict the value of Y_train . If Y_train has only the PSS_Stress column, then X_train will have all the other columns of your dataset, except PSS_Stress .

After all, it does not make sense to use PSS_Stress to predict itself.

    
30.11.2018 / 18:36