Questions tagged as 'data-science'

2
answers

DataFrame Pandas - Calculate column based on other

I have a dataframe in the following format: colunas = [ 'COMEDY', 'CRIME', 'Classe Prevista' ] precisao_df = pd.DataFrame(columns=colunas) precisao_df['COMEDY'] = y_pred_proba[:,0] precisao_df['CRIME'] = y_pred_proba[:,1] precisao...
asked by 31.10.2018 / 05:31
0
answers

What is the difference between a data warehouse and a data lake?

Given the concepts of data warehouse and data lake , what notable differences can we cite between them?     
asked by 04.12.2018 / 23:42
0
answers

Random forest with very high accuracy

I'm working with this dataset and I applied random forest to create a pricing model, but the accuracy of the model is getting too high, so I'm wary of anything wrong. Apparently train and test are different, so it was not to give such a high a...
asked by 14.11.2018 / 14:08
0
answers

What is the difference between Train Test Split and Holdout?

From what I've already researched, they both divide the set into two subsets of training and testing. Are there any differences between the two?     
asked by 14.11.2018 / 15:23
0
answers

Bag of words in Python

I have a news dataset and want to separate them between two classes. For this I thought of using Bag of words, but I'm not getting with Sklearn. I've tried the following: #Bag of words from sklearn.feature_extraction.text import CountVectorize...
asked by 31.10.2018 / 02:01