I'm running a regression where I have 3 parameters and a column with categories.
As sklearn does not recognize categories I transform them into dummies (I create a column for each category and fill it with 1 case belongs to the column category and zero otherwise)
from sklearn import preprocessing
myEncoder = preprocessing.OneHotEncoder()
myEncoder.fit(df_c_f[['segment_id']])
dummies = myEncoder.transform(df_c_f[['segment_id']]).toarray()
So my array that initially has n rows and 4 columns now has 3 columns + c columns of categories.
Doubt is how I can iterate my first 3 columns with all dummies so I end up with n rows and 3 * c columns.
I ran the following code to do this, but it only works for small arrays, any number a little big the code hangs
matrix = []
def itera_parametros_e_dummies(matrix1,matrix2):
print(len(matrix1))
if len(matrix1) != len(matrix2):
print("matrizes de tamanhos diferentes")
else:
for i in range(len(matrix1)):
matrix.append(np.dot(matrix1[i:i+1],(matrix2[i:i+1]))[0])
return(matrix)
itera_parametros_e_dummies(log_orgc_traf,df_dummies)