How to normalize data? Any sklearn libbioteca?

3

I need to normalize data that I have to be between -1 and 1.

I used StandardScaler, but the range got larger.

What other sklearn library could I use? There are several in sklearn, but I could not, it should make life easier, but I do not think I know how to use it.

What I tried was:

df = pd.read_fwf('traco_treino.txt', header=None)
plt.plot(df)

Datainrange-4and4

Afterattemptingtonormalize:

fromsklearn.preprocessingimportStandardScalerscaler=StandardScaler()scaler.fit(df)dftrans=scaler.transform(df)plt.plot(dftrans)

The data is between -10 and 10.

    
asked by anonymous 16.05.2018 / 00:03

1 answer

4

StandardScaler standardizes the data for a unit of variance ( var = 1) and not for a range , so the results differ from that expected.

To standardize the data in the range (-1, 1), use MaxAbsScaler :

import numpy as np
from sklearn.preprocessing import MaxAbsScaler

# Define os dados
dados = np.array([[0, 0], [300, -4], [400, 3.8], [1000, 0.5], [3000, 0]], dtype=np.float64)

dados
=> array([[  0.00000000e+00,   0.00000000e+00],
       [  3.00000000e+02,  -4.00000000e+00],
       [  4.00000000e+02,   3.80000000e+00],
       [  1.00000000e+03,   5.00000000e-01],
       [  3.00000000e+03,   0.00000000e+00]])

# Instancia o MaxAbsScaler
p=MaxAbsScaler()

# Analisa os dados e prepara o padronizador
p.fit(dados)
=> MaxAbsScaler(copy=True)

# Transforma os dados
print(p.transform(dados))
=> [[ 0.          0.        ]
 [ 0.1        -1.        ]
 [ 0.13333333  0.95      ]
 [ 0.33333333  0.125     ]
 [ 1.          0.        ]]

More information on documentation or Wikipedia: Feature scaling

    
16.05.2018 / 02:10