How to make a frequency distribution table in Python? [closed]

0

Good afternoon,

One question: Could anyone clarify me how I can make a frequency distribution table: classes; absolute and relative frequency; cumulatively; average values of each class.

    
asked by anonymous 19.01.2018 / 20:31

1 answer

1

Hi, I was able to create one ... I'll send you the example using Pandas and maybe even help you!

Calculations needed to generate the table: Class width (h) through the relation h = AT / k , where AT = max (x) - min (x) is the total data span and k = root (n) is an estimated number of class ranges for a data set with n observations (k can be computed by other definitions, such as the Sturges rule, for example).

Creating the table - Suppose you also go to use a DataFrame pandas

1 - Sorting the dataframe values

df = data['fixed acidity']
df.sort_values(ascending=True)

2 - Calculate the Total Range of Data

# Amplitude dos dados = Valor maior dos registros - menor valor
at = df.max() - df.min()

3 - Calculate the Amplitude of the Class

  # Lembrando que k = raiz quadrada do total de registros/amostras
    k = math.sqrt(len(df))
    # O valor de amplitude de classe pode ser arredondado para um número inteiro, geralmente para facilitar a interpretação da tabela.
    h = at/k 
    h = math.ceil(h)

4 - Generate frequency table

frequencias = []

# Menor valor da série
menor = round(df.min(),1)

# Menor valor somado a amplitude
menor_amp = round(menor+h,1)

valor = menor
while valor < df.max():
    frequencias.append('{} - {}'.format(round(valor,1),round(valor+h,1)))
    valor += h

5 - Frequency distribution:

freq_abs = pd.qcut(df,len(frequencias),labels=frequencias) # Discretização dos valores em k faixas, rotuladas pela lista criada anteriormente
print(pd.value_counts(freq_abs))

Reference of the calculations and some examples used: link

    
04.01.2019 / 03:44