Good afternoon,
One question: Could anyone clarify me how I can make a frequency distribution table: classes; absolute and relative frequency; cumulatively; average values of each class.
Good afternoon,
One question: Could anyone clarify me how I can make a frequency distribution table: classes; absolute and relative frequency; cumulatively; average values of each class.
Hi, I was able to create one ... I'll send you the example using Pandas and maybe even help you!
Calculations needed to generate the table: Class width (h) through the relation h = AT / k , where AT = max (x) - min (x) is the total data span and k = root (n) is an estimated number of class ranges for a data set with n observations (k can be computed by other definitions, such as the Sturges rule, for example).
Creating the table - Suppose you also go to use a DataFrame pandas
1 - Sorting the dataframe values
df = data['fixed acidity']
df.sort_values(ascending=True)
2 - Calculate the Total Range of Data
# Amplitude dos dados = Valor maior dos registros - menor valor
at = df.max() - df.min()
3 - Calculate the Amplitude of the Class
# Lembrando que k = raiz quadrada do total de registros/amostras
k = math.sqrt(len(df))
# O valor de amplitude de classe pode ser arredondado para um número inteiro, geralmente para facilitar a interpretação da tabela.
h = at/k
h = math.ceil(h)
4 - Generate frequency table
frequencias = []
# Menor valor da série
menor = round(df.min(),1)
# Menor valor somado a amplitude
menor_amp = round(menor+h,1)
valor = menor
while valor < df.max():
frequencias.append('{} - {}'.format(round(valor,1),round(valor+h,1)))
valor += h
5 - Frequency distribution:
freq_abs = pd.qcut(df,len(frequencias),labels=frequencias) # Discretização dos valores em k faixas, rotuladas pela lista criada anteriormente
print(pd.value_counts(freq_abs))
Reference of the calculations and some examples used: link