I'm starting to learn R and I came across a situation I do not understand. I downloaded ENEM 2014 data (CSV file) and read it using:
dados_enem <- read.csv(file="MICRODADOS_ENEM_2014.csv", header = TRUE, sep = ",")
When I ask you to calculate the maximum, minimum or average of a given numeric field, it returns perfectly. For example, the field NU_NOTA_REDACAO:
max(dados_enem$NU_NOTA_REDACAO)
min(dados_enem$NU_NOTA_REDACAO)
mean(dados_enem$NU_NOTA_REDACAO)
> max(dados_enem$NU_NOTA_REDACAO)
[1] 1000
> min(dados_enem$NU_NOTA_REDACAO)
[1] 0
> mean(dados_enem$NU_NOTA_REDACAO)
[1] 323.4219
However, when doing the same for the fields NOTA_CN or NOTA_CH, both of the same format as NU_NOTA_REDACAO, me appears NA:
max(dados_enem$NOTA_CN)
min(dados_enem$NOTA_CN)
mean(dados_enem$NOTA_CN)
max (data_enem $ NOTE_CN)
[1] NA
min (data_enem $ NOTE_CN)
[1] NA
mean (data_enem $ NOTA_CN)
[1] NA
I tried to force the conversion to numeric, but the result was the same:
data_enem $ NOTA_CN = as.numeric (as.character (data_enem $ NOTA_CN))
max (data_enem $ NOTE_CN)
[1] NA
The file is quite large (almost 9 million records and 166 columns, but a sample of the data in this column follows:
[4513] NA NA 462.1 483.1 541.7 NA 527.8 NA NA 456.9 639.5 527.9 535.1 NA NA NA
[4529] 505.7 389.3 391.7 764.9 527.5 459.3 481.1 NA 438.7 609.3 591.8 438.3 538.2 NA 493.5 NA
[4545] NA 396.8 NA 486.3 566.1 NA NA NA 529.8 620.5 477.0 404.4 446.2 547.4 NA 460.5
[4561] NA NA 541.8 NA NA 544.2 605.2 584.5 NA NA 523.2 541.7 NA 523.1 528.7 NA
What am I doing wrong?
Thank you all!