Max of a numeric field returning NA

5

I'm starting to learn R and I came across a situation I do not understand. I downloaded ENEM 2014 data (CSV file) and read it using:

dados_enem <- read.csv(file="MICRODADOS_ENEM_2014.csv", header = TRUE, sep = ",")

When I ask you to calculate the maximum, minimum or average of a given numeric field, it returns perfectly. For example, the field NU_NOTA_REDACAO:

max(dados_enem$NU_NOTA_REDACAO)  
min(dados_enem$NU_NOTA_REDACAO)  
mean(dados_enem$NU_NOTA_REDACAO)

    > max(dados_enem$NU_NOTA_REDACAO)  
    [1] 1000  
    > min(dados_enem$NU_NOTA_REDACAO)  
    [1] 0  
    > mean(dados_enem$NU_NOTA_REDACAO)  
    [1] 323.4219 

However, when doing the same for the fields NOTA_CN or NOTA_CH, both of the same format as NU_NOTA_REDACAO, me appears NA:

max(dados_enem$NOTA_CN)  
min(dados_enem$NOTA_CN)  
mean(dados_enem$NOTA_CN) 
  

max (data_enem $ NOTE_CN)
  [1] NA
  min (data_enem $ NOTE_CN)
  [1] NA
  mean (data_enem $ NOTA_CN)
  [1] NA

I tried to force the conversion to numeric, but the result was the same:

  

data_enem $ NOTA_CN = as.numeric (as.character (data_enem $ NOTA_CN))
  max (data_enem $ NOTE_CN)
  [1] NA

The file is quite large (almost 9 million records and 166 columns, but a sample of the data in this column follows:

[4513]    NA    NA 462.1 483.1 541.7    NA 527.8    NA    NA 456.9 639.5 527.9 535.1    NA    NA    NA  
 [4529] 505.7 389.3 391.7 764.9 527.5 459.3 481.1    NA 438.7 609.3 591.8 438.3 538.2    NA 493.5    NA  
 [4545]    NA 396.8    NA 486.3 566.1    NA    NA    NA 529.8 620.5 477.0 404.4 446.2 547.4    NA 460.5  
 [4561]    NA    NA 541.8    NA    NA 544.2 605.2 584.5    NA    NA 523.2 541.7    NA 523.1 528.7    NA  

What am I doing wrong?

Thank you all!

    
asked by anonymous 25.02.2016 / 14:56

1 answer

3

Try the following:

max(dados_enem$NOTA_CN, na.rm = TRUE)  
min(dados_enem$NOTA_CN, na.rm = TRUE)  
mean(dados_enem$NOTA_CN, na.rm = TRUE)

By default, these functions return NA result when there is NA data in the vector. You need to explicitly warn that you want to delete them from the result.

This confuses a lot of people who are starting in R since there is no standard among their functions. For example, the summary function and the table function by default ignore the presence of NA 's

    
25.02.2016 / 15:56