Inconsistent numeric format

3

I am an experienced programmer in SAS, but a beginner in R. I am working with RStudio Version 0.99.903 - © 2009-2016 RStudio, Inc. and Windows 8. I have the following question:

  • The file "a_us" has 4 numeric and 2 alphanumeric fields as follows:

      

    str (a_us) // command to show file structure

  • 'data.frame': 1039992 obs. of  7 variables:  
    $ 'dsSisOriginario'             : chr  "Construcard" "Construcard" "Construcard" "Construcard" ...  
    $ 'nrContrato'                  : chr "000002160000023630," "000002160000116565," "000002160000225267," ...  
    $ 'vlCredInadimplenciaLancadoCa': num  9570 4455 6791 2678 4483 ...  
    $ 'dtCredInadimplenciaEntradaCa': chr "03/11/2002" "17/10/2004" "25/03/2007" "15/12/2006" ...  
    $ 'vlCredFcvsCessao'            : num  271 216 329 130 217 ...  
    $ PercentPagoCarteira           : num  0.0283 0.0484 0.0484 0.0484 ...  
    $ QtdCredDiasAtraso             : int  5110 4396 3507 3607 2768 2407 2640 ...
    
  • Using summary (a_us), the result comes out as expected, that is, the statistics for the numeric variables are perfect.

  • However, when I try to take, for example, the mean (mean) or any other quantitative procedure, such as hist (), of these same numeric variables ('vlCredInputInputCa', 'vlCredFcvsCessage', PercentageDelay Port, , it only works for the variables (PercentPagoCarteira, QtDCredDiasAtraso), for the others ('vlCredPayIndicationCan', 'vlCredFcvsCessao'), I get the message:

  • > mean(a_us$'vlCredFcvsCessao')
    >     [1] NA
    >     Warning message:
    >     In mean.default(a_us$vlCredFcvsCessao) :
    >       argumento não é numérico nem lógico: retornando NA
    

    Although the variable is numeric, I get this error message!

    Can anyone give me a hint of what's going on and how to solve it?

        
    asked by anonymous 11.10.2016 / 16:56

    2 answers

    1

    As your data was imported, some columns were left with quotation marks in their name. This prevents the $ operator from working the way you expect it to. The best way to fix it is to re-import the database. But it's also possible to refer to the column this way:

    mean(a_us$''vlCredFcvsCessao'')
    

    Note the accent that surrounds the column name.

    See this simple example:

    > df <- dplyr::data_frame("'colunacomaspas'" = 1, colunasemaspas = 1)
    > str(df)
    Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   1 obs. of  2 variables:
     $ 'colunacomaspas': num 1
     $ colunasemaspas  : num 1
    > mean(df$''colunacomaspas'')
    [1] 1
    > mean(df$'colunacomaspas')
    [1] NA
    Warning messages:
    1: Unknown column 'colunacomaspas' 
    2: In mean.default(df$colunacomaspas) :
      argument is not numeric or logical: returning NA
    

    Note that str shows the name of the columns with quotation marks and the non-quotation marks in their example as well.

    Another way to fix it would be to rename the columns by removing those quotes. Example:

    > names(df) <- gsub("'", "", names(df))
    > mean(df$colunacomaspas)
    [1] 1
    
        
    11.10.2016 / 19:41
    0

    I will follow the complaint pattern of a good number of employees of the English version :). It is crucial for us to try to reproduce the problem that you put an excerpt from the R routine that can be easily copied and pasted into other environments and then compared by the people trying to help you. Also important to cite the version of R, whether you are using RStudio or not and the version of the operating system

    As there is no example of the data frame here is a small example of reference possibilities the columns of the data frame and how you can send more details about your problem, is not a response, yet. "Apparently" everything is perfect.

    df <- data.frame(a= seq(1:10),b=seq(11:20))
    summary(df)
    
    # testar a classe de uma coluna
    class(df$a)
    
    mean(df$a)
    mean(df[,'a'])
    mean(df$'a')
    
        
    11.10.2016 / 18:41