.XLSX file reading in R

5

What is the problem and / or difference between reading a file in .txt and .xlsx in R ?

Reading in .xlsx Can I have more problems than in .txt during an analysis?

A friend asked me to do everything in .txt because it's better but I did not understand why.

Another detail is that I have a file in .xlsx with 4 tabs and when I rename the tab in the script it continues reading the previous one. Is this due to be Excel?

    
asked by anonymous 20.03.2017 / 14:54

2 answers

5
  

What is the problem and / or difference between reading a file in .txt and .xlsx in R?

Strictly speaking, none. Both are valid ways of storing data for analysis, as well as .csv , .sav and .dat are also. The only way to use .xlsx is almost necessarily to require a spreadsheet editor to view the files, whereas the .txt format can be read for almost any program installed on the computer.

  

Reading in .xlsx Can I have more problems than in .txt during an analysis?

If you read both files correctly, you should not experience problems during data analysis.

  

A friend asked me to do everything in .txt because it's better but I did not understand why.

See the first answer I gave. Also, it may just be his personal preference. In particular, I prefer .txt and .csv files because I can read them directly on the terminal, without needing additional programs. Besides, of course, the disk space occupied by files .txt is smaller than the space occupied by files .xlsx (although in the present day this is not so relevant).

  

Another detail is that I have a .xlsx file with 4 tabs and when I rename the tab in the script it continues reading the previous one. Is this due to being Excel?

I can not answer this question because I do not have your code available. So, I can not figure out what could be wrong with it or even the .xlsx file to read. What I can say is that I use something similar to the code below when working with people who use Excel and this code, when adapted to the needs of each analysis, works very well, even in .xlsx files with more than one sheet. I just change the parameter sheet=1 to sheet=2 in order to read a different sheet. I do not call them by name, but by the position inside the file .xlsx .

library(readxl)
read_excel("arquivo.xlsx", sheet=1, col_names=TRUE)
read_excel("arquivo.xlsx", sheet=2, col_names=TRUE)

Note that it is necessary to install the readxl package before running the above commands.

    
20.03.2017 / 15:35
4

Well there are still some differences that are important in relation to both formats and in general I would say that your friend is right.

1) txt reading is faster and you do not depend heavily on external packages or other languages.

Until recently we did not have the readxl package, which is a non-working hand for reading from excel. You had to opt for packages like xlsx or XLconnect or openxlsx or (several other packages) ... and each one has a different external dependency (Java, C ++, etc.). That way, it was very common to have compatibility issues for reading excel files.

In addition, it is still much faster to read txt files.

2) To save excel files you probably can experience compatibility issues

We do not yet have a good, reliable package like readxl to save excel files. So you're going to have to use one of these that I mentioned and even if everything works correctly on your computer, your friend's computer or other people can be a problem.

3) excel has line limiting

In excel you have a limitation on the number of rows. Also the files become unnecessarily heavy. If you're going to work with large databases, forget it.

That way, you have to think about what you want to save in excel, otherwise work with txt.

There are situations where saving in excel is useful, usually when you want to present the final result in an excel spreadsheet. But notice that this usually occurs in the final stage of analysis. While you are manipulating and exchanging databases, it is interesting to avoid this format.

    
22.03.2017 / 04:18