script about program R

1

I'm using R to extract data in .h5 format. I am getting it. However, they are 200 years, and monthly data.

For the moment, I can only call a file every month at a time, but I need to be able to do a script where I can extract the data from each file for every year, without having to do it one by one. My script looks like this:

library(hdf5)
mydata = hdf5load ("teste_200-Q-2000-01-00-000000-g01.h5",load=FALSE)
mydata$AGB_PY
names(mydata)

I tried to concatenate and it looked like this:

library(hdf5)
ED<-c ( "01.h5","02.h5","03.h5","04.h5","05.h5","06.h5", "07.h5", "08.h5", "09.h5","10.h5","11.h5","12.h5")
ED
names(mydata)
for(i in 01:12)sum(mydata$AGB_PY)
resu<-sum(mydata$AGB_PY)
resu
agbyear = rep(NA,times=12)
for (i in 1:12){
  mydata = hdf5load(ED[i],load=FALSE)
  agbyear[i] = sum(mydata$AGB_PY)}
agbyear
mydata$AGB_PY<- edit(data.frame(agbyear))
write.table(agbyear,"agbyear.csv", row.names=FALSE  , sep  = ",")

But I wanted to know how do I make it know that you have to call all .h5 files and distinguish months and years.

Editing:

Answering the questions, the format of each file is this:

teste_200-Q-2001-01-00-000000-g01.h5
teste_200-Q-2001-02-00-000000-g01.h5
teste_200-Q-2001-03-00-000000-g01.h5

... and so on, for 200 years from 2000 to 2200.

So how does the month stay in the "middle" of the file name, how could I call the files of the month?

I tried like this: "_.h5", but it did not work. I also tried "* .h5" also did not work.

    
asked by anonymous 15.07.2015 / 22:21

2 answers

2

Your code seems to be almost 100%, you just needed to mount the name of each file completely. You can do this by using the paste , or paste0 function so you do not have to set the sep = "" argument.

Solving only for the months, which you had already started, and removing the lines you put just to observe the data, would look like this:

library(hdf5)

ED <- c("01.h5","02.h5","03.h5","04.h5","05.h5","06.h5", "07.h5", "08.h5", "09.h5","10.h5","11.h5","12.h5")

resu <- sum(mydata$AGB_PY)
agbyear <- rep(NA, times=12)
basename <- "teste_200-Q-2000-01-00-000000-g"

for (i in 1:12) {
  mydata <- hdf5load(paste0(basename, ED[i]), load=FALSE)
  agbyear[i] <- sum(mydata$AGB_PY)
}

write.table(agbyear, "agbyear.csv", row.names=FALSE, sep = ",")

Some points that can be highlighted:

  • Perhaps it is best not to manually type the end of the files into a vector (or create all the names). One output would be to use something like list.files(pattern="\.h5") , which returns a vector with the names of the .h5 files in the working directory.

  • Avoid using edit() . If you change the data manually, the code becomes irreproducible. Look for ways to make the changes you want with code.

  • Instead of for , we could use a sapply , which is a more idiomatic way of doing the same thing, without having to create the agbyear vector previously.

  • >

    For example:

    allfiles <- paste0(basename, ED)
    agbyear <- sapply(allfiles, function(i) {
      mydata <- hdf5load(i, load=FALSE)
      sum(mydata$AGB_PY)
    }
    

    Despite the 1 point, we can grab the sapply and allfiles hook to create all files from several years to several months. It was not clear in the question whether the months really are the part in ED (imagine that yes), and what is the interval of years. But for the 2000 - 2015 range, we could do this:

    allfiles <- as.vector(sapply(anos, function(a) {
      paste0("teste_200-Q-", a, "-01-00-000000-g", ED)
    }))
    

    So, we would have 16 * 12 = 192 filenames. But if the files are organized, perhaps the best option is still to use list.files() .

        
    16.07.2015 / 02:34
    1

    I'll try to help you and I need some information. How do you define the month and year? What is the default file name? What data is contained in this AGB_PY component?

    You can do something like this:

        # Função para ler os arquivos h5, recebendo um diretório como argumento.
    
        lerH5 <- function( diretorio ){
        # sempre caregar a biblioteca hdf5load quando chamar a função
        require(hdf5)
    
        listaArquivos <- list.files( diretorio )
    
          for( i in 1:length(listaArquivos)){
             #realizar a extração dos dados que deseja aqui, para cada arquivo
          }
        }
    

    I hope I have contributed in some way.

        
    16.07.2015 / 02:36