Handle values from a data.frame

5

In the link you can find a dataset in csv about solar irradiation in Brazil. The column the_geom_4326 represents the longitudes and latitudes of a polygon, which represents a grid of solar irradiation. That is, within this grid, the values of solar irradiation arranged in the other columns occur. I would like to extract these values and put them in separate columns, forming the total of 8 new columns, with the names

  • Grid1_Long
  • Grid1_Lat
  • Grid2_Long
  • Grid2_Lat
  • Grid3_Long
  • Grid3_Lat
  • Grid4_Long
  • Grid4_Lat

Any tips on how to handle these values? The problem is that these values do not always have a pattern of character numbers.

    
asked by anonymous 29.09.2016 / 22:08

2 answers

1

I could not solve this problem in R. However, it can be solved easily on the Linux or Mac terminal, using tools that are already installed by default:

cat swera_ghi_inpe_hi_res.csv | awk 'BEGIN {FS=","}; {print $3" "$4" "$5" "$6" "$7}' | 
sed s/"\"POLYGON (("//g | sed s/"))\""//g | sed s/"  "/" "/g | tail -n +2 > 
LongLat.dat

The commands I used were:

  • cat to read the contents of the csv file on the computer screen

  • awk to print only the contents of the polygon columns

  • multiple sed to remove unwanted content, such as strings "POLYGON (( , ))\" , and two spaces in sequence

  • tail to not consider the original file header

  • > to save the result to the LongLat.dat file

Now just read the LongLat.dat file inside the R and combine it with the previous data:

dados <- read.csv(file="swera_ghi_inpe_hi_res.csv")

LongLat <- read.table(file="LongLat.dat", sep=" ", header=FALSE)
names(LongLat) <- c("Grid1_Long", "Grid1_Lat", "Grid2_Long", "Grid2_Lat", 
"Grid3_Long", "Grid3_Lat", "Grid4_Long", "Grid4_Lat", "Grid5_Long", "Grid5_Lat")

dados <- cbind(dados, LongLat)

Notice that I ended up finding polygons on 5 sides, not 4 as in the original question. You have to see why.

    
29.09.2016 / 23:20
1

A solution is only available in R.

It's not as short as the one that uses the terminal tools, but it works. I did the separate steps too to get clearer.

# Lendo os dados
dat <- read.csv("swera_ghi_inpe_hi_res.csv", stringsAsFactors=FALSE)

# Removendo da coluna tudo que não é informação útil: texto e parêntesis
dat$the_geom_4326 <- gsub("[A-Z]|(\s\(\()|(\)\))", "", dat$the_geom_4326)

#Separando os valores pela ocorrência de vírgula e espaço
poly <- strsplit(x = dat$the_geom_4326, ",?\s")

#Transformando texto em números
poly <- lapply(poly, as.numeric)

#Transformando a lista em matriz depois em data.frame
mat <- as.data.frame(do.call(rbind, poly))

#Colocando os nomes das colunas
colnames(mat) <- paste0(paste0("Grid", rep(1:5, each = 2)), c("_Long", "_Lat"))

#Juntando com os dados iniciais.
dat2 <- cbind(dat, mat)

Just for the sake of curiosity, a way in pipe to do the same thing would be as follows:

library(magrittr)
dat <- read.csv("swera_ghi_inpe_hi_res.csv", stringsAsFactors=FALSE) 

dat2 <- dat %>%
  .$the_geom_4326 %>%
  gsub("[A-Z]|(\s\(\()|(\)\))", "", .) %>%
  strsplit(",?\s") %>% 
  lapply(as.numeric) %>%
  do.call(rbind, .) %>% 
  as.data.frame %>%
  set_colnames(paste0(paste0("Grid", rep(1:5, each = 2)), c("_Long", "_Lat"))) %>%
  cbind(dat, .)
    
05.10.2016 / 05:44