Handle values from a data.frame

Question

Handle values from a data.frame

Navigation

#1 by (1 votes)
#2 by (1 votes)

5

In the link you can find a dataset in csv about solar irradiation in Brazil. The column the_geom_4326 represents the longitudes and latitudes of a polygon, which represents a grid of solar irradiation. That is, within this grid, the values of solar irradiation arranged in the other columns occur. I would like to extract these values and put them in separate columns, forming the total of 8 new columns, with the names

Grid1_Long
Grid1_Lat
Grid2_Long
Grid2_Lat
Grid3_Long
Grid3_Lat
Grid4_Long
Grid4_Lat

Any tips on how to handle these values? The problem is that these values do not always have a pattern of character numbers.

r

asked by anonymous 29.09.2016 / 22:08

2 answers

Put emoticons in the chat of my site [closed] color gradient R

score 1 · Answer 1

I could not solve this problem in R. However, it can be solved easily on the Linux or Mac terminal, using tools that are already installed by default:

cat swera_ghi_inpe_hi_res.csv | awk 'BEGIN {FS=","}; {print $3" "$4" "$5" "$6" "$7}' | 
sed s/"\"POLYGON (("//g | sed s/"))\""//g | sed s/"  "/" "/g | tail -n +2 > 
LongLat.dat

The commands I used were:

cat to read the contents of the csv file on the computer screen
awk to print only the contents of the polygon columns
multiple sed to remove unwanted content, such as strings "POLYGON (( , ))\" , and two spaces in sequence
tail to not consider the original file header
> to save the result to the LongLat.dat file

Now just read the LongLat.dat file inside the R and combine it with the previous data:

dados <- read.csv(file="swera_ghi_inpe_hi_res.csv")

LongLat <- read.table(file="LongLat.dat", sep=" ", header=FALSE)
names(LongLat) <- c("Grid1_Long", "Grid1_Lat", "Grid2_Long", "Grid2_Lat", 
"Grid3_Long", "Grid3_Lat", "Grid4_Long", "Grid4_Lat", "Grid5_Long", "Grid5_Lat")

dados <- cbind(dados, LongLat)

Notice that I ended up finding polygons on 5 sides, not 4 as in the original question. You have to see why.

score 1 · Answer 2

A solution is only available in R.

It's not as short as the one that uses the terminal tools, but it works. I did the separate steps too to get clearer.

# Lendo os dados
dat <- read.csv("swera_ghi_inpe_hi_res.csv", stringsAsFactors=FALSE)

# Removendo da coluna tudo que não é informação útil: texto e parêntesis
dat$the_geom_4326 <- gsub("[A-Z]|(\s\(\()|(\)\))", "", dat$the_geom_4326)

#Separando os valores pela ocorrência de vírgula e espaço
poly <- strsplit(x = dat$the_geom_4326, ",?\s")

#Transformando texto em números
poly <- lapply(poly, as.numeric)

#Transformando a lista em matriz depois em data.frame
mat <- as.data.frame(do.call(rbind, poly))

#Colocando os nomes das colunas
colnames(mat) <- paste0(paste0("Grid", rep(1:5, each = 2)), c("_Long", "_Lat"))

#Juntando com os dados iniciais.
dat2 <- cbind(dat, mat)

Just for the sake of curiosity, a way in pipe to do the same thing would be as follows:

library(magrittr)
dat <- read.csv("swera_ghi_inpe_hi_res.csv", stringsAsFactors=FALSE) 

dat2 <- dat %>%
  .$the_geom_4326 %>%
  gsub("[A-Z]|(\s\(\()|(\)\))", "", .) %>%
  strsplit(",?\s") %>% 
  lapply(as.numeric) %>%
  do.call(rbind, .) %>% 
  as.data.frame %>%
  set_colnames(paste0(paste0("Grid", rep(1:5, each = 2)), c("_Long", "_Lat"))) %>%
  cbind(dat, .)