WebScrape placar previdencia

2

I needed to extract the information from this site to an excel file, which MPs vote for, against, abstention, anyway. It's a webspace exc, but as I understand html I'm having trouble understanding the nodes. I've tried read_html , readHTMLTable , readLines , but none of these worked as intended.

Any of you have any suggestions?

link

    
asked by anonymous 07.04.2017 / 22:24

2 answers

3

Using the stringr and rvest packages, the question can be solved like this:

library(rvest)
library(stringr)
url <- 'http://infograficos.estadao.com.br/especiais/placar/votacao/economia/?id=GLwN7vXR3W'

resp <- read_html(url)

Since we are going to get texts several times, it is convenient to write a function:

pega_texto <- function (css) {
  resp %>% html_nodes(css) %>% html_text()
}

posicoes <- pega_texto('h3') %>% str_extract('[A-Z].+')

quantidades <- pega_texto('h3') %>% str_extract('[0-9]+') %>% as.numeric()

posicao <- mapply(rep, x =  posicoes, each = quantidades) %>% 
  unlist()

partido <- pega_texto('.p-org')
nome <- pega_texto('.p-name') %>% 
  .[. != "Placar da Previdência (intenção do voto)"]
regiao <- pega_texto('.p-region')

dados <- data.frame(partido, nome, regiao, posicao)

head(dados)

  partido           nome regiao posicao
1      PP Adail Carneiro     CE A favor
2    PMDB  Alberto Filho     MA A favor
3     PPS   Alex Manente     SP A favor
4    PMDB Altineu Côrtes     RJ A favor
5      PP    André Abdon     AP A favor
6     PSD André de Paula     PE A favor

openxlsx::write.xlsx(dados, "arquivo.xlsx)

EDITED

I had forgotten to comment on exporting to Excel. I recommend using the openxlsx package because it uses C ++ to access Excel. The xlsx package uses Java and it is common to encounter incompatibility issues with Java (32-bit X 64-bit).

    
08.04.2017 / 18:12
3

To import data on the Social Security Scoreboard, infographic from the Estadão website and export to Excel, use the code below.

If you have not installed the packages 'XML', 'xlsx' and 'stringr', execute the first line.

install.packages(c('XML', 'xlsx', 'stringr'))


library(XML)
library(stringr)
library(xlsx)

url <- 'http://infograficos.estadao.com.br/especiais/placar/votacao/economia/?id=GLwN7vXR3W'
paginavoto <- htmlParse(url)

tipo <- xpathSApply(paginavoto, "//section//h3", fun = xmlValue)
deputados <- data.frame(nome = character(), 
                    partido = character(), 
                    voto = character())

for(i in 1:length(tipo)){
  if(as.numeric(str_extract(tipo[i], '\d+')) != 0){

    pDep <- paste0("//section[",i ,"]//span[@class='p-name']")
    pPart <- paste0("//section[",i ,"]//span[@class='p-org']")
    deputado <- data.frame(nome = xpathSApply(paginavoto, pDep, fun = xmlValue),
                   partido = xpathSApply(paginavoto, pPart, fun = xmlValue),
                   voto = trimws(str_extract(tipo[i], '\D+')))
    deputados <- rbind(deputados, deputado)
  }
}

write.xlsx(deputados, "deputados.xlsx")
    
08.04.2017 / 10:52