Generate links and download content programmatically

5

I would like to know how I would collect data from a website.

The site is link . There I have to download all the data from operation history from power generation to Natural Energy Influence. The problem is that within each data series, you are directed to a page to make the subsystem selection (SE / CO, S, NE and N), unit, year, and so on. And when the options are selected, the page link does not change so you can not discriminate to scan automatically.

I want to make a database with all this information. Since I use R pretty a lot, I'd like to know an R code for it.

    
asked by anonymous 14.12.2015 / 14:56

1 answer

7

You can do this using the rvest package. The following code will help you:

library(rvest)
# criando a sessão de navegação
sessao <- html_session("http://www.ons.org.br/historico/energia_natural_afluente.aspx")
# identificando o formulário que deseja "POSTAR"
form <- sessao %>% html_form()
form <- form[[4]]
# atribuindo os valores aos parâmetros do formulário
values <- set_values(form = form,
                     passo1="SE",
                     passo2a="-1",
                     passo2b="MWmed",
                     passo3a="-1",
                     passo3b="2015",
                     tipo="regiao",
                     passo2="MWmed",
                     passo3="2015",
                     passo4="-1",
                     passo1text="SE",
                     passo2text="MWmed",
                     passo3text="2015",
                     passo4text="-1"
                     )
# submetendo o formulário
resposta <- submit_form(sessao, values)
# obtendo as tabelas da resposta do formulário
tabelas <- resposta %>% html_table(fill = T, header = T)
# identificando a tabela desejada
tabela <- tabelas[[2]]

In the object tabela you will probably find the values you are looking for:

> tabela
        2015
1  Jan 21466
2  Fev 34907
3  Mar 43126
4  Abr 37029
5  Mai 30293
6  Jun 23248
7  Jul 28362
8  Ago 16195
9  Set 21010
10 Out 19459
11 Nov 32269

Now you only have to map the options you want to get the data and pass them through the set_values function.

    
14.12.2015 / 19:57