Differences between RCurl, httr (R) and requests (python) when doing a POST

3

I was looking to access a page that you get by clicking "View all documents above" in that link . The company I got is just an example, I have no interest in it.

I tried to resolve this through a POST request, and I got the result I wanted using the requests python library. Python code below:

import requests
link = "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
r = requests.get(link)
dados={'hdnCategoria':'0', 'hdnPagina':'', 'FechaI':'', 'FechaV':''}
r1 = requests.post(link, data=dados, cookies=r.cookies)
print r1.text

I tried to run the following codes in R, one using RCurl :

library(RCurl)
link <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
curl <- getCurlHandle()
r <- getURL(link, curl=curl)
r1 <- postForm(link, hdnCategoria='0', hdnPagina='', FechaI='', FechaV='', .encoding='UTF-8', curl=curl)
  cat(r1)

and another using httr (which I know is only a wrapper of RCurl ):

library(httr)
link <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
h <- handle(link)
dados=list(hdnCategoria='0', hdnPagina='', FechaI='', FechaV='')
r1 <- POST(handle=h, body=dados, encoding='UTF-8')
cat(content(r1, 'text'))

a) Why do the two alternatives in R return the original page and not the result of clicking "Show all documents above"?

b) What does the python library have "over", which makes it work so simply?

PS: For this question, I would not like to use mechanize , selenium , other python libraries, etc. I would like to solve in R, preferably with httr and, if not, with RCurl . There is also a new alternative, rvest , but I do not know very well and I do not know if it makes sense to use this particular case.

    
asked by anonymous 10.10.2014 / 21:13

1 answer

1

Strangely enough, I was able to solve the problem by simplifying the httr code. It looks like the package has received an update and now it receives a encode parameter, which can receive multipart (default), form (what I want to do) and json .

In addition, httr already saves cookies by default between sections. The code below worked

library(httr)
link <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
aux <- GET(link)
dados=list(hdnCategoria='0', hdnPagina='', FechaI='', FechaV='')
r1 <- POST(link, body=dados, encode='form')
cat(content(r1, 'text'))
    
13.10.2014 / 12:48