I was looking to access a page that you get by clicking "View all documents above" in that link . The company I got is just an example, I have no interest in it.
I tried to resolve this through a POST request, and I got the result I wanted using the requests
python library. Python code below:
import requests
link = "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
r = requests.get(link)
dados={'hdnCategoria':'0', 'hdnPagina':'', 'FechaI':'', 'FechaV':''}
r1 = requests.post(link, data=dados, cookies=r.cookies)
print r1.text
I tried to run the following codes in R, one using RCurl
:
library(RCurl)
link <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
curl <- getCurlHandle()
r <- getURL(link, curl=curl)
r1 <- postForm(link, hdnCategoria='0', hdnPagina='', FechaI='', FechaV='', .encoding='UTF-8', curl=curl)
cat(r1)
and another using httr
(which I know is only a wrapper of RCurl
):
library(httr)
link <- "http://siteempresas.bovespa.com.br/consbov/ExibeTodosDocumentosCVM.asp?CNPJ=02.541.982/0001-54&CCVM=22551&TipoDoc=C&QtLinks=10"
h <- handle(link)
dados=list(hdnCategoria='0', hdnPagina='', FechaI='', FechaV='')
r1 <- POST(handle=h, body=dados, encoding='UTF-8')
cat(content(r1, 'text'))
a) Why do the two alternatives in R return the original page and not the result of clicking "Show all documents above"?
b) What does the python library have "over", which makes it work so simply?
PS: For this question, I would not like to use mechanize
, selenium
, other python libraries, etc. I would like to solve in R, preferably with httr
and, if not, with RCurl
. There is also a new alternative, rvest
, but I do not know very well and I do not know if it makes sense to use this particular case.