Scrape of the mediating system of the MTE

4

I'm trying to do the scrape of the Ministry of Labor mediator system. Basically, I want the relation of collective agreements and conventions:

url1<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo"

Once I access this page, I arrive at the search form. I have chosen only to select the validity: "All" and the registration UF: "IF"

Clicking, I have access to XHR:

url2<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo/getConsultaAvancada"

And to the body:

str(body)
List of 27
 $ nrCnpj                             : chr ""
 $ nrCei                              : chr ""
 $ noRazaoSocial                      : chr ""
 $ dsCategoria                        : chr ""
 $ tpRequerimento                     : chr "acordo"
 $ tpRequerimento                     : chr "acordoColetivoEspecificoPPE"
 $ tpRequerimento                     : chr "acordoColetivoEspecificoDomingosFeriados"
 $ tpRequerimento                     : chr "convencao"
 $ tpRequerimento                     : chr "termoAditivoAcordo"
 $ tpRequerimento                     : chr "termoAditivoConvecao"
 $ tpRequerimento                     : chr "termoAditivoAcordoEspecificoPPE"
 $ tpRequerimento                     : chr "termoAditivoAcordoEspecificoDomingoFeriado"
 $ tpVigencia                         : chr "2"
 $ sgUfDeRegistro                     : chr "SE"
 $ dtInicioRegistro                   : chr ""
 $ dtFimRegistro                      : chr ""
 $ dtInicioVigenciaInstrumentoColetivo: chr ""
 $ dtFimVigenciaInstrumentoColetivo   : chr ""
 $ tpAbrangencia                      : chr "Todos os tipos"
 $ ufsAbrangidasTotalmente            : chr "SE"
 $ cdMunicipiosAbrangidos             : chr ""
 $ cdGrupo                            : chr ""
 $ cdSubGrupo                         : chr ""
 $ noTituloClausula                   : chr ""
 $ utilizarSiracc                     : chr ""
 $ pagina                             : chr "2"
 $ qtdTotalRegistro                   : chr "1740"

Then I did the following to access the results:

library(httr)
a<-GET(url1)
b<-POST(url2,body=body,set_cookies(unlist(a$cookies)))

But unfortunately the answer does not return the expected results.

    
asked by anonymous 26.04.2017 / 14:35

1 answer

3

The question is how to perform this specific scraping in R. Notice that the form for TpQ requires a list, which we can implement as a vector.

In R, it would look like this:

body <- list(
  nrCnpj="",
  nrCei="",
  noRazaoSocial="",
  dsCategoria="",
  tpRequerimento=c("acordo",
               "acordoColetivoEspecificoPPE",
               "acordoColetivoEspecificoDomingosFeriados",
               "convencao",
               "termoAditivoAcordo",
               "termoAditivoConvecao",
               "termoAditivoAcordoEspecificoPPE",
               "termoAditivoAcordoEspecificoDomingoFeriado"),
  tpVigencia="2",
  sgUfDeRegistro="SE",
  dtInicioRegistro="",
  dtFimRegistro="",
  dtInicioVigenciaInstrumentoColetivo="",
  dtFimVigenciaInstrumentoColetivo="",
  tpAbrangencia="Todos os tipos",
  ufsAbrangidasTotalmente="SE",
  cdMunicipiosAbrangidos="",
  cdGrupo="",
  cdSubGrupo="",
  noTituloClausula="",
  utilizarSiracc="",
  pagina="2",
  qtdTotalRegistro="1740")


library(httr)
  url1<-"http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo"

  a <- GET(url1)
url2 <- "http://www3.mte.gov.br/sistemas/mediador/ConsultarInstColetivo/getConsultaAvancada"

b <- POST(url2,body=body,set_cookies(unlist(a$cookies)))
    
26.04.2017 / 18:24