I have the task of creating an interface optimized for touch monitor, taking data from a website ( link ).
This site gives a listing of bus lines and consults their schedules, using an auto-complete ajax.
Because it is a government agency, the possibility of obtaining the data in another way is almost nil.
I thought of doing a crawler in java or node.js to go to the request url, to pass the parameters of the site (inputs) and to filter in the response what I need. Easy! Only in theory: (
I made a request in this url:
http://www.consultas.der.mg.gov.br/grgx/sgtm/consulta_linha.xhtml;jsessionid=1820D695BDE4B916EC808F84BD1B335D
Using this http header with the webcrawler module of node.js:
Accept:application/xml, text/xml, */*; q=0.01 Accept-Encoding:gzip, deflate Accept-Language:pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4 Connection:keep-alive Content-Length:457 Content-Type:application/x-www-form-urlencoded; charset=UTF-8 Cookie:JSESSIONID=1820D695BDE4B916EC808F84BD1B335D Faces-Request:partial/ajax Host:www.consultas.der.mg.gov.br Origin:http://www.consultas.der.mg.gov.br Referer:http://www.consultas.der.mg.gov.br/grgx/sgtm/consulta_linha.xhtml User-Agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 X-Requested-With:XMLHttpRequest
And the form date below, where I used the number 6 as a query for autocomplete, which in the site comes up with a listing:
javax.faces.partial.ajax:true javax.faces.source:form:tabview:campoBusca javax.faces.partial.execute:form:tabview:campoBusca javax.faces.partial.render:form:tabview:campoBusca form:tabview:campoBusca:form:tabview:campoBusca form:tabview:campoBusca_query:6 form:form form:tabview:campoBusca_input:6 form:tabview:campoBusca_hinput:6 form:tabview_activeIndex:0 javax.faces.ViewState:-6275073363975845032:-2043218073946595619
But that was the answer:
I also tried in java, using JSoup, but it was worse, returned a lifecicle exception.
I got caught in the curve. How to make a functional webcrawler in this scenario?