How to download automatic in site that calls JavaScript?

8

I need to make a program that downloads PDFs from several sites daily and automatically. It is very easy to perform this operation using the C # WebClient command, however, on certain sites it is not possible to find the download URL in any way. In the event of click of the download button, the site code calls a JavaScript and in no time is generated a link, I already tried to make a webrequest containing session cookies in an attempt to download the PDF by the server response (I used the fiddler to identify), but I did not succeed.

Click "search journals" in the left-hand corner .

Using the DLL Watin, which is a web browser simulator, I can simulate the click of the button in the browser, but it is not possible to handle the " or open the file "of Internet Explorer.

Is there any method of downloading from sites like this?

    
asked by anonymous 30.05.2014 / 16:33

1 answer

3

Although it is giving the form submit by javascript at some point an Http request is sent with the request data for the server to return the right file, I suggest you analyze the head of the http request and see what parameters it sends in the form , I did a test here and comes the Form Data with some parameters with random numbers, I think you can based on that see what is the main parameter for the pdf request

For example in this case at line 5287 of the file Common1_2_13.js, the submit of a POST of form occurs.

This case is very complicated to do a crawler, for who has a lot of code generated automatically and the site requires a session that expires in 30 minutes.

As for your alternative to using a web browser simulation I've already used Selenium, if I'm not mistaken, it has driver for several browsers ( link )

    
31.05.2014 / 03:26