Discover link in python

1

I am doing a web scrapping in python and sometimes I come across some links and / or buttons that are not with the actual URL of the url so that you will be redirecting if you click.

In this case, if I click, it downloads a PDF file, but I just want to get the URL of the file.

In the case of the link: sometimes a javascript appears

In my current issue : this is a button with no form ..

  

*** When I go to download and see what the url is, I can not access it directly (by copying and pasting it into the address bar of the   browser)

     

I'm using selenium and requests!

Does anyone have any idea what this is and how to solve it?

    
asked by anonymous 27.09.2016 / 21:44

1 answer

1

I recommend using Beautiful Soup to manipulate HTML
You can try using JSON tbm

So I understand you have to have an HTTP base
can try to install a firebug addon for firefox and analyze the behavior
when you click on the pdf button to download can analyze in firebug what was done. for example I click on a button to download and with firebug open I can see that a POST has been done and I get this POST to understand what I need to manipulate when making a GET. www.meusite.com/lista-pdf
clicked on download button
no firebug:
POST download.do?id25/pdf1/archive.pdf
Just use it then:
www.meusite.com/download.do?id25/pdf1/arquivo.pdf

    
29.09.2016 / 21:34