Scrapy for login

Question

Scrapy for login

Navigation

#1 by (1 votes)

0

I got this code from the internet and I changed a little, to log in to the cpfl website, but when I use the command scrapt crawl myproject nothing happens and the command scrapy runspider items.py gives the error error:

No element find in

python web-crawler login scrapy

asked by anonymous 08.08.2018 / 15:44

1 answer

Laravel activitylog, implementation error Put the result of the function

score 1 · Answer 1

The problem is that the user input form and password is not on the page you are loading - the page you are loading only has javascript code, and the form is dynamically mounted by that code.

Since scrapy does not execute javascript, it is not possible to use it this way in this site - this leaves you with two alternatives:

Analyze the page's javascript code, find out what it does, and "simulate" this with manually written python code. This solution is usually more efficient but much more complex to deploy.

In the specific case of the CPFL website, it seems that when sending the login, it does via javascript AJAX a HTTP POST in https://servicosonline.cpfl.com.br/agencia-webapi/api/token with the following parameters:
```
{
    'client_id': 'agencia-virtual-cpfl-web',
    'grant_type': 'password',
    'username', USER_NAME,
    'password': PASSWORD,
}
```
To find out I used Firefox's inspector mode (press F12 ) and tried to login, then on the network tab you can see everything the page is doing on the network.
```
yield scrapy.FormRequest(
    url='https://servicosonline.cpfl.com.br/agencia-webapi/api/token',
    formdata={
        'client_id': 'agencia-virtual-cpfl-web',
        'grant_type': 'password',
        'username', USER_NAME,
        'password': PASSWORD,
    },
    callback=self.after_login,
)
```
This code above should probably log you in, but the return will not be a page but something like 'OK' - you'll have to continue inspecting the page with the browser, to figure out what to do with it to get what you wanting to start is just the beginning of the problem.
The simplest alternative to implement is to use selenium - it's a lib that lets you control a browser through python, such as chrome or firefox - Using javascript is possible. But it's a lot less efficient because you're running an entire browser ...

I hope I have put you in the right direction.