How to protect my Scrapyd server from unauthenticated calls?

2

Let's say I have the following configuration in scrapy.cfg in Scrapyd.

[deploy]
url = http://example.com/api/scrapyd/
username = user
password = secret
project = projectX

In the Scrapyd documentation it cites the username and password options, but apparently I still have the spiders run even without authentication.

The question remains, how do I protect my Scrapyd server from unwanted / unauthenticated calls?

    
asked by anonymous 09.01.2015 / 21:15

2 answers

2

So this username / password setting is a client configuration for Basic HTTP authentication , which scrapyd does not currently implement.

To configure this on your server, the path is to leave scrapyd listening only for local connections (127.0.0.1) and configure an Nginx (or another HTTP proxy) with HTTP authentication on the front, passing the requests to scrapyd.

Here's a ready-to-use configuration in a Docker container:

If using Docker is not an option, you can take inspiration from Dockerfile and not is provided in the repository to do the process manually.

    
10.01.2015 / 23:37
0

You can edit the scrapyd settings and put the following configuration in the ~ / .scrapyd.conf file:

bind_address = 127.0.0.1

This will make the server only be able to be used by processes running on it.

If you wish to password protect, you can also use the apache server as a proxy and add basic authentication. Abaxio follows an example of a virtual host:

<VirtualHost *:80>
    ServerName yourserver
    DocumentRoot /var/www/service-status
    <Directory /var/www/service-status/>
        Require valid-user
        Order allow,deny
        Allow from all
        AuthType Basic
        AuthName "Protected"
        AuthUserFile /var/www/service-status/.htpasswd
    </Directory>
    <Location /api/>
        ProxyPass  http://127.0.0.1:40500/
        ProxyPassReverse  http://127.0.0.1:40500/        
    </Location>
    <Proxy *>
        Require valid-user
        AuthType Basic
        AuthName "Protected"
        AuthUserFile /var/www/service-status/.htpasswd
    </Proxy>    
    RewriteEngine  on
    RewriteRule ^/?api$         /api/ [QSA,L,R]
    RewriteRule ^/?jobs(.*)     /api/jobs$1 [QSA,L,R]
    RewriteRule ^/?logs(.*)     /api/logs$1 [QSA,L,R]
    RewriteRule ^/?items(.*)    /api/items$1 [QSA,L,R]
</VirtualHost>

In the example above, the API commands would be available at link

The .htpasswd file needs to be created for the virtualserver to work.

    
02.04.2016 / 10:13