I have some crawlers in PHP that access pages that require credentials. It depends on each case, since each has a form of authentication. In my case, I know the forms required. For example, access a site where their login page contains the following form:
<form class="onclick-submit card grid-3" accept-charset="utf-8" method="post" action="https://painel2.oculto.net/conectorPainel.php" id="frmLogin" >
<input class="hidden" type="text" name="email" id="txtUserName" value="[email protected]" />
<input class="hidden" type="password" name="senha" id="txtPassword" value="senha" />
<input class="hidden" type="checkbox" name="permanecerlogado" tabindex="6" id="chkRemember" checked="checked" />
<input class="hidden" type="hidden" value="login" name="acao" />
...
</form>
In this case, my PHP crawler authenticates the site before processing the content:
$curl = new cURL();
$curl->post('https://painel2.oculto.net/conectorPainel.php', '[email protected]&senha=senha&permanecerlogado=1&acao=login');
The site will create a session for my subsequent accesses and the program will have privileged access. I do not even check the site's response, since the chances of login failure are minimal and if it is denied by some other failure (such as connection failure, server down, etc.) the program will stop execution and try again later. p>
Most sites therefore require only 3 basic information:
But this certainly does not work for everyone, since some sites create tokens for each session (eg icloud.com) or some algorithm that makes automation difficult. In these cases, it requires manual programming.