How to create a robot with PHP? [closed]

2

What is the best way to create a Robot in PHP?

The purpose of the robot, is to access a URL, with login and password, enter data in certain fields, submit this data, and interpret the result.

This so that you can update an internal database, according to the result returned from the query.

The data to be submitted will come from another database.

Any suggestions?

    
asked by anonymous 01.04.2014 / 18:21

1 answer

4

Robots to search for and interpret information on other pages are also called web crawlers or spiders.

These are scripts that perform the following process:

  • Request for a URL.
  • Store the returned return in a variable.
  • Interpret the return, that is, perform the HTML parser.
  • Search for relevant information.
  • Perform the processes with the information obtained.
  • The process in steps 1 through 3 is easily solved as follows:

    $url = 'www.exemplo.com';
    $dom = new DOMDocument('1.0');
    $dom->loadHTMLFile($url);
    

    In this way you will get an object that will allow you to navigate through HTML as needed.

    For example, to get all the links on a page and display the addresses would look like this:

    $anchors = $dom->getElementsByTagName('a');
    foreach ($anchors as $element) {
        $href = $element->getAttribute('href');
        echo $href . '<br>';
    }
    

    An interesting class that can aid in handling HTML and avoiding thousands of lines of code is the Simple HTML DOM , and a tutorial teaching how to use it can be found on Make Use Of .

    In order to fill a form, it is enough to make a request for the URL that the form points to using the expected request method, that is, to request the URL present in the action attribute using the request method present in the method .

    To simulate the situation we will change the previous requisition code to:

    $curl = curl_init();
    // Set some options - we are passing in a useragent too here
    curl_setopt_array($curl, array(
        // Retorna o conteúdo como string
        CURLOPT_RETURNTRANSFER => 1,
        CURLOPT_URL => 'http://www.exemplo.com',
        // Nome de identificação do seu robô
        CURLOPT_USERAGENT => 'Nome do seu crawler',
        // Indica que a requisição utiliza o método POST
        CURLOPT_POST => 1,
        // Parâmetros que serão passados via POST
        CURLOPT_POSTFIELDS => array(
            item1 => 'value',
            item2 => 'value2'
        )
    ));
    
    // Fazendo a requisiçnao e salvando na variavel $response
    $response = curl_exec($curl);
    
    // Finalizando o objeto de requisição
    curl_close($curl);
    
    $dom = new DOMDocument('1.0');
    
    // Realiza o parser da String de retorno da requisição
    // Observe que o método mudou de loadHTMLFile para loadHTML
    $dom->loadHTML($response);
    

    Learn more about CURL

        
    01.04.2014 / 20:56