Capturing e-mail from pages

0

I have an email system with the address:

 http://www.site.com.br/123/123/123.php?p=1&codelist=1

that goes up to:

 http://www.site.com.br/123/123/123.php?p=460&codelist=1

(p = 1 is the page)

That is, 460 pages. In this system I have all the data of my clients.

I need a code that accesses these 460 urls and captures only the email listed on the site and ignores other data. Anyway, just email and save in a TXT.

Can anyone help me? I have no idea how.

    
asked by anonymous 24.07.2017 / 18:51

1 answer

0

example - ideone - capture email

$url="http://www.site.com.br/123/123/123.php?p=1&codelist=1";
$text=file_get_contents($url);

$res = preg_match_all(
"/[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/i",
$text,
$matches
);

//caso haja somente um email
$email = reset($matches[0])."\n";
file_put_contents('emails.txt', $email, FILE_APPEND);
// fim caso haja um email

/* caso haja mais de um email
foreach($matches[0] as $email){
    file_put_contents('emails.txt', $email."\n", FILE_APPEND);
}
*/

file_get_contents - Reads the entire contents of a file to a string

The function preg_match_all() will return an integer with the number of occurrences found by the regular expression.

reset() rewinds the internal array pointer to the first element and returns the value of the first array element.

file_put_contents writes a string to a file, if this file does not already exist it creates the file.

FILE_APPEND adds a value to a file already created.

  

The above script is for a page, but nothing prevents you from looping by changing the value of p to $url

for ($x = 1; $x <= 460; $x++) {
   $url="http://www.site.com.br/123/123/123.php?p=".$x."&codelist=1";
   $text=file_get_contents($url);

   $res = preg_match_all(
"/[a-z0-9]+[_a-z0-9\.-]*[a-z0-9]+@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})/i",
  $text,
  $matches
  );

  //caso haja somente um email
  $email = reset($matches[0])."\n";
  file_put_contents('emails.txt', $email, FILE_APPEND);
  // fim caso haja um email
}
    
24.07.2017 / 21:45