I use this code to capture links from a particular page:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$resultado = curl_exec($ch);
preg_match_all('/<a href="/(.*)"/i', $resultado, $outros);
However, this regular expression leaves out links such as:
<a name="exemplo" href="link.php"></a>
And if I take out <a
and leave only href
for example:
preg_match_all('/href="/(.*)"/i', $resultado, $outros);
There you will get undue stuff like css links for example:
<link href="link.css">
What is the ideal regular expression for capturing all href
of elements a
without the risk of capturing href
of elements that are not a
, such as css
for example?