I recommend using the PHP Simple HTML DOM Parser , it is great and very easy to use, I use in several scripts to analyze HTML from other sites.
Very good Bruno Augusto's answer, I just want to complement his response and give some more details that I think are important to be observed and taken into account. When I need to parse HTML content and use regular expression for this, I try to make a more complete code since HTML is very irregular, attributes have no order defined, and may have codes with line breaks, I suggest using a regular expression plus " complete ", in your case I would use this regular expression:
/<link.*?href=\"([^\"]*?)\".*?\/?>/si
Basically the improvements are 2 substitutions:
1 - from (.*?)
to ([^\"]*?)
because it is the right thing to do, since there are no "
characters if the attribute delimiter is also "
, same is the '
character.
2 - from >
to \/?>
because there may or may not be the character /
before the character <
.
3 - from /i
to /si
since there may be line breaks between attributes, values, etc ... not always the HTML tags in the sites are totally inline, there may be one piece in a row and another piece in the another line.
If you use the original regular expression suggested by Bruno Augusto , it may not find certain LINK tag codes if they are broken by lines or if they have the /
character (slash, which represents the closing tag), example:
$string = <<<EOF
<link
rel="shortcut icon"
href="http://localhost/teste/icon.png"
>
EOF;
if ( preg_match_all( '/<link.*?href="(.*?)".*?>/i', $string, $matches, PREG_SET_ORDER ) ) {
var_dump( $matches );
die();
} else {
echo 'Nenhuma tag encontrada.';
/* Esta parte será executada pois não serão encontrados tags, devido as quebras de linhas e adicionalmente também há a presença do caractere "/" (barra) do fechamento da tag LINK */
}
Now using the same sample code with the most complete regular expression suggested by me, the results will be successfully obtained:
$string = <<<EOF
<link
rel="shortcut icon"
href="http://localhost/teste/icon.png"
>
EOF;
if ( preg_match_all( '/<link.*?href=\"([^\"]*?)\".*?\/?>/si', $string, $matches, PREG_SET_ORDER ) ) {
/* Tags encontradas com sucesso */
var_dump( $matches );
die();
}