You can use the native PHP API called DOMDocument
combined with curl
or file_get_contents
and then use preg_match
, a simple example to understand:
<?php
$meuhtml = '
<script type="text/javascript">
var src = "https:www.site.com";
</script>
<script type="text/javascript">
var src = \'https:www.site2.com\';
</script>
';
$doc = new DOMDocument;
$doc->loadHTML($meuhtml);
$tags = $doc->getElementsByTagName('script');
$urls = array();
foreach ($tags as $tag) {
if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
$result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
$urls[] = $result; //Adiciona ao array
}
}
//Mostra todas urls
print_r($urls);
The regex used #var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#
is the one who extracts the data returned by $tag->nodeValue
. See working at link (click the Run button when the page loads).
Of course this was an example to understand the code, to download the data from another site you can use curl
or if in your php.ini
allow_url_fopen
is on
, example with curl: / p>
<?php
$url = 'http://site.com';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);
if (!$data) {
die('Erro');
}
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($httpcode !== 200) {
die('Erro na requisição');
}
curl_close($ch);
$doc = new DOMDocument;
$doc->loadHTML($data);
$tags = $doc->getElementsByTagName('script');
$urls = array();
foreach ($tags as $tag) {
if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
$result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
$urls[] = $result; //Adiciona ao array
}
}
//Mostra todas urls
print_r($urls);
Or if you just want to get the first URL change to:
$url = '';
foreach ($tags as $tag) {
if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
$result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
$url = $result;
break;// Finaliza o foreach assim que encontrar a url
}
}
echo $url;