Simple HTML html get link "text / javascript"?

-1

As the url inside:

<script type="text/javascript">
    var src = "https:www.site.com";
</script>

I have tried to search but the examples that I can not change to what I need.

The code looks like this:

include('simple_html_dom.php');
$page = 'www.site.com';
$html = new simple_html_dom();
$html->load_file($page);

$links = array(); 
foreach($html->find(script) as $element) {
   $links[] = $element;
echo $element;
}

reset($links);

What I want is to get the link inside the

<script type="text/javascript">
  var src = "https:www.site.com";
</script>

Returning only this: https:www.site.com

    
asked by anonymous 14.05.2017 / 03:43

2 answers

1

You can use the native PHP API called DOMDocument combined with curl or file_get_contents and then use preg_match , a simple example to understand:

<?php
$meuhtml = '
<script type="text/javascript">
    var src = "https:www.site.com";
</script>
<script type="text/javascript">
    var    src    = \'https:www.site2.com\';
</script>
';

$doc = new DOMDocument;
$doc->loadHTML($meuhtml);

$tags = $doc->getElementsByTagName('script');

$urls = array();

foreach ($tags as $tag) {
    if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
        $result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
        $urls[] = $result; //Adiciona ao array
    }
}

//Mostra todas urls
print_r($urls);

The used #var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)# is the one who extracts the data returned by $tag->nodeValue . See working at link (click the Run button when the page loads).

Of course this was an example to understand the code, to download the data from another site you can use curl or if in your php.ini allow_url_fopen is on , example with curl: / p>

<?php
$url = 'http://site.com';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$data = curl_exec($ch);

if (!$data) {
     die('Erro');
}


$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);

if ($httpcode !== 200) {
    die('Erro na requisição');
}

curl_close($ch);

$doc = new DOMDocument;
$doc->loadHTML($data);

$tags = $doc->getElementsByTagName('script');

$urls = array();

foreach ($tags as $tag) {
    if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
        $result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
        $urls[] = $result; //Adiciona ao array
    }
}

//Mostra todas urls
print_r($urls);

Or if you just want to get the first URL change to:

$url = '';

foreach ($tags as $tag) {
    if (preg_match('#var\s+src(\s+|)=(\s+|)(".*";|\'.*\';)#', $tag->nodeValue, $match)) {
        $result = preg_replace('#^["\']|["\'];$#', '', $match[3]);
        $url = $result;

        break;// Finaliza o foreach assim que encontrar a url
    }
}

echo $url;
    
14.05.2017 / 05:04
0

Just use PHP's XPath , basically the following:

$html = "seu HTML obtido por file_get_content ou por cURL...";

$DOM = new DOMDocument;
$DOM->loadHTML($html);

$XPath = new DomXPath($DOM);

$TagScriptJavascript = $XPath->query('//script[@type="text/javascript"]');

foreach($TagScriptJavascript as $item){

    if(preg_match('/var src = "(.*)";/', $item->nodeValue, $url)){

        echo $url[1];

    }

}

Explanations:

  • First start the DOM with your HTML, anyway.

  • / li>
  • The $TagScriptJavascript will make the option script for every type obtained.

  • The text/javascript will search for //script[@type="text/javascript"] , if it finds it will show, due to foreach .

  • Try this here.

        
    14.05.2017 / 05:10