Check if there is a tag within the site [closed]

2

I generate a specific tag from my system, my clients need to add this tag inside the body of their index, my problem is: How to identify if this tag is inside the site, only by your url, without informing the name of the file, for example:

The client informs you of your url: www.teste.com.br , and my system will access and verify that the tag is there.

Currently I generate a .html, the client uploads to your hosting and my system checks if a certain content exists in that file sent:

    $file_download = "1__".md5('1').".html";
    $url = "http://www.site.com.br/".$file_download.";

    $arquivo = $url ;
    $handle = @fopen($arquivo, "rb");
    $cont = @fread($handle, 100);

    if($cont == "verifica-licenca") {
        echo 1;
    } else {
        echo 0;
    } 

Vlw

    
asked by anonymous 21.01.2017 / 14:15

3 answers

4

Well the code I think is quite simple. First you need the source code of the site. which you can get as follows:

$codigofonte=file_get_contents("http://www.google.com");

With the source code in hand, it's simple to get your tag, I used a "regular expression" in that case substituir your tag, but I actually only want to know if it replaced, because then I know your tag exists.

$result=preg_match_all($tag,$codigofonte,$valorsubstituido);

In this line, the variable $result will have as value, true or false , that is, it will be found and replaced or not.

Follow the code:

<?php

// pega o codigo fonte do site
$codigofonte=file_get_contents("http://www.google.com");

//sua tag
$tag='/<minhatag>/';

// 'busca' sua tag no coigo
$result=preg_match_all($tag,$codigofonte,$valorsubstituido);

if(!$result){
    echo "nao encontrado";
} else {
    echo "encontrado";
}


?>
    
21.01.2017 / 15:19
5

Another way would be to use PHP DOMDocument , especially for those who does not like REGEX . : P

Get the source code:

$ch = curl_init('http://seu_site_alvo.com');
curl_setopt_array($ch, [

    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_IPRESOLVE => CURL_IPRESOLVE_V4,
    CURLOPT_DNS_USE_GLOBAL_CACHE => true,
    CURLOPT_DNS_CACHE_TIMEOUT => 120,
    CURLOPT_SSL_VERIFYHOST => false,
    CURLOPT_SSL_VERIFYPEER => false

]);
$html = curl_exec($ch);
curl_close($ch);
  

You can use file_get_contents or fopen as well.

Then use DOM :

// Inicia o DOM e XPath:
$DOM = new DOMDocument;
$DOM->loadHTML($html);
$XPath = new DomXPath($DOM);

// Obtem a contagem de todos os '<a>' que possuem o 'HREF' de 'http://www.site.com.br' e também que contenham o 'CLASS' de 'authority'.
if($XPath->evaluate("//a[contains(@class, 'authority') and @href='http://www.site.com.br']")->length >= 1){

        echo 'Encontrado';

}

In this way you will be able to know, without using REGEX, if they contain HTML. The above code also causes this:

<a href="http://www.site.com.br" class="authority outra_coisa"></a>

Also be valid. If you do not want this, use @class='authority instead of contains(@class, 'authority') , for example.

    
21.01.2017 / 23:17
1

Fixed as follows

                    // pega o codigo fonte do site
                $codigofonte=file_get_contents('http://www.google.com.br');


                //sua tag
                $tag='<a href="http://www.site.com.br" id="authority"></a>';
                    // 'busca' sua tag no codigo
                if (stripos(strtolower($codigofonte), $tag) !== false){
                        //Encontrado
                        echo 1;
                    }
                    else{
                        //Não Encontrado
                        echo 0;
                    }
    
21.01.2017 / 18:29