Detect if link is current or external domain with PHP

2

I'm currently using this expression with preg_replace to detect links in content sent via POST :

$conteudo = $_POST["conteudo"];

$conteudo = preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' <a class="link_externo" href="$2" target="_blank">$2</a>', $conteudo);

echo $conteudo;

I would like to know if the link that is sent via POST is from my site, or is an external link. If it is from my site, it should contain the class link_interno , and if it is external, then use a class link_externo and add target="_blank" .

I would also like the expression to accept some symbols, because if I send a link for example: www.site.com.br/teste!apenasumteste or www.site.com.br/teste#apenasumteste it will only detect the link until before ! or # . >

How can I do this?

    
asked by anonymous 10.05.2015 / 05:03

1 answer

4

One way to get the current domain is to get the key value SERVER_NAME from the array $_SERVER .

  

SERVER_NAME : The host name of the server where the current script is run. If the script   is running on a virtual host , this will be the value set for   that virtual host.

     

Note : $_SERVER is an array that contains information such as headers, paths, and script ... There is no guarantee that each web server will provide any of these; servers may omit some, or provide others [..].

To extract information from a link , for example, host , use function parse_url , and in a function you check if the extracted host is equivalent or not to your site:

function verificarLink($link, $dominio) {
  $info = parse_url($link);
  $host = isset($info['host']) ? $info['host'] : "";
  return ((!empty($host) && strcasecmp($host, $dominio) == 0) ? true : false);
}

To do this, do the following:

$link = "http://www.site.com.br/teste!apenasumteste";
$dominio = $_SERVER['SERVER_NAME'];

if (verificarLink($link, $dominio)) {
    echo "Domínio interno!";
} else {
    echo "Domínio externo!";
}

Update

As this SOEn response, answer, use the regular expression below to extract links link :

(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s'!()\[\]{};:'\".,<>?«»“”‘’]))

This regular expression can be used in the preg_match_all function to extract all links of a string :

function extrairLinks($conteudo){
    $expressao = "%(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s'!()\[\]{};:'\".,<>?«»“”‘’]))%";
    preg_match_all($expressao, $conteudo, $resultados);
    $links = array_filter(array_map('array_filter', $resultados))[0]; // Remover capturas vazias
    return $links;
}

And to use it do:

$dominio = $_SERVER['SERVER_NAME'];
$links = extrairLinks($conteudo);

foreach($links as $link){   
    if (verificarLink($link, $dominio)) {
        echo '<a class="link_interno" href="'. $link .'" target="_blank">'. $link .'</a>' . "<br>";
    } else {
        echo '<a class="link_externo" href="'. $link .'" target="_blank">'. $link .'</a>' . "<br>";
    } 
}
    
10.05.2015 / 06:12