How to extract a word from a URL in PHP

4

In these examples below:

  

+ bbbbbbb2.virtua.com.br - take the virtua
  + 000-74-4-000.paemt702.dsl.brasiltelecom.net.br to take the brasiltelecom
  + 111.222.22.222.dynamic. adsl.gvt. net.br - take gvt

I've tried:

$texto = "189-72-5-240.paemt702.dsl.brasiltelecom.net.br";  
echo substr($texto,-10);   

But then he counts the word and is missing depending on the size of the host that writes to the DB.

    
asked by anonymous 29.10.2014 / 20:44

3 answers

4

If you want the antepenultimate piece of the URL:

This solution works with all the examples given in the question:

$pedacos = explode('.',$texto);
echo $pedacos[count($pedacos)-2];

Entries:

$texto = "189-72-5-240.paemt702.dsl.brasiltelecom.net.br";  
$texto = "+bbbbbbb2.virtua.com.br";
$texto = "+111.222.22.222.dynamic.adsl.gvt.net.br";

Outputs:

brasiltelecom
virtua
gvt


The problem of using fixed position:

If you have addresses with different suffixes in the list, pre-determined positions can be problematic, as in the following examples:

$texto = "bbbbbbb2.virtua.com.br";
$texto = "www.usp.br";  
$texto = "66-97-12-89.datalink.net";  

Outputs:

virtua       Até aqui tudo bem...
www          ... mas neste caso teria que ser "usp"...
66-97-12-89  ... e neste teria que ser datalink !

To solve the problem, follow the ...:


Solution for addresses with multiple suffixes:

To resolve what suffix is and what is the domain name itself, you will need a system with a list of "official" suffixes to see what can and can not be removed from the end of the URL.

Mozilla provides a list of suffixes in link .

This function solves the problem well if you apply the suffixes of interest:

function NomeDoDominio( $dominio ) {
    // o array precisa estar ordenado dos maiores para os menores
    $sufixos = array( '.com.br', '.net.br', '.org.br', '.com', '.br' );
    foreach( $sufixos as $sufixo ) {
       if( $sufixo == substr( $dominio , -strlen( $sufixo ) ) ) {
          $dominio = substr( $dominio , 0, -strlen( $sufixo ) );
          break;
       }
    }
    return substr( strrchr( '.'.$dominio, '.'), 1);
}

See working at IDEONE .

Note: In the case of Brazil, for example, an address can be www.jose.silva.nom.br, to further complicate the situation.

    
29.10.2014 / 20:49
4

According to your question, whatever the URL, there is a consistency that is to get the third word to cut from the end:

┌─────────────────────────────────────────────────┬───────────────┐
│ Endereço URL                                    │ Valor a obter │
├─────────────────────────────────────────────────┼───────────────┤
│ +bbbbbbb2.virtua.com.br                         │ virtua        │
├─────────────────────────────────────────────────┼───────────────┤
│ +000-74-4-000.paemt702.dsl.brasiltelecom.net.br │ brasiltelecom │
├─────────────────────────────────────────────────┼───────────────┤
│ +111.222.22.222.dynamic. adsl.gvt. net.br       │ gvt           │
└─────────────────────────────────────────────────┴───────────────┘ 

Solution

For this specific effect you can:

$valor = array_reverse(explode(".", $url))[2];
  • We are converting the string $url to an array by starting it with the . character using the explode() .
  • The result is sent to the array_reverse() function that will invert the array.
  • Finally we limit the result to index 2 which corresponds to the third position.
  • Example

    In this example for the three URLs that are also in the Ideone , we have made a function with the code above where it receives the string and the position to return:

    <?php
    function recolher($url="", $pos=2) {
        return array_reverse(explode(".", $url))[$pos];
    }
    
    echo recolher("+bbbbbbb2.virtua.com.br").PHP_EOL;                         // virtua
    
    echo recolher("+000-74-4-000.paemt702.dsl.brasiltelecom.net.br").PHP_EOL; // brasiltelecom
    
    echo recolher("+111.222.22.222.dynamic. adsl.gvt. net.br").PHP_EOL;       // gvt
    ?>
    

    To get even more flexible, we can pass the separation character as a function parameter.

        
    29.10.2014 / 22:01
    1

    Well, come on, this is a generic function to solve the problem.

    $ text is the text you want to remove the word from;

    $ word is the word that will be removed;

    $ pattern is the separator that will be used (In the case of the past examples, it is '.')

    function remove($texto, $palavra, $pattern){
        $txt = explode($pattern, $texto); //Transformamos em array
        $id = array_search($palavra, $txt); //Buscamos o índice do array que contém aquela palavra 
        unset($txt[$id]); //Removemos o índice
        $texto = implode($pattern, $txt); //Transformamos o array em uma string novamente.
        return $texto;
    }
    

    Detail that this function will only remove the FIRST occurrence of the word.

    If you have something like + 111.222.22.222.dynamic. gvt .adsl. gvt . net.br. It only removes the first gvt

    Link to documentation:

    link link link

        
    29.10.2014 / 20:59