Get the last "word" of a PATH with different URL formats

1

I'm creating a function that gets the last "word" of a url requested in php, without considering parameters and considering the root as index.

Examples:

  

URL link EXPECTATION index

     

URL www.teste.com.br/ EXPECTATION index

     

URL test.com EXPECTATION index

     

URL teste.com/ EXPECTATION index

     

URL www.teste.com.br/test EXPECTATION test

     

URL link EXPECTATION test

     

URL link EXPECTATION test

     

URL link EXPECTATION test

     

URL link EXPECTATION test

     

URL test.com/test/two EXPECTATION two

     

URL teste.com/teste/dois/ EXPECTATION two

     

URL teste.com/teste/dois/?variavel=teste EXPECTATION two

     

URL teste.com/teste/dois?variavel=teste EXPECTATION two

     

URL teste.com/teste/dois/?variavel=teste EXPECTATION two

     

URL test.com/teste?var1=t&var2=t EXPECTATION test

     

URL teste.com/teste/tres#ola EXPECTATIVE three

     

URL test.com/teste?var1=t&var2=t#ola EXPECTATION test

Using the basename function and working with substr and preg_match I get a certain success rate:

$arr = array(
  array("name"=>"http://www.teste.com.br/","possibleValues"=>array("index")),
  array("name"=>"www.teste.com.br/","possibleValues"=>array("index")),
  array("name"=>"teste.com","possibleValues"=>array("index")),
  array("name"=>"teste.com/","possibleValues"=>array("index")),
  array("name"=>"www.teste.com.br/teste","possibleValues"=>array("teste")),
  array("name"=>"http://www.teste.com.br/teste","possibleValues"=>array("teste")),
  array("name"=>"http://teste.com/teste","possibleValues"=>array("teste")),
  array("name"=>"https://www.teste.com/teste","possibleValues"=>array("teste")),
  array("name"=>"https://teste.com/teste","possibleValues"=>array("teste")),
  array("name"=>"teste.com/teste/dois","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois/","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois/?variavel=teste","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois?variavel=teste","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste/dois/?variavel=teste","possibleValues"=>array("dois")),
  array("name"=>"teste.com/teste?var1=t&var2=t","possibleValues"=>array("teste")),
  array("name"=>"teste.com/teste/tres#ola","possibleValues"=>array("tres")),
  array("name"=>"teste.com/teste?var1=t&var2=t#ola","possibleValues"=>array("teste"))
);

foreach($arr as $value){
  echo "URL ".$value["name"]."\n";
  echo ( array_search( basename( returnLastWord( $value["name"] ) ), $value["possibleValues"] ) === false ? "FALHOU" : "PASSOU" )." -> expected: ".json_encode( $value["possibleValues"] )." get '".basename( returnLastWord( $value["name"] ) )."'\n\n";
}

function returnLastWord($var){
  preg_match('/[?#]/', $var, $matches, PREG_OFFSET_CAPTURE);
  $after = ( empty( $matches[0][1] ) ? NULL : $matches[0][1] );
  if($after){
    return substr($var, 0, $after);
  }else{
    // echo "aqui\n";
    return $var;
  }
}

link

URL http://www.teste.com.br/
FALHOU -> expected: ["index"] get 'www.teste.com.br'

URL www.teste.com.br/
FALHOU -> expected: ["index"] get 'www.teste.com.br'

URL teste.com
FALHOU -> expected: ["index"] get 'teste.com'

URL teste.com/
FALHOU -> expected: ["index"] get 'teste.com'

URL www.teste.com.br/teste
PASSOU -> expected: ["teste"] get 'teste'

URL http://www.teste.com.br/teste
PASSOU -> expected: ["teste"] get 'teste'

URL http://teste.com/teste
PASSOU -> expected: ["teste"] get 'teste'

URL https://www.teste.com/teste
PASSOU -> expected: ["teste"] get 'teste'

URL https://teste.com/teste
PASSOU -> expected: ["teste"] get 'teste'

URL teste.com/teste/dois
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois/
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois/?variavel=teste
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois?variavel=teste
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste/dois/?variavel=teste
PASSOU -> expected: ["dois"] get 'dois'

URL teste.com/teste?var1=t&var2=t
PASSOU -> expected: ["teste"] get 'teste'

URL teste.com/teste/tres#ola
PASSOU -> expected: ["tres"] get 'tres'

URL teste.com/teste?var1=t&var2=t#ola
PASSOU -> expected: ["teste"] get 'teste'

I'm having problems especially in the first 4 examples, where theoretically it would be the root of the project, ie I should get the index

    
asked by anonymous 30.06.2017 / 23:23

3 answers

1

I did not find a simple way to do it, even more with the variations of the types of input urls, what I got was this:

function returnLastWord($var){

    //Remove o protocolo
    $var = preg_replace('~^[^:]+[:][/]{2,}~', '', $var);

    /*
    Pega qualquer coisa que seja um PATH em URLs
    pega o que esta entre o parenteses neste exemplo:
    'site.com/(foo/bar/baz)?querystring=ignorada#hashignorada'
    */
    if (preg_match('~/([^#?]{1,})~', $var, $matches)) {

        //Remove o / do final em urls como 'foo/bar/', para evitar pegar em branco
        $result = rtrim($matches[1], '/');

        //Pega qualquer coisa que estiver no final
        if (preg_match('~[^/]+$~', $result, $matches)) {
          return $matches[0];
        }
    }

    //Se qualquer coisa anterior falhou é porque provavelmente é "index"
    return 'index';
}

Example on Ideone

  

I'm likely to revise this to make it more performative or simple.

    
30.06.2017 / 23:31
1

PHP already has a native function for working with URLs, but not all of them are in the format defined in RFC-3986 , the function ends up incorrectly analyzing the ones that are not standardized. Nothing critical. What happens is that the function considers what should be host as part of path , then a check if there is the character . in path > is required, so if there is, the element in question is the host , not the path , thus returning index .

function returnLastWord ($url) {

    // Analisa a URL:
    $url = parse_url($url);

    // Divide o path nas ocorrências de /:
    $parts = explode('/', trim($url["path"], '/'));

    // Busca o último elemento:
    $last = end($parts);

    // Se não estiver vazio e não possuir o caractere ., retorna o valor, senão retorna index:
    return $last && false === strpos($last, '.') ? $last : "index";

}
  

See working at Ideone .

    
01.07.2017 / 00:08
1

Using the parse_url () function can make the job easier.

<?php

$arr = array(
    'http://localhost/',
    'https://localhost/',
    'http://localhost',
    'https://localhost',
    'http://sub.localhost/',
    'http://sub.localhost',
    'http://localhost/foo',
    'http://localhost/foo/bar',
    'http://localhost/?p=1&b=1',
    'localhost/foo',
    'localhost'
);

echo '<table border=1>
<tr><td>URL</td><td>Word</td></tr>';
foreach ($arr as $v) {
    $url_original = null;
    // Normalizing the given URL
    if (preg_match('#^https?://#i', $v) !== 1) {
        $url_original = $v;
        $v = 'http://'.$v;
    }

    $url = parse_url($v);
    echo PHP_EOL.'<tr><td>'.$v.(!empty($url_original)? '<br>('.$url_original.')': '').'</td><td>';
    if (
        isset($url['path'])
        && !empty($url['path'])
        && $url['path'] != '/'
    ) {
        // Path found
        $p = strrpos($url['path'], '/');
        if ($p !== false) {
            echo substr($url['path'], $p+1);
        }
    } else {
        // Empty, no path
        echo 'index';
    }
    echo '</td></tr>';
}
echo '</table>';

Normalization
To work even with URLs without the schema (http or https), a normalization is done where "http: //" is prefixed to the string before passing to the parse_url() function.

Multibyte Characters
The snippet that invokes the substr() function may fail when the URL has multibyte characters. If you want to support multibyte characters, see mbstring functions.

    
01.07.2017 / 00:32