How does the "positive lookahead" (? = X) combined with the "positive lookbehind" (? = X) work?

17

After answering this question , and though manage to realize what is occurring in the regular expression of my answer, I was curious to know how the ((?<=;)|(?=;)) excerpt works.

I read in this answer what each one does, and also in several other sources, but I confess I did not understand as well as the expression quoted works.

If possible, I would like an explanation as I am trying to learn regular expression but the explanations there are complicated and with terms that I am often not familiar with.

    
asked by anonymous 16.10.2017 / 12:21

1 answer

17

You must have heard that phrase that says:

  

Just because you arrived at your goal does not mean it's right

Well, what happened with your REGEX is this, I'll explain it with another to make it clearer.

/((?<=t)|(?=a)).+/

Explanation

  • ((?<=t)|(?=a)) - Group in which one of the occurrences must occur, giving preference to the first (?<=t)
  • .+ - Anything as long as possible and having at least 1.

So we could break up into two REGEX:

  • /(?<=t).+/ - Anything that comes after t
  • /(?=a).+/ - Anything that has a

Tests

$regex = '~((?<=t)|(?=a)).+~';

$testes = array(
    'ana',
    'tania',
    'anastasia',
    'etilico',
    'aguilherme'
    'guilherme'
);

foreach ($testes as $k => $value){
    preg_match($regex, $value, $matches);
    print_r($matches);
}

Output

[0] => ana
[0] => ania
[0] => anastasia
[0] => ilico
[0] => aguilherme
[0] => 

Resuming your REGEX ((?<=;)|(?=;))

It is redundant because both are checking ; so if there is a (?=;) sentence then (?<=;) will also occur. However,% w_% depends on what follows, so that if it is% w_%, but the comma comes at the end of the% w_th sentence, the second part wont be completed (?<=...) , so it falls in the second .+ .

Questions

This last explanation can be a bit confusing, any questions ask.

Adding - as to your doubt

  

The problem I had was that when I used lookahead, the comma was not isolated and broken in an index by the split of the java when it was preceded by another string, and the inverse occurred when it was lookbehind see so the confusion to understand each one.

What happens is as follows:

Both REGEX are "exploding" by joining the characters ( this example group 2 ), but that ;$ is more specific than .+ , remembering above:

  • (?=;) - Whatever happens after (?<=;)
  • (?=;) - sentence containing (?<=;)

So ; will explode by the join that comes after (?=;) forming the words you saw ; , (?<=;) .

But ; could explode with both what comes before and what would come later, however the split consumes the character after using it, in this way only the explosion occurs for the first (coming before) generating the other words you saw: pontoevirgula; , delinha;

    
16.10.2017 / 13:57