Negation operator returns value that should be discarded

1

When using the negation operator, I want to get only the part of the text that does not contain the previously denied group.

Using the expression ( ?<br\/?> ?)(Unit.) , I get the following result :

Whenyouaddthenegationoperator?!,itnegatesandignoresbringingall results :

The expected result for the expression is only the first tag, where you have nothing before the word Unit.

    
asked by anonymous 16.08.2016 / 20:06

1 answer

3

Your logic is almost right, I say almost, because it lacks a small interpretation.

In REGEX you should consider that it can start / end where you want, unless you explicitly define how it should behave.

Analyzing what happens

Legend

  • ^ Start of text to be interpreted
  • $ End of text to be interpreted

Analyze 1

<td>Preço<br/>Unit.</td>
^
$

Note that in this hunt the interpreted text only has < , so REGEX does not hit

Analyze 2

<td>Preço<br/>Unit.</td>
^      $

Note that in this hunt the interpreted text is <td>Preç , so REGEX does not hit

Analyze 3

<td>Preço<br/>Unit.</td>
         ^        $

Note that in this hunt the interpreted text is <br/>Unit. , if REGEX is the 1st% ( ?<br\/?> ?)(Unit.) , hits perfectly finding the result, but since it is the 2nd% (?! ?<br\/?> ?)(Unit.) lookback inhibits the result .

Analyze 4

<td>Preço<br/>Unit.</td>
              ^   $

Note that in this hunt the interpreted text is Unit. , if REGEX is the 1st% ( ?<br\/?> ?)(Unit.) , the result is not found, since ?<br\/?> ? is missing
in the beginning, but as the 2nd (?! ?<br\/?> ?)(Unit.) , hits perfectly,
because lookback says it should not contain ?<br\/?> ? before (Unit.) ,
and having nothing is valid. So returning as a valid result.

Possible solution

Using the m flag to consider each new line \n as a new text to be interpreted. You can change REGEX to:

/^(?!.* ?<br\/?> ?Unit\..*)(.*Unit\..*)$/gm

See on REGEX101

Explanation

  • ^...$ - I'm saying that the sentence to be parsed is from beginning to end.
  • (?!.* ?<br\/?> ?Unit\..*) - I'm saying that if he finds .* ?<br\/?> ?Unit\..* he should not capture.
  • (.*Unit\..*) - Content to be captured.

Addendum

  • The best way to think of denial lookback (as I see it) is, imagine the exact sentence of what it should capture.
  • You have used Unit. in that if you want to capture . literal you should escape it, otherwise capture would accept UnitG , Unit# , Unit .
17.08.2016 / 17:36