Your logic is almost right, I say almost, because it lacks a small interpretation.
In REGEX you should consider that it can start / end where you want, unless you explicitly define how it should behave.
Analyzing what happens
Legend
-
^
Start of text to be interpreted
-
$
End of text to be interpreted
Analyze 1
<td>Preço<br/>Unit.</td>
^
$
Note that in this hunt the interpreted text only has <
, so REGEX does not hit
Analyze 2
<td>Preço<br/>Unit.</td>
^ $
Note that in this hunt the interpreted text is <td>Preç
, so REGEX does not hit
Analyze 3
<td>Preço<br/>Unit.</td>
^ $
Note that in this hunt the interpreted text is <br/>Unit.
, if REGEX is the 1st% ( ?<br\/?> ?)(Unit.)
, hits perfectly finding the result, but since it is the 2nd% (?! ?<br\/?> ?)(Unit.)
lookback inhibits the result .
Analyze 4
<td>Preço<br/>Unit.</td>
^ $
Note that in this hunt the interpreted text is Unit.
, if REGEX is the 1st% ( ?<br\/?> ?)(Unit.)
, the result is not found, since ?<br\/?> ?
is missing
in the beginning, but as the 2nd (?! ?<br\/?> ?)(Unit.)
, hits perfectly,
because lookback says it should not contain ?<br\/?> ?
before (Unit.)
,
and having nothing is valid. So returning as a valid result.
Possible solution
Using the m
flag to consider each new line \n
as a new text to be interpreted. You can change REGEX to:
/^(?!.* ?<br\/?> ?Unit\..*)(.*Unit\..*)$/gm
See on REGEX101
Explanation
-
^...$
- I'm saying that the sentence to be parsed is from beginning to end.
-
(?!.* ?<br\/?> ?Unit\..*)
- I'm saying that if he finds .* ?<br\/?> ?Unit\..*
he should not capture.
-
(.*Unit\..*)
- Content to be captured.
Addendum
- The best way to think of denial lookback (as I see it) is, imagine the exact sentence of what it should capture.
- You have used
Unit.
in that if you want to capture .
literal you should escape it, otherwise capture would accept UnitG
, Unit#
, Unit
.