How to select text that does not have a certain term in the middle?

1

I'm trying to select a part of an HTML code with RegEx but I'm not getting the regular expression correct, could anyone help?

I need to select the <li> groups separately, ie without the presence of the <br> tag in the middle.

For example, I'm trying with the expression below:

/<li.*(?!<br).*\/li>/gi

You must select the following text separately:

<li>Teste 1</li><li>Teste 2</li><li>Teste 3</li>

In this test , I created two occurrences of this list, however the expression is selecting everything from the first occurrence to the last.

How do I select the two lists separately?

    
asked by anonymous 31.05.2018 / 22:34

1 answer

3

The problem with quantifiers * and + is that they are "greedy" , that is, they try to take as many characters as possible to match the expression.

To cancel this "greedy" behavior, just put ? after * . This will take the minimum number of characters needed (so *? is also called lazy quantifier ). Then the regex would look like this:

/<li.*?(?!<br).*?\/li>/

You can see it running here .

The above regex takes 6 groups (each tag li ) separately. To get a sequence of multiple li that does not contain br as if they were a single thing, simply search for 1 or more occurrences of the entire previous regex (using the + quantizer):

(<li.*?(?!<br).*?\/li>)+

You can see this regex running here .

    
31.05.2018 / 23:02