Regular expressions: lazy function "?"

8

I've learned about using Regular Expressions and read some explanations about using the ? sign (called lazy ), such as # :

  

*? : Matches the previous element zero times or more, but as few as possible.

In RegExr shows me the following explanation:

IfIuseeithertheexpression^(.*)$orthe^(.*?)$or^(.+?)$Igetthesameresult:

var string = "abcdef";
var re1 = /^(.*)$/;
var re2 = /^(.*?)$/;
var re3 = /^(.+?)$/;
console.log(string.match(re1));
console.log(string.match(re2));
console.log(string.match(re3));

What was not clear to me was " ... but as few as possible."

What do you mean, as few as possible ?

In what situation should you use and when not using ? to quantify * or + , since, as in the examples above, using or not, the result is the same? >     

asked by anonymous 19.06.2018 / 00:24

2 answers

9

You get the same result because $ forces the end of the catching group to be at the end of the string. So, the fewest possible number of times will still necessarily go until the end of the string.

The ? gets the * or + as few as possible, in contrast to the regex default that is to be "greedy", that is, to get as many matching characters as possible in the standard. Consider the following:

var string = 'Eu quero pegar uma palavra que comece com "q" seguida por um espaço.';
var re1 = /(q.*) /;
console.log(string.match(re1)); 
// "quero pegar uma palavra que comece com \"q\" seguida por um"

Notice that instead of picking up the "want", the capture group got the quero , which starts with "q", and everything else until the last space found . That's why regex is greedy: it takes everything you can.

Now let's look at the same thing with ? that takes greed out of regex:

var string = 'Eu quero pegar uma palavra que comece com "q" seguida por um espaço.';
var re2 = /(q.*?) /;
console.log(string.match(re2));
// "quero"

This is more in line with what we wanted. The regex got the capture group my word that starts with "q", and stopped in the first space ; that is, it gave match in the least number of possible characters that satisfies the regular expression.

In your original example, the smallest possible number of characters was still the complete string because with ^ and $ , you forced the regex to start at the first character of the string and end at the last one. So the smallest string that satisfies the complete expression is the whole string.

    
19.06.2018 / 00:37
4

Imagine that you have a giant HTML page in a s string and you search the% regex /<[^>]+>.*<\/[^>]+>/ . What will you get? Yes, the html tag, the rest of the document and the html closes, because the normal quantifiers are greedy .

Now, if you have the same string s and a /<[^>]+>.*?<\/[^>]+>/ regex, what will you get? Any tag that opens and closes in the first few sections of the document.

$(function () {

  var s = $('#conteudo').html();
  
  alert('Original: ' + s);
  alert('Eager: ' + s.match(/<[^>]+>.*<\/[^>]+>/gm)[0]);
  alert('Lazy: ' + s.match(/<[^>]+>.*?<\/[^>]+>/gm)[0]);

});
* { font-family: Consolas; }
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id='conteudo'><table><thead><tr><th>Produto</th><th>Preço</th><th>Quantidade</th></tr></thead><tbody><tr><td>Feijão</td><td>R$ 8,75</td><td>1</td></tr><tr><td>Arroz</td><td>R$ 4,99</td><td>2</td></tr></tbody><tfoot><tr><td>Total</td><td></td><td>R$ 18,73</td></tr></tfoot></table></div>
    
19.06.2018 / 00:55