What does the regular expression "/ (? = (?: ...) * $) /"?

19

I just needed a solution to put points to separate numbers from three to three, backwards.

For example:

1000 => 1.000
100000 => 100.000
10000000 => 1.000.000

In a response I found in Stackoverflow English , the last solution was to use this regular expression:

function separarDeTresEmTres(numero)
{
  return String(numero).split(/(?=(?:...)*$)/ ).join('.');
}


console.log(separarDeTresEmTres(1000));  
console.log(separarDeTresEmTres(1000000));
console.log(separarDeTresEmTres(10000000));

But I did not quite understand what the magic of /(?=(?:...)*$)/ was.

What is causing this regular expression to separate the numbers from three to three backwards? What is the explanation?

NOTE : I do not want answers explaining how to separate a number from three to three, even because the regular expression I'm using is already doing this. The question here is specifically about how each part of that regular expression works. I do not want to solve the problem without explaining what is happening.

    
asked by anonymous 06.02.2017 / 12:01

4 answers

18

Well, let's build this regular expression:

  • . - Recognizes any character.

  • ... - Recognizes any three characters.

  • (?:...) - Group without any three characters. Untagged groups are started by (?: and terminated by ) .

  • (?:...)* - Repeat. This * indicates 0 or more repetitions. So this is several groups of three characters.

  • $ - End of string. By ensuring that the end of the string is present, it is ensured that no character can be left over at the end.

  • (?:...)*$ - Groups of three characters followed by the end of the string. This ensures that the recognized groups must be at the end of the string, not at the beginning.

  • (?=(?:...)*$) - Lookahead - Forces recognition of the following and looks for all places where the next expression matches something.

To understand this last point, let's assume that the expression is (?=a(?:...)*$) and the input string is 1234a567890 . In this case, the value recognized in the internal expression to (?= - ) would be a567890 , as this would be a followed by a multiple character number of 3, but recognition of the entire string captured by < in> lookahead (in this case the complete entry) is forced anyway. Notice that the recognition of the whole happened even though the beginning of the next string did not enter the recognized part - that's what the positive lookahead does. Recognition also occurs several times because the regex within (?= - ) is recognized in several different places - each string of multiple lengths of three characters stuck at the end of the string (including 0) preceded by anything is recognized. / p>

  • The / before and after the regex is what Javascript uses to denote and delimit the regex.

  • The String(numero) is a way to convert a number to a string.

  • The split method chooses the string where recognition occurs, so it will end up slicing the string every three characters from the end to the beginning because several different possibilities have been recognized and created with this an array of strings.

  • The join is an array method that joins all pieces into a single string by placing a separator between each chunk, the result of which is returned with return .

06.02.2017 / 12:49
18

Let's get by:

/(?=(?:...)*$)/
  • ?= It will capture the space followed by the expression after the =.
  • ?: Sets the entire expression within the parentheses in a non-capture group.
  • ... Any character 3 times.
  • *$ Repeating several times at the end of the string.

Explaining in practice:

What happens is this, this expression groups 3 characters: (?:...) , and it captures before them: ?= , which guarantees that this is done backwards infinitely times is: *$ , by applying the split the number 1000000 would be divided thus: 1 | 000 | 000, then it is only it to join with a point .join('.') that the magic is made.

NOTE: The group does not capture ?: is not to disturb the time to capture what really matters, which is before the 3 characters.

    
06.02.2017 / 12:31
9

This REGEX is very interesting because it combines some interesting factors.

Factors

  • We know that REGEX is used to capture certain content of a text / string.
  • We also know that split divides by occurrence .

Examples

var test = 'Teste de captura';
var r = /c.p/
console.log(test.match(r));

var test = 'Teste de divisão';
var div = 'e';
console.log(test.split(div));

Note that in% w / o% the division character is lost.

What's happening?

This function is uniting these two particularities. The big question is: "What is it using to capture and divide" ?
The answer is: The nothing ".

Now you must be wondering, "What do you mean, nothing?"

What is "nothing"

In this answer I approach a little what is nothing .

  

in compilers would be the same as a straight transition to the next stage

How it does it

Through the part of REGEX split . This creates a catch that should not go to the result.

General explanation of REGEX

  • (?= ) As you have set an end, this will change the default behavior of REGEX and make it "start" at the end, marrying content back and forth.
  • $ - Only says that it is a group that can repeat itself infinitely, but it should not be counted.
  • (?: )* - This is a catch that should occur but not go to the result.
  • (?= ) - Sequence of 3 characters whatever.

And where is nothing in all this?

In fact you do not have a catch, you have a 3-character sequence going from end to end. And he is dividing by transition for the next stage of 3. It would be the gap between the two.

    
06.02.2017 / 12:40
-2

Now a variant of the same idea, using the same philosophy - this time a simple substitute in Perl (to emphasize that this question is orthogonal to the programming language)

$ echo "1000 e mais 100000 10000000" |
      perl -pe 's/\d(?=(\d\d\d)+\b)/$&./g'
1.000 e mais 100.000 10.000.000

That is: a digit "d" followed by groups of 3 digits, is replaced by "d." As usual:

  • \b : word border
  • \d : digit
  • (?= regexp) : zero-width lookahead (right context)
  • s/regexp/string/g : find-replace global
  • $& : the matching string

function separarDeTresEmTres(numero)
{
  return String(numero).replace(/\d(?=(...)+$)/g,"$&.");
}
    
06.02.2017 / 13:47