Second backslash in meta-characters when expression is enclosed in quotation marks

5

When using a pattern in quotation marks, it is necessary to add a second backslash ( \ ) in the metacharacters that have such a bar, as in this case, for example:

/\d+/ -> "\d+"

Code samples:

var str = "Hello 123!";

// usando new RegExp()
var re = new RegExp("\d+"); // entre aspas
var re2 = new RegExp(/\d+/); // sem aspas

// sem new RegExp()
var re3 = "\d+";            // entre aspas
var re4 = /\d+/;             // sem aspas

console.log(str.match(re));
console.log(str.match(re2));
console.log(str.match(re3));
console.log(str.match(re4));
  

As you read in this documentation , the dot ( . ) is the only   meta-character that does not have a backslash, unlike \d , \w , \s etc.

What would be the logical explanation for the need for the second backslash in meta-characters (except . ) when the pattern is used in quotation marks?

    
asked by anonymous 26.07.2018 / 03:22

1 answer

4

Within strings, \ is used to encode escape sequences. That is, it is used to encode things that would be difficult to put inside the string in some other way. For this we have \n that denotes a line break, \t that denotes a tab, the \u1234 sequences to encode specific unicode characters and a few other cases. All of these cases are resolved at compile time (although javaScript is interpreted, it compiles just-in-time before beginning to interpret the code). Thus, in the string "Bom\ndia" what the compiler will mount will be a string with Bom , a line break and dia .

However, since the \ character is used to escape other sequences, then how would you put the \ character in the string itself? The answer is with the \ escape sequence. This is why within strings, when you want to write the \ character, it has to be doubled.

Similar cases occur with ' and " , which because they are string terminators, can be represented as \' and \" , respectively when they are within the string.

So far we have not said anything about regular expressions, but the thing complicates because regular expressions also use the \ character to escape, and use it to make escape sequences that are largely different from strings (although to make the escape of \ itself, regular expressions also use \ ). So if you have a string that appears in the code as "\d+" , the just-in-time compiler of JavaScript will merge into memory a string containing \d+ and then the regular expression compiler will compile this regular expression for an object that accepts one or more characters in the range '0' to '9' .

The dot ( . ) is a character that has nothing special in the strings, so it does not have to be represented by any escape sequence. However, in regular expressions, it has special meaning because it can represent any character. Thus, new RegExp("abc.def") will become a regular expression that recognizes 7 characters, with the first three abc , the last three def and the middle one, anything.

And how then do you represent the . character in a regular expression literally? In this case, you should use the regular expression \. . However, if this is encoded as a string, you will have to use "\." .

What happens is that when you represent regular expressions with strings, there are two compilation steps involved. One is to put the string into memory by applying the necessary escape sequences and a second to convert the string into a regular expression, also applying the necessary escape sequences. This means that in this case the programmer has to watch out to see what is being mounted when, which can be quite confusing considering that the two steps use the same \ character to represent escape sequences.

However, when you use regular expressions delimited by / , such as /\d+/ or /\./ , in this case you are not constructing a string, but instructing the compiler to construct the regular expression directly without the intermediate step of representing it as a string. So in this case the \d and \. should not be represented as \d or \. .

Oh, and that's why new RegExp("\\") is the regular expression that is used to recognize a single \ singular. The string to be mounted will be \ , which interprets as regular expression, becomes just \ .

    
26.07.2018 / 04:20