In several regex I noticed some symbols that do not seem to be part of the catch but some kind of functionality. I would like to know the name or term of these symbols and what is the functionality of each one.
?:
?=
?!
?<=
?<!
In several regex I noticed some symbols that do not seem to be part of the catch but some kind of functionality. I would like to know the name or term of these symbols and what is the functionality of each one.
?:
?=
?!
?<=
?<!
If you are referring to .Net regexes, using the , these symbols can be used when starting a group with parentheses:
Regex
+ symbols + ... + (
What do they mean:
)
Unrecognized group: indicates a group that will not be in the list of captured groups ... note that this will normally be considered within the match, it will not be a group, for example:
String parsed: ?:
Regex: abc. 123 xpto<fim>
Matches: \w+(?:\.|<fim>)
, abc.
The others are assertive, without capture, nor do they advance in reading:
xpto<fim>
Positive Lookahead: This is an assertion, which verifies that the group can start starting in the position it find, but without capturing or advancing in reading the string being parsed, for example:
Read words that happen before a point ( ?=
)
String parsed: .
Regex: 123. xpto.
Matches: \b\w+\b(?=\.)
, 123
xpto
Negative Lookahead: This is an assertion, which verifies that the group can not start starting at the position it is found, but without capturing or advancing the reading of the string being parsed, for example:
Read words that do not happen before a point ( ?!
)
String parsed: .
Regex: 123 xpto abc.
Matches: \b\w+\b(?!\.)
, 123
xpto
Positive Lookbehind: This is an assertion, which verifies that the group can be found ending in the position it find, but without capturing or advancing in reading the string being parsed, for example:
Read words that happen after a point ( ?<=
)
String parsed: .
Regex: abc. 123 xpto.
Matches: (?<=\.\s*)\b\w+\b
123
Negative Lookbehind: This is an assertion, which verifies that the group can not be found ending in the position it is found, but without capturing or advancing the reading of the string being parsed, for example:
Read words that do not happen after a period ( ?<!
)
String parsed: .
Regex: abc. 123 xpto.
Matches: (?<!\.\s*)\b\w+\b
, acb
If it's of your interest, I usually use this tool to work with regexes in C #:
Lookahead is a way to look for strings that have a particular ending or not. It is used ( ?⁼..
) for the positive, that is, that they end with; and ( ?!..
) to the negative, that is, that does not end with.
Lookbehind does the same thing as lookahead , however, as the name itself says, look no further than the string. ( ?<=..
) to the positive and ( ?<!..
) to the negative.
Example, consider the foobarbarfoo
sequence.
bar(?=bar) encontra o primeiro bar.
bar(?!bar) encontra o segundo bar.
(?<=foo)bar encontra o primeiro bar.
(?<!foo)bar encontra o segundo bar.
You can also combine them:
(?<=foo)bar(?=bar) encontra o primeiro bar.
See this online tool ( RegExr ) to help you create expressions, such as identifying types, there are also examples.
Here explains in more detail about this.
I will soon update the answer with more information.