Forcing backreference on regex

3

These days ago I asked here about a regex that validates dates and how to force the separators regex validated the following format: dd / mm / yyyy So based on it I was trying to force the separators using backreference in a regex that validates the formats: yyyy / mm / dd but I'm not getting anyone could explain me how to find the values for the backreference and do that? The regex that I need to force the backreferences that validates YYYY / mm / dd is this.

R"(^(?:\d{4}([-/.])(?:(?:(?:(?:0?[13578]|1[02]|j(?:an|ul)|[Mm]a[ry]|[Aa]ug|[Oo]ct|[Dd]ec)([-/.])(?:0?[1-9]|[1-2][0-9]|3[01]))|(?:(?:0?[469]|11|[Aa]pr|[Jj]un|[Ss]ep|[Nn]ov)([-/.])(?:0?[1-9]|[1-2][0-9]|30))|(?:(0?2|[Ff]eb)([-/.])(?:0?[1-9]|1[0-9]|2[0-8]))))|(?:(?:\d{2}(?:0[48]|[2468][048]|[13579][26]))|(?:(?:[02468][048])|[13579][26])00)([-/.])(0?2|[Ff]eb)([-/.])29)$)"
    
asked by anonymous 22.02.2018 / 22:04

2 answers

2
  

My answer complements Paulo RF Amorim answer, showing how what he said applies to Regex Expression placed in the question.

See that in a Regex you create the Groups through the parentheses, what is inside parentheses will be captured by the group; and, each group has a numeric id so we can reference it. The id 0 matches the entire regex match; we can know the ids of Groups within the regex expression by looking at the order from left to right in which the parentheses are opened:

In Regex a(b|c)(apartamento|ca(sa|rro)) :

  • Group of id 1 is (b|c) that will return b or c ;
  • Group of id 2 is (apartamento|ca(sa|rro)) which will return apartamento or casa or carro ;
  • The group of id 3 is (sa|rro) which will return sa or rro (or it will be undefined if the 2 group contains apartamento ) li>

    If you want a Group not to have a reference, that is, an id, you can use (?:) which creates a non-capturing group , explained this answer of SOen.

    Your Regex starts like this: (^(?:\d{4}([-/.])... See that ([-/.]) Group has 2 , because before it we have a non-capturing group (?:\d{4}... and an id group 1 that is opened by the first parentheses (^... .

    Date separator can be obtained by id 2 which is the group reference " ([-/.]) ", which will return - or / or . . To refer to this group you just have to as @Paulo explained.

    Currently your Regex repeatedly displays Groups equal to " ([-/.]) ", we can simply keep the first of these (which has 2 , as I explained above) and replace the others with its reference that is ; I decided to keep the reference within Groups to facilitate, so that we have a Before / After like this:

    Before: (^(?:\d{4}([-/.])...([-/.])...([-/.])...([-/.])...
    Then: (^(?:\d{4}([-/.])...()...()...()...

    With this, Regex will only match if all the tabs are the same as the one captured by the% wc group, so we only validate dates that do not mix different tabs. Then there will be match on a date like 2 or 2000/02/28 , but there will be no match on a date like 2000-02-28 .

    In the end, your Regex looks like this:

    (^(?:\d{4}([-/.])(?:(?:(?:(?:0?[13578]|1[02]|j(?:an|ul)|[Mm]a[ry]|[Aa]ug|[Oo]ct|[Dd]ec)()(?:0?[1-9]|[1-2][0-9]|3[01]))|(?:(?:0?[469]|11|[Aa]pr|[Jj]un|[Ss]ep|[Nn]ov)()(?:0?[1-9]|[1-2][0-9]|30))|(?:(0?2|[Ff]eb)()(?:0?[1-9]|1[0-9]|2[0-8]))))|(?:(?:\d{2}(?:0[48]|[2468][048]|[13579][26]))|(?:(?:[02468][048])|[13579][26])00)([-/.])(0?2|[Ff]eb)()29)$)
    

    Note that at the end of the expression I used a reference to the% id% group instead of the id% group: 2000/02-28 ; this was necessary due to the specific features of its Regex, which gives a different treatment for the 29th day of the month February. This is clearer in Debuggex :

    See in the image that it would not work to make 7 ( 2 ) reference ...()29)$) because Group 9 will be undefined when the date is February 29, but , we make Ref 4 refer Group 2 which will not be undefined on the dates of February 29.

        
  • 05.03.2018 / 04:00
    4

    The idea of the backreference in Regex is that you can reuse a block through the group created by it. For example:

    In this regex ([a-c])xx , the following block ([a-c]) is defined as group 1, which defines what is the letter "a", "b" or "c".

    Then the following words "axaxa", "bxbxb" or "cxcxc" are valid as explained in this link . The link site is a useful tool for identifying groups and validating regex.

        
    28.02.2018 / 18:26