Regex to validate a particular date format

2

I was modifying a regex for a program in c ++ that would validate the following input form of the 29/feb/2000 dates. Currently it was only accepting 29/02/2000 or 30/03/2017 .

I tried to add to the other months but I can not. How to make it possible to 30/mar/2017 or 20/dec/2018 ?

Follow the regex:

"^(?:(?:0[1-9]|1[0-9]|2[0-8])(?:/|.|-)(?:0[1-9]|1[0-2])|(?:(?:29|30)(?:/|.|-)(?:0[13456789]|1[0-2]))|(?:31(?:/|.|-)(?:0[13578]|1[02])))(?:/|.|-)(?:[2-9][0-9]{3}|1[6-9][0-9]{2}|159[0-9]|158[3-9])|29(?:/|.|-)(?:02|feb|Feb)(?:/|.|-)(?:(?:[2-9](?:04|08|[2468][048]|[13579][26])|1[6-9](?:(?:04|08|[2468][048]|[13579][26])00)|159(?:2|6)|158(?:4|8))|(?:16|[2468][048]|[3579][26])00)$"
    
asked by anonymous 29.01.2018 / 11:47

3 answers

5
  

Regex is definitely not the right tool to solve this problem. However, I read in the comments that you are studying regex ... So just for fun


Regex

^(?:(?:(0?[1-9]|1\d|2[0-8])([-/.])(0?[1-9]|1[0-2]|j(?:an|u[nl])|ma[ry]|a(?:pr|ug)|sep|oct|nov|dec|feb)|(29|30)([-/.])(0?[13-9]|1[0-2]|j(?:an|u[nl])|ma[ry]|a(?:pr|ug)|sep|oct|nov|dec)|(31)([-/.])(0?[13578]|1[02]|jan|ma[ry]|jul|aug|oct|dec))(?:||)(0{2,3}[1-9]|0{1,2}[1-9]\d|0?[1-9]\d{2}|[1-9]\d{3})|(29)([-/.])(0?2|feb)(\d{1,2}(?:0[48]|[2468][048]|[13579][26])|(?:0?[48]|[13579][26]|[2468][048])00))$

⟶ ⟶ ⟶ ⟶ ⟿ ⟶ ⟿ ∞

Let's see Debuggex to unroll:

Orexplainedwithvariables:

std::string regexData() { std::string sep = "/", dia1a28 = "(0?[1-9]|1\d|2[0-8])", dia29 = "(29)", dia29ou30 = "(29|30)", dia31 = "(31)", mesFev = "(0?2|feb)", mes31diasNum = "0?[13578]|1[02]", mes31diasNome = "jan|ma[ry]|jul|aug|oct|dec", mes31dias = "("+mes31diasNum+"|"+mes31diasNome+")", mesNaoFevNum = "0?[13-9]|1[0-2]", mesNaoFevNome = "j(?:an|u[nl])|ma[ry]|a(?:pr|ug)|sep|oct|nov|dec", mesNaoFev = "("+mesNaoFevNum+"|"+mesNaoFevNome+")", mesTudoNum = "0?[1-9]|1[0-2]", mesTudoNome = mesNaoFevNome+"|feb", mesTudo = "("+mesTudoNum+"|"+mesTudoNome+")", diames29Fev = dia29+sep+mesFev, diames1a28 = dia1a28+sep+mesTudo, diames29ou30naoFev = dia29ou30+sep+mesNaoFev, diames31 = dia31+sep+mes31dias, diamesNao29Feb = "(?:"+diames1a28+"|"+diames29ou30naoFev+"|"+diames31+")", ano001a9999 = "(0{2,3}[1-9]|0{1,2}[1-9]\d|0?[1-9]\d{2}|[1-9]\d{3})", anoX4nao100 = "\d{1,2}(?:0[48]|[2468][048]|[13579][26])", anoX400 = "(?:0?[48]|[13579][26]|[2468][048])00", anoBissexto = "("+anoX4nao100+"|"+anoX400+")", dataNao29Fev = diamesNao29Feb+sep+ano001a9999, data29Fev = diames29Fev+sep+anoBissexto, dataFinal = "(?:"+dataNao29Fev+"|"+data29Fev+")"; return dataFinal; }


Using different date separators

You can use something like:

^(dia)[-/.](mês)[-/.](ano)$
dia = match[1]; mes = match[2]; ano = match[3];

But this would allow a date as 1.2/2000 .

To force a match using the same tab, you must use a group to pick up the first, and in the second, use a reviewer ( backreference ) for match the text captured by this group:

^(dia)([-/.])(mês)(ano)$
dia = match[1]; mes = match[3]; ano = match[4];


Code

#include <iostream>
#include <regex>

int main() {
    constexpr char text[]{"29/feb/2020"};
    std::regex re(R"((?:(?:(0?[1-9]|1\d|2[0-8])([-/.])(0?[1-9]|1[0-2]|j(?:an|u[nl])|ma[ry]|a(?:pr|ug)|sep|oct|nov|dec|feb)|(29|30)([-/.])(0?[13-9]|1[0-2]|j(?:an|u[nl])|ma[ry]|a(?:pr|ug)|sep|oct|nov|dec)|(31)([-/.])(0?[13578]|1[02]|jan|ma[ry]|jul|aug|oct|dec))(?:||)(0{2,3}[1-9]|0{1,2}[1-9]\d|0?[1-9]\d{2}|[1-9]\d{3})|(29)([-/.])(0?2|feb)(\d{1,2}(?:0[48]|[2468][048]|[13579][26])|(?:0?[48]|[13579][26]|[2468][048])00)))");
    std::cmatch match;
    bool valid = std::regex_match(text, match, re);

    if (valid) {
        std::cout << "Data válida: " << match[0] << std::endl
                  << "Dia: " << match[1]  << match[4]  << match[7]  << match[11] << std::endl
                  << "Mês: " << match[3]  << match[6]  << match[9]  << match[13] << std::endl
                  << "Ano: " << match[10] << match[14] << std::endl;
    } else {
        std::cout << "Data inválida!!";
    }
    return 0;
}

Result

Data válida: 29/feb/2020
Dia: 29
Mês: feb
Ano: 2020

Example on Ideone

    
02.02.2018 / 04:29
2
  

Validate the following incoming form of dates 29 / Feb / 2000 [...]

If you want to validate only the input format try this validation regex:

\d{2}\/[a-zA-Z]{3}\/\d{4}|\d{2}\/\d{2}\/\d{4}

But if you want to do a validation that only accepts the months of the year, I suggest you do not use regex, try to make a comparison at the start of the day and month, checking if the entries allowed are the same (If so, comment that I can change the answer).

Explanation

Validate if the string is:

  • 2 Digits
  • 1 /
  • 3 Characters from a to z
  • 1 /
  • 4 Digits
    Or
  • 2 Digits
  • 1 /
  • 2 Digits
  • 1 /
  • 4 Digits

You can also see an example of this regex running here.

    
29.01.2018 / 12:25
0

As long as it is possible to do this through regular expressions, I do not believe it is the best way in any programming language. (I do not know why loads of water answered the question thinking it was from Python too - but most of the answer, except the exact code in the example, applies)

Month names will be much easier to check, verify, and above all - "get the number of the month", to have a% object of% real if you check these month names outside the regular expression. p> Also, if your application will work in another language other than Portuguese: there are frameworks to turn programs into multi-language programs, and in general they depend on you putting all the strings of your program into a function call (often with a name meant to be almost transparent as date ). This function then searches your string for the desired language in the translation base. If the months names are hardcoded within the regular expression, you would have to pass the entire regexp to the translation engine.

Of course it would be possible to mount a regular expression template, with the names of the months in external variables, and merge all using string interpolation, before calling the regular expressions function - this is one of the advantages of in Python expressions regular calls are usable through normal function calls without having a special syntax.

But regular expressions are already quite difficult to read and keep in code. Changing regular expressions in runtime would be even more complicated to read.

My tip, as in the first paragraph, would be to use the regular expression to get the groups with day, month and year, and then a quieter mechanism with dictionaries and if's to extract the "real month". And take advantage of this opportunity, to do the validation of days of the month, year, and etc ... also outside the context of regular expression. I'm going to put an example in Python, which is a great pseudo-code for C ++ - but you'll get an idea of the problem:

So, instead of:

def validate_date(text):
    if re.search(super_complicated_auto_validating_regexp, text):
        return True
    return False

You can write something like:

short_months = {"jan": 1, "fev": 2,...,"dez": 12}

def days_per_month(month, year):
    data = {1: 31, 2: 28, 3: 31, 4:30, ...}
    if month == 2 and year % 4 == 0 and (not year % 100 == 0 or year % 400 == 0):
            return 29
    return data[month]

def parse_date(text):
    match = re.search(r"(\d{1,2})/(.{1,3})/(\d{2,4})", text)
    if not match:
        raise ValueError("Invalid date format")
    day, month, year = [match.group[i] for i in (1,2,3)]
    day = int(day.lstrip("0"))
    if not month.isdigit():
       month = short_months[month.lower()]
    month = int(month.lstrip("0"))
    year = int(year):
    if year < 50:  # assume 2 digit years < 50 are in XXI
          year += 2000
    elif year <= 99:  
         year += 1900
    if day > days_per_month(month, year):
        raise ValueError(f"Invalid day {day} for month {month}")
    result = datetime.date(year=year, month=month, day=day)

Note that you need more or less 20 lines of programmatic code to parse and validate the date. With your regular expression approach, you want to compress all the logic of these 20 lines into a single 'line', which is actually a mini-program in a language that is not user friendly.

All that said, the most normal way of parsing and validating "real" dates in the various crazy formats that users can type, or being in files, is to use a specialized library for this. In it, several people, for hundreds of hours, have already thought about how to make the thing more user-friendly and more error-proof - you would have to duplicate that work in your code (with a chance of doing it wrong - see the subtlety to calculate correctly leap years - that even microsoft missed the first versions of Excel, for example)

In Python, we have the excellent dateparser , which allows you to simply:

>>> import dateparser
>>> dateparser.parse("25/fev/2018", languages=["pt"])

datetime.datetime(2018, 2, 25, 0, 0)

It allows many more date formats than separated by /, including dates written in extensively in more than 20 languages - and is not prone to errors due to "corner cases".

In C ++ I would look for additional date modules of some framework you might already be using to provide more functionality to the language - there should be "natural date parsers" using Qt or Boost, for example.

    
29.01.2018 / 12:25