As long as it is possible to do this through regular expressions, I do not believe it is the best way in any programming language.
(I do not know why loads of water answered the question thinking it was from Python too - but most of the answer, except the exact code in the example, applies)
Month names will be much easier to check, verify, and above all - "get the number of the month", to have a% object of% real if you check these month names outside the regular expression. p>
Also, if your application will work in another language other than Portuguese: there are frameworks to turn programs into multi-language programs, and in general they depend on you putting all the strings of your program into a function call (often with a name meant to be almost transparent as date
). This function then searches your string for the desired language in the translation base. If the months names are hardcoded within the regular expression, you would have to pass the entire regexp to the translation engine.
Of course it would be possible to mount a regular expression template, with the names of the months in external variables, and merge all using string interpolation, before calling the regular expressions function - this is one of the advantages of in Python expressions regular calls are usable through normal function calls without having a special syntax.
But regular expressions are already quite difficult to read and keep in code. Changing regular expressions in runtime would be even more complicated to read.
My tip, as in the first paragraph, would be to use the regular expression to get the groups with day, month and year, and then a quieter mechanism with dictionaries and if's to extract the "real month".
And take advantage of this opportunity, to do the validation of days of the month, year, and etc ... also outside the context of regular expression. I'm going to put an example in Python, which is a great pseudo-code for C ++ - but you'll get an idea of the problem:
So, instead of:
def validate_date(text):
if re.search(super_complicated_auto_validating_regexp, text):
return True
return False
You can write something like:
short_months = {"jan": 1, "fev": 2,...,"dez": 12}
def days_per_month(month, year):
data = {1: 31, 2: 28, 3: 31, 4:30, ...}
if month == 2 and year % 4 == 0 and (not year % 100 == 0 or year % 400 == 0):
return 29
return data[month]
def parse_date(text):
match = re.search(r"(\d{1,2})/(.{1,3})/(\d{2,4})", text)
if not match:
raise ValueError("Invalid date format")
day, month, year = [match.group[i] for i in (1,2,3)]
day = int(day.lstrip("0"))
if not month.isdigit():
month = short_months[month.lower()]
month = int(month.lstrip("0"))
year = int(year):
if year < 50: # assume 2 digit years < 50 are in XXI
year += 2000
elif year <= 99:
year += 1900
if day > days_per_month(month, year):
raise ValueError(f"Invalid day {day} for month {month}")
result = datetime.date(year=year, month=month, day=day)
Note that you need more or less 20 lines of programmatic code to parse and validate the date. With your regular expression approach, you want to compress all the logic of these 20 lines into a single 'line', which is actually a mini-program in a language that is not user friendly.
All that said, the most normal way of parsing and validating "real" dates in the various crazy formats that users can type, or being in files, is to use a specialized library for this. In it, several people, for hundreds of hours, have already thought about how to make the thing more user-friendly and more error-proof - you would have to duplicate that work in your code (with a chance of doing it wrong - see the subtlety to calculate correctly leap years - that even microsoft missed the first versions of Excel, for example)
In Python, we have the excellent dateparser , which allows you to simply:
>>> import dateparser
>>> dateparser.parse("25/fev/2018", languages=["pt"])
datetime.datetime(2018, 2, 25, 0, 0)
It allows many more date formats than separated by /, including dates written in extensively in more than 20 languages - and is not prone to errors due to "corner cases".
In C ++ I would look for additional date modules of some framework you might already be using to provide more functionality to the language - there should be "natural date parsers" using Qt or Boost, for example.