Capture year that is outside the regex

4

I'm working with a text file using sublime I want to replace some strings where:
I have several strings like this:

EMISSAO="2016-04-18 00:00:00"

I need a regex that captures where the year is invalid, eg: In some registers it looks like this:

EMISSAO="65321-04-18 00:00:00"

That is, the number 65321 represents the year and it is a year that does not even exist (invalid), I need to see where the years are above 2017 to correct.

    
asked by anonymous 02.03.2017 / 18:21

7 answers

5

As it is something punctual to look for in the sublime I believe that the problem is only to find the part of the year valid or invalid, with that the part after the year would be the date that by what you said will come correct.

With this to check only the valid year can use this regex:

^EMISSAO="(19\d{2})|20(0[0-9]|1[0-7])-/d{2}-\d{2}\s(\d{2}:){2}\d{2})"$

Explaining:

(19 \ d {2}) | 20 (0 [0-9] | 1 [0-7]) - House years 19 through 2000 until 2017

\ d {2} - \ d {2} \ s - Here are the subsequent parts of the date: month and day. Not validated yet because apparently only the year comes wrong. And including any space using \ s "

and then finally the last 2 remaining numbers.

I have tested here with these cases:

EMISSAO="2017-04-18 00:00:00" // passa
EMISSAO="1990-04-18 00:00:00" // passa
EMISSAO="2016-04-18 00:00:00" // passa
EMISSAO="2015-04-18 00:00:00" // passa
EMISSAO="2017-04-18 00:00:00" // passa
EMISSAO="2018-04-18 00:00:00" // não passa
EMISSAO="22000 00:00:00" // não passa
EMISSAO="2016-04-18 00:00:00" // passa
EMISSAO="65321-04-18 00:00:00" // não passa
EMISSAO="5069 00:00:00" //não passa
EMISSAO="2018 00:00:00" //não passa
EMISSAO="2019 00:00:00" //não passa
EMISSAO="2020 00:00:00" //não passa
    
02.03.2017 / 18:52
3

If an invalid year starts from 3000, you can use the following regex [3-9]\d{3,} . It matches a number that starts between 3 or 9 by following any other digits at least three times.

"414-10-12 17:04:29" //não casa
"6014-10-12 17:04:29" //casa
"8014-10-12 17:04:29" //casa
"85014-10-12 17:04:29" //casa
"2019" //não casa
"3000" //casa
    
02.03.2017 / 18:42
3

This below is not the most optimized way for your situation, as there are some possibilities to explore that you did not raise in your question, but below a possible Regex for your situation:

/(1[0-9]{3}|20(0[0-9]|1[0-7]))-(0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-9]|3[0-1])\s+00:00:00/g
    
02.03.2017 / 18:50
3

Mark the prefix "###" the odd years, and then edit for manual correction:

perl -i.bak -pe 's/(?<=EMISSAO=")(\d{4,})/$1 < 2018 ? $1 : "###$1"/ge' ex.xml
sublime ex.xml
    
02.03.2017 / 20:07
3

Using the @Marlysson answer .

And adjusting for the logic you need:

  

The number 65321 represents the year and is a year that does not even exist (invalid), I need to see where the years are above 2017 to correct.

In other words, you want invalid and non-valid matches .

Setting REGEX to look like this:

^EMISSAO="(?(?!(19\d{2}|20(0\d|1[0-7]))-\d{2}-\d{2}).*|)$

  • Note that I'm disregarding what comes after the date, in that case the part where @Marlysson made \s(\d{2}:){2}\d{2}" to check the whole sentence.

Logica

  • The logic used is inversion, ie I have to know the valid matches so I can not catch them. For this I used (?!...)
  • In order to perform the capture action of what is not valid, I used the logics of Ternario in REGEX% with% REGEX (?(?{option} .

Explanation

  • )then|else) - literal sentence from the beginning
  • ^EMISSAO=" - If you hit this statement it is false.
  • (?!(19\d{2}|20(0\d|1[0-7]))-\d{2}-\d{2}) boolean (? - When true it captures everything, when false it does not capture anything.
  • .*|) - End of capture.

Be in the REGEX101

    
07.03.2017 / 15:14
2

Test the following regular expression to see if a date is valid:

EMISSAO=\"([0-9]{4,}(?<=0*2(0(1[8-9]|[2-9][0-9])|[1-9][0-9]{2})|[3-9][0-9]{3}))-(0?[1-9]|1[0-2])-(0?[1-9]|[1-2][0-9]|3[0-1]) (0?[0-9]|1[0-9]|2[0-3]):(0?[0-9]|[1-5][0-9]):(0?[0-9]|[1-5][0-9])\"

([0-9]{4,}(?<=0*2(0(1[8-9]|[2-9][0-9])|[1-9][0-9]{2})|[3-9][0-9]{3})) = > Any year that is not between year 0 and 2017

(0?[1-9]|1[0-2]) = > Any month from 1 to 12

(0?[1-9]|[1-2][0-9]|3[0-1]) = > Any day up to 31

(0?[0-9]|1[0-9]|2[0-3]) = > Any time from 0 to 23

(0?[0-9]|[1-5][0-9]) = > Any minute from 0 to 59

(0?[0-9]|[1-5][0-9]) = > Any second from 0 to 59

    
02.03.2017 / 19:08
0
  

I need to see where the years are above 2017

Regular expression

\b0*(?:(?:[12]\d|[3-9])\d{3,}|2(?:0(?:1[89]|[2-9]\d)|[1-9]\d{2}))(?=-\d{2}-\d{2} \d{2}:\d{2}:\d{2})


Test here: link

    
26.03.2017 / 13:09