Regex for repeated numbers in the CNPJ

2

I have the following regular expression:

regex:/^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$/

I can validate for this, but not for repeated numbers.

I want to apply within this regex, a form that does not accept repeated values, for example: 11.111.111/1111-11 , 22.222.222/2222-22 , and so on.

I'm using this Regex within a Laravel Request.

public function rules()
    {
        return [
            'name'                  =>  'required:unique:companies',
            'email'                 =>  'required|email|unique:companies',
            'cnpj'                  =>  'required|unique:companies|regex:/^\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$/',
            'display_name'          =>  'required',
            'description'           =>  'string',
            'address'               =>  'required|string',
            'address_number'        =>  'required|numeric',
            'district'              =>  'required',
            'zip_code'              =>  'required|min:9',
            'city_id'               =>  'required',
            'site_url'              =>  'required',
            'photo_url'             =>  'required|image',
            'phone_number'          =>  'required|min:10',
        ];
    }

How would you do that?

    
asked by anonymous 01.11.2018 / 14:11

1 answer

2

Short answer

^(?!(\d)\.{3}\.{3}\/{4}-{2}$)\d{2}\.\d{3}\.\d{3}\/\d{4}-\d{2}$

I'm not sure if the hyphen needs to be escaped with \ , like you did. If you need to, just change the regex to:

^(?!(\d)\.{3}\.{3}\/{4}\-{2}$)\d{2}\.\d{3}\.\d{3}\/\d{4}\-\d{2}$

Long answer

First we have bookmarks ^ and $ , which means, respectively, the beginning and end of the string. With this I guarantee that the entire string has only what is inside the regex.

After ^ (beginning of string), the regex has 2 main parts. Let's see how each one works separately.

The first excerpt in parentheses (?!...) is a negative lookahead . Basically, it checks if the string does not match the expression that is inside the parentheses.

The first thing we have in lookahead is (\d) . The shortcut \d matches the digits, and the parentheses form a catch group . This means that if the first character is a digit, it will be "captured" by the regex. And as it is the first pair of parentheses, it will be referenced as group 1 (the lookahead does not count as it alone does not form a catch group).

Then I use , which is a way to reference group 1. This means that will have the same value as the digit that was captured in group 1. That is, (\d) checks for two digits in a row and if they are the same digit.

Then we have \. , which corresponds to the character itself ( . ), and then we have {3} , which means "exactly 3 occurrences ( {3} ) of what was captured in group 1 , the digit we capture in (\d) ) ".

The rest of the expression ( \.{3}\/{4}-{2}$ ) checks to see if there is another point, plus 3 occurrences of the same digit, bar, 4 occurrences of the same digit, hyphen and 2 occurrences of the same digit, and finally the end of the string ( $ ).

That is, the entire expression checks whether the same digit repeats (it corresponds to cases like 11.111.111/1111-11 and 22.222.222/2222-22 ). And the negative lookahead ( (?!...) ) ensures that the string does not has that format. Therefore, if all digits are equal, lookahead fails and regex does not find a match .

The lookahead trick is that it first checks the string and if it is ok, it goes back where it was and continues to evaluate the rest of the expression. Since lookahead is just after ^ (beginning of string), it returns to the beginning of the string and continues evaluating the remainder of the regex. If lookahead fails, regex also fails and can not find any match .

The second part is the regex you were already using (2 digits, period, 3 digits, period, 3 digits, bar, 4 digits, hyphen, 2 digits and end of the string).

The combination of lookahead with your expression ensures that you have what you need:

  • negative lookahead ensures that the digits are not all the same
  • If the lookahead check worked out (that is, it does not fall into cases where all digits are equal), it returns to where it was (in this case, the beginning of the string) and checks the rest of the expression
  • The rest checks to see if it is in the format you specified
01.11.2018 / 14:42