Type of the CPF or CNPJ field in the VARCHAR or INT database?

Question

Type of the CPF or CNPJ field in the VARCHAR or INT database?

Navigation

#1 by (70 votes)
#2 by (17 votes)
#3 by (5 votes)
#4 by (0 votes)

47

What type of CPF or CNPJ field in database VARCHAR or INT ?

I've seen some posts suggesting to use INT to optimize performance in JOIN and filters.

If you have 0 on the left, you can only complete it according to the size of the stored field (CPF or CNPJ) to carry out the validation.

So what would be the best approach?

sql validação tipagem banco-de-dados modelagem

asked by anonymous 22.01.2015 / 14:51

4 answers

17

I always defended using VARCHAR because they are a string that coincidentally occurs to be a sequence of digits. Unlike a INT where the purpose of being a number is to allow you to perform mathematical operations, something that has no meaning in a CPF or CNPJ.

The same case occurs with RG, where each state has a different format, and using as INT would remove the zeros on the left, as well as bring up problems regarding the check digit.

22.01.2015 / 14:56

5

Use VARCHAR you will have problems if you use . (dot) or / and - to separate the digits. If you do some checking and remove these or other characters then you can use INT . But I recommend using VARCHAR . Use INT for records that you know will only have digits / numbers.

23.01.2015 / 13:23

0

If we look at the size of INT (UNSIGNED) from 0 to 4294967295, we have 10 supported positions. Since the CPF has 11 unformatted numbers, we would have 1 more number in that string, forcing the DB, at the time of insertion, to make a CPF with initial value of "500,000,000-60" or if "4294967296" turn 4294967295 .

The rule applies to the CNPJ also, in the case of the CNPJ with 14 unformatted numbers. The ideal would be to use a BIGINT from 0 to 18446744073709551615. For querying speed issues, an index can be added in this field and other rules that go from each DBA.

In the case of 0 left in MySql for example we can define the field as ZEROFILL.

On formatting it is not necessary to allow the application to reach the DB with a CPF or CNPJ formatted with ".", "-" or "/", this rule can be done in a view.

27.10.2017 / 13:16

What are lexical scope and dynamic scope and what are their main differences? What is the advantage of using recursive functions?

score 70 · Accepted Answer

Optimization should only be done if it is really necessary. I doubt this optimization will pay off in any scenario.

The basic rule for selecting the type when they can be confused is what you will use them for. Numbers represent quantities. Are CPF or CNPJ amounts? No, they are identifiers that can even change, which may one day have letters. The correct semantics for this data is varchar after all it is descriptive.

The question of zero left alone is good enough to use varchar . Why bother playing a problem with the wrong type.

You should always opt for varchar on any data until you find a reason to choose another type. A general ID, for example, has a good reason to be int , it needs to be incremented, you make accounts with it. The same goes for salary, price, quantity. But not worth to zip code, earphone, even number of a property at address when this field is separate. A due date has to be of another type, you need to sort the information by date order, and as text this would not work correctly. Just a few examples.

Some even say that it is necessary to use varchar because it has punctuation beyond numbers. But this is not a good argument since the score should not even be recorded, at least not in most situations. This should be resolved in the presentation only, the punctuation is not part of the data.

In some cases a numeric type can not be used since the number of digits used to identify something is larger than the numeric type it is.

Some will say that varchar should not be used. Because the size is fixed, a simple char solves the problem. But there is controversy about that. Depending on the DB system one type may actually be more advantageous than another but not enough to justify, in most cases, a choice because of performance. Semantics should be the primary choice.

Unfortunately there is a lot of wrong information on the internet about it (or any subject :)).