Surrogate Key and Natural Key

3

I am studying about data warehouse , but I was confused about the concepts of SK and NK.

As I understand in a DW, a natural key would be a foregin key column in the dimension table of the column that is PK referring to the same table in the database of data. That is, a copy of that column.

The surrogate key would be a PK of the dimension table in DW.

Exemplifying:
BD

tabela_usuario  
id_usuario (PK):  
U100  
U101  
U102...  

DW

dimensao_usuario  
sk_usuario (PK), nk_usuario (FK)  
1, U100  
2, U101  
3, U102  

Is my understanding correct?

    
asked by anonymous 18.02.2018 / 18:30

2 answers

3

The surrogate key ( surrogate ) or artificial is a data that is created for purposes of database control, it does not exist outside of the software solution being developed. It is usually a unique, stable (does not change), sequential, and self-incrementing numeric ID. You have some control over it. It's just that.

The natural key is one that exists outside the application and is being used as a key. A CPF or CNPJ is a natural key, it exists independent of the database. It can even be a name, although unusual, it can be an email, phone, or document number, it can be an EAN or other standardized by some industry, anyway it can be any data that serves as a key and you do not have control over it or has limited control, is an inherent property of the actual object being registered in the database.

Whether the key is primary or not is an issue that does not matter there. If it is primary it must be unique, ideally short, predictable, that you have control and be stable, so it is very common to use a substitute key.

Nothing prevents you from using a natural key as a primary, it can only cause problems, people think they will never have a problem using it, and one day they find out that there is a problem.

The impression you give me is that id_ usuario uses a substitute key, and dimensao_usuario uses it as a secondary (foreign) key. Why the same key that was substitute to be natural when used elsewhere? For me, in this example, only substitute keys exist, unless%% is an existing data outside the database that was transported to it, then the data in the two tables is natural and U100 is the substitute .

    
18.02.2018 / 20:00
2

There is a great article on natural keys, primary keys, surrogate keys etc. in Primary Key Primer for SQL Server . Note that this conceptualization is independent of the DW context.

Here is an excerpt about choosing a natural or substitute key:

Should I use a natural or surrogate key?
A key is just a combination of column (attribute) values that provides an obvious way of distinguishing rows, or 'domain' key is one that has meant outside the database environment, such as Longitude / Latitude. Many people argue for a general rule that keys must be natural (or perversely that they should not be) or that they must be immutable. I've even heard it said that all joins must be on key values. As always, it depends. On this topic, there are very few hard and fast rules, because there are conflicting requirements between the knowledge level and the implementation requirements.
(...)
Surrogate keys are the normal way of getting round the complexities of trying to handle natural keys that are ungainly or do not conform to your business rules. These are fine if they are kept private within the database: otherwise, they are a form of technical debt. They make the coding of databases easier but are disliked by book-keepers, accountants, retailers or anyone else who has to handle them. They are not human-friendly. Sometimes, surrogate keys 'escape' into the world if exposed to 'reference numbers' and take on a permanent and intended meaning that prevents any refactoring or renumbering within the database.

I suggest reading this article.

In the case of data warehouse (DW) the focus changes a little, because of temporality. For example, a given product may have its coding modified at any given time; this becomes transparent if DW has chosen to use a substitute key to identify the products. Even though in the database of the OLTP the coding has been changed, in the DW database it remains unchanged, allowing to follow the product throughout its existence.

There are also other reasons for using substitute keys in DW. One is that information about the same object may have different encodings, depending on the source, when the DW is fed from several sources (including from outside the company). In this way, the substitute key chosen for the object starts to function as standardization.

This can be confirmed by the definition of Surrogate Keys on the Kimball Group website, where it is mentioned that Actually, the surrogate key in the data warehouse is more than just a substitute for a natural key. In the data warehouse, the surrogate key is a necessary generalization of the natural production key and is one of the basic elements of data warehouse design .

In fact, this entry reinforces that Every join between dimension tables and fact tables in the data warehouse environment should be based on surrogate keys, not natural key .

TRANSLATION OF ARGUMENTS ABOVE ORIGINALLY IN ENGLISH
A key is just the combination of column values (attribute) that provides an obvious way to differentiate rows, and a natural (or domain) key is one that has meaning outside the database environment, such as latitude / longitude. Several people maintain by a general rule that the keys must be natural (or stubbornly do not owe) or should be immutable. I myself have heard that all joins must be through key values. As always, it depends. There are very few hard and fast rules on this subject, because there are sometimes conflicting demands knowledge and implementation requirements.
(...)
Substitute keys are the usual form of circumventing the complexities of trying to deal with natural keys that are clumsy or that do not conform to business rules. This is correct if they are kept restricted to the database: otherwise, they are a type of technical debt. They make it easier to encrypt databases but are rejected by bookkeepers, accountants, resellers, or anyone else who has to use them. They are not friendly. Substitute keys sometimes escape the world if exposed as 'reference numbers' and assume a permanent sense that prevents any internal modification or renumbering in the database.

    
20.02.2018 / 00:55