I have a bank that will receive a lot of data. I am in doubt whether it is better to organize the data into a few tables with too much data or to split the data further into multiple tables. In which case will the queries be faster? I'm using MySQL.
I have a bank that will receive a lot of data. I am in doubt whether it is better to organize the data into a few tables with too much data or to split the data further into multiple tables. In which case will the queries be faster? I'm using MySQL.
According to the database normalization rules:
It is better to have more tables with fewer attributes in each, and to make association by foreign keys.
1st Normal Form: All attributes of a table must be atomic, ie, the table must not contain repeated groups or attributes with more than one value. To leave in this normal form, it is necessary to identify the primary key of the table, to identify the column (s) that have (repeat) data and to remove it (s), to create a new table with the primary key to store the and create a relationship between the main table and the secondary table. For example, consider the People table below.
PESSOAS = {ID+ NOME + ENDERECO + TELEFONES}
It contains the primary key ID and the PHONES attribute is a multivalued attribute, and therefore, the table is not in 1FN. To leave it in 1FN, we will create a new table called PHONES which will contain PERSON_ID as the foreign key of PEOPLE and PHONE as the multivalued value that will be stored.
PESSOAS = { ID + NOME + ENDERECO }
TELEFONES = { PESSOA_ID + TELEFONE }
2nd Normal Form: first of all, to be in 2FN you must be in 1FN. In addition, all non-key attributes of the table must depend solely on the primary key (and can not depend only on part of it). To leave in the second normal form, you must identify the columns that are not functionally dependent on the primary key of the table, and then remove that column from the main table and create a new table with that data. For example, consider the STUDENT_CURSOS table below.
ALUNOS_CURSOS = { ID_ALUNO + ID_CURSO + NOTA + DESCRICAO_CURSO }
In this table, the DESCRIPTION_CURSO attribute depends only on the primary key RID_ID. That way, the table is not in 2FN. In order to do this, a new table called COURSES is created, which has as primary key the_CURSO_ID and the DESCRICAO attribute, thus removing the attribute_CURSO from the CURRICULUM_CARDS table.
ALUNOS_CURSOS = {ID_ALUNO + ID_CURSO + NOTA}
CURSOS = {ID_CURSO + DESCRICAO}
3rd Normal Form: to be at 3FN, you must be at 2FN. In addition, the non-key attributes of a table must be mutually independent and dependent solely on the primary key (an attribute B is functionally dependent on A if and only if for each value of A there is only one value of B). To achieve this normal form, you must identify the columns that are functionally dependent on the other non-key columns and extract them to another table. Consider, for example, the following FUNCTIONAL table.
FUNCIONARIOS = { ID + NOME + ID_CARGO + DESCRICAO_CARGO }
The DESCRIPTION_CODE attribute depends exclusively on ID_CARGO (non-key attribute), so you must create a new table with these attributes. That way, we get the following tables:
FUNCIONARIOS = { ID + NOME + ID_CARGO }
CARGOS = { ID_CARGO + DESCRICAO }
Source dsc
As everything in IT .. it depends !!!
In a relational database having normalized information is a good option. Of course using indexes is key. The staff commented on the third normal way, and I agree with them ... but it's nice to know that we have more, six in full and some more details ... And it's up to you to know how far to go!
When it comes to DML (Data Manipulation Language, or INSERT, UPDADE, DELETE, and MERGE), this may change a bit ... Inserting records into different tables in large volumes can be "time-consuming" and I have seen practices have a single table for input and then fragment that data into its proper tables.
As you are working with a relational model, I recommend doing modelo entidade-relacionamento
by following the três formas normais
and not adding more tables than the model generates.
As data grows, scalability techniques can be applied.
A famous technique is to pass database tables in cache
to make queries faster.
Another is to use the concept of Sharding
and break a large table into smaller tables, distributing the load and also streamlining the search. Roughly speaking, it is the concept of load balancer
applied to the database.