Database modeling, when it is possible to denormalize

2

I am modeling a database (MSSQL) and a question has arisen. In every customer business rule data is viewed vertically ex:

  • Product,
  • Original Price,
  • Competitor Price01,
  • Competitor Price02,
  • Competitor Price03,
  • Competitor Price04,
  • Competitor Price05
  • ReadMe

that is, you would define the 5 competitors and at worst would have to add a new column to a new competitor.

But my SQL knowledge says that I should normalize and put the data horizontally, even though I know the odds of a new competitor are small.

ex:

  • Product
  • Original Price
  • Competitor Price
  • Competitor Id; // 1 = Competitor 01, 2 = Competitor 02, etc.
  • ReadMe

By doing the traditional (normalized) form, I have realized that I will have a lot of work to generate the reports as the client desires, since several lines will actually represent only one line for the report. In SQL would have to query with PIVOT

Doubt:

  

What is the correct way in this modeling? Following the business rule of the   customer and then denormalize to facilitate or follow the rigid norms of standardization? This is a scenario that for me would be an advantage to denormalize.

    
asked by anonymous 11.01.2017 / 12:05

3 answers

-1

One conclusion I reached:

  

When to denormalize or not a modeling?

By default standardization is always a path we should follow, in my project it was clear that it seemed one of the rare cases that denormalization would even be worth, for some specific rules of the client.

So my approach was to denormalize. But I had a parameter that I had not commented on in the question and that I put here to help anyone who ventures into a denormalized database.

I had a lot of trouble using EntityFramework , it's definitely not meant for that!

There is even a way to get around a lot of problems you were having, but the solutions were more monumental than the difficulties you would have in the case of standardization.

Summary : I will normalize the database (will generate difficulties in reports and some other points) however EntityFramework + assync processes are not prepared for nonstandard databases . It may be that other languages or even with Ado.net in synchronous, work.

    
13.01.2017 / 14:07
2

In my opinion, data modeling should be performed independent of the sgbd . After data modeling is finished, the physical implementation comes, when it may be necessary to adapt the data model to the characteristics of sgbd.

(1)
Considering 3FN, we could have the following physical implementation in sgbd SQL Server:

-- código #1
CREATE TABLE Produto (
  ID_produto int primary key,
  Nome_produto varchar(200) not null,
  Preço money not null
);

CREATE TABLE Concorrente (
  ID_concorrente smallint primary key,
  Nome_concorrente varchar(80) not null
);

and

-- código #2
CREATE TABLE ColetaPreço (
  ID_produto int not null references Produto,
  ID_concorrente smallint not null references Concorrente,
  Data_coleta date not null,
  Preço money not null
);

You can create a view ( view ) that returns product prices on a single line per product, making the view independent of the physical implementation.

There are advantages and disadvantages to this implementation.

(2)
Sometimes, for performance reasons, denormalize yourself entity for physical implementation. If you realize the need to implement in 2FN, you can then use 5 columns (one for each competitor). Something like this:

-- código #3 v2
CREATE TABLE ColetaPreço (
  ID_produto int not null references Produto,
  Data_coleta date not null,
  Preço_concorrente1 money null,
  Preço_concorrente2 money null,
  Preço_concorrente3 money null,
  Preço_concorrente4 money null,
  Preço_concorrente5 money null
);

The disadvantage of this option is that if you decide to add a sixth competitor in the price collection, you need to change the database structure as well as the existing schedule.

(3)
Another more radical way would be to store the competitor's price list in a single column:

-- código #4 v2
CREATE TABLE ColetaPreços (
  ID_produto int not null references Produto,
  Data_coleta date not null,
  Lista_preço varchar(200) not null
);

Each of these options should come with the corresponding set of manipulation objects (view, function, etc.) to make the data look simple.

    
11.01.2017 / 13:03
2

In general the ideal is to normalize. Until you have a reason not to. I I have answered this in general terms .

There will never be more than 5 competitors' prices? I believe there will not be, right?

Can you afford the cost of having a small, unused space when there are less than 5 prices? I do not think that's a problem. I do not think that's your concern, and we should not always worry about it. Even if you have this concern, you have a solution, although it may not simplify reporting and other operations.

From the comments, it seems that not only will 5 but always be the same. It seems to me a clear case that normalization is only being made to follow the rule. You are disregarding the concept.

When you model you have to think about the concept first. There in the answer that I linkei above says that there are cases that are even doing denormalization, the model should be like this. I do not say that this is the case, but the question is always to ask yourself why you are doing it one way or another. Do not follow rules, do what's right for that case.

Do you think you could have problems in the future with a change because you did the current way? I think not. This part of the database is used throughout the system and a change would be a nightmare? Maybe it's something very specific used in something timely.

Never think of the best template for the report because that's the easy part and that's no problem. But never disregard other factors that may indicate the need for the "rigid" rule. I can not answer for sure because it may have a hidden requirement there, but it seems a case to do as you please.

    
12.01.2017 / 12:47