Save a JSON in the bank vs. Relationship?

4

When I did not know much about the question of database structure and relationship between tables, I used JSON or comma-separated data in a particular field in any table in the database.

In fact, I remember a specific case where I needed to relate multiple images to a particular post. These images saved a list in JSON to a certain field called imagens .

The structure was more or less like this

-----------------------------------------------
tabela
-----------------------------------------------
id | imagens                      | usuario_id
-----------------------------------------------
1  | ["image1.png", "image2.png"] |   55
2  | NULL                         |   56

But at a certain point in time when changes began to emerge in this system, I was harshly criticized by a programmer who knew more about database than I (and was kidding too) for doing so.

He explained that the data should look something like this:

--------------------------
tabela
--------------------------
id | usuario_id
--------------------------
1  |  55
2  |  56


----------------------------
imagens
----------------------------
post_id | url_imagem
 1      | image1.png
 1      | image2.png

Considering these two cases (VS serialization relationship saved in a field), can we say that, structurally speaking, using JSON is bad practice?

I was criticized for using data in JSON in a table field instead of relationship, but there is some case where I should / can use a serialization and save it in a database field, without this being considered a bad programming practice?

    
asked by anonymous 19.08.2015 / 19:15

1 answer

10

A normalized relational database assumes that each attribute (value of a column in a table) is atomic , that is, it can not be broken into smaller pieces from the point of view of the relational model . From a relational model point of view , it's good to note: no one would suggest that you store each digit of a number in a separate column, or each character in a string. If a data is not used to form relationships between tables, this data can be considered for all atomic effects.

In your example above this is not necessarily true: you have a list of images, ok, but can that list of images be considered atomic? Each image belongs to a single table row, there is no image overlay (ie the same image belongs to several distinct posts), you will always be interested in the complete list and nothing else (ie not possible that you decide, for example, to take only the first image from the list, to make a thumbnail or cover image for example)?

If the answer is a definite yes, you do not have the slightest chance that you want parts of this list of images for anything, or it's the whole list or it's nothing, it is not bad practice to store the list this way, either using JSON, comma-separated strings, Blobs , etc. Just as your image is in an external file, and that file is viewed by the relational model as an atomic thing (you do not reference "chunks" of that image in your model, only the entire image), the list of images - or even some something more complex and structured - can be seen as an atomic entity, and therefore saved in a single attribute.

However, most of the time the answer is not a "yes": it is perfectly possible to design use cases where an image is shared by more than one post , or you want to recover just a subset of the post images for anything, or even if you want to associate with each image additional metadata (like type, size, creation date, etc.). In this case, a normalized representation is often ideal, and this alternative suggestion would correspond to a normal form.

As a general rule, I would say, in doubt, to adopt the normalized form: it is easier to denormalize a table later (in case of performance problems, for example) than to normalize one that started denormalized. Unless you have good reason to save a JSON to a table avoid doing this. There are cases where a JSON is rather the best representation for a data set, but in this case a non-relational database - commonly referred to as NoSQL - may be a preferable option.

    
19.08.2015 / 19:33