The question has a slightly more conceptual bias, so I'm going to use a simple generic example.
There are cases where some column may be in special situation, this is because an information:
- is not available,
- is invalid at some point,
- is unknown,
- is undetermined,
- has no relevance,
- may be in a special situation that only makes sense for the specific domain .
One way to indicate the special situation is to use of null
, which is already questionable whether it is good or bad, although everything has its merit under certain circumstances. The use of null
is a clear case of choosing unnatural special values in the column domain to indicate a special situation.
But you may need to indicate more precisely what the special situation is, the reason for the current state. At ask about null
there is a definition of how to solve a special situation. But there may be more than one motive, and worse, new motives may be included in the future.
For example, in an academic control there is a column with students' grades in subjects it may not be enough to have a grade of zero. A zero note is not the same as no note. And a null
may not be sufficient since it only indicates that there is no note. You need to know why there is no note, since each circumstance may require different treatment. You need to know if the note:
- not yet given,
- does not exist because the student is exempt from that assessment,
- he did not do the evaluation (which in this case is different from him having zero rating).
I know that in some cases it is possible to avoid certain reasons for improving standardization. That's not the point here.
There are basically two solutions to this:
-1
would indicate the reason a
%% of% motif -2
and b
to motif -3
). In a domain with a special situation should we use values or auxiliary columns to indicate the situation? That is, is there more advantages in one than another? I particularly prefer the second.
How to deal with these cases? After all, queries can give wrong results without considering these cases. And this goes for either option.
A typical example of the problem is the calculation of the average values of selected rows. But all problems involving c
apply equally in this case. Database systems often have a solution to deal with null
.
Bonus point: Is there a way to abstract (make transparent) this? Or the user (perhaps a developer) will have to know this need and properly handle each query?