The great advantage is in the organization of memory, and in its reuse.
Variables in a struct
are organized into sequential addresses, so that each variable that makes up struct
lies side by side in memory.
Your example is not a good example for a union
, so I will not use it.
Imagine that we have a grocery item. This item
has a name, price, and dimension. The size can be either in volume (1 liter) or weight (1 kg). So we could create the following struct
:
struct item {
char nome[50];
float preco;
float volume; // em litros.
unsigned peso; // em gramas.
}
In this struct item
, we would have allocated memory as follows (I'm guessing memory byte alignment, for simplicity):
0-------------49-50-------53-54--------57-58-------61
nome preco volume peso
Note, however, that in the case of milk, we do not buy milk by weight, but by volume. Therefore, the struct item.peso
field would not have a valid value for this item, but would always occupy memory.
The same goes for cheese: it is sold in grams, not in liters.
How can I reduce the memory used? We can declare within union
the fields volume
and peso
:
struct item {
char nome[50];
float preco;
union {
float volume;
unsigned peso;
}
}
Now our memory layout will be:
0-------------49-50-------53-54-------------57
nome preco volume/peso
In this way, when we access the struct item.volume
field, the compiler knows that we are treating that memory region as a float
, and will handle it correctly. The same goes for when we access struct item.peso
, it knows that it is a unsigned
, and will apply the unsigned
rules.
But, what if we do:
struct item it;
it.peso = 2;
it.volume = 0.0f;
printf("%u", it.peso);
The output will not be 2
, which is the value we put in the peso
variable, but the binary value of 0.0
in IEEE 754 interpreted as a unsigned
. Coincidentally, this value is also 0
, and therefore the output will be 0
.
Why?
Remember that the volume
and peso
fields occupy the same memory region. So the assignments wrote at the same address.
So if we access the value by the "wrong" field, we can get absurd results for our mastery of the problem. So how do you know which field to use?
We can add a flag indicating this:
struct item {
char nome[50];
float preco;
bool porVolume;
union {
float volume;
unsigned peso;
}
}
And so if we wanted to print the contents of an item, we could use:
if ( it.porVolume ) {
printf("%s\t%.2f\t%.3f", it.nome, it.preco, it.volume);
} else {
printf("%s\t%.2f\t%u", it.nome, it.preco, it.peso);
}
And this pattern is repeated for when we access the fields of union
.
In addition to using within struct
, we can use union
as a type itself:
union pesoVolume {
float volume;
unsigned peso;
}
union pesoVolume pv;
pv.volume = 0.0f;
The operation is identical except that union
will no longer be within struct
.
In language C, the fields that make up union
's can have different size, inclusive, and the compiler will reserve memory identical to the size of the largest variable. That is:
union u1 {
float f1; // 4.
unsigned f2; // 4.
}
printf("%d", sizeof(union u1)); // 4.
union u2 {
float f1; // 4.
long int f2; // 8.
char f3[20]; // 20.
}
printf("%d", sizeof(union u2)); // 20.
When to use?
Today does not make much sense anymore, I think. In the past, memory was a non-abundant resource, and therefore justified making these savings. Today, the standard of a PC is 4GB, and it is not uncommon to find machines with 8GB or more.
In some cases, however, union
facilitates the passing of parameters in an API, and can be used if the programmer identifies the advantage. This occurs in some Win32
commands. I sincerely do not recommend it, as it may be that some languages do not support this organization, causing interoperability issues.
Why did not I use your example?
We would probably like to keep both the idade
information for a person and your peso
.