How to assemble a list of generic objects in C?

4

In many high-level languages it is possible to have a structure or a collection of data of varying types, often using type Object for this.

How to do the same in C? That is, I do not know the types that will enter the structure or array , it can be any that will be correct.

    
asked by anonymous 30.01.2017 / 11:47

2 answers

4

Most likely to use void * . This will cause any object to be placed in the object in question (a structure or a collection) as a reference, that is, it will be a pointer. This way you even the type that will always have the size of the pointer.

The downside is that all data will be pointed out. This can be bad for types that are usually by value ( int , char , double , etc.) since there will be an indirection to access the data (accesses the pointer and then goes where the value is) besides taking up space for the object, it is very likely that you will have to allocate it in heap . It will be a mistake in most cases to just grab the pointer for the stack value. This works if it does not escape out of the current function (you can use the functions called there, but not by the caller, be confused about this give a smooth How stack works ).

It also has another possibility that can optimize this, although it can give a certain complexity in the code. We can use union with scalar types and probably void * for others. It will always occupy the space of the largest type, the types by value (scalar) will be used by same value and only the types by reference will have the indirection, which would already have anyway.

In fact to stay more flexible will probably take up too much space to indicate which type is stored there. This is called tagged union .

Note that what you are doing is to make C a dynamic typing language. In fact this is how dynamic typing languages usually work.

See in the two runs below on different machines that the size of the structure is different.

#include <stdio.h>

typedef struct {
    enum { is_int, is_float, is_char, is_pointer } type;
    union {
        int i;
        float f;
        char c;
        void *p;
    } value;
} Tipo;

int main(void) {
    int x = 10;
    float y = 5.5f;
    char c = 'h';
    char a[] = "teste";
    Tipo var1 = { .type = is_int, .value.i = x };
    Tipo var2 = { .type = is_float, .value.f = y };
    Tipo var3 = { .type = is_char, .value.c = c };
    Tipo var4 = { .type = is_pointer, .value.p = a };
    printf("%d\n", var1.value.i);
    printf("%f\n", var2.value.f);
    printf("%c\n", var3.value.c);
    printf("%s\n", (char *)var4.value.p);
    printf("%d\n", var2.type);
    printf("%d\n", sizeof(var2));
    printf("%d\n", var3.type);
    printf("%d\n", sizeof(var3));
}

See running on ideone . And No Coding Ground . Also put it on GitHub for future reference .

    
30.01.2017 / 11:47
3

Then - C is a statically typed language, with no static support for objects. on the other hand, it gives you control almost everything you want to do, and it provides a "generic" data type - which is just the void (void) type pointer. When you declare a variable of type void * - all that the compiler "knows" is that it is a memory address - and your program will be solely responsible for manipulating the data in that memory region.

So you can not have a "generic" object that uses "Structs" pre-defined in C dynamically - you can not pass a struct type as a parameter to a function. This means that you will have to structure your object types so that they have some fixed fields at the beginning of the data structure that describe the layout of the data in later sessions (including size).

For example, you could define, in "Portuguese" itself, that for your objects, the first two bytes will be a 16-bit integer defining the length of a string, where each byte corresponds to an ASCII character defining a field - type "B - Unisgned char, b - signed char, I 32bit unsigned integer" - "L 64 bit unsigned integer", "Z 16bit size prefixed string". And then you write functions that treat data with this formatting as described. Note that this is independent of whether you define this object header as a struct itself, or simply use pointer arithmetic, within its functions, to allocate the required memory and manipulate the attributes of your objects dynamically.

For the type of object I described above, we could have this function to create new objects, allocating the required memory at runtime:

#include <stdlib.h>

void *create_object(char *definition) {
   short unsigned int size = 2, def_len=0;
   void *new_obj=NULL;

   for (int i = 0; definition[i]; i++) {
      def_len ++;
      size += 1;
      switch (defintion[i]) {
         case 'B': size += 1; break;
         case 'I': size += 4; break;
         ...
      }
   }
   new_obj = malloc(size);
   if (!new_obj) {return NULL;}
   (short integer *)(new_obj[0]) = def_len;
   for(int i = 0; definition[i]; i++) {(char *)(new_obj[i]) = defintion[i]}
   return new_obj;
}

(A function to manipulate the fields themselves, within this reserved memory, would have to go through each character of the definition string to know the position of each field, when it was to access a field by its numeric index):

int get_field_offset(void *obj, int field_num, char *type) {
    int field_offset = 2 + *((short int *)obj);
    for (int i = 0, j = 0; i < field_num; i ++) {
         char field_type = (char *)(obj[i + 2]);
         if (i >= *((short int *)obj)) {return 0;}
         switch (field_type) {
             case 'B': field_offset += 1; break;
             case 'I': field_offset += 4; break;
             ...
         }

    }
    type[0] = field_type
    return field_offset
}

void set_field_value(void *obj, int field_num, void *value) {
    char *type[1]=0;
    int offset;
    offset = get_field_offset(obj, field_num, type)
    if (!offset) return;  // field does not exist
    switch (type[0]) {
        case 'B': *(char *)(obj[offset]) = value;
        case 'I': *(int *)(obj[offset]) = value;
    }
}

void * get_field_value(void *obj, int field_num, char *type) {
    int offset;
    offset = get_field_offset(obj, field_num, type)
    if (!offset) return NULL;
    // Return the address of the  exact field, and its type indication on "type"
    return &(obj[offset]);
}
So realize that you can manipulate different data structures that change at runtime, and you do not even need to use the C struct keyword. You can even use an object definition that comes from an input of data - whether the user is typing or reading from a text file.

void *coordenadas = create_object("ff")
set_field_value(coordenadas, 0, 23.0);
set_field_value(coordenadas, 1, 45.23); 
...

(For that, just put f as float or even double in the switch cases above) - and you can save latitude and longitude on these objects.

This is a "very crude" form - and it would be hard work to accommodate variable length data types in there. But you could fine-tune how much you wanted, for example, by adding a field to count how many references there are to the object (so, whenever a code snippet no longer needs an object, one of the reference counter decreases - if that counter reaches zero , the object can be deallocated immediately, releasing the memory). Another interesting sophistication is to include a table of strings that would allow, for example, to give textual names to the fields. Of course the C code is getting proportionately more complex.

Various object systems, or generic data protocols, are written in pure C, and all of them have to depart from more or less of these principles (det er fixed fields at the beginning of the data that determine the layout of the whole object) - the "gobject" framework, for example, Google's "protobuf", Cap'n'Proto and the Python programming language itself - of which all objects have a memory representation that can be used from the C language done well in those terms. (In general, these initial fields that define the layout of an object are not visible if you access the object from Python code, but are there if you access objects from C). The definition of the objects in Python has to be included in any extension in C that goes to manipulate Python objects, for example, and to handle generic objects, it uses the types (typedefs) defined in the file object.h - see this file, near line 112.

    
31.01.2017 / 05:09