Monday, March 7, 2011

what is the purpose and return type of the __builtin_offsetof operator?

What is the purpose of the __builtin_offsetof operator (or _FOFF operator in Symbian) in C++?

In addition what does it return? Pointer? Number of bytes?

From stackoverflow
  • It's a builtin provided by the GCC compiler to implement the offsetof macro that is specified by the C and C++ Standard:

    GCC - offsetof

    It returns the offset in bytes that a member of a POD struct/union is at.

    Sample:

    struct abc1 { int a, b, c; };
    union abc2 { int a, b, c; };
    struct abc3 { abc3() { } int a, b, c; }; // non-POD
    union abc4 { abc4() { } int a, b, c; };  // non-POD
    
    assert(offsetof(abc1, a) == 0); // always, because there's no padding before a.
    assert(offsetof(abc1, b) == 4); // here, on my system
    assert(offsetof(abc2, a) == offsetof(abc2, b)); // (members overlap)
    assert(offsetof(abc3, c) == 8); // undefined behavior. GCC outputs warnings
    assert(offsetof(abc4, a) == 0); // undefined behavior. GCC outputs warnings
    

    @Jonathan provides a nice example of where you can use it. I remember having seen it used to implement intrusive lists (lists whose data items include next and prev pointers itself), but i can't remember where it was helpful in implementing it, sadly.

    Steve Jessop : I'd guess where it was useful was that the intruded nodes contain pointers to the node in the "next" object. When using the list, you need to get from the node to the base of the object, so you subtract offsetof(something) bytes from the pointer value and reinterpret_cast.
    Steve Jessop : All very non-portable in C++, of course, but does the job in C.
  • As @litb, said: the offset in bytes of a struct/class member. In C++ there are cases where it is undefined, in case the compiler will complain. IIRC, one way to implement it (in C, at least) is to do

    #define offsetof(type, member) (int)(&((type *)0)->member)

    But I'm sure there are problems this, but I'll leave that to the interested reader to point out...

    MSalters : Undefined behavior, even in C. Multiple reasons, even: redefining a std macro and deref of NULL. Common in stdlib, though, since that's bound by different rules.
    Robert S. Barnes : @MSalters - JesperE is correct. See the definition in the stddef.h in the Linux Kernel source code: http://lxr.linux.no/#linux+v2.6.31/include/linux/stddef.h#L24
  • As @litb points out and @JesperE shows, offsetof() provides an integer offset in bytes (as a size_t value).

    When might you use it?

    One case where it might be relevant is a table-driven operation for reading an enormous number of diverse configuration parameters from a file and stuffing the values into an equally enormous data structure. Reducing enormous down to SO trivial (and ignoring a wide variety of necessary real-world practices, such as defining structure types in headers), I mean that some parameters could be integers and others strings, and the code might look faintly like:

    #include <stddef.h>
    
    typedef stuct config_info config_info;
    struct config_info
    {
       int parameter1;
       int parameter2;
       int parameter3;
       char *string1;
       char *string2;
       char *string3;
       int parameter4;
    } main_configuration;
    
    typedef struct config_desc config_desc;
    static const struct config_desc
    {
       char *name;
       enum paramtype { PT_INT, PT_STR } type;
       size_t offset;
       int   min_val;
       int   max_val;
       int   max_len;
    } desc_configuration[] =
    {
        { "GIZMOTRON_RATING", PT_INT, offsetof(config_info, parameter1), 0, 100, 0 },
        { "NECROSIS_FACTOR",  PT_INT, offsetof(config_info, parameter2), -20, +20, 0 },
        { "GILLYWEED_LEAVES", PT_INT, offsetof(config_info, parameter3), 1, 3, 0 },
        { "INFLATION_FACTOR", PT_INT, offsetof(config_info, parameter4), 1000, 10000, 0 },
        { "EXTRA_CONFIG",     PT_STR, offsetof(config_info, string1), 0, 0, 64 },
        { "USER_NAME",        PT_STR, offsetof(config_info, string2), 0, 0, 16 },
        { "GIZMOTRON_LABEL",  PT_STR, offsetof(config_info, string3), 0, 0, 32 },
    };
    

    You can now write a general function that reads lines from the config file, discarding comments and blank lines. It then isolates the parameter name, and looks that up in the desc_configuration table (which you might sort so that you can do a binary search - multiple SO questions address that). When it finds the correct config_desc record, it can pass the value it found and the config_desc entry to one of two routines - one for processing strings, the other for processing integers.

    The key part of those functions is:

    static int validate_set_int_config(const config_desc *desc, char *value)
    {
        int *data = (int *)((char *)&main_configuration + desc->offset);
        ...
        *data = atoi(value);
        ...
    }
    
    static int validate_set_str_config(const config_desc *desc, char *value)
    {
        char **data = (char **)((char *)&main_configuration + desc->offset);
        ...
        *data = strdup(value);
        ...
    }
    

    This avoids having to write a separate function for each separate member of the structure.

    Robert S. Barnes : If you wanted to get really evil you could use a hash table containing the parameter names and indexes into `desc_configuration`. Really amazing example by the way.
    Jonathan Leffler : @Robert: this example is closely based on reading data from a configuration file into a big data structure, and reversing the process. I won't bother to explain how it is currently done: suffice to say, there are 300 parameters, about 4500 lines of code in the function that handles it all, and a lot of repetition. I am not in charge of the code - sadly.
    Jonathan Leffler : See also: http://stackoverflow.com/questions/1445762/need-way-to-alter-common-fields-in-different-structs
  • The purpose of a built-in __offsetof operator is that the compiler vendor can continue to #define an offsetof() macro, yet have it work with classes that define unary operator&. The typical C macro definition of offsetof() only worked when (&lvalue) returned the address of that rvalue. I.e.

    #define offsetof(type, member) (int)(&((type *)0)->member) // C definition, not C++
    struct CFoo {
        struct Evil {
            int operator&() { return 42; }
        };
        Evil foo;
    };
    ptrdiff_t t = offsetof(CFoo, foo); // Would call Evil::operator& and return 42
    

0 comments:

Post a Comment