Encoding tables in constants 2021-04-06

A common pattern you see in programs are predicates such as is_action_allowed(), or should_perform_action(), taking a constant value, often from an enumerated type, and returning a boolean value. A straightforward way to implement such predicates is to write a function like so:

enum http_methods {
    HTTP_METHOD_POST,
    HTTP_METHOD_PUT,
    HTTP_METHOD_DELETE,
    HTTP_METHOD_GET,
};

bool is_method_allowed(enum http_methods method)
{
    if (action == HTTP_METHOD_POST)
        return true;
    if (action == HTTP_METHOD_PUT)
        return true;
    return false;
}

While this works just fine and is indeed quite efficient – even if you consider something like its big-O complexity, N is so small here that it really doesn't make a difference, especially considering how cheap an integer comparison is – it's still a function that has to be called every time (as it's an implementation detail and would probably not be inlined in a header somewhere), or patched every time a new method is added to the enum http_methods enum.

Another way that works just as well is defining a table – where index is the enum and the value is a boolean. A bitmap could be even used as a more efficient way to encode this, but declaring such table in a language like C would probably either require repetitive and error-prone code, or hard to read macros that are also tricky to get right.

enum http_methods {
    HTTP_METHOD_POST,
    HTTP_METHOD_PUT,
    HTTP_METHOD_DELETE,
    HTTP_METHOD_GET,
    _HTTP_METHODS_MAX,
};

bool is_method_allowed(enum http_methods method)
{
    static const bool allowed_methods[_HTTP_METHODS_MAX] = {
            [HTTP_METHOD_POST] = true,
            [HTTP_METHOD_PUT] = true,
    };
    assert(method != _HTTP_METHODS_MAX);
    return allowed_methods[method];
}

Tables are amazing to implement predicate functions like that: sometimes code that's inherently branchy can be reduced to a simple array lookup, which is cheap and readable. Not many optimization techniques offer both at the same time.

What I don't see explored that much, though, is the fact that we can encode these tables in the constant themselves. If we're dilligent about the values we use in an enumeration and, instead of using the default values assigned by the compiler, we can use individual bits to signify values that can be used for these predicate functions.

For instance, if we consider that the lower 16-bit are enough for all of our enumerated values, and 16-bit is enough for all of our flags, we could do something like so instead:

enum http_methods {
    _HTTP_METHOD_FLAGS_MASK = 0xffff0000,
    _HTTP_METHOD_ALLOWED = 1<<16,

    HTTP_METHOD_POST = _HTTP_METHOD_ALLOWED | 0,
    HTTP_METHOD_PUT = _HTTP_METHOD_ALLOWED | 1,
    HTTP_METHOD_DELETE = 2,
    HTTP_METHOD_GET = 3,
};

static inline bool is_method_allowed(enum http_methods method)
{
    return (method & _HTTP_METHOD_FLAGS_MASK) & _HTTP_METHOD_ALLOWED;
}

This isn't that much harder to read, but we transform a memory read into a mask and make it a short static inline function. This opens up a few optimization possibilities, especially if two or more predicates need to be used in succession to determine if something should be performed at that point or not.

It's very unlikely that something like this will ever be a bottleneck of a program, but if you're trying to squeeze the most out of something, this is one of those low-hanging fruits that you just gotta give a try. Especially when you're on an embedded system where size is at a premium.

A drawback for this technique is that, if this enumeration is not just an implementation detail and ends up being part of a public API and thus the ABI, making changes can be tricky if the table needs to be extended beyond the alotted 16-bit value, so plan with care.

If you liked this post, consider getting me a coffee!