Optimizing a JSON serializer (part 1)

A few years back, I wrote a tiny JSON library for the Zephyr OS; it focused mostly in things that are important for an embedded real-time operating system: code size, type-safety, and predictability (both for memory and CPU time usage). It was never tuned to ace performance benchmarks: the other constraints were far more important driver for its design decisions.

As part of the Lwan entry in the TechEmpower Web Framework Benchmarks (TWFB), the JSON library from CCAN was used. It was in part responsible for Lwan “taking the crown” for that round, according to their blog post. It’s certainly not a bad library, but considering I can reason about every line of code in the JSON library I wrote for Zephyr, I decided to use it instead, moving forward.

I could just consider the JSON library as a black box and not worry about its guts, but I wanted to optimize it — just for fun — and being able to understand every design decision in the new library contributed a lot to the decision of changing libraries.

In the end, though, there’s not much point in spending time optimizing a JSON encoder that’s good enough already; in a web service, most of the time is spent in other areas (e.g. the network transport), and it’s very unlikely that changes in something like this will actually influence benchmarks. Lwan is, however, more of a testbed of ideas, where I’m free to try out different techniques or play with things I rarely have the opportunity to play with in the “real world”, so this seemed like a good candidate to start.

For those unfamiliar with the JSON benchmark in TWFB, it essentially has the requirement that, for a particular endpoint, an object is created and serialized on-the-fly (without caching). The object has to have the content ({"message": "Hello, World!"}), so unless the JSON library is pretty bad to begin with, there isn’t much that can be done to speed this up.

The CCAN library requires a JSON object tree to be built before it can serialize, just to require that data to be torn down immediately afterwards; it is kind of wasteful for this purpose. It also requires serializing the whole JSON buffer into a chunk of memory that’s allocated by the encoder itself, potentially requiring copies to send the response.

The new JSON library, however, takes a different approach.

Instead of requiring an object tree to be built, the serializer traverses a descriptor array while obtaining values from a user-supplied struct (the descriptor contains the type and offserts to the user-supplied pointer where the values can be obtained); it kind of performs “manual reflection”, if such thing would make sense.

The struct can be allocated in the stack, avoiding expensive roundtrips to malloc (and occasional heap fragmentation). My previous blog post about the JSON library has more details on how this works, from the parsing side.

/* Declare the struct and its descriptor */
struct hello_world_json {
    const char *message;
};
static const struct json_obj_descr hello_world_json_desc[] = {
    JSON_OBJ_DESCR_PRIM(struct hello_world_json, message, JSON_TOK_STRING),
};

LWAN_HANDLER(json)
{
    struct hello_world_json j = {.message = "Hello, World!"};
    int ret;

    response->mime_type = "application/json";

    ret = json_obj_encode_full(hello_world_json_desc, 1,
                               &j,
                               append_to_strbuf, response->buffer,
                               /* escape_keys */ false);

    return ret ? HTTP_INTERNAL_ERROR : HTTP_OK;
}

In addition, the encoder takes a function pointer that’s called to append bytes to a buffer; this interface, inspired by Go’s Writer interface, completely decouples the JSON encoder from the buffer manipulation facilities.

static int append_to_strbuf(const char *bytes, size_t len, void *data)
{
    struct lwan_strbuf *strbuf = data;

    return !lwan_strbuf_append_str(strbuf, bytes, len);
}

For the benchmark, since each response in Lwan has a struct lwan_strbuf, that’s what the supplied callback ends up using.

This function pointer does not necessarily need to be used to write bytes to a buffer; for instance, here’s how one can figure out how much space they’ll need to encode a particular struct in JSON:

static int
measure_bytes(const char *bytes __attribute__((unused)), size_t len, void *data)
{
    ssize_t *total = data;

    *total += (ssize_t)len;

    return 0;
}

ssize_t json_calc_encoded_len(const struct json_obj_descr *descr,
                              size_t descr_len,
                              const void *val)
{
    ssize_t total = 0;
    int ret;

    ret = json_obj_encode(descr, descr_len, val, measure_bytes, &total);
    if (UNLIKELY(ret < 0)) {
        return ret;
    }

    return total;
}

Miscelaneous little improvements

Alignment: shift ➡ full values. Zephyr is built for embedded devices. Memory is severely constrained, so it’s usually fine to pay the price of packing multiple integer values into a single uint32_t. Lwan doesn’t have that limitation, so bitfields were removed from that version.

Getting rid of branches while encoding commas between elements: JSON uses commas as item delimiter in its collection types; as such, it doesn’t allow trailing commas. The encoder originally checked if every descriptor entry was the last item; it was modified to never check if it’s the last item, but rather:

  • For objects, is starts looping from the second descriptor element, always appending a comma inside the loop; when that’s done, the first descriptor is serialized.
  • For arrays, the loop runs (N - 1) times, always appending the comma, and finally adds the last element.

Using `int_to_string()` to serialize integers: the library originally used snprintf() to serialize integers. This is inneficient, so another method to convert integers to string was used instead.

Keys don’t need to be escaped, most of the time. If that’s the case, branches can be removed from the fast path and append_bytes() can be called with the key name directly. This saves a lot of indirect function calls. For keys that need to be escaped, or for string values, the library would previously call append_bytes() for each byte from the string to be escaped; it’s now batched and only split into multiple calls if there’s a character that needs to be escaped.

Deferred error checking: ``append_bytes()`` isn’t supposed to fail, most of the time. So it’s fine to not return early and keep trying to serialize, as long as the original caller knows that an encoding error happened. So, instead of many err = append_bytes(...); if (err < 0) { return err; } lines, the encoder essentially does err |= append_bytes(...) and returns err at the end. This code is not equivalent, especially since append_bytes() can return any negative error code, but it does remove alot of branches in the fast path. If you know your error codes (e.g. they’re all power of two), this can actually be a good solution, and better than relying on error codes from <errno.h>.

Contrast the new code:

int json_obj_encode_full(const struct json_obj_descr *descr,
                         size_t descr_len,
                         const void *val,
                         json_append_bytes_t append_bytes,
                         void *data,
                         bool escape_key)
{
    int ret = append_bytes("{", 1, data);

    if (LIKELY(descr_len)) {
        for (size_t i = 1; i < descr_len; i++) {
            ret |= encode_key_value(&descr[i], val, append_bytes, data,
                                    escape_key);
            ret |= append_bytes(",", 1, data);
        }

        ret |= encode_key_value(&descr[0], val, append_bytes, data, escape_key);
    }

    return ret | append_bytes("}", 1, data);
}

With the previous code. Note that all the error-checking branches are gone, and the code is much shorter:

int json_obj_encode_full(const struct json_obj_descr *descr,
                         size_t descr_len,
                         const void *val,
                         json_append_bytes_t append_bytes,
                         void *data,
                         bool escape_key)
{
    int ret;

    ret = append_bytes("{", 1, data);
    if (UNLIKELY(ret < 0)) {
        return ret;
    }

    if (LIKELY(descr_len)) {
        for (size_t i = 1; i < descr_len; i++) {
            ret = encode_key_value(&descr[i], val, append_bytes, data,
                                   escape_key);
            if (UNLIKELY(ret < 0)) {
                return ret;
            }

            ret = append_bytes(",", 1, data);
            if (UNLIKELY(ret < 0)) {
                return ret;
            }
        }

        ret = encode_key_value(&descr[0], val, append_bytes, data, escape_key);
        if (UNLIKELY(ret < 0)) {
            return ret;
        }
    }

    return append_bytes("}", 1, data);
}

Pre-encode keys with quotes and colon: for every key/value pair in an object, the encoder had to call append_bytes() at least 5 times: one for each comma, one for the colon, one for the key, and one for the value.

static int
str_encode(const char **str, json_append_bytes_t append_bytes, void *data, bool escape)
{
    int ret;

    ret = append_bytes("\"", 1, data);
    if (UNLIKELY(ret < 0)) {
        return ret;
    }

    if (escape) {
        ret = json_escape_internal(*str, append_bytes, data);
    } else {
        ret = append_bytes(*str, strlen(*str), data);
    }
    if (!ret) {
        return append_bytes("\"", 1, data);
    }

    return ret;
}

static int encode_key_value(const struct json_obj_descr *descr,
                            const void *val,
                            json_append_bytes_t append_bytes,
                            void *data,
                            bool escape_key)
{
    int ret;

    ret = str_encode((const char **)&descr->field_name, append_bytes, data,
                     escape_key);
    if (UNLIKELY(ret < 0)) {
        return ret;
    }

    ret = append_bytes(":", 1, data);
    if (UNLIKELY(ret < 0)) {
        return ret;
    }

    return encode(descr, val, append_bytes, data, escape_key);
}

By pre-calculating the key+colon in compile time, each key/value pair will make at most 2 calls to append_bytes(): one for the key, and one for the value. This could be easily done in the macros that define a JSON object descriptor, and stored right after the unencoded key (because each descriptor also carries the key length with it).

static int encode_key_value(const struct json_obj_descr *descr,
                            const void *val,
                            json_append_bytes_t append_bytes,
                            void *data,
                            bool escape_key)
{
    int ret;

    if (!escape_key) {
        ret = append_bytes(descr->field_name + descr->field_name_len,
                           descr->field_name_len + 3 /* 3=len('"":') */, data);
    } else {
        ret = str_encode((const char **)&descr->field_name, append_bytes, data);
        ret |= append_bytes(":", 1, data);
    }

    return ret | encode(descr, val, append_bytes, data, escape_key);
}

Next time

I’ll probably continue working in this JSON library before the next round of the TWFB takes place. I’m working on another improvement (that will take some time to finish) that, if it works, will be described here in the blog. (And, of course, next time performance numbers will accompany the article.)