Implementing TLS in Lwan 2022-03-23

The road to HTTP/2 (and consequently, TLS)

One of the long standing tasks in Lwan is to implement HTTP/2. I've been postponing it since before it was fully standardized, and followed a lot of public discussions that turned SPDY into HTTP/2. I had a few false starts, but never really started the work. One of the first obstacles wasn't even HTTP/2 itself -- which requires substantially more commitment than HTTP/1.1 for a proof-of-concept implementation -- but that most HTTP/2 clients require an encrypted connection, and Lwan never supported HTTPS.

I've mulled over this for quite a while, and even considered implementing HTTP/2 without encryption -- making it available only if the PROXY protocol setting was enabled. This would make it clear that you would need a TLS terminator in front of Lwan. While this would work just fine, it felt clunky and I decided against it.

I also considered using something like OpenSSL and change all standard socket calls to equivalent OpenSSL BIO function calls. However, Lwan has few abstraction layers as a design decision, and I wasn't going to add one. In addition, if encryption is performed in user mode, session keys have to be kept in memory accessible to the Lwan process for as long as the connection was open; should a bug be found in Lwan or in user-written request handlers, it would be possible to syphon out all the session keys, and I wasn't willing to pay that price.

In both FreeBSD and Linux, it's possible to transparently encrypt and decrypt sockets -- with optional hardware offload, which frees up some CPU cycles and implements AES in a custom ASIC. This looked promising after watching some talks from the people that implemented this: if anything, this reduces some of the copies between kernel and user mode, which sounded like a good thing.

TLS in user modeBufferCopy  EncryptedbufferCopy  EncryptedbufferCopy  EncryptedbufferBufferSoftware kTLSCopy  EncryptedbufferCopy  EncryptedbufferBufferOffloaded kTLSCopy  BufferCopy  EncryptedbufferLegend:User modeKernelNIC

Comparison of where and when buffers are encrypted with three different methods: no kTLS, software kTLS, and offloaded kTLS.

The TLS handshake has to be performed in user mode, however; the kernel is merely informed of the keys necessary to do the work to encrypt/decrypt stuff. The server is free to throw away the TLS session handle and doesn't need to keep the key in memory.

It took a while before this was actually stable enough and widely available, and working for both transmission and reception, but the mechanism, at least in Linux, seems pretty solid now. Some TLS libraries, like OpenSSL, will enable this behind the scenes if available, but it's an abstraction layer I'd like to avoid using.

After summarizing the pros:

And the cons:

I decided that the pros outweighed the cons, bit the bullet, and implemented TLS in Lwan using Linux kTLS. It was surprisingly easy. Here's how I did it.

How it works

It uses mbedTLS for the handshake. I was already familiar with mbedTLS, having implemented quite a bit of the Zephyr RTOS crypto subsystem. Once I found out that all the information I would need wasn't in opaque data structures, choosing this library was a no-brainer. (Some libraries, such as OpenSSL and WolfSSL, do not provide ways to obtain the necessary information without needing to perform some questionable hackery on opaque types.)

Note

Unfortunately, mbedTLS doesn't yet support TLSv1.3, so we're stuck with TLSv1.2, which is the minimum TLS version supported by kTLS on Linux. You shouldn't be using older TLS versions anyway, so this is fine.

To make things a bit better, Lwan only enables ciphersuites that are not deprecated/unavailable in TLSv1.3. It also only uses ephemeral key exchange, so that forward secrecy is possible. This should buy us some time until TLSv1.3 is implemented.

Library initialization

Since trying to use kTLS will only fail whenever we try to set the keys with calls to setsockopt(2), first Lwan checks if the Upper Layer Protocol (ULP) supports ktls during initialization:

static bool is_tls_ulp_supported(void)
{
    FILE *available_ulp = fopen("/proc/sys/net/ipv4/tcp_available_ulp", "re");
    char buffer[512];
    bool available = false;

    if (!available_ulp)
        return false;

    if (fgets(buffer, 512, available_ulp)) {
        if (strstr(buffer, "tls"))
            available = true;
    }

    fclose(available_ulp);
    return available;
}

We then proceed to allocate space for a struct containing every context object we might need: the TLS configuration itself, the server certificate, the server public key, and the random number generator. All of this is pretty much standard API usage for mbedTLS, completely unrelated to kTLS, and shouldn't be surprising to anybody that has used this library before:

l->tls = calloc(1, sizeof(*l->tls));
if (!l->tls)
    lwan_status_critical("Could not allocate memory for SSL context");

lwan_status_debug("Initializing mbedTLS");

mbedtls_ssl_config_init(&l->tls->config);
mbedtls_x509_crt_init(&l->tls->server_cert);
mbedtls_pk_init(&l->tls->server_key);
mbedtls_entropy_init(&l->tls->entropy);
mbedtls_ctr_drbg_init(&l->tls->ctr_drbg);

/* Load the server certificate from the path specified in the config file. */
r = mbedtls_x509_crt_parse_file(&l->tls->server_cert, l->config.ssl.cert);
if (r) {
    lwan_status_mbedtls_error(r, "Could not parse certificate at %s",
                              l->config.ssl.cert);
    abort();
}

/* Load the server private key from the path specified in the config file. */
r = mbedtls_pk_parse_keyfile(&l->tls->server_key, l->config.ssl.key, NULL);
if (r) {
    lwan_status_mbedtls_error(r, "Could not parse key file at %s",
                              l->config.ssl.key);
    abort();
}

/* Even though this points to files that will probably be outside
 * the reach of the server (if straightjackets are used), wipe this
 * struct to get rid of the paths to these files. */
lwan_always_bzero(l->config.ssl.cert, strlen(l->config.ssl.cert));
free(l->config.ssl.cert);
lwan_always_bzero(l->config.ssl.key, strlen(l->config.ssl.key));
free(l->config.ssl.key);
lwan_always_bzero(&l->config.ssl, sizeof(l->config.ssl));

mbedtls_ssl_conf_ca_chain(&l->tls->config, l->tls->server_cert.next, NULL);
r = mbedtls_ssl_conf_own_cert(&l->tls->config, &l->tls->server_cert,
                              &l->tls->server_key);
if (r) {
    lwan_status_mbedtls_error(r, "Could not set cert/key");
    abort();
}

/* Initialize a CTR DRBG PRNG using the standard entropy function provided by
 * mbedTLS. */
r = mbedtls_ctr_drbg_seed(&l->tls->ctr_drbg, mbedtls_entropy_func,
                          &l->tls->entropy, NULL, 0);
if (r) {
    lwan_status_mbedtls_error(r, "Could not seed ctr_drbg");
    abort();
}

/* Use a standard configuration for a TLS server. */
r = mbedtls_ssl_config_defaults(&l->tls->config, MBEDTLS_SSL_IS_SERVER,
                                MBEDTLS_SSL_TRANSPORT_STREAM,
                                MBEDTLS_SSL_PRESET_DEFAULT);
if (r) {
    lwan_status_mbedtls_error(r, "Could not set mbedTLS default config");
    abort();
}

/* After our configuration has been initialized, tell it to use the PRNG
 * we just initialized above. */
mbedtls_ssl_conf_rng(&l->tls->config, mbedtls_ctr_drbg_random,
                     &l->tls->ctr_drbg);

We then set up all the supported ciphers at this point:

static const int aes128_ciphers[] = {
    /* Only allow Ephemeral Diffie-Hellman key exchange, so Perfect
     * Forward Secrecy is possible.  */
    MBEDTLS_TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    MBEDTLS_TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
    MBEDTLS_TLS_DHE_RSA_WITH_AES_128_GCM_SHA256,
    MBEDTLS_TLS_DHE_PSK_WITH_AES_128_GCM_SHA256,

    /* FIXME: Other ciphers are supported by kTLS, notably AES256 and
     * ChaCha20-Poly1305.  Add those here and patch
     * lwan_setup_tls_keys() to match.  */

    /* FIXME: Maybe allow this to be user-tunable like other servers do?  */
    0,
};

mbedtls_ssl_conf_ciphersuites(&l->tls->config, aes128_ciphers);

To finalize, we also disable renegotiation, both because we don't handle anything like that later on, and because there has been security problems with it in the past. Even though we do this, however, openssl s_client tells me that renegotiation is supported by the server and I don't really know why.

mbedtls_ssl_conf_renegotiation(&l->tls->config,
                               MBEDTLS_SSL_RENEGOTIATION_DISABLED);
mbedtls_ssl_conf_legacy_renegotiation(&l->tls->config,
                                      MBEDTLS_SSL_LEGACY_NO_RENEGOTIATION);

And last, but not least, allows the negotiation of the HTTP version using ALPN. Of course, we only support HTTP/1.1 for the moment, but this seemed like a good idea anyway:

static const char *alpn_protos[] = {"http/1.1", NULL};
mbedtls_ssl_conf_alpn_protocols(&l->tls->config, alpn_protos);

This is pretty much it to initialize the mbedTLS library to act as a TLSv1.2 server. There are some details that I omitted here, but this is otherwise a literal copy from the Lwan source code (in lwan-thread.c).

Accepting connections

The configuration file format had to be slightly rearranged to support two listeners: previously, all instances of handlers and modules, would be under a listener section; with two listeners (one for HTTP and another for HTTPS), this would be impractical. So I changed it to have the instances of handlers and modules under a site section, and have both a listener and tls_listener sections. This allows us to do a few things:

Three new lwan_connection flags have been added: CONN_LISTENER_HTTP, CONN_LISTENER_HTTPS, and CONN_TLS. The first two are used to denote if a listener is either for HTTP, or for HTTPS; the third denotes if a client connection has been accepted through a listener with the CONN_LISTENER_HTTPS flag. (Previously, the lack of data in a struct epoll_event would signal the I/O thread main loop that it was time to accept connections; it now checks for either flag instead.)

Everything else is the same, until the connection coroutine starts execution and checks if the CONN_TLS flag is set; if so, it then performs the TLS handshake. Any failure to perform the handshake at this point will abort the coroutine. (There isn't much that can be done at this point if the client is expecting a TLS connection.)

Hand-shaking

The handshake is performed inside the coroutine because the mbedTLS library function that drives the handshake (mbedtls_ssl_handshake()) may request to be called again later whenever data is available in the socket, when it's possible to write to the socket, and so on. This tied really well with the coroutines in Lwan; check out the commented code that follows.

static bool lwan_setup_tls(const struct lwan *l, struct lwan_connection *conn)
{
    /* The SSL context can be kept in the stack, because we're going to
     * throw it away after the kernel learns the transmission and reception
     * keys. */
    mbedtls_ssl_context ssl;
    bool retval = false;
    int r;

    mbedtls_ssl_init(&ssl);

    r = mbedtls_ssl_setup(&ssl, &l->tls->config);
    if (UNLIKELY(r != 0)) {
        lwan_status_mbedtls_error(r, "Could not setup TLS context");
        return false;
    }

    /* Yielding the coroutine during the handshake enables the I/O loop to
     * destroy this coro (e.g.  on connection hangup) before we have the
     * opportunity to free the SSL context.  Defer this call for these
     * cases. */
    struct coro_defer *defer =
        coro_defer(conn->coro, lwan_setup_tls_free_ssl_context, &ssl);
    if (UNLIKELY(!defer)) {
        lwan_status_error("Could not defer cleanup of the TLS context");
        return false;
    }

    int fd = lwan_connection_get_fd(l, conn);

    /* We could pass a pointer to ``fd`` here, but this is an optimization
     * that we'll discuss later in this blog post. */
    struct lwan_mbedtls_handshake_ctx ctx = { .fd = fd };
    /* This is the current version of this function call; it used to use
     * the callback functions provided in the mbedTLS library, but those
     * could be optimized slightly. */
    mbedtls_ssl_set_bio(&ssl, &ctx, lwan_mbedtls_send,
                        lwan_mbedtls_recv, NULL);

    while (true) {
        /* This is the reason the handshake is performed within the coroutine:
         * we can yield here.  Amusingly enough, both mbedTLS and Lwan agrees
         * on the terminology here. */
        switch (mbedtls_ssl_handshake(&ssl)) {
        case 0:
            /* This is also part of the optimization that we'll discuss below. */
            flush_pending_output(fd);
            goto enable_tls_ulp;
        case MBEDTLS_ERR_SSL_ASYNC_IN_PROGRESS:
        case MBEDTLS_ERR_SSL_CRYPTO_IN_PROGRESS:
        case MBEDTLS_ERR_SSL_WANT_READ:
            coro_yield(conn->coro, CONN_CORO_WANT_READ);
            break;
        case MBEDTLS_ERR_SSL_WANT_WRITE:
            coro_yield(conn->coro, CONN_CORO_WANT_WRITE);
            break;
        default:
            goto fail;
        }
    }

enable_tls_ulp:
    /* In order to set both the TLS_RX and TLS_TX keys for a socket, we must
     * enasble the TLS ULP that we tested for during initialization. */
    if (UNLIKELY(setsockopt(fd, SOL_TCP, TCP_ULP, "tls", sizeof("tls")) < 0))
        goto fail;
    /* Call our helper function to set keys, for reception and transmission. */
    if (UNLIKELY(!lwan_setup_tls_keys(fd, &ssl, TLS_RX)))
        goto fail;
    if (UNLIKELY(!lwan_setup_tls_keys(fd, &ssl, TLS_TX)))
        goto fail;

    /* If we succeed here, all syscalls that perform I/O with the socket
     * will be transparently encrypted/decrypted. */
    retval = true;

fail:
    /* And get rid of the context, including our copies of the keys. */
    coro_defer_disarm(conn->coro, defer);
    mbedtls_ssl_free(&ssl);
    return retval;
}

So far, there's not a lot here that's really surprising; the meat and potatoes of this thing is in the lwan_setup_tls_keys() function, which is where we poke inside the ssl struct to find things like the initialization vector, keys, salt, this kind of stuff:

static bool
lwan_setup_tls_keys(int fd, const mbedtls_ssl_context *ssl, int rx_or_tx)
{
    /* This struct is provided by the Linux kernel as part of its
     * kTLS API. */
    struct tls12_crypto_info_aes_gcm_128 info = {
        .info = {.version = TLS_1_2_VERSION,
                 .cipher_type = TLS_CIPHER_AES_GCM_128},
    };
    const unsigned char *salt, *iv, *rec_seq;
    const mbedtls_gcm_context *gcm_ctx;
    const mbedtls_aes_context *aes_ctx;

    /* Poke inside the ``ssl`` struct to find things we're going to need,
     * which are different if we're either transmitting or receiving: */
    switch (rx_or_tx) {
    case TLS_RX:
        salt = ssl->transform->iv_dec;
        rec_seq = ssl->in_ctr;
        gcm_ctx = ssl->transform->cipher_ctx_dec.cipher_ctx;
        break;
    case TLS_TX:
        salt = ssl->transform->iv_enc;
        rec_seq = ssl->cur_out_ctr;
        gcm_ctx = ssl->transform->cipher_ctx_enc.cipher_ctx;
        break;
    default:
        __builtin_unreachable();
    }

    iv = salt + 4;
    aes_ctx = gcm_ctx->cipher_ctx.cipher_ctx;

    /* Copy those to the struct that the kernel will consume */
    memcpy(info.iv, iv, TLS_CIPHER_AES_GCM_128_IV_SIZE);
    memcpy(info.rec_seq, rec_seq, TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
    memcpy(info.key, aes_ctx->rk, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
    memcpy(info.salt, salt, TLS_CIPHER_AES_GCM_128_SALT_SIZE);

    /* And finally set the keys. */
    if (UNLIKELY(setsockopt(fd, SOL_TLS, rx_or_tx, &info, sizeof(info)) < 0)) {
        lwan_status_perror("Could not set %s kTLS keys for fd %d",
                           rx_or_tx == TLS_TX ? "transmission" : "reception",
                           fd);
        lwan_always_bzero(&info, sizeof(info));
        return false;
    }

    /* Regardless of success or failure, we also zero out the ``info`` struct
     * as it contains copies of the keys.  We use ``lwan_always_bzero()``, that
     * ensures that the compiler won't elide the ``memset()`` call. */
    lwan_always_bzero(&info, sizeof(info));
    return true;
}

And that's pretty much it! All things considered, this wasn't a lot of work this side of the kernel. There was only one thing that required an immediate fix, though.

Optimizing the handshake

By looking at the strace output when using the standard mbedTLS I/O callback functions, one could see that every step of the TLS handshake was sent as a new TCP fragment, which was confirmed with Wireshark.

https://tia.mat.br/temp/img/tls-lwan-before.png

Wireshark screenshot showing 24 TCP fragments from connection being accepted to response being sent

I wrote my own functions that use MSG_MORE to buffer all the things that are transmitted during the handshake, and this worked like a charm. The handshake is now performed with half as many TCP fragments, which is pretty remarkable considering how simple the optimization was.

https://tia.mat.br/temp/img/tls-lwan-after.png

15 TCP fragments from connection being accepted to response being sent.

In theory, I could also cork the TCP connection and uncork before handing it over to the client, but this didn't seem to work the way I expected it would. Maybe I need to experiment further because this would simplify things a bit.

What's next

There are a few things to sort out still:

References

If you liked this post, consider getting me a coffee!