Using MAP_FIXED_NOREPLACE for JITs 2022-05-23

It's been a while since I've blogged about the JSON library optimization work I did for Lwan's entry in the TechEmpower Web Framework Benchmarks. I've been taking a break from a lot of my personal projects, so the work to generate machine code from the JSON descriptors to serialize data has been put on hold indefinitely. However, there's one thing that I'd like to share, that might be useful for other people.

If you're not familiar with the encoding of x86 call instructions: there is no direct call instruction that takes a 64-bit offset immediate. You either load the 64-bit offset into a register and perform an indirect call, or you make a call using a 32-bit offset. The latter is preferable due to its (slightly) lower cost, but things get a bit difficult when you don't really know where your page with the JITted code will land due to things like ASLR.

After discussing this on Twitter a good while back, I got a really good suggestion: use the MAP_FIXED_NOREPLACE flag for mmap(2). This flag extends the meaning of the MAP_FIXED flag, allowing the memory map to be placed at a known address, while also ensuring it's not replacing any other mapping. By finding a page that's close to the symbol we're trying to call, we can at last make a direct call with a 32-bit immediate. Two gigabytes either way oughta be good enough. This is a neat trick that I just had to try.

So I wrote the code below: it reads the page mappings by parsing /proc/self/maps, and if it finds the mapping that contains the address of the function we want to call in the JITted code, we try to allocate a big enough page right before that mapping until we succeed. While this extremely non-portable for anything beyond Linux on x86 (I'm not aware of other systems implementing MAP_FIXED_NOREPLACE or anything equivalent, for instance), I think this is a neat hack that might be useful for some people out there. It is also insecure if you're going to mprotect(PROT_EXEC) this later if the fallback code is executed; for my use case, this was perfectly acceptable.

Update: I was made aware that MAP_FIXED_NOREPLACE was buggy in Linux until 2018 or so, clobbering adjacent mappings.

/* Auxiliary function for try_map_near_symbol(): a very inneficient way of
 * searching for a page before the one we want that contains `symbol` with
 * size `want_size` (which will be rounded to the page size). */
static void *find_free_page_near_symbol(void *symbol, size_t want_size)
{
    /* This is quite inneficient, but it's fine for my toy JIT. */
    FILE *maps;
    char buffer[512];
    char *ptr;
    uintptr_t max_want_addr = 0;
    uintptr_t symbol_addr = (uintptr_t)symbol;
    long page_size = sysconf(_SC_PAGE_SIZE);

    maps = fopen("/proc/self/maps", "re");
    if (!maps)
        return NULL;

    want_size = (want_size + (page_size - 1)) & ~(page_size - 1);

    while ((ptr = fgets(buffer, 512, maps))) {
        uintptr_t start, end;

        if (sscanf(ptr, "%lx-%lx", &start, &end) != 2)
            continue;

        if (symbol_addr >= start && symbol_addr <= end) {
           max_want_addr = start - want_size;
           rewind(maps);
           break;
        }
    }

    if (max_want_addr) {
        while ((ptr = fgets(buffer, 512, maps))) {
            uintptr_t start, end;

            if (sscanf(ptr, "%lx-%lx", &start, &end) != 2)
                continue;

            if (start >= max_want_addr && end <= max_want_addr + want_size) {
                max_want_addr -= page_size;
                rewind(maps);
            }
        }
    }

    fclose(maps);

    return (void *)max_want_addr;
}

/* This tries creating a private anonymous map with ``prot`` and size
 * ``want_size``, at a location that is close enough to ``symbol``
 * that a direct call instruction with a 32-bit offset can be emitted
 * in JITted code.
 * NOTE: This function does not ensure that the actual offset is indeed
 *       a 32-bit offset in the end!  A lot of things may go wrong here,
 *       including find_free_page_near_symbol() going too far backwards
 *       that it wraps.  This has been written for a toy, thus error
 *       handling is minimal. */
static void *try_map_near_symbol(void *symbol, size_t want_size, int prot)
{
    int flags = MAP_ANONYMOUS | MAP_FIXED_NOREPLACE | MAP_PRIVATE;
    void *addr = symbol;

    for (int try = 0; try < 32; try ++) {
        addr = find_free_page_near_symbol(addr, want_size);
        if (!addr)
            break;

        void *ptr = mmap(addr, want_size, prot, flags, -1, 0);
        if (ptr == addr)
            return ptr;

        if (ptr != MAP_FAILED) {
            munmap(ptr, want_size);
        } else {
            if (errno == EEXIST && (flags & MAP_FIXED_NOREPLACE)) {
                /* If we get EEXIST here, then we know that this kernel
                 * has NOREPLACE support -- we just got unlucky with a
                 * race condition and lost.  Try finding another address
                 * again. */
                continue;
            }

            /* Any other error condition means we might want to try
             * just passing a hint to mmap() and seeing if the address
             * it returns us is the same that we want. */
        }

        /* This kernel probably has no NOREPLACE support, so disable it
         * for the next tries. */
        flags &= ~MAP_FIXED_NOREPLACE;
        /* We don't set MAP_FIXED here because that might replace some
         * already existing mapping -- so let's give the kernel some
         * opportunity to map memory at the address we want, because
         * it prefers regions that are currently unmapped otherwise. */
    }

    return NULL;
}

I hope this is useful to somebody, so I'm licensing it under the MIT license.

Thanks to Ole André V. Ravnås for the idea to use the MAP_FIXED_NOREPLACE flag, and to Paul Khuong for helping making it more robust in systems where that flag might not exist.

If you liked this post, consider getting me a coffee!