When I first wrote lwan, file serving was not a primary goal. I’ve added this capability later, never giving much thought to the number of system calls required to serve one file. As a result, static file serving was quite slow compared to “hello world” requests. Bored one day, I’ve decided to speed this as much as I could.
Before optimizing, serving a file would look like this:
<... epoll_wait resumed> {{EPOLLIN, {u32=8, u64=8}}}, 16383,
4294967295) = 1
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
read(8, "GET / HTTP/1.0\r\n", 4096) = 16
getcwd("/home/l/git/lwan/build", 4096) = 29
lstat("/home/l/git/lwan/build/files_root",
{st_mode=S_IFLNK|0777, st_size=33, ...}) = 0
readlink("/home/l/git/lwan/build/files_root",
"/home/l/git/blotest/output/", 4095) = 33
lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/l", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/l/git", {st_mode=S_IFDIR|0775, st_size=4096, ...})
= 0
lstat("/home/l/git/blotest", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
lstat("/home/l/git/blotest/output", {st_mode=S_IFDIR|0775,
st_size=4096, ...}) = 0
getcwd("/home/l/git/lwan/build", 4096) = 29
lstat("/home/l/git/lwan/build/files_root",
{st_mode=S_IFLNK|0777, st_size=33, ...}) = 0
readlink("/home/l/git/lwan/build/files_root",
"/home/l/git/blotest/output/", 4095) = 33
lstat("/home", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/l", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/home/l/git", {st_mode=S_IFDIR|0775, st_size=4096, ...})
= 0
lstat("/home/l/git/blotest", {st_mode=S_IFDIR|0755,
st_size=4096, ...}) = 0
lstat("/home/l/git/blotest/output", {st_mode=S_IFDIR|0775,
st_size=4096, ...}) = 0
open("/home/l/git/blotest/output", O_RDONLY|O_NOATIME) = 9
fstat(9, {st_mode=S_IFDIR|0775, st_size=4096, ...}) = 0
close(9) = 0
open("/home/l/git/blotest/output/index.html",
O_RDONLY|O_NOATIME) = 9
fstat(9, {st_mode=S_IFREG|0664, st_size=13200, ...}) = 0
setsockopt(8, SOL_TCP, TCP_CORK, [1], 4) = 0
write(8, "HTTP/1.0 200 OK\r\nContent-Length:"..., 100) = 100
fadvise64(9, 0, 13200, POSIX_FADV_SEQUENTIAL) = 0
sendfile(8, 9, NULL, 1400) = 1400
epoll_ctl(6, EPOLL_CTL_MOD, 8, {EPOLLOUT|EPOLLERR|0x2000, {u32=8,
u64=8}}) = 0
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 1400
epoll_wait(6, {{EPOLLOUT, {u32=8, u64=8}}}, 16383, 1000) = 1
sendfile(8, 9, NULL, 1400) = 600
close(9) = 0
epoll_ctl(6, EPOLL_CTL_MOD, 8, {EPOLLIN|EPOLLERR|EPOLLET|0x2000,
{u32=8, u64=8}}) = 0
close(8) = 0
Yes. That many system calls – I was not kidding when I said that file serving was added as an afterthought. After some experiments, I’ve managed to turn that mess into this:
<... epoll_wait resumed> {{EPOLLIN, {u32=9, u64=9}}}, 16383,
4294967295) = 1
read(9, "GET / HTTP/1.0\r\n", 4096) = 16
newfstatat(7, "index.html", {st_mode=S_IFREG|0664, st_size=13200,
...}, 0) = 0
openat(7, "index.html", O_RDONLY|O_NOATIME) = 10
sendto(9, "HTTP/1.0 200 OK\r\nContent-Length:"..., 223, MSG_MORE,
NULL, 0) = 223
fadvise64(10, 0, 13200, POSIX_FADV_SEQUENTIAL) = 0
sendfile(9, 10, [0], 13200) = 13200
close(10) = 0
Ah, much better! This was a result of these steps:
These improvements resulted in very low overhead while serving files. In fact, compared to a simple hello world handler and file serving – without keep-alive – the performance drop even comparing the I/O involved is about 5%.
Copyright © 2023 L. A. F. Pereira