HTTP Server Benchmark

This is a benchmark with comparative of performance of various well known web servers. Web servers these days are fast enough, and I think not many people opt for the fastest web server, but instead are interested in that. So I made a simple benchmark to see which approach gives the best result.

This test only covers basic "Hello World"(11 chars) response performance over the HTTP/1.1 protocol for different languages and http server implementations of it.

Hardware and software used

macOS 10.24.4 Sierra (darwin 16.5.0), Core i5 (2C4T), 16G RAM

Server and client were running on the same machine. They used all these same cores and memory quota, but it's just as clearly not a huge problem. Important note that this benchmarking here was primarily for fun, not science.

I used the simple HTTP benchmarking tool wrk — agree with some of you may be saying "not the ideal tool" but it's good enough here. Command used to run benchmarks was wrk -t 4 -c 1000 -d 10s.

The compiler/interpreter used were:

Python 3.6.1
Ruby 2.4.1p111
Node 7.8.0
Erlang/OTP 19 (erts-8.3)
Go 1.8.1
Rustc 1.16.0 + --release argument for code optimization
Clang (Apple LLVM) 8.1.0 (clang-802.0.38) + -O3 argument

I've focused on general usage, this means except for multicore languages, I didn't specify how many CPU cores to use. So performance will probably get better if it is used, but it's not as good as you think.

Python 3.6 (gunicorn + gevent)

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.54ms  120.03ms   1.99s    98.66%
    Req/Sec     1.53k   843.45     2.22k    75.19%
  21382 requests in 10.10s, 3.30MB read
  Socket errors: connect 0, read 937, write 42, timeout 366
Requests/sec:   2117.23
Transfer/sec:    334.98KB

Python 3.6 (sanic)

https://github.com/channelcat/sanic/blob/master/examples/simple_server.py

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    34.45ms   15.76ms 174.99ms   67.78%
    Req/Sec     4.10k   806.43     6.41k    65.15%
  163069 requests in 10.05s, 20.37MB read
  Socket errors: connect 0, read 969, write 24, timeout 0
Requests/sec:  16228.36
Transfer/sec:      2.03MB

Ruby 2.4 (sinatra + thin)

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    77.28ms   46.12ms 350.39ms   73.45%
    Req/Sec   695.04    309.34     1.60k    70.78%
  27662 requests in 10.11s, 5.86MB read
  Socket errors: connect 0, read 8722, write 9, timeout 0
Requests/sec:   2736.72
Transfer/sec:    593.31KB

Even though sinatra itself supplies a web server and is often considered as the de-facto web server, it is not for production.

NodeJS 7 (http)

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    52.15ms   15.23ms 214.22ms   68.56%
    Req/Sec     2.97k     1.15k    6.73k    72.75%
  118772 requests in 10.10s, 17.56MB read
  Socket errors: connect 0, read 995, write 9, timeout 0
Requests/sec:  11759.86
Transfer/sec:      1.74MB

NodeJS 7 (express)

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    77.00ms   30.91ms 287.93ms   69.08%
    Req/Sec     1.66k   306.47     2.56k    74.00%
  66359 requests in 10.06s, 13.61MB read
  Socket errors: connect 0, read 1108, write 16, timeout 0
Requests/sec:   6594.02
Transfer/sec:      1.35MB

Erlang (gen_tcp)

-module(hello).
-compile(export_all).

start() ->
    start(12345).
start(Port) ->
    N = erlang:system_info(schedulers),
    listen(Port, N),
    io:format("ehttpd ready with ~b schedulers on port ~b~n", [N, Port]),
 
    register(?MODULE, self()),
    receive Any -> io:format("~p~n", [Any]) end.  %% to stop: ehttpd!stop.
 
listen(Port, N) ->
    Opts = [{active, false},
            binary,
            {backlog, 256},
            {packet, http_bin},
            {raw,6,9,<<1:32/native>>}, %defer accept
            %%{delay_send,true},
            %%{nodelay,true},
            {reuseaddr, true}],
 
    {ok, S} = gen_tcp:listen(Port, Opts),
    Spawn = fun(I) ->   
                    register(list_to_atom("acceptor_" ++ integer_to_list(I)),
                             spawn_opt(?MODULE, accept, [S, I], [link, {scheduler, I}]))
            end,
    lists:foreach(Spawn, lists:seq(1, N)).
 
accept(S, I) ->
    case gen_tcp:accept(S) of
        {ok, Socket} -> spawn_opt(?MODULE, loop, [Socket], [{scheduler, I}]);
        Error    -> erlang:error(Error)
    end,
    accept(S, I).
 
loop(S) ->
    case gen_tcp:recv(S, 0) of
        {ok, http_eoh} ->
            Response = <<"HTTP/1.1 200 OK\r\nContent-Length: 11\r\n\r\nHello World">>,
            gen_tcp:send(S, Response),
            gen_tcp:close(S),
            ok;
 
        {ok, _Data} ->
            loop(S);
 
        Error ->
            Error
    end.

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    38.79ms   22.90ms 238.24ms   92.78%
    Req/Sec   529.45    232.85     1.46k    73.78%
  15548 requests in 10.03s, 759.18KB read
  Socket errors: connect 0, read 15786, write 10, timeout 0
Requests/sec:   1549.60
Transfer/sec:     75.66KB

Erlang (cowboy)

https://github.com/ninenines/cowboy/tree/e30d120bd8c9a4a7b469937d5b5be6a8dfe148d4/examples/hello_world

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    44.28ms   40.37ms 415.08ms   80.28%
    Req/Sec     2.87k     1.29k    8.84k    80.21%
  110509 requests in 10.07s, 13.61MB read
  Socket errors: connect 0, read 1002, write 0, timeout 0
Requests/sec:  10978.57
Transfer/sec:      1.35MB

Go (net/http)

package main

import (
	"log"
	"net/http"
)

func main() {
	http.HandleFunc("/", func(w http.ResponseWriter, _ *http.Request) {
		w.Write([]byte("Hello World"))
	})
	log.Fatal(http.ListenAndServe(":10000", nil))
}

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.35ms    6.18ms 182.26ms   77.42%
    Req/Sec    13.54k     1.44k   18.39k    79.50%
  540583 requests in 10.07s, 65.99MB read
  Socket errors: connect 0, read 919, write 0, timeout 0
Requests/sec:  53691.63
Transfer/sec:      6.55MB

Go (fasthttp)

package main

import (
	"log"
	"net/http"
)

func handle(ctx *fasthttp.RequestCtx) {
	ctx.WriteString("Hello World")
}

func main() {
	fasthttp.ListenAndServe(":10000", handle)
}

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    11.08ms    3.38ms  48.80ms   73.45%
    Req/Sec    17.63k     3.67k   30.17k    80.75%
  702773 requests in 10.05s, 97.85MB read
  Socket errors: connect 0, read 753, write 17, timeout 0
Requests/sec:  69909.98
Transfer/sec:      9.73MB

Rust (Rocket)

#![feature(plugin)]
#![plugin(rocket_codegen)]

extern crate rocket;

#[get("/")]
fn index() -> &'static str {
    "Hello world"
}

fn main() {
    rocket::ignite().mount("/", routes![index]).launch();
}

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    93.50us  161.65us  50.10ms   99.70%
    Req/Sec    21.07k    10.11k   36.77k    49.71%
  367721 requests in 10.07s, 50.50MB read
  Socket errors: connect 0, read 914, write 0, timeout 1
Requests/sec:  36508.45
Transfer/sec:      5.01MB

Rust (tokio-minihttp)

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    13.09ms    2.11ms  45.22ms   84.63%
    Req/Sec    17.36k     1.22k   19.71k    77.00%
  690673 requests in 10.08s, 66.53MB read
  Socket errors: connect 0, read 383, write 41, timeout 0
Requests/sec:  68505.57
Transfer/sec:      6.60MB

Note that tokio-minihttp is an alpha quality implementation and not built for production.

C (microhttpd)

https://github.com/ulion/libmicrohttpd/blob/8ba762d203801638d1563243b14beedbbb38448c/src/examples/minimal_example.c

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    14.89ms   17.29ms 201.62ms   81.50%
    Req/Sec     8.39k     3.98k   34.11k    79.03%
  324327 requests in 10.10s, 34.33MB read
  Socket errors: connect 0, read 1003, write 0, timeout 0
Requests/sec:  32101.87
Transfer/sec:      3.40MB

C (h2o)

https://github.com/h2o/h2o/blob/e577f9bd091788582e378307fc7a52d018399e94/examples/libh2o/simple.c

Running 10s test @ http://localhost:10000
  4 threads and 1000 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.29ms    7.38ms 327.74ms   98.12%
    Req/Sec    16.11k     2.92k   22.48k    84.74%
  610802 requests in 10.02s, 86.21MB read
  Socket errors: connect 0, read 989, write 0, timeout 0
Requests/sec:  60950.54
Transfer/sec:      8.60MB

Result

	Python (gunicorn + gevent)	Python (sanic)	Ruby (thin)	NodeJS (http)	NodeJS (express)	Erlang (gen_tcp)	Erlang (cowboy)	Go (net/http)	Go (fasthttp)	Rust (rocket)	C (microhttpd)	C (h2o)
avg. Latency	14ms	34ms	77ms	52ms	77ms	39ms	44ms	13ms	11ms	93.50us	15ms	7ms
stdev. Latency	120ms	16ms	46ms	15ms	31ms	23ms	40ms	6ms	3ms	161.65us	17ms	7ms
avg. RPS	1.53k	4.10k	695.04	2.97k	1.66k	529.45	2.87k	13.54k	17.63k	21.07k	8.39k	16.11k
stdev. RPS	843.45	806.43	309.34	1.15k	306.47	232.85	1.29k	1.44k	3.67k	10.11k	3.98k	2.92k
RPS	2117.23	16228.36	2736.72	11759.86	6594.02	1549.60	10978.57	53691.63	69909.98	36508.45	32101.87	60950.54
Transfers/sec	334.98KB	2.03MB	593.31KB	1.74MB	1.35MB	75.66KB	1.35MB	6.55MB	9.73MB	5.01MB	3.40MB	8.60MB
Rank	11	6	10	7	9	12	8	3	1	4	5	2

The noticable results are described in the graph below:

Conclusion

I was quite surprised by the result, and tried it a few more times as I think something might be wrong. I never thought C would lose it and Go would win. Also, I didn't expect Python to get a good result even with asyncio + uvloop.

There are several other remarkable things about the benchmarks:

node.js is the most famous event-driven runtime, but it's slower than Python with asyncio and uvloop in this benchmark. this, however, does not mean that python is better than node.js.
the default performance of Go is quite impressive and promising, it is very close to C.

Please keep in mind that this benchmarking does not represent the real performance, and I hope you enjoyed these benchmarks.