HTTP Server Benchmark
This is a benchmark with comparative of performance of various well known web servers. Web servers these days are fast enough, and I think not many people opt for the fastest web server, but instead are interested in that. So I made a simple benchmark to see which approach gives the best result.
This test only covers basic "Hello World"(11 chars) response performance over the HTTP/1.1 protocol for different languages and http server implementations of it.
Hardware and software used
macOS 10.24.4 Sierra (darwin 16.5.0), Core i5 (2C4T), 16G RAM
Server and client were running on the same machine. They used all these same cores and memory quota, but it's just as clearly not a huge problem. Important note that this benchmarking here was primarily for fun, not science.
I used the simple HTTP benchmarking tool wrk — agree with some of you may be saying "not the ideal tool" but it's good enough here. Command used to run benchmarks was wrk -t 4 -c 1000 -d 10s
.
The compiler/interpreter used were:
- Python 3.6.1
- Ruby 2.4.1p111
- Node 7.8.0
- Erlang/OTP 19 (erts-8.3)
- Go 1.8.1
- Rustc 1.16.0 +
--release
argument for code optimization - Clang (Apple LLVM) 8.1.0 (clang-802.0.38) +
-O3
argument
I've focused on general usage, this means except for multicore languages, I didn't specify how many CPU cores to use. So performance will probably get better if it is used, but it's not as good as you think.
Python 3.6 (gunicorn + gevent)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 13.54ms 120.03ms 1.99s 98.66%
Req/Sec 1.53k 843.45 2.22k 75.19%
21382 requests in 10.10s, 3.30MB read
Socket errors: connect 0, read 937, write 42, timeout 366
Requests/sec: 2117.23
Transfer/sec: 334.98KB
Python 3.6 (sanic)
https://github.com/channelcat/sanic/blob/master/examples/simple_server.py
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 34.45ms 15.76ms 174.99ms 67.78%
Req/Sec 4.10k 806.43 6.41k 65.15%
163069 requests in 10.05s, 20.37MB read
Socket errors: connect 0, read 969, write 24, timeout 0
Requests/sec: 16228.36
Transfer/sec: 2.03MB
Ruby 2.4 (sinatra + thin)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 77.28ms 46.12ms 350.39ms 73.45%
Req/Sec 695.04 309.34 1.60k 70.78%
27662 requests in 10.11s, 5.86MB read
Socket errors: connect 0, read 8722, write 9, timeout 0
Requests/sec: 2736.72
Transfer/sec: 593.31KB
Even though sinatra itself supplies a web server and is often considered as the de-facto web server, it is not for production.
NodeJS 7 (http)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 52.15ms 15.23ms 214.22ms 68.56%
Req/Sec 2.97k 1.15k 6.73k 72.75%
118772 requests in 10.10s, 17.56MB read
Socket errors: connect 0, read 995, write 9, timeout 0
Requests/sec: 11759.86
Transfer/sec: 1.74MB
NodeJS 7 (express)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 77.00ms 30.91ms 287.93ms 69.08%
Req/Sec 1.66k 306.47 2.56k 74.00%
66359 requests in 10.06s, 13.61MB read
Socket errors: connect 0, read 1108, write 16, timeout 0
Requests/sec: 6594.02
Transfer/sec: 1.35MB
Erlang (gen_tcp)
-module(hello).
-compile(export_all).
start() ->
start(12345).
start(Port) ->
N = erlang:system_info(schedulers),
listen(Port, N),
io:format("ehttpd ready with ~b schedulers on port ~b~n", [N, Port]),
register(?MODULE, self()),
receive Any -> io:format("~p~n", [Any]) end. %% to stop: ehttpd!stop.
listen(Port, N) ->
Opts = [{active, false},
binary,
{backlog, 256},
{packet, http_bin},
{raw,6,9,<<1:32/native>>}, %defer accept
%%{delay_send,true},
%%{nodelay,true},
{reuseaddr, true}],
{ok, S} = gen_tcp:listen(Port, Opts),
Spawn = fun(I) ->
register(list_to_atom("acceptor_" ++ integer_to_list(I)),
spawn_opt(?MODULE, accept, [S, I], [link, {scheduler, I}]))
end,
lists:foreach(Spawn, lists:seq(1, N)).
accept(S, I) ->
case gen_tcp:accept(S) of
{ok, Socket} -> spawn_opt(?MODULE, loop, [Socket], [{scheduler, I}]);
Error -> erlang:error(Error)
end,
accept(S, I).
loop(S) ->
case gen_tcp:recv(S, 0) of
{ok, http_eoh} ->
Response = <<"HTTP/1.1 200 OK\r\nContent-Length: 11\r\n\r\nHello World">>,
gen_tcp:send(S, Response),
gen_tcp:close(S),
ok;
{ok, _Data} ->
loop(S);
Error ->
Error
end.
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 38.79ms 22.90ms 238.24ms 92.78%
Req/Sec 529.45 232.85 1.46k 73.78%
15548 requests in 10.03s, 759.18KB read
Socket errors: connect 0, read 15786, write 10, timeout 0
Requests/sec: 1549.60
Transfer/sec: 75.66KB
Erlang (cowboy)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 44.28ms 40.37ms 415.08ms 80.28%
Req/Sec 2.87k 1.29k 8.84k 80.21%
110509 requests in 10.07s, 13.61MB read
Socket errors: connect 0, read 1002, write 0, timeout 0
Requests/sec: 10978.57
Transfer/sec: 1.35MB
Go (net/http)
package main
import (
"log"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, _ *http.Request) {
w.Write([]byte("Hello World"))
})
log.Fatal(http.ListenAndServe(":10000", nil))
}
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 13.35ms 6.18ms 182.26ms 77.42%
Req/Sec 13.54k 1.44k 18.39k 79.50%
540583 requests in 10.07s, 65.99MB read
Socket errors: connect 0, read 919, write 0, timeout 0
Requests/sec: 53691.63
Transfer/sec: 6.55MB
Go (fasthttp)
package main
import (
"log"
"net/http"
)
func handle(ctx *fasthttp.RequestCtx) {
ctx.WriteString("Hello World")
}
func main() {
fasthttp.ListenAndServe(":10000", handle)
}
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 11.08ms 3.38ms 48.80ms 73.45%
Req/Sec 17.63k 3.67k 30.17k 80.75%
702773 requests in 10.05s, 97.85MB read
Socket errors: connect 0, read 753, write 17, timeout 0
Requests/sec: 69909.98
Transfer/sec: 9.73MB
Rust (Rocket)
#![feature(plugin)]
#![plugin(rocket_codegen)]
extern crate rocket;
#[get("/")]
fn index() -> &'static str {
"Hello world"
}
fn main() {
rocket::ignite().mount("/", routes![index]).launch();
}
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 93.50us 161.65us 50.10ms 99.70%
Req/Sec 21.07k 10.11k 36.77k 49.71%
367721 requests in 10.07s, 50.50MB read
Socket errors: connect 0, read 914, write 0, timeout 1
Requests/sec: 36508.45
Transfer/sec: 5.01MB
Rust (tokio-minihttp)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 13.09ms 2.11ms 45.22ms 84.63%
Req/Sec 17.36k 1.22k 19.71k 77.00%
690673 requests in 10.08s, 66.53MB read
Socket errors: connect 0, read 383, write 41, timeout 0
Requests/sec: 68505.57
Transfer/sec: 6.60MB
Note that tokio-minihttp is an alpha quality implementation and not built for production.
C (microhttpd)
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.89ms 17.29ms 201.62ms 81.50%
Req/Sec 8.39k 3.98k 34.11k 79.03%
324327 requests in 10.10s, 34.33MB read
Socket errors: connect 0, read 1003, write 0, timeout 0
Requests/sec: 32101.87
Transfer/sec: 3.40MB
C (h2o)
https://github.com/h2o/h2o/blob/e577f9bd091788582e378307fc7a52d018399e94/examples/libh2o/simple.c
Running 10s test @ http://localhost:10000
4 threads and 1000 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 7.29ms 7.38ms 327.74ms 98.12%
Req/Sec 16.11k 2.92k 22.48k 84.74%
610802 requests in 10.02s, 86.21MB read
Socket errors: connect 0, read 989, write 0, timeout 0
Requests/sec: 60950.54
Transfer/sec: 8.60MB
Result
Python (gunicorn + gevent) | Python (sanic) | Ruby (thin) | NodeJS (http) | NodeJS (express) | Erlang (gen_tcp) | Erlang (cowboy) | Go (net/http) | Go (fasthttp) | Rust (rocket) | C (microhttpd) | C (h2o) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
avg. Latency | 14ms | 34ms | 77ms | 52ms | 77ms | 39ms | 44ms | 13ms | 11ms | 93.50us | 15ms | 7ms |
stdev. Latency | 120ms | 16ms | 46ms | 15ms | 31ms | 23ms | 40ms | 6ms | 3ms | 161.65us | 17ms | 7ms |
avg. RPS | 1.53k | 4.10k | 695.04 | 2.97k | 1.66k | 529.45 | 2.87k | 13.54k | 17.63k | 21.07k | 8.39k | 16.11k |
stdev. RPS | 843.45 | 806.43 | 309.34 | 1.15k | 306.47 | 232.85 | 1.29k | 1.44k | 3.67k | 10.11k | 3.98k | 2.92k |
RPS | 2117.23 | 16228.36 | 2736.72 | 11759.86 | 6594.02 | 1549.60 | 10978.57 | 53691.63 | 69909.98 | 36508.45 | 32101.87 | 60950.54 |
Transfers/sec | 334.98KB | 2.03MB | 593.31KB | 1.74MB | 1.35MB | 75.66KB | 1.35MB | 6.55MB | 9.73MB | 5.01MB | 3.40MB | 8.60MB |
Rank | 11 | 6 | 10 | 7 | 9 | 12 | 8 | 3 | 1 | 4 | 5 | 2 |
The noticable results are described in the graph below:
Conclusion
I was quite surprised by the result, and tried it a few more times as I think something might be wrong. I never thought C would lose it and Go would win. Also, I didn't expect Python to get a good result even with asyncio + uvloop.
There are several other remarkable things about the benchmarks:
- node.js is the most famous event-driven runtime, but it's slower than Python with asyncio and uvloop in this benchmark. this, however, does not mean that python is better than node.js.
- the default performance of Go is quite impressive and promising, it is very close to C.
Please keep in mind that this benchmarking does not represent the real performance, and I hope you enjoyed these benchmarks.