Hosting Shadowsocks proxy servers is usually an easy task. You install it, and just forget about everything. And due to it being a lightweight proxy, shadowsocks is usually I/O-bounded: To achieve highest throughput, you need higher ethernet, not faster CPU.

However, it is not always the case. It is known that some VPS providers only focus on premium connections and bandwidth, and completely ignores CPU and RAM performance to save money(cough bandwagonhost cough). As such, their VPSes often have 1GBps+ connectivity, with 1 CPU core and 512MB RAM.

This makes Shadowsocks CPU-bounded. The reason your proxy server is so slow might be because its CPU is so outdated that cannot even handle cryptography. I mean, what is the point of 1GBps connectivity when the CPU cannot handle it?

That makes it important to figure out which implementation is the fastest, without compromising security. There are 3 major implementations:

shadowsocks-libev is the most popular implementation. It runs smoothly from MIPS routers to large-scale servers.

go-shadowsocks2 is written in pure Golang. It is also used in V2ray and other proxy software.

shadowsocks-rust is the new player here. It is written in Rust, and ensures both memory safety and speed.

Other implementations exist (such as nodejs), but most of them are not up-to-date. Using Shadowsocks with an outdated implementation is generally regarded as unsafe.

The biggest factor influencing Shadowsocks' speed is the performance of used encryption method. In newer machines with AES-NI, aes-256-gcm is regarded as the fastest one. In older machines with AES-NI, chacha20-ietf-poly1305 is roughly as fast as aes-256-gcm. In machines without AES-NI, chacha20-ietf-poly1305 is much faster. Thus we will use chacha20-ietf-poly1305 in the benchmark to eliminate uncertainty from AES-NI instructions.

Secondly, the efficiency of used cryptography library in the implementation is linked with the performance of this implementation. shadowsocks-libev uses libsodium. It is mostly fine, but lacks AES acceleration on ARM64. go-shadowsocks2 uses Golang’s official crypto lib. shadowsocks-rust uses ring, which is regarded as the fastest cryptography library, even faster than OpenSSL.

Other factors can also influence speed, such as compiler optimization, LTO, language design, etc.

Controlling variables

Since we only care about server-side performance, I will use different implementations of shadowsocks server, connect a client to it, and see how much throughput can be achieved.

To control variables, I used shadowsocks-libev as the test client. shadowsocks-libev is the fastest client implementation, even able to handle it on a slow router.

Test host: Vultr HFC 2C/2GB RAM

Environment: Arch Linux, linux-zen 5.12.10-zen1-1-zen

CPU: Skylake, aes, avx2, x86_64-v3

Next time I will run it on a bare-metal machine with fixed CPU frequency.

Setup test client

We will use standard iperf3 to test throughput, and shadowsocks-libev as test client.

Launch iperf3 server and shadowsocks-libev client.

iperf3 -s
ss-tunnel -m chacha20-ietf-poly1305 -s 127.0.0.1 -p 8488 -k passwd -l 1090 -L 127.0.0.1:5201

Test shadowsocks-libev

Start shadowsocks server.

ss-server -m chacha20-ietf-poly1305 -s 127.0.0.1 -p 8488 -k passwd

Then, launch iperf3.

iperf3 -c 127.0.0.1 -p 1090

Output:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.03 GBytes  4.32 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  5.02 GBytes  4.31 Gbits/sec                  receiver

Test go-shadowsocks2

Start shadowsocks server.

ss-go -s "ss://AEAD_CHACHA20_POLY1305:passwd@:8488"

Then, launch iperf3.

iperf3 -c 127.0.0.1 -p 1090

Output:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  5.15 GBytes  4.43 Gbits/sec    2             sender
[  5]   0.00-10.00  sec  5.14 GBytes  4.41 Gbits/sec                  receiver

It is only slightly faster. There is really no reason to use it, as shadowsocks-libev is much more stable.

Test shadowsocks-rust

ssserver-rust --single-threaded -m chacha20-ietf-poly1305 -s "[::]:8488" -k passwd

Then, launch iperf3.

iperf3 -c 127.0.0.1 -p 1090

Output:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  6.12 GBytes  5.26 Gbits/sec    2             sender
[  5]   0.00-10.03  sec  6.12 GBytes  5.25 Gbits/sec                  receiver

This is almost 20%(!!!) faster than shadowsocks-libev, which completely exceeds my expectations. Mind that this test server has a Skylake CPU, with AES-NI, AVX2, and all other modern instructions.

On this test server, 5.25 Gbps easily saturates the ethernet link. So 20% does not make much difference. On bandwagonhost, their slower CPUs are similar to Sandybridge models. That makes this 20% uplift much more important. It can make the difference between 500MBps and 600MBps.

Conclusion

shadowsocks-libev’s server implementation is actually the slowest. I have no idea why this happened. Maybe shadowsocks-libev aggressively optimizes for client uses, and ignores server uses? Given the popularity of itself in servers, it is really bad. Moreover, it lacks AES acceleration on ARM64. This really makes shadowsocks-rust an attractive choice.

go-shadowsocks2 is a bit faster than shadowsocks-libev. I guess it is ok since it is used in a lot of Golang proxy softwares.

shadowsocks-rust is the fastest, with no surprise. Rust is fast, the crypto lib ring is fast, and Rust has good LLVM optimizations.

So to maximize your proxy server’s throughput, shadowsocks-rust should be the first choice. It has frequent updates, has better performance, and should have less segfaults.

Next time, I will test if optimizing for target CPU instead of generic x86_64 will yield better performance.