On Mon, 2021-06-28 at 12:23 +0100, David Woodhouse wrote: > > To be clear: from the point of view of my *application* I don't care > about any of this; my only motivation here is to clean up the kernel > behaviour and make life easier for potential future users. I have found > a setup that works in today's kernels (even though I have to disable > XDP, and have to use a virtio header that I don't want), and will stick > with that for now, if I actually commit it to my master branch at all: > https://gitlab.com/openconnect/openconnect/-/commit/0da4fe43b886403e6 > > I might yet abandon it because I haven't *yet* seen it go any faster > than the code which just does read()/write() on the tun device from > userspace. And without XDP or zerocopy it's not clear that it could > ever give me any benefit that I couldn't achieve purely in userspace by > having a separate thread to do tun device I/O. But we'll see... I managed to do some proper testing, between EC2 c5 (Skylake) virtual instances. The kernel on a c5.metal can transmit (AES128-SHA1) ESP at about 1.2Gb/s from iperf, as it seems to be doing it all from the iperf thread. Before I started messing with OpenConnect, it could transmit 1.6Gb/s. When I pull in the 'stitched' AES+SHA code from OpenSSL instead of doing the encryption and the HMAC in separate passes, I get to 2.1Gb/s. Adding vhost support on top of that takes me to 2.46Gb/s, which is a decent enough win. That's with OpenConnect taking 100% CPU, iperf3 taking 50% of another one, and the vhost kernel thread taking ~20%.