Hi there, my name is Thomas Gschwantner and this is my report for the work I've
done on WireGuard, for the GSoC 2018.

Before GSoC even started proper, Jason asked us to work on a small fix regarding
the endianness of the trie used to store IPs in `allowedips.c`.  Previously,
they'd be stored in network byte order (big endian), meaning on little endian
system there would be unnecessary conversions.  The main point of this was
probably to get everyone up and running in terms of setting up and working with
the WireGuard codebase.

The next task was developing and testing a replacement for the ring buffer
implementation in the kernel ([ptr_ring.h]) that is lock-free.  For that we
consulted a userspace implementation called [Concurrency Kit].  Since the task
was rather hard but would result in relatively little code, Jason suggested
Jonathan Neuschäfer and I work on it separately at the start, merging the code
later, which we did.  My own version, before any merging happened, can be found
[here][mpmc_premerge].

The task proved challenging, because the concurrency kit implementation made
heavy use of macros and uses different semantics for atomic operations and
memory barriers than the linux kernel.  Another problem we ran into was kernel
deadlocks, that only happened in specific situations while testing. We ultimately
determined that the cause for them was unfortunate scheduling of the kthreads by
the kernel, that caused concurrent producers to be stuck waiting for each other
for a long time.  This was ultimately solved by disabling [preemption].

While the performance of the final implementation was better when directly
compared to the other one in regards to raw produce/consume performance, when
benchmarking WireGuard as a hole it performed the same or slightly worse.  The
reason for this is likely the way WireGuard processes packets, which consumes
items from the queue one-by-one. As a result, the [final version] has not been
merged yet.  This may change in the future however, if the consumers can be made
to run multithreaded.

Next we worked on making WireGuard take advantage of [NAPI], a kernel internal API
designed to reduce the overhead of receiving packets.  I worked on this for some
time, but Jonathan ended up being faster than me with his
[solution][jonathan_napi].  My (naturally unfinished) version can be found
[here][tharre_napi].  This tasks involved **a lot** of reading kernel
code to figure out how NAPI works since all available documentation is either
outdated or simply nonexistent.  I also ended up doing a lot of benchmarking,
since many solutions we came up with had unfavourable performance
characteristics.

While working on this we also bumped into an interesting problem with the NAPI,
that involved [napi_hash], a hashtable used by [busy polling].  Because we used
one napi_struct per WireGuard peer, there was concern that this hashtable would
blow up in larger deployments.  I ended up solving this problem, after much
research, in a [deceivingly simple commit][napi_busy_fix].

Lastly, I worked on implementing a new socket option, `SO_ZERO_ON_FREE`, that
would cause all sk_buffs to be securely zeroed out when freed.  While I got a
simple version of this working very quickly, I could only make it work on
`AF_UNIX` and `AF_INET`, but not on `AF_ALG` and other netlink sockets, which I
assumed would be the primary usage for this option.  The reason for that was, as
I found out after much debugging and digging in crypto/, that the corresponding
kernel code wasn't even using sk_buffs.  Instead, the code would lock the socket
directly, and then use [memcpy_to_msg] to copy the data to userspace.

The code for this can be found [here][skb_memzero], however it is not quite
finished yet as I haven't been able to find a more general way of setting the
`zero_on_free` bit on sk_buffs for all socket types yet.

# Conclusion

Working on WireGuard definitely was a ton of fun and I also learned **a lot**
about kernel development.  Thanks a lot to Jason A. Donenfeld for providing
mentorship, and his work on WireGuard in general, that made all of this
possible.

Also check out the reports of my fellow GSoC students, Gauvain
[Roussel-Tarbouriech] and [Jonathan Neuschäfer].

[ptr_ring.h]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/ptr_ring.h
[Concurrency Kit]: http://concurrencykit.org/
[mpmc_premerge]: https://git.zx2c4.com/WireGuard/log/?h=tg/mpmc_ring
[preemption]: https://git.zx2c4.com/WireGuard/commit/?h=tg/mpmc-benchmark&id=9ced23773059b0e96daba64dd12ac60327d0d14e
[final version]: https://git.zx2c4.com/WireGuard/log/?h=tg/mpmc-benchmark
[napi]: https://wiki.linuxfoundation.org/networking/napi
[jonathan_napi]: https://git.zx2c4.com/WireGuard/commit/?id=6008eacbf2c7a5f31b0c9d5d0a629cbdfbb8f222
[tharre_napi]: https://gist.githubusercontent.com/Tharre/11e11afb99ddb6095d4de8269fe8ea72/raw/0da04f14634cfb3e7f7520307363d0a5dc6f2bfe/wireguard_gsoc_napi.diff
[napi_hash]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/core/dev.c?id=7effaf06c3cdef6855e127886c7405b9ab62f90d#n197
[busy polling]: https://netdevconf.org/2.1/slides/apr6/dumazet-BUSY-POLLING-Netdev-2.1.pdf
[napi_busy_fix]: https://git.zx2c4.com/WireGuard/commit/?id=fe5f0f661797b41648eac64b40e5038b25175047
[memcpy_to_msg]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/crypto/algif_hash.c#n226
[skb_memzero]: https://git.zx2c4.com/linux-dev/commit/?h=tg/skb_memzero
[Roussel-Tarbouriech]: https://govanify.com/post/gsoc-wireguard/
[Jonathan Neuschäfer]: https://gist.github.com/neuschaefer/e752af1bbdd057704e02e282e3e082c5

-- 
PGP fingerprint: 42CE 7698 D6A0 6129 AA16  EF5C 5431 BDE2 C8F0 B2F4