From: Raoul Bhatia <raoul.bhatia@radarcs.com>
To: "wireguard@lists.zx2c4.com" <wireguard@lists.zx2c4.com>
Cc: Velimir Iveljic <velimir.iveljic@radarcs.com>
Subject: Wireguard connection lost between peers
Date: Thu, 29 Apr 2021 10:30:04 +0000 [thread overview]
Message-ID: <655D0D0D-4650-4A45-864E-5BDC6C5A76AE@radarcs.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3105 bytes --]
Dear List,
We are experiencing unusual issue where WireGuard connectivity between peers suddenly stops working.
The connection itself seem to be up, but the peers cannot communicate to each other (more details below).
Any insight would be greatly appreciated.
Software versions:
- Debian Stretch 4.9.0-11-amd64 (4.9.189-3+deb9u2)
- LXC version: 2.0.7
- Wireguard: 1.0.20210124 (from buster-backports)
Environment:
A Debian host serves as LXC hypervisor for unpriviledged containers.
WireGuard is used as a network layer for the containers, which means on the host we create physical WG interface for each container.
Inside the containers, we run a distributed cluster which is spanning multiple containers on multiple physical servers interconnected via 10G links (i.e. 6 physical servers w/ 8-10 containers each).
So the network load can get comparatively high, on average 500Mbit/s with peaks of ~3Gbit/s.
Problem description:
The outlined setup works fine in most cases.
Occasionally, however, one container completely loses connectivity, and is not reachable _even_ from the underlying host.
We cannot distinguish what is the trigger for this to happen, but we observed it happening when the network traffic is high.
NOTE: We also had this similar (same?) issue between two physical hosts.
So far we identified two ways to restore the service:
1. Restart wg-quick@wg0 service on the host, which is _not_ sustainable because this resets the connectivity of all containers, impacting the cluster.
2. Dump the WG conf, manually remove the unreachable peer public key from the interface, and then re-sync the dumped conf.
--- SNIP ---
$ wg-quick strip wg0 > wg0_peers
$ wg show wg0 dump
$ wg set wg0 peer $PEER_PUB_KEY remove
$ wg syncconf wg0 wg0_peers
--- SNIP ---
Additional notes:
1. We didn't manage to reproduce the issue until now in a test environment.
2. We cannot easily upgrade the versions that we run in production.
3. We suspected that time settings on the host could be the issue, so we made sure timesyncd is configured properly. We observe this issue less frequently, but it is not fully gone.
4. The dynamic kernel log doesn't provide much information other than send/receive handshake request, keepalive packets and re-creating keypairs.
WireGuard module version
---
filename: /lib/modules/4.9.0-11-amd64/updates/dkms/wireguard.ko
intree: Y
alias: net-pf-16-proto-16-family-wireguard
alias: rtnl-link-wireguard
version: 1.0.20210124
author: Jason A. Donenfeld <Jason@zx2c4.com>
description: WireGuard secure network tunnel
license: GPL v2
srcversion: 507BE23A7368F016AEBAF94
depends: udp_tunnel,ip6_udp_tunnel
retpoline: Y
vermagic: 4.9.0-11-amd64 SMP mod_unload modversions
---
** Issue was also observed on (now decomissioned) host Debian 4.9.0-14-amd64 (4.9.246-2) with Wireguard module (1.0.20210124)
Any insight would be appreciated. Happy to share more debug information as requested / offline.
Thanks,
Raoul
PS. Please reply to me / reply all, as I am currently not subscribed to the mailing list.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6069 bytes --]
next reply other threads:[~2021-04-29 15:56 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-29 10:30 Raoul Bhatia [this message]
2021-04-30 13:42 ` Wireguard connection lost between peers Jason A. Donenfeld
2021-05-12 5:19 Raoul Bhatia
2021-05-30 13:20 ` Raoul Bhatia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=655D0D0D-4650-4A45-864E-5BDC6C5A76AE@radarcs.com \
--to=raoul.bhatia@radarcs.com \
--cc=velimir.iveljic@radarcs.com \
--cc=wireguard@lists.zx2c4.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).