Re: Wireguard connection lost between peers

* Re: Wireguard connection lost between peers
@ 2021-05-12  5:19 Raoul Bhatia
  2021-05-30 13:20 ` Raoul Bhatia
  0 siblings, 1 reply; 4+ messages in thread
From: Raoul Bhatia @ 2021-05-12  5:19 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: wireguard

[-- Attachment #1: Type: text/plain, Size: 1541 bytes --]

Hi Jason

Apologies for taking some time to get back to you.
We tried to verify a few things and to see if we spot anything unusual,
and waited for a few mor instances to happen to get sufficient right data.

> That's surprising behavior. Thanks for debugging it. Can you see if
> you can reproduce with dynamic logging enabled? That'll give some
> useful information in dmesg:
>
>            # modprobe wireguard && echo module wireguard +p >
> /sys/kernel/debug/dynamic_debug/control

I did enable the debug control and also set
  sysctl -w net.core.message_cost=0
and have extracted a sample of the issue.
Please find it here https://nem3d.net/wireguard_20210512a.txt

From my observation, it is always the following symptoms:
1. Everything is WORKING:
LXC container d1-h sends handshake initiation.
Host wg0 receives, re-creates keypair, answers
d1-h receives, re-creates keypair, sends keepalive
wg0 receives keepalive
etc.

2. Somewhen it BREAKS
d1-h stopps hearing back after 15 seconds.
Initialization loop like mentioned above
d1-h stopps hearing back after 15 seconds.
etc.

As mentioned, the resolution is to dump the config, 
remove the peer, and syncconf to restore.
This time,  I used "nsenter -n" to apply this procedure to the
unprivileged container interface d1-h.

Lastly, we also saw similar behavior even between 2 physical hosts.
I will try to gather similar debug information.

Please let me know if further information is needed to
better understand the problem.

Raoul

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 6069 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread