All of lore.kernel.org
 help / color / mirror / Atom feed
* Should I expect faster recovery after one side goes down
@ 2017-11-27  9:49 Bruno Wolff III
  2017-11-27 11:04 ` Jason A. Donenfeld
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-27  9:49 UTC (permalink / raw)
  To: WireGuard mailing list

I'm not sure what is really going on but I have seen some very long delays 
after one side of the link goes down, while the other keeps sending 
packets. The work around is to restart the local side once the remote side 
is back up.

When I do some testing and say reboot the router the a wg tunnel terminates 
at, while continuing to use the laptop at the other end, after the router 
is back up very little traffic seems to get through or there is a very 
large latency. Restarting the iptables service with systemd will also 
hang. I don't know if that is forever or just a very long time. If I restart 
wireguard on the laptop (which deletes and recreates the device) things 
will start working normally again.

Is there some information I can collect that will illuminate what is going 
on here?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-27  9:49 Should I expect faster recovery after one side goes down Bruno Wolff III
@ 2017-11-27 11:04 ` Jason A. Donenfeld
  2017-11-27 13:49   ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Jason A. Donenfeld @ 2017-11-27 11:04 UTC (permalink / raw)
  To: Bruno Wolff III; +Cc: WireGuard mailing list

Hi Bruno,

The first question is - how long?

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-27 11:04 ` Jason A. Donenfeld
@ 2017-11-27 13:49   ` Bruno Wolff III
  2017-11-27 17:33     ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-27 13:49 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

On Mon, Nov 27, 2017 at 12:04:06 +0100,
  "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>Hi Bruno,
>
>The first question is - how long?

For "systemctl iptables stop" I have waited around a minute before 
using control C. After running "systemctl stop wireguard" or 
"systemctl restart wireguard" (which will delete wg0) "systemctl stop 
iptables" will run with no noticeable delay.

For network traffic, I waited around 10 minutes and things were still not 
working. Web page loads would still time out after a minute or two. But 
I did have a few DNS lookups succeed. I'm not sure if I did something 
that allowed a value to get cached (there is a local caching resolver 
on the affected machines) or if a response eventually made it through. 
After "systemctl restart wireguard" things start working normal right 
away. So I don't know the delay for specific traffic, but it looks to 
be at least a minute for most traffic. The problem does not seem to 
resolve for at least 10 minutes, though I don't think I have ever seen it 
resolve on its own.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-27 13:49   ` Bruno Wolff III
@ 2017-11-27 17:33     ` Bruno Wolff III
  2017-11-27 17:36       ` Jason A. Donenfeld
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-27 17:33 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

On Mon, Nov 27, 2017 at 07:49:14 -0600,
  Bruno Wolff III <bruno@wolff.to> wrote:
>On Mon, Nov 27, 2017 at 12:04:06 +0100,
> "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>>Hi Bruno,
>>
>>The first question is - how long?

This might be related to the amount or type of traffic backed up. The two 
machines where this was very noticeable in testing had all of their traffic 
routed through the tunnel other than the encapsulating packets. (DNS traffic 
gets tunnelled.) Playing with this on my work machine where only traffic 
destined for a few specific hosts was tunnelled, I am finding it hard to 
duplicate the problem.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-27 17:33     ` Bruno Wolff III
@ 2017-11-27 17:36       ` Jason A. Donenfeld
  2017-11-27 18:25         ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Jason A. Donenfeld @ 2017-11-27 17:36 UTC (permalink / raw)
  To: Bruno Wolff III; +Cc: WireGuard mailing list

Hello Bruno,

That's some pretty weird behavior, and it sounds like whatever the
cause is is being obscured under layers of systemd. Perhaps come on
into #wireguard on Freenode and we can debug this in real time? I've
got a few ideas.

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-27 17:36       ` Jason A. Donenfeld
@ 2017-11-27 18:25         ` Bruno Wolff III
  2017-11-28  6:13           ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-27 18:25 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

On Mon, Nov 27, 2017 at 18:36:23 +0100,
  "Jason A. Donenfeld" <Jason@zx2c4.com> wrote:
>Hello Bruno,
>
>That's some pretty weird behavior, and it sounds like whatever the
>cause is is being obscured under layers of systemd. Perhaps come on
>into #wireguard on Freenode and we can debug this in real time? I've
>got a few ideas.

I don't have my laptop with me at work and breaking the wireguard tunnel 
to it will break my access to it from here. I could check configuration 
from work. I can bring it with me tomorrow, otherwise I'll probably get 
home too late for you tonight and I probably won't be up late enough 
to catch you early tomorrow. 

Probably I'll be able to reproduce the issue from work.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-27 18:25         ` Bruno Wolff III
@ 2017-11-28  6:13           ` Bruno Wolff III
  2017-11-28  6:44             ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-28  6:13 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

I'm pretty sure I'm being bit by firewall rules on the router. It seems to 
be rejecting all of the tunnel packets and it has no reason to try to 
connect to the laptop the handshake never occurs again. I suspect that 
normally a connection established related rule lets things through. I just 
need to figure out how the start up packet is different so that it gets 
through. The systemd iptables service eventually seems to stop. Probably 
there is a DNS request that needs to timeout.
I do some source address rewriting and it may be that the initial addresses 
used for the encapsulating packets are different than the ones later.
So most likely this is all on my end and not wireguard related.
Thanks for the tcpdump suggestion. I should have tried that sooner.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-28  6:13           ` Bruno Wolff III
@ 2017-11-28  6:44             ` Bruno Wolff III
  2017-11-28  8:42               ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-28  6:44 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

On Tue, Nov 28, 2017 at 00:13:06 -0600,
  Bruno Wolff III <bruno@wolff.to> wrote:
>I do some source address rewriting and it may be that the initial 
>addresses used for the encapsulating packets are different than the 
>ones later.

When I'm on the local network, 192.168.6.1 gets used for the initial 
source adddress and gets rewritten to 98.103.208.26 in order to make 
the source consistent for the laptop whether or not it is on the 
local network. (That way I don't need to allow connections from 
192.168.6.1 somewhere else where it wouldn't be my router.) When this 
happens the source port seems to normally get changed. Wireguard on the 
laptop remembers the new source port and tries to keep using it after 
the router is rebooted. But during the reboot the router forgets about 
the port mapping so it ends up dropping the packets. It has no reason 
to send packets on its own to the laptop (and wouldn't know where to 
send them) so the port doesn't get corrected.

I think the correct fix is to know if I reboot the router for testing 
something, I need to also restart wireguard to make sure it is sending 
data to the expected port. This isn't going to be an issue in normal 
operation.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-28  6:44             ` Bruno Wolff III
@ 2017-11-28  8:42               ` Bruno Wolff III
  2017-12-01  8:43                 ` Baptiste Jonglez
  0 siblings, 1 reply; 11+ messages in thread
From: Bruno Wolff III @ 2017-11-28  8:42 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

On Tue, Nov 28, 2017 at 00:44:13 -0600,
  Bruno Wolff III <bruno@wolff.to> wrote:
>
>I think the correct fix is to know if I reboot the router for testing 
>something, I need to also restart wireguard to make sure it is sending 
>data to the expected port. This isn't going to be an issue in normal 
>operation.

I found a way to make it work more automatically. The reason the port 
was getting reassigned was because the original connection packet was 
being tracked and was conflicting with the source nat mapping even though 
in reallity the connection was the same. By putting in CT --notrack rules 
I was able to block that traking and without the conflict the port doesn't 
get remapped. I don't need tracking or the original connection for my 
firewall rules so this should be OK. On testing it seems to work as 
expected. Now when I reboot my router, my laptop reconnects and the wireguard 
tunnel works without having to restart it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-11-28  8:42               ` Bruno Wolff III
@ 2017-12-01  8:43                 ` Baptiste Jonglez
  2017-12-01 17:02                   ` Bruno Wolff III
  0 siblings, 1 reply; 11+ messages in thread
From: Baptiste Jonglez @ 2017-12-01  8:43 UTC (permalink / raw)
  To: wireguard

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

Hi,

On 28-11-17, Bruno Wolff III wrote:
> On Tue, Nov 28, 2017 at 00:44:13 -0600,
>  Bruno Wolff III <bruno@wolff.to> wrote:
> >
> >I think the correct fix is to know if I reboot the router for testing
> >something, I need to also restart wireguard to make sure it is sending
> >data to the expected port. This isn't going to be an issue in normal
> >operation.

It sounds like one of these situations where persistent keepalives would
be useful, doesn't it?

This way the laptop would create a new binding in your firewall.

> I found a way to make it work more automatically. The reason the port was
> getting reassigned was because the original connection packet was being
> tracked and was conflicting with the source nat mapping even though in
> reallity the connection was the same. By putting in CT --notrack rules I was
> able to block that traking and without the conflict the port doesn't get
> remapped. I don't need tracking or the original connection for my firewall
> rules so this should be OK. On testing it seems to work as expected. Now
> when I reboot my router, my laptop reconnects and the wireguard tunnel works
> without having to restart it.
> _______________________________________________
> WireGuard mailing list
> WireGuard@lists.zx2c4.com
> https://lists.zx2c4.com/mailman/listinfo/wireguard

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Should I expect faster recovery after one side goes down
  2017-12-01  8:43                 ` Baptiste Jonglez
@ 2017-12-01 17:02                   ` Bruno Wolff III
  0 siblings, 0 replies; 11+ messages in thread
From: Bruno Wolff III @ 2017-12-01 17:02 UTC (permalink / raw)
  To: Baptiste Jonglez; +Cc: wireguard

On Fri, Dec 01, 2017 at 09:43:19 +0100,
  Baptiste Jonglez <baptiste@bitsofnetworks.org> wrote:
>
>It sounds like one of these situations where persistent keepalives would
>be useful, doesn't it?

It is definitely useful as the laptop is expected to be behind NAT, but it 
doesn't help with the rebooting the router breaking historical source 
NAT (on my local network) while the other end remembers where it last got a 
packet from. The solution below dealt with that.

>
>> I found a way to make it work more automatically. The reason the port was
>> getting reassigned was because the original connection packet was being
>> tracked and was conflicting with the source nat mapping even though in
>> reallity the connection was the same. By putting in CT --notrack rules I was
>> able to block that traking and without the conflict the port doesn't get
>> remapped. I don't need tracking or the original connection for my firewall
>> rules so this should be OK. On testing it seems to work as expected. Now
>> when I reboot my router, my laptop reconnects and the wireguard tunnel works
>> without having to restart it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-12-01 16:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27  9:49 Should I expect faster recovery after one side goes down Bruno Wolff III
2017-11-27 11:04 ` Jason A. Donenfeld
2017-11-27 13:49   ` Bruno Wolff III
2017-11-27 17:33     ` Bruno Wolff III
2017-11-27 17:36       ` Jason A. Donenfeld
2017-11-27 18:25         ` Bruno Wolff III
2017-11-28  6:13           ` Bruno Wolff III
2017-11-28  6:44             ` Bruno Wolff III
2017-11-28  8:42               ` Bruno Wolff III
2017-12-01  8:43                 ` Baptiste Jonglez
2017-12-01 17:02                   ` Bruno Wolff III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.