All of lore.kernel.org
 help / color / mirror / Atom feed
* Another roaming problem
@ 2018-03-08 14:29 Toke Høiland-Jørgensen
  2018-03-08 14:49 ` Matthias Urlichs
  2018-03-08 16:18 ` Jason A. Donenfeld
  0 siblings, 2 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-08 14:29 UTC (permalink / raw)
  To: wireguard

So I ran into another roaming problem, which I thought I'd open a
separate thread for.

Basically, the problem is this: I run wireguard on my gateway router,
which I then connect to from road warriors (laptop, phone) and tunnel
all my traffic through it.

This works well, except that when the client is connected to the local
network (behind the gateway router), it'll start talking to the internal
interface of the gateway device, and so the client will change its idea
of the endpoint address to the internal (private) address. And so, when
I leave the local network, it can no longer reach the server, and I have
to restart the wireguard interface on the client to get connectivity.

So is there a way to either tell the client not to change its idea of
the endpoint, or to tell the server to always use a certain source
address for outgoing packets?

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 14:29 Another roaming problem Toke Høiland-Jørgensen
@ 2018-03-08 14:49 ` Matthias Urlichs
  2018-03-08 16:18 ` Jason A. Donenfeld
  1 sibling, 0 replies; 18+ messages in thread
From: Matthias Urlichs @ 2018-03-08 14:49 UTC (permalink / raw)
  To: wireguard

On 08.03.2018 15:29, Toke Høiland-Jørgensen wrote:
> tell the server to always use a certain source
> address for outgoing packets

You can do that with a SNAT firewall rule.

-- 
-- Matthias Urlichs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 14:29 Another roaming problem Toke Høiland-Jørgensen
  2018-03-08 14:49 ` Matthias Urlichs
@ 2018-03-08 16:18 ` Jason A. Donenfeld
  2018-03-08 16:59   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-08 16:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

Hi Toke,

On Thu, Mar 8, 2018 at 3:29 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=
.dk> wrote:
> So is there a way to either tell the client not to change its idea of
> the endpoint, or to tell the server to always use a certain source
> address for outgoing packets?

There have been some discussions on adding another [gasp] nob to clamp
an endpoint, for this reason and some other related ones. But the
source address caching is supposed to be sticky. That is -- it's
supposed to be that WireGuard will use the correct source address
based on in the prior incoming packet. I can try to reproduce to see
if perhaps you're uncovering some incorrect behavior here. More
generally speaking, it seems like this problem is occurring for you
because of NAT and so I wonder if a simpler solution would also
involve NAT -- namely, configuring "hair pin" NAT?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 16:18 ` Jason A. Donenfeld
@ 2018-03-08 16:59   ` Toke Høiland-Jørgensen
  2018-03-08 17:02     ` Jason A. Donenfeld
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-08 16:59 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list



On 8 March 2018 17:18:47 CET, "Jason A=2E Donenfeld" <Jason@zx2c4=2Ecom> w=
rote:
>Hi Toke,
>
>On Thu, Mar 8, 2018 at 3:29 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@to=
ke=2Edk>
>wrote:
>> So is there a way to either tell the client not to change its idea of
>> the endpoint, or to tell the server to always use a certain source
>> address for outgoing packets?
>
>There have been some discussions on adding another [gasp] nob to clamp
>an endpoint, for this reason and some other related ones=2E But the
>source address caching is supposed to be sticky=2E That is -- it's
>supposed to be that WireGuard will use the correct source address
>based on in the prior incoming packet=2E I can try to reproduce to see
>if perhaps you're uncovering some incorrect behavior here=2E More
>generally speaking, it seems like this problem is occurring for you
>because of NAT=20

Well, in the sense that this wouldn't be a problem if there was no NAT on =
the internet, sure=2E=2E=2E

But other than that, how is it related to NAT?

> and so I wonder if a simpler solution would also
>involve NAT -- namely, configuring "hair pin" NAT?

What's that?

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 16:59   ` Toke Høiland-Jørgensen
@ 2018-03-08 17:02     ` Jason A. Donenfeld
  2018-03-08 17:23       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-08 17:02 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

On Thu, Mar 8, 2018 at 5:59 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=
.dk> wrote:
>> and so I wonder if a simpler solution would also
>>involve NAT -- namely, configuring "hair pin" NAT?
>
> What's that?

It's the terrible vendor term for hitting the gateway through one of
its IPs (say, the public one) and having it forward packets for you to
another machine on the same LAN. The idea here, being, you'd get to
keep using the same IP address for communicating, even when you're
behind NAT in the private network. (This seems to work well for me at
my house.)

Wikipedia describes it in terms of the p2p discovery issue, which is
slightly different, but still the same underlying concept:
https://en.wikipedia.org/wiki/Hairpinning

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 17:02     ` Jason A. Donenfeld
@ 2018-03-08 17:23       ` Toke Høiland-Jørgensen
  2018-03-08 17:39         ` Jason A. Donenfeld
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-08 17:23 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> On Thu, Mar 8, 2018 at 5:59 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@to=
ke.dk> wrote:
>>> and so I wonder if a simpler solution would also
>>>involve NAT -- namely, configuring "hair pin" NAT?
>>
>> What's that?
>
> It's the terrible vendor term for hitting the gateway through one of
> its IPs (say, the public one) and having it forward packets for you to
> another machine on the same LAN. The idea here, being, you'd get to
> keep using the same IP address for communicating, even when you're
> behind NAT in the private network. (This seems to work well for me at
> my house.)
>
> Wikipedia describes it in terms of the p2p discovery issue, which is
> slightly different, but still the same underlying concept:
> https://en.wikipedia.org/wiki/Hairpinning

Ah, right. In that case I think I probably didn't explain my setup well
enough. Let me try again:


I have a gateway device with two interfaces, one public and one private.
This device performs NAT, and is also the one running wireguard (as the
'server'). The client roams. So I have two cases:


C (public IP) --- (public IP) GW (private IP) -- [LAN]

In this case, C talks to GW on GWs public IP; everything works fine.

Second case:

[internet] --- (public IP) GW (private IP) -- [LAN] -- C (private IP)

Here, C talks to GW; it still tries to send packets to the public IP of
GW (because that is what it's configured to do), but because GW sees
that the source IP is on its internal subnet, it replies with a source
address in the private subnet. This works fine as long as the client is
on the LAN; but once it roams outside, it now thinks that the wireguard
server lives on the private IP of the GW, which is obviously can't reach
from its shiny new public IP.

So what I'd want to happen is that GW should keep using its public
IP as the source of the wireguard packets, even when talking to a client
on a directly-connected internal subnet. Or, alternatively, that C
should ignore the source address change of the packets coming from GW
and keep sending its packets to the public IP it was first configured to
use...

This is all orthogonal to NAT, as far as I can tell :)

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 17:23       ` Toke Høiland-Jørgensen
@ 2018-03-08 17:39         ` Jason A. Donenfeld
  2018-03-08 17:50           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-08 17:39 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

Hi Toke,

On Thu, Mar 8, 2018 at 6:23 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=
.dk> wrote:
>
> I have a gateway device with two interfaces, one public and one private.
> This device performs NAT, and is also the one running wireguard (as the
> 'server'). The client roams. So I have two cases:
>
>
> C (public IP) --- (public IP) GW (private IP) -- [LAN]
>
> In this case, C talks to GW on GWs public IP; everything works fine.
>
> Second case:
>
> [internet] --- (public IP) GW (private IP) -- [LAN] -- C (private IP)
>
> Here, C talks to GW; it still tries to send packets to the public IP of
> GW (because that is what it's configured to do), but because GW sees
> that the source IP is on its internal subnet, it replies with a source
> address in the private subnet. This works fine as long as the client is
> on the LAN; but once it roams outside, it now thinks that the wireguard
> server lives on the private IP of the GW, which is obviously can't reach
> from its shiny new public IP.
>
> So what I'd want to happen is that GW should keep using its public
> IP as the source of the wireguard packets, even when talking to a client
> on a directly-connected internal subnet. Or, alternatively, that C
> should ignore the source address change of the packets coming from GW
> and keep sending its packets to the public IP it was first configured to
> use...
>

In this case, WireGuard is indeed supposed to make the right decision.
Namely, it should continue replying using the correct source address.
It's not supposed to switch to the internal one. I have the exact same
setup at home, so I just tried things out again to verify, and from my
end it seems to be working fine:

zx2c4@thinkpad ~ $ wg
interface: martino
  public key: 4HUj8boJyeZI70WVxmKhHfGAohtoyFQpWk96OpuFcVY=3D
  private key: (hidden)
  listening port: 53249
  fwmark: 0xca6c

peer: GMvmorUa9WzHAkOVOxQKSrw3F1JruA4bTN1NkWN0T3E=3D
  preshared key: (hidden)
  endpoint: 129.228.12.33:10000
  allowed ips: 0.0.0.0/0, ::/0
  latest handshake: 48 seconds ago
  transfer: 1.06 KiB received, 19.50 KiB sent
zx2c4@thinkpad ~ $ ip link set wwan0 down
zx2c4@thinkpad ~ $ ip link set wlan0 up
zx2c4@thinkpad ~ $ pingg
PING google.com (172.217.19.142) 56(84) bytes of data.
64 bytes from mrs08s04-in-f14.1e100.net (172.217.19.142): icmp_seq=3D1
ttl=3D53 time=3D20.1 ms
64 bytes from mrs08s04-in-f14.1e100.net (172.217.19.142): icmp_seq=3D2
ttl=3D53 time=3D19.1 ms
^C
--- google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev =3D 19.181/19.666/20.151/0.485 ms
zx2c4@thinkpad ~ $ wg
interface: martino
  public key: 4HUj8boJyeZI70WVxmKhHfGAohtoyFQpWk96OpuFcVY=3D
  private key: (hidden)
  listening port: 53249
  fwmark: 0xca6c

peer: GMvmorUa9WzHAkOVOxQKSrw3F1JruA4bTN1NkWN0T3E=3D
  preshared key: (hidden)
  endpoint: 129.228.12.33:10000
  allowed ips: 0.0.0.0/0, ::/0
  latest handshake: 5 seconds ago
  transfer: 113.70 KiB received, 85.43 KiB sent

I wonder what might be different about your configuration...

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 17:39         ` Jason A. Donenfeld
@ 2018-03-08 17:50           ` Toke Høiland-Jørgensen
  2018-03-08 18:03             ` Jason A. Donenfeld
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-08 17:50 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list



On 8 March 2018 18:39:15 CET, "Jason A=2E Donenfeld" <Jason@zx2c4=2Ecom> w=
rote:
>Hi Toke,
>
>On Thu, Mar 8, 2018 at 6:23 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@to=
ke=2Edk>
>wrote:
>>
>> I have a gateway device with two interfaces, one public and one
>private=2E
>> This device performs NAT, and is also the one running wireguard (as
>the
>> 'server')=2E The client roams=2E So I have two cases:
>>
>>
>> C (public IP) --- (public IP) GW (private IP) -- [LAN]
>>
>> In this case, C talks to GW on GWs public IP; everything works fine=2E
>>
>> Second case:
>>
>> [internet] --- (public IP) GW (private IP) -- [LAN] -- C (private IP)
>>
>> Here, C talks to GW; it still tries to send packets to the public IP
>of
>> GW (because that is what it's configured to do), but because GW sees
>> that the source IP is on its internal subnet, it replies with a
>source
>> address in the private subnet=2E This works fine as long as the client
>is
>> on the LAN; but once it roams outside, it now thinks that the
>wireguard
>> server lives on the private IP of the GW, which is obviously can't
>reach
>> from its shiny new public IP=2E
>>
>> So what I'd want to happen is that GW should keep using its public
>> IP as the source of the wireguard packets, even when talking to a
>client
>> on a directly-connected internal subnet=2E Or, alternatively, that C
>> should ignore the source address change of the packets coming from GW
>> and keep sending its packets to the public IP it was first configured
>to
>> use=2E=2E=2E
>>
>
>In this case, WireGuard is indeed supposed to make the right decision=2E
>Namely, it should continue replying using the correct source address=2E
>It's not supposed to switch to the internal one=2E I have the exact same
>setup at home, so I just tried things out again to verify, and from my
>end it seems to be working fine:
>
>zx2c4@thinkpad ~ $ wg
>interface: martino
>  public key: 4HUj8boJyeZI70WVxmKhHfGAohtoyFQpWk96OpuFcVY=3D
>  private key: (hidden)
>  listening port: 53249
>  fwmark: 0xca6c
>
>peer: GMvmorUa9WzHAkOVOxQKSrw3F1JruA4bTN1NkWN0T3E=3D
>  preshared key: (hidden)
>  endpoint: 129=2E228=2E12=2E33:10000
>  allowed ips: 0=2E0=2E0=2E0/0, ::/0
>  latest handshake: 48 seconds ago
>  transfer: 1=2E06 KiB received, 19=2E50 KiB sent
>zx2c4@thinkpad ~ $ ip link set wwan0 down
>zx2c4@thinkpad ~ $ ip link set wlan0 up
>zx2c4@thinkpad ~ $ pingg
>PING google=2Ecom (172=2E217=2E19=2E142) 56(84) bytes of data=2E
>64 bytes from mrs08s04-in-f14=2E1e100=2Enet (172=2E217=2E19=2E142): icmp_=
seq=3D1
>ttl=3D53 time=3D20=2E1 ms
>64 bytes from mrs08s04-in-f14=2E1e100=2Enet (172=2E217=2E19=2E142): icmp_=
seq=3D2
>ttl=3D53 time=3D19=2E1 ms
>^C
>--- google=2Ecom ping statistics ---
>2 packets transmitted, 2 received, 0% packet loss, time 1001ms
>rtt min/avg/max/mdev =3D 19=2E181/19=2E666/20=2E151/0=2E485 ms
>zx2c4@thinkpad ~ $ wg
>interface: martino
>  public key: 4HUj8boJyeZI70WVxmKhHfGAohtoyFQpWk96OpuFcVY=3D
>  private key: (hidden)
>  listening port: 53249
>  fwmark: 0xca6c
>
>peer: GMvmorUa9WzHAkOVOxQKSrw3F1JruA4bTN1NkWN0T3E=3D
>  preshared key: (hidden)
>  endpoint: 129=2E228=2E12=2E33:10000
>  allowed ips: 0=2E0=2E0=2E0/0, ::/0
>  latest handshake: 5 seconds ago
>  transfer: 113=2E70 KiB received, 85=2E43 KiB sent
>
>I wonder what might be different about your configuration=2E=2E=2E

Well, I do generally setup routing in a somewhat unusual manner=2E

I can try to capture some packet dumps tomorrow to poke into it a bit more=
=2E Anything in particular I should look for?

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 17:50           ` Toke Høiland-Jørgensen
@ 2018-03-08 18:03             ` Jason A. Donenfeld
  2018-03-09 10:08               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-08 18:03 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

On Thu, Mar 8, 2018 at 6:50 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=
.dk> wrote:
> Well, I do generally setup routing in a somewhat unusual manner.
>
> I can try to capture some packet dumps tomorrow to poke into it a bit mor=
e. Anything in particular I should look for?

One thing to examine is when WireGuard calls
`socket_clear_peer_endpoint_src'. This makes wireguard forget the
source address that it should be using and fall back to the default.
You could add a pr_info(...) call in this function. I have an inkling
that I make calls to this function too zealously and in potentially
unneeded places, such as on handshake transmission retries.

I'm headed out of town super soon, so likely debugging this will have
to wait until I'm back, but do let me know what you find, and we'll
get this fixed up upon return.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-08 18:03             ` Jason A. Donenfeld
@ 2018-03-09 10:08               ` Toke Høiland-Jørgensen
  2018-03-09 14:32                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-09 10:08 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> On Thu, Mar 8, 2018 at 6:50 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@to=
ke.dk> wrote:
>> Well, I do generally setup routing in a somewhat unusual manner.
>>
>> I can try to capture some packet dumps tomorrow to poke into it a bit mo=
re. Anything in particular I should look for?
>
> One thing to examine is when WireGuard calls
> `socket_clear_peer_endpoint_src'. This makes wireguard forget the
> source address that it should be using and fall back to the default.
> You could add a pr_info(...) call in this function. I have an inkling
> that I make calls to this function too zealously and in potentially
> unneeded places, such as on handshake transmission retries.
>
> I'm headed out of town super soon, so likely debugging this will have
> to wait until I'm back, but do let me know what you find, and we'll
> get this fixed up upon return.

Well, completely failed to reproduce it; everything works as its
supposed to now (wireguard correctly picks the public IP as its source
address when replying to the client).

Not sure if I have changed something in my setup or what is going on;
but at least I can roam now, so I'm happy ;)

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 10:08               ` Toke Høiland-Jørgensen
@ 2018-03-09 14:32                 ` Toke Høiland-Jørgensen
  2018-03-09 14:35                   ` Jason A. Donenfeld
  2018-03-09 14:39                   ` Toke Høiland-Jørgensen
  0 siblings, 2 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-09 14:32 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:

> "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
>
>> On Thu, Mar 8, 2018 at 6:50 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@t=
oke.dk> wrote:
>>> Well, I do generally setup routing in a somewhat unusual manner.
>>>
>>> I can try to capture some packet dumps tomorrow to poke into it a bit m=
ore. Anything in particular I should look for?
>>
>> One thing to examine is when WireGuard calls
>> `socket_clear_peer_endpoint_src'. This makes wireguard forget the
>> source address that it should be using and fall back to the default.
>> You could add a pr_info(...) call in this function. I have an inkling
>> that I make calls to this function too zealously and in potentially
>> unneeded places, such as on handshake transmission retries.
>>
>> I'm headed out of town super soon, so likely debugging this will have
>> to wait until I'm back, but do let me know what you find, and we'll
>> get this fixed up upon return.
>
> Well, completely failed to reproduce it; everything works as its
> supposed to now (wireguard correctly picks the public IP as its source
> address when replying to the client).
>
> Not sure if I have changed something in my setup or what is going on;
> but at least I can roam now, so I'm happy ;)

Scratch that, it's still happening; just not straight away upon roaming.
It is definitely a timeout thing; installed a kprobe on the function you
mentioned and got this strack trace when it switches IP:

TIME(s)            FUNCTION
104.999884129      socket_clear_peer_endpoint_src
	socket_clear_peer_endpoint_src
	expired_new_handshake
	call_timer_fn
	run_timer_softirq
	__do_softirq
	irq_exit
	smp_apic_timer_interrupt
	__irqentry_text_start
	cpuidle_enter_state
	do_idle
	cpu_startup_entry
	start_secondary
	secondary_startup_64


Think it may be related to powersave on the phone or something? Doesn't
seem to happen with my laptop at least...

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:32                 ` Toke Høiland-Jørgensen
@ 2018-03-09 14:35                   ` Jason A. Donenfeld
  2018-03-09 14:42                     ` Toke Høiland-Jørgensen
  2018-03-09 14:39                   ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-09 14:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

Hi Toke,

That all makes sense. I'm going out of town extremely soon, but I'll
fix this when I've returned. I have a pretty good idea of what's
required. If you're curious to try it yourself, just try removing
invocations of socket_clear_peer_endpoint_src inside timers.c.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:32                 ` Toke Høiland-Jørgensen
  2018-03-09 14:35                   ` Jason A. Donenfeld
@ 2018-03-09 14:39                   ` Toke Høiland-Jørgensen
  2018-03-09 14:41                     ` Jason A. Donenfeld
  1 sibling, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-09 14:39 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:

> Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke.dk> writes:
>
>> "Jason A. Donenfeld" <Jason@zx2c4.com> writes:
>>
>>> On Thu, Mar 8, 2018 at 6:50 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@=
toke.dk> wrote:
>>>> Well, I do generally setup routing in a somewhat unusual manner.
>>>>
>>>> I can try to capture some packet dumps tomorrow to poke into it a bit =
more. Anything in particular I should look for?
>>>
>>> One thing to examine is when WireGuard calls
>>> `socket_clear_peer_endpoint_src'. This makes wireguard forget the
>>> source address that it should be using and fall back to the default.
>>> You could add a pr_info(...) call in this function. I have an inkling
>>> that I make calls to this function too zealously and in potentially
>>> unneeded places, such as on handshake transmission retries.
>>>
>>> I'm headed out of town super soon, so likely debugging this will have
>>> to wait until I'm back, but do let me know what you find, and we'll
>>> get this fixed up upon return.
>>
>> Well, completely failed to reproduce it; everything works as its
>> supposed to now (wireguard correctly picks the public IP as its source
>> address when replying to the client).
>>
>> Not sure if I have changed something in my setup or what is going on;
>> but at least I can roam now, so I'm happy ;)
>
> Scratch that, it's still happening; just not straight away upon roaming.
> It is definitely a timeout thing; installed a kprobe on the function you
> mentioned and got this strack trace when it switches IP:
>
> TIME(s)            FUNCTION
> 104.999884129      socket_clear_peer_endpoint_src
> 	socket_clear_peer_endpoint_src
> 	expired_new_handshake
> 	call_timer_fn
> 	run_timer_softirq
> 	__do_softirq
> 	irq_exit
> 	smp_apic_timer_interrupt
> 	__irqentry_text_start
> 	cpuidle_enter_state
> 	do_idle
> 	cpu_startup_entry
> 	start_secondary
> 	secondary_startup_64

And leaving it running a bit more, there is also a call from
expired_retransmit_handshake:

449.079751015      socket_clear_peer_endpoint_src
	socket_clear_peer_endpoint_src
	expired_retransmit_handshake
	call_timer_fn
	run_timer_softirq
	__do_softirq
	irq_exit
	smp_apic_timer_interrupt
	__irqentry_text_start
	cpuidle_enter_state
	do_idle
	cpu_startup_entry
	start_secondary
	secondary_startup_64


-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:39                   ` Toke Høiland-Jørgensen
@ 2018-03-09 14:41                     ` Jason A. Donenfeld
  2018-03-09 14:46                       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-09 14:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

On Fri, Mar 9, 2018 at 3:39 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@toke=
.dk> wrote:
> And leaving it running a bit more, there is also a call from
> expired_retransmit_handshake:

Yep! These are the two calls in timers.c.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:35                   ` Jason A. Donenfeld
@ 2018-03-09 14:42                     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-09 14:42 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Hi Toke,
>
> That all makes sense. I'm going out of town extremely soon, but I'll
> fix this when I've returned. I have a pretty good idea of what's
> required. If you're curious to try it yourself, just try removing
> invocations of socket_clear_peer_endpoint_src inside timers.c.

Cool! I'll be travelling for a few weeks myself, so no great rush. Not
sure I'll the time to try out any changes before I leave...

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:41                     ` Jason A. Donenfeld
@ 2018-03-09 14:46                       ` Toke Høiland-Jørgensen
  2018-03-09 14:48                         ` Jason A. Donenfeld
  0 siblings, 1 reply; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-09 14:46 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> On Fri, Mar 9, 2018 at 3:39 PM, Toke H=C3=B8iland-J=C3=B8rgensen <toke@to=
ke.dk> wrote:
>> And leaving it running a bit more, there is also a call from
>> expired_retransmit_handshake:
>
> Yep! These are the two calls in timers.c.

Right, cool. Kprobes are awesome, BTW (this was my first time trying this):

https://github.com/iovisor/bcc/blob/master/examples/tracing/stacksnoop.py

./stacksnoop.py socket_clear_peer_endpoint_src

and presto; nice stack traces every time they are called :D

-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:46                       ` Toke Høiland-Jørgensen
@ 2018-03-09 14:48                         ` Jason A. Donenfeld
  2018-03-09 14:53                           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 18+ messages in thread
From: Jason A. Donenfeld @ 2018-03-09 14:48 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: WireGuard mailing list

Neat script, looks pretty easy to use. The wg repo has a kprobes
script too for extracting ephemeral keys from the kernel:

https://git.zx2c4.com/WireGuard/tree/contrib/examples/extract-handshakes

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Another roaming problem
  2018-03-09 14:48                         ` Jason A. Donenfeld
@ 2018-03-09 14:53                           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 18+ messages in thread
From: Toke Høiland-Jørgensen @ 2018-03-09 14:53 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: WireGuard mailing list

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Neat script, looks pretty easy to use. The wg repo has a kprobes
> script too for extracting ephemeral keys from the kernel:
>
> https://git.zx2c4.com/WireGuard/tree/contrib/examples/extract-handshakes

Neat! Brave new world of debugging ;)

/me goes to write some more printk's


-Toke

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2018-03-09 14:43 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 14:29 Another roaming problem Toke Høiland-Jørgensen
2018-03-08 14:49 ` Matthias Urlichs
2018-03-08 16:18 ` Jason A. Donenfeld
2018-03-08 16:59   ` Toke Høiland-Jørgensen
2018-03-08 17:02     ` Jason A. Donenfeld
2018-03-08 17:23       ` Toke Høiland-Jørgensen
2018-03-08 17:39         ` Jason A. Donenfeld
2018-03-08 17:50           ` Toke Høiland-Jørgensen
2018-03-08 18:03             ` Jason A. Donenfeld
2018-03-09 10:08               ` Toke Høiland-Jørgensen
2018-03-09 14:32                 ` Toke Høiland-Jørgensen
2018-03-09 14:35                   ` Jason A. Donenfeld
2018-03-09 14:42                     ` Toke Høiland-Jørgensen
2018-03-09 14:39                   ` Toke Høiland-Jørgensen
2018-03-09 14:41                     ` Jason A. Donenfeld
2018-03-09 14:46                       ` Toke Høiland-Jørgensen
2018-03-09 14:48                         ` Jason A. Donenfeld
2018-03-09 14:53                           ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.