wireguard.lists.zx2c4.com archive mirror
 help / color / mirror / Atom feed
* Mixed MTU hosts on a network
@ 2018-03-16  9:25 Roman Mamedov
  2018-03-16  9:35 ` Matthias Ordner
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Roman Mamedov @ 2018-03-16  9:25 UTC (permalink / raw)
  To: wireguard

Hello,

I have a host which is on PPPoE and has 1492 as underlying MTU.

When WireGuard starts by default, it sets MTU of its interface to 1420. All
TCP connections trying to send a stream of data over the WG interface to that
host, hang up (I test with iperf3).

My first idea was to override the MTU for this specific host via adding a
route:

# ip -6 route add fd39:30::250/128 dev wg0 mtu 1412 metric 1

# ip -6 route | grep ^fd39:30
fd39:30::250 dev wg0  metric 1  mtu 1412
fd39:30::/64 dev wg0  proto kernel  metric 256

# ip route get fd39:30::250
fd39:30::250 from :: dev wg0  src fd39:30::2  metric 1  mtu 1412

However, this does not help at all. Even adding the corresponding route on the
other side. Even using the "mtu lock" keyword instead of just "mtu". I am still
puzzled why. Any ideas?

=========================================
# iperf3 -c fd39:30::250
Connecting to host fd39:30::250, port 5201
[  4] local fd39:30::2 port 44902 connected to fd39:30::250 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   474 KBytes  3.88 Mbits/sec    1   1.31 KBytes       
[  4]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.31 KBytes       
[  4]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    1   1.31 KBytes       
[  4]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    0   1.31 KBytes       
[  4]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    1   1.31 KBytes       
[  4]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.31 KBytes       
[  4]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    0   1.31 KBytes       
[  4]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.31 KBytes       
[  4]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.31 KBytes       
[  4]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    1   1.31 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   474 KBytes   388 Kbits/sec    5             sender
[  4]   0.00-10.00  sec  0.00 Bytes  0.00 bits/sec                  receiver

iperf Done.
=========================================

What helps, is only reducing MTU of the entire wg0 interface to 1412. Then
everything works fine. But it doesn't feel optimal to reduce MTU of the entire
network just because of 1 or 2 hosts. I would rather use a couple of those
mtu-override routes, if they worked.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-03-16  9:25 Mixed MTU hosts on a network Roman Mamedov
@ 2018-03-16  9:35 ` Matthias Ordner
  2018-03-16 10:53   ` Roman Mamedov
  2018-03-16 10:01 ` Kalin KOZHUHAROV
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Matthias Ordner @ 2018-03-16  9:35 UTC (permalink / raw)
  To: wireguard

[-- Attachment #1: Type: text/plain, Size: 329 bytes --]

Hi Roman,

> When WireGuard starts by default, it sets MTU of its interface to 1420. All
> TCP connections trying to send a stream of data over the WG interface to that
> host, hang up (I test with iperf3).

If you only care about TCP connections you could set a different TCP-MSS 
with an iptables rule.

Best regards

Matthias

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 6394 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-03-16  9:25 Mixed MTU hosts on a network Roman Mamedov
  2018-03-16  9:35 ` Matthias Ordner
@ 2018-03-16 10:01 ` Kalin KOZHUHAROV
  2018-03-26 19:12 ` Luis Ressel
  2018-04-14  1:38 ` Jason A. Donenfeld
  3 siblings, 0 replies; 15+ messages in thread
From: Kalin KOZHUHAROV @ 2018-03-16 10:01 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: WireGuard mailing list

On Fri, Mar 16, 2018 at 10:25 AM, Roman Mamedov <rm.wg@romanrm.net> wrote:
> Hello,
>
> I have a host which is on PPPoE and has 1492 as underlying MTU.
>
> When WireGuard starts by default, it sets MTU of its interface to 1420. All
> TCP connections trying to send a stream of data over the WG interface to that
> host, hang up (I test with iperf3).
>
> My first idea was to override the MTU for this specific host via adding a
> route:
>
> # ip -6 route add fd39:30::250/128 dev wg0 mtu 1412 metric 1
>
> # ip -6 route | grep ^fd39:30
> fd39:30::250 dev wg0  metric 1  mtu 1412
> fd39:30::/64 dev wg0  proto kernel  metric 256
>
> # ip route get fd39:30::250
> fd39:30::250 from :: dev wg0  src fd39:30::2  metric 1  mtu 1412
>
> However, this does not help at all. Even adding the corresponding route on the
> other side. Even using the "mtu lock" keyword instead of just "mtu". I am still
> puzzled why. Any ideas?
>
Isn't it because routing is done by WG itself, based on AlowedIPs, so
that routing table is not considered at all, after the packet is given
to WG?

Those are assumptions of how things work, I haven't looked at the code.

> What helps, is only reducing MTU of the entire wg0 interface to 1412. Then
> everything works fine. But it doesn't feel optimal to reduce MTU of the entire
> network just because of 1 or 2 hosts. I would rather use a couple of those
> mtu-override routes, if they worked.
>
You may need to pre-shape the packets for the "offenders", e.g.

ip6tables -t mangle -A POSTROUTING -o wg0 -d WHATEVERHOST -p tcp -m
tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1352

https://www.netfilter.org/documentation/HOWTO/netfilter-extensions-HOWTO-4.html#ss4.7

O, wait! You talk IPv6...

ip6tables -t mangle -A POSTROUTING -o wg0 -d fd39:30::250/128 -p tcp
-m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1372

You can also try setting the route MTU as above and then use "... -j
TCPMSS --clamp-mss-to-pmtu", although it may be more work and/or might
not work.

Cheers,
Kalin.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-03-16  9:35 ` Matthias Ordner
@ 2018-03-16 10:53   ` Roman Mamedov
  2018-03-16 16:20     ` Roman Mamedov
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2018-03-16 10:53 UTC (permalink / raw)
  To: Matthias Ordner, Kalin KOZHUHAROV; +Cc: wireguard

[-- Attachment #1: Type: text/plain, Size: 1837 bytes --]

On Fri, 16 Mar 2018 10:35:18 +0100
Matthias Ordner <matthias.ordner@noris.net> wrote:

> If you only care about TCP connections you could set a different TCP-MSS 
> with an iptables rule.

On Fri, 16 Mar 2018 11:01:51 +0100
Kalin KOZHUHAROV <me.kalin@gmail.com> wrote:

> You may need to pre-shape the packets for the "offenders", e.g.
> 
> ip6tables -t mangle -A POSTROUTING -o wg0 -d WHATEVERHOST -p tcp -m
> tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1352
> 
> https://www.netfilter.org/documentation/HOWTO/netfilter-extensions-HOWTO-4.html#ss4.7
> 
> O, wait! You talk IPv6...
> 
> ip6tables -t mangle -A POSTROUTING -o wg0 -d fd39:30::250/128 -p tcp
> -m tcp --tcp-flags SYN,RST SYN -j TCPMSS --set-mss 1372

I knew about this option, but wanted to avoid it because it would incur more
overhead (going to iptables for this) and a bit more complexity.

But guess what, turns out that didn't work either. Tried both OUTPUT and
POSTROUTING chains on the "mangle" table, and set-mss all the way down to
1220, no matter what, the iperf3 output looked the same as before. At this
point I thought I'm going crazy or something. :)

It's not just iperf either, trying to send a file with "netcat6" into a
running listener on the other side also failed to transfer data.

Then almost by accident, I discovered that what also helps. It's to reduce
interface MTU only on the receiver, but just by a bit more, to 1408.

So what makes it work is EITHER:

a) set MTU 1412 on wg0 at sender;

OR

b) set MTU 1408 on wg0 at receiver.

...doing both at the same time is not even necessary. Some tcpdumps from the
receiver host are attached to demonstrate (if anyone else thinks I am crazy :).

Now, I can live with just the impacted (PPPoE) hosts having a lower MTU on wg0.

But still the whole thing seems rather weird.

-- 
With respect,
Roman

[-- Attachment #2: mtu-tcpdump.txt --]
[-- Type: text/plain, Size: 5715 bytes --]

Receiver mtu 1420, sender mtu 1412, successful transfer:

# tcpdump -i wg0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
15:42:35.027995 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [S], seq 4148302601, win 27040, options [mss 1352,sackOK,TS val 2239613851 ecr 0,nop,wscale 9], length 0
15:42:35.028026 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [S.], seq 505975510, ack 4148302602, win 26960, options [mss 1360,sackOK,TS val 1473426057 ecr 2239613851,nop,wscale 9], length 0
15:42:35.102517 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [.], ack 1, win 53, options [nop,nop,TS val 2239613925 ecr 1473426057], length 0
15:42:35.102772 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [.], seq 1:1341, ack 1, win 53, options [nop,nop,TS val 2239613925 ecr 1473426057], length 1340
15:42:35.102785 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [.], ack 1341, win 58, options [nop,nop,TS val 1473426131 ecr 2239613925], length 0
15:42:35.102810 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [P.], seq 1341:2145, ack 1, win 53, options [nop,nop,TS val 2239613925 ecr 1473426057], length 804
15:42:35.102818 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [.], ack 2145, win 64, options [nop,nop,TS val 1473426131 ecr 2239613925], length 0
15:42:35.729846 IP6 fd39:30::250.5001 > fd39:30::2.42162: Flags [F.], seq 1811803733, ack 3749581328, win 56, options [nop,nop,TS val 1473426758 ecr 2239251660,nop,nop,sack 1 {1341:2145}], length 0
15:42:35.804023 IP6 fd39:30::2.42162 > fd39:30::250.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 2239614627 ecr 1473426758,nop,nop,sack 1 {0:1}], length 0
15:42:36.939584 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [F.], seq 2145, ack 1, win 53, options [nop,nop,TS val 2239615763 ecr 1473426131], length 0
15:42:36.939723 IP6 fd39:30::250.5001 > fd39:30::2.42414: Flags [F.], seq 1, ack 2146, win 64, options [nop,nop,TS val 1473427968 ecr 2239615763], length 0
15:42:37.014143 IP6 fd39:30::2.42414 > fd39:30::250.5001: Flags [.], ack 2, win 53, options [nop,nop,TS val 2239615837 ecr 1473427968], length 0
^C
12 packets captured
12 packets received by filter
0 packets dropped by kernel

=======================================================

Receiver mtu 1408, sender mtu 1420, successful transfer:

# tcpdump -i wg0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
15:43:23.935508 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [S], seq 1011924297, win 27200, options [mss 1360,sackOK,TS val 2239662759 ecr 0,nop,wscale 9], length 0
15:43:23.935541 IP6 fd39:30::250.5001 > fd39:30::2.42442: Flags [S.], seq 1735470303, ack 1011924298, win 26720, options [mss 1348,sackOK,TS val 1473474964 ecr 2239662759,nop,wscale 9], length 0
15:43:24.009867 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 2239662834 ecr 1473474964], length 0
15:43:24.010192 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [.], seq 1:1337, ack 1, win 54, options [nop,nop,TS val 2239662834 ecr 1473474964], length 1336
15:43:24.010203 IP6 fd39:30::250.5001 > fd39:30::2.42442: Flags [.], ack 1337, win 58, options [nop,nop,TS val 1473475039 ecr 2239662834], length 0
15:43:24.010206 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [P.], seq 1337:2145, ack 1, win 54, options [nop,nop,TS val 2239662834 ecr 1473474964], length 808
15:43:24.010213 IP6 fd39:30::250.5001 > fd39:30::2.42442: Flags [.], ack 2145, win 63, options [nop,nop,TS val 1473475039 ecr 2239662834], length 0
15:43:26.669491 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [F.], seq 2145, ack 1, win 54, options [nop,nop,TS val 2239665493 ecr 1473475039], length 0
15:43:26.669531 IP6 fd39:30::250.5001 > fd39:30::2.42442: Flags [F.], seq 1, ack 2146, win 63, options [nop,nop,TS val 1473477698 ecr 2239665493], length 0
15:43:26.744246 IP6 fd39:30::2.42442 > fd39:30::250.5001: Flags [.], ack 2, win 54, options [nop,nop,TS val 2239665568 ecr 1473477698], length 0
^C
10 packets captured
10 packets received by filter
0 packets dropped by kernel

=======================================================

Receiver mtu 1412, sender mtu 1420, locked-up transfer:

# tcpdump -i wg0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wg0, link-type RAW (Raw IP), capture size 262144 bytes
15:44:47.350436 IP6 fd39:30::2.42536 > fd39:30::250.5001: Flags [S], seq 1322713189, win 27200, options [mss 1360,sackOK,TS val 2239746176 ecr 0,nop,wscale 9], length 0
15:44:47.350510 IP6 fd39:30::250.5001 > fd39:30::2.42536: Flags [S.], seq 167242985, ack 1322713190, win 26800, options [mss 1352,sackOK,TS val 1473558379 ecr 2239746176,nop,wscale 9], length 0
15:44:47.424806 IP6 fd39:30::2.42536 > fd39:30::250.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 2239746250 ecr 1473558379], length 0
15:44:47.425210 IP6 fd39:30::2.42536 > fd39:30::250.5001: Flags [P.], seq 1341:2145, ack 1, win 54, options [nop,nop,TS val 2239746250 ecr 1473558379], length 804
15:44:47.425231 IP6 fd39:30::250.5001 > fd39:30::2.42536: Flags [.], ack 1, win 56, options [nop,nop,TS val 1473558454 ecr 2239746250,nop,nop,sack 1 {1341:2145}], length 0
15:44:51.199602 IP6 fd39:30::2.42536 > fd39:30::250.5001: Flags [F.], seq 2145, ack 1, win 54, options [nop,nop,TS val 2239750025 ecr 1473558454], length 0
15:44:51.199627 IP6 fd39:30::250.5001 > fd39:30::2.42536: Flags [.], ack 1, win 56, options [nop,nop,TS val 1473562228 ecr 2239746250,nop,nop,sack 1 {1341:2146}], length 0
^C
7 packets captured
7 packets received by filter
0 packets dropped by kernel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-03-16 10:53   ` Roman Mamedov
@ 2018-03-16 16:20     ` Roman Mamedov
  0 siblings, 0 replies; 15+ messages in thread
From: Roman Mamedov @ 2018-03-16 16:20 UTC (permalink / raw)
  To: wireguard

On Fri, 16 Mar 2018 15:53:43 +0500
Roman Mamedov <rm@romanrm.net> wrote:

> But guess what, turns out that didn't work either. Tried both OUTPUT and
> POSTROUTING chains on the "mangle" table, and set-mss all the way down to
> 1220, no matter what, the iperf3 output looked the same as before.

Actually the iptables bit is easy to explain. Even if initial MSS is forced
to a low value on the sender, it's get negotiated back up to the maximum value
according to MTU on the receiver (changed both IPs since then):

21:13:38.641531 IP6 fd39:30::f5a8:e923:f8cd:24b5.40052 > fd39:30::e84f:942d:7f93:ddc1.5001: Flags [S], seq 2397878391, win 27200, options [mss 1220,sackOK,TS val 566161815 ecr 0,nop,wscale 9], length 0
21:13:38.641574 IP6 fd39:30::e84f:942d:7f93:ddc1.5001 > fd39:30::f5a8:e923:f8cd:24b5.40052: Flags [S.], seq 1221117548, ack 2397878392, win 26800, options [mss 1352,sackOK,TS val 2726162536 ecr 566161815,nop,wscale 9], length 0
21:13:38.716047 IP6 fd39:30::f5a8:e923:f8cd:24b5.40052 > fd39:30::e84f:942d:7f93:ddc1.5001: Flags [.], ack 1, win 54, options [nop,nop,TS val 566161889 ecr 2726162536], length 0
21:13:38.716444 IP6 fd39:30::f5a8:e923:f8cd:24b5.40052 > fd39:30::e84f:942d:7f93:ddc1.5001: Flags [P.], seq 1341:1605, ack 1, win 54, options [nop,nop,TS val 566161889 ecr 2726162536], length 264
21:13:38.716458 IP6 fd39:30::e84f:942d:7f93:ddc1.5001 > fd39:30::f5a8:e923:f8cd:24b5.40052: Flags [.], ack 1, win 55, options [nop,nop,TS val 2726162611 ecr 566161889,nop,nop,sack 1 {1341:1605}], length 0

So the other side really needs to have a proper MTU set. And the highest working
wg0 MTU on PPPoE turned out to be 1408, not 1412 as I assumed. As for why 1412
also works but only if set on the sender side, I've no explanation for that yet.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-03-16  9:25 Mixed MTU hosts on a network Roman Mamedov
  2018-03-16  9:35 ` Matthias Ordner
  2018-03-16 10:01 ` Kalin KOZHUHAROV
@ 2018-03-26 19:12 ` Luis Ressel
  2018-04-14  1:38 ` Jason A. Donenfeld
  3 siblings, 0 replies; 15+ messages in thread
From: Luis Ressel @ 2018-03-26 19:12 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: wireguard

On Fri, 16 Mar 2018 14:25:47 +0500
Roman Mamedov <rm.wg@romanrm.net> wrote:

> What helps, is only reducing MTU of the entire wg0 interface to 1412.
> Then everything works fine. But it doesn't feel optimal to reduce MTU
> of the entire network just because of 1 or 2 hosts. I would rather
> use a couple of those mtu-override routes, if they worked.

Unfortunately, lowering the MTU of the whole tunnel interface is the
only reliable solution right now. Per-peer configurability of MTUs has
been on project TODO for a while, so there will be a better solution
some day. I even started to work on this a few months back, but got
sidetracked.

Cheers,
Luis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-03-16  9:25 Mixed MTU hosts on a network Roman Mamedov
                   ` (2 preceding siblings ...)
  2018-03-26 19:12 ` Luis Ressel
@ 2018-04-14  1:38 ` Jason A. Donenfeld
  2018-04-14  2:40   ` Jason A. Donenfeld
  3 siblings, 1 reply; 15+ messages in thread
From: Jason A. Donenfeld @ 2018-04-14  1:38 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Luis Ressel, WireGuard mailing list

Hi Roman,

I think that your idea of setting a route-based MTU _should_ work, and
it seems like a bug if it isn't working. There are two places in
WireGuard which directly touch the MTU:

1) When we split GSO superpackets up into normal sized packets. This
code is supposed to be aware of the per-route MTU you've set, so it
shouldn't be a problem. This is the call to skb_gso_segment in
device.c.

2) When we pad the packet payload. In this case, we pad it to the
nearest multiple of 16, but we don't let it exceed the device MTU.
This is skb_padding in send.c. This behavior seems like the bug in
your particular case, since what matters here is the route's MTU, not
the device MTU. For full 1412 size packets, the payload is presumably
being padded to 1424, since that's still less than the device MTU. In
order to test this theory, try setting your route MTU, as you've
described in your first email, to 1408 (which is a multiple of 16). If
this works, let me know, as it will be good motivation for fixing
skb_padding. If not, then it means there's a problem elsewhere to
investigate too.

I'm CC'ing Luis on this email, as he was working on the MTU code a while back.

Regards,
Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14  1:38 ` Jason A. Donenfeld
@ 2018-04-14  2:40   ` Jason A. Donenfeld
  2018-04-14 13:16     ` Jason A. Donenfeld
  0 siblings, 1 reply; 15+ messages in thread
From: Jason A. Donenfeld @ 2018-04-14  2:40 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Luis Ressel, WireGuard mailing list

On Sat, Apr 14, 2018 at 03:38:46AM +0200, Jason A. Donenfeld wrote:
> 2) When we pad the packet payload. In this case, we pad it to the
> nearest multiple of 16, but we don't let it exceed the device MTU.
> This is skb_padding in send.c. This behavior seems like the bug in
> your particular case, since what matters here is the route's MTU, not
> the device MTU. For full 1412 size packets, the payload is presumably
> being padded to 1424, since that's still less than the device MTU. In
> order to test this theory, try setting your route MTU, as you've
> described in your first email, to 1408 (which is a multiple of 16). If
> this works, let me know, as it will be good motivation for fixing
> skb_padding. If not, then it means there's a problem elsewhere to
> investigate too.
> 
> I'm CC'ing Luis on this email, as he was working on the MTU code a while back.

I'm still playing with this, but something like the following might fix
the issue, if you're interested in playing a bit.

=~=~=~=~=~=~=

diff --git a/src/device.c b/src/device.c
index 1614d61..3d18368 100644
--- a/src/device.c
+++ b/src/device.c
@@ -120,6 +120,7 @@ static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev)
 	struct sk_buff *next;
 	struct sk_buff_head packets;
 	sa_family_t family;
+	u32 mtu;
 	int ret;

 	if (unlikely(skb_examine_untrusted_ip_hdr(skb) != skb->protocol)) {
@@ -142,6 +143,8 @@ static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev)
 		goto err_peer;
 	}

+	mtu = dst_mtu(skb_dst(skb)) ?: skb->dev->mtu;
+
 	__skb_queue_head_init(&packets);
 	if (!skb_is_gso(skb))
 		skb->next = NULL;
@@ -168,6 +171,8 @@ static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev)
 		 */
 		skb_dst_drop(skb);

+		PACKET_CB(skb)->mtu = mtu;
+
 		__skb_queue_tail(&packets, skb);
 	} while ((skb = next) != NULL);

diff --git a/src/queueing.h b/src/queueing.h
index d5948f3..c507536 100644
--- a/src/queueing.h
+++ b/src/queueing.h
@@ -46,6 +46,7 @@ struct packet_cb {
 	u64 nonce;
 	struct noise_keypair *keypair;
 	atomic_t state;
+	u32 mtu;
 	u8 ds;
 };
 #define PACKET_PEER(skb) (((struct packet_cb *)skb->cb)->keypair->entry.peer)
diff --git a/src/send.c b/src/send.c
index dddcc0b..e3b1ffd 100644
--- a/src/send.c
+++ b/src/send.c
@@ -116,11 +116,11 @@ static inline unsigned int skb_padding(struct sk_buff *skb)
 	 * isn't strictly neccessary, but it's better to be cautious here, especially
 	 * if that code ever changes.
 	 */
-	unsigned int last_unit = skb->len % skb->dev->mtu;
+	unsigned int last_unit = skb->len % PACKET_CB(skb)->mtu;
 	unsigned int padded_size = (last_unit + MESSAGE_PADDING_MULTIPLE - 1) & ~(MESSAGE_PADDING_MULTIPLE - 1);

-	if (padded_size > skb->dev->mtu)
-		padded_size = skb->dev->mtu;
+	if (padded_size > PACKET_CB(skb)->mtu)
+		padded_size = PACKET_CB(skb)->mtu;
 	return padded_size - last_unit;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14  2:40   ` Jason A. Donenfeld
@ 2018-04-14 13:16     ` Jason A. Donenfeld
  2018-04-14 13:40       ` Roman Mamedov
  0 siblings, 1 reply; 15+ messages in thread
From: Jason A. Donenfeld @ 2018-04-14 13:16 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Luis Ressel, WireGuard mailing list

Hi Roman,

This commit should fix it. It now has a unit test too so that we don't
hit this issue again. Thanks for reporting it in such detail.

https://git.zx2c4.com/WireGuard/commit/?id=a88a067d5477f877003d3703bb3b95cb4e94bc46

Let me know if that fixes it on your end.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14 13:16     ` Jason A. Donenfeld
@ 2018-04-14 13:40       ` Roman Mamedov
  2018-04-14 14:15         ` Jason A. Donenfeld
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2018-04-14 13:40 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Luis Ressel, WireGuard mailing list, Roman Mamedov

On Sat, 14 Apr 2018 15:16:56 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> Hi Roman,
> 
> This commit should fix it. It now has a unit test too so that we don't
> hit this issue again. Thanks for reporting it in such detail.
> 
> https://git.zx2c4.com/WireGuard/commit/?id=a88a067d5477f877003d3703bb3b95cb4e94bc46
> 
> Let me know if that fixes it on your end.
> 
> Jason

Thanks! I didn't get a chance to test it yet.

Leaving route MTUs aside, did you look into why the interface MTU of 1412
behaves erratically (while by all calculations it should just fit into 1492
underlying PPPoE MTU), with only 1408 working reliably? Is it also because of
the padding?

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14 13:40       ` Roman Mamedov
@ 2018-04-14 14:15         ` Jason A. Donenfeld
  2018-04-14 14:38           ` Roman Mamedov
  0 siblings, 1 reply; 15+ messages in thread
From: Jason A. Donenfeld @ 2018-04-14 14:15 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Luis Ressel, WireGuard mailing list, Roman Mamedov

Hi Roman,

I answered this in my first email to you, which perhaps got lost in
the mix of emails, so I'll quote the relevant part:

> 2) When we pad the packet payload. In this case, we pad it to the
> nearest multiple of 16, but we don't let it exceed the device MTU.
> This is skb_padding in send.c. This behavior seems like the bug in
> your particular case, since what matters here is the route's MTU, not
> the device MTU. For full 1412 size packets, the payload is presumably
> being padded to 1424, since that's still less than the device MTU. In
> order to test this theory, try setting your route MTU, as you've
> described in your first email, to 1408 (which is a multiple of 16). If
> this works, let me know, as it will be good motivation for fixing
> skb_padding. If not, then it means there's a problem elsewhere to
> investigate too.

In short, because 1408 is a multiple of 16 so it didn't get rounded
up, whereas 1412 got rounded up to 1424.

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14 14:15         ` Jason A. Donenfeld
@ 2018-04-14 14:38           ` Roman Mamedov
  2018-04-14 14:45             ` Jason A. Donenfeld
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2018-04-14 14:38 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Luis Ressel, WireGuard mailing list, Roman Mamedov

On Sat, 14 Apr 2018 16:15:07 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> Hi Roman,
> 
> I answered this in my first email to you, which perhaps got lost in
> the mix of emails, so I'll quote the relevant part:
> 
> > 2) When we pad the packet payload. In this case, we pad it to the
> > nearest multiple of 16, but we don't let it exceed the device MTU.
> > This is skb_padding in send.c. This behavior seems like the bug in
> > your particular case, since what matters here is the route's MTU, not
> > the device MTU. For full 1412 size packets, the payload is presumably
> > being padded to 1424, since that's still less than the device MTU. In
> > order to test this theory, try setting your route MTU, as you've
> > described in your first email, to 1408 (which is a multiple of 16). If
> > this works, let me know, as it will be good motivation for fixing
> > skb_padding. If not, then it means there's a problem elsewhere to
> > investigate too.
> 
> In short, because 1408 is a multiple of 16 so it didn't get rounded
> up, whereas 1412 got rounded up to 1424.

I got that, but that still seemed to be talking about the problem with route
MTUs.

But what about if I don't touch any route MTUs at all, but set the WG device
MTU to 1412. In my further experiments that didn't work well either, causing
weird one-directional issues, and only 1408 worked.

So, is it possible to fix the padding so 1412 can be used as WG device MTU on
underlying MTU of 1492? Otherwise, shouldn't there be a warning somewhere in
the docs to not just choose the largest fitting MTU according to [1], but also
round down what you got, to a nearest multiple of 16.

[1] https://www.mail-archive.com/wireguard@lists.zx2c4.com/msg01856.html

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14 14:38           ` Roman Mamedov
@ 2018-04-14 14:45             ` Jason A. Donenfeld
  2018-04-14 15:20               ` Roman Mamedov
  0 siblings, 1 reply; 15+ messages in thread
From: Jason A. Donenfeld @ 2018-04-14 14:45 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Luis Ressel, WireGuard mailing list, Roman Mamedov

Hi Roman,

That's strange; I'm unable to reproduce what you've described:

[+] NS1: ip link set wg0 mtu 1412
[+] NS2: ip link set wg0 mtu 1412
[+] NS1: wg set wg0 peer QXloTaPOwUTzqFElVLSD0vBc4sxjyoKtPBSaTkZHokY=
endpoint 127.0.0.1:2
[+] NS2: wg set wg0 peer X0p7+UWc4wjaAmT73xAEuXLY80I6Gv8vTg6KwFHCPGs=
endpoint 127.0.0.1:1
[+] NS0: iptables -A INPUT -m length --length 1473 -j DROP
[+] NS2: ping -c 1 -W 1 -s 1384 192.168.241.1
PING 192.168.241.1 (192.168.241.1) 1384(1412) bytes of data.
1392 bytes from 192.168.241.1: icmp_seq=1 ttl=64 time=0.752 ms

--- 192.168.241.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.752/0.752/0.752/0.000 ms

In this case, WireGuard seems to be doing the right thing. Think you
could come up with some minimal test that exhibits the behavior you're
seeing?

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14 14:45             ` Jason A. Donenfeld
@ 2018-04-14 15:20               ` Roman Mamedov
  2018-04-14 23:08                 ` Jason A. Donenfeld
  0 siblings, 1 reply; 15+ messages in thread
From: Roman Mamedov @ 2018-04-14 15:20 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Luis Ressel, WireGuard mailing list, Roman Mamedov

On Sat, 14 Apr 2018 16:45:32 +0200
"Jason A. Donenfeld" <Jason@zx2c4.com> wrote:

> In this case, WireGuard seems to be doing the right thing. Think you
> could come up with some minimal test that exhibits the behavior you're
> seeing?

I now remember in more detail what was the problem. It was not with MTU 1412
on both sides, it was during trying to mix WG MTU 1412 on the PPPoE-connected
machine, with WG MTU 1420 on the other side (which uses full 1500 underlying
MTU).

Here I posted about it with some tcpdumps included:
https://lists.zx2c4.com/pipermail/wireguard/2018-March/002537.html

With 1420 on the "full MTU" side, the "PPPoE" side had to set 1408 WG MTU for
things to work properly, not 1412 as would theoretically fit into its PPPoE.

I'll post an update if I come up with a short and simple reproducer sequence.

Setting 1412 on both sides seems to work fine from more testing just now.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Mixed MTU hosts on a network
  2018-04-14 15:20               ` Roman Mamedov
@ 2018-04-14 23:08                 ` Jason A. Donenfeld
  0 siblings, 0 replies; 15+ messages in thread
From: Jason A. Donenfeld @ 2018-04-14 23:08 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Luis Ressel, WireGuard mailing list, Roman Mamedov

Hey Roman,

I've just tried a few ways of replicating your setup, and I can't seem
to reproduce the bug, either with the new code or old. The results you
mention are surprising too, since WireGuard or not, TCP is supposed to
negotiate the lowest common MSS. I wonder if some strange iptables
rules are getting in the way and confusing things?

Jason

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-04-14 22:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-16  9:25 Mixed MTU hosts on a network Roman Mamedov
2018-03-16  9:35 ` Matthias Ordner
2018-03-16 10:53   ` Roman Mamedov
2018-03-16 16:20     ` Roman Mamedov
2018-03-16 10:01 ` Kalin KOZHUHAROV
2018-03-26 19:12 ` Luis Ressel
2018-04-14  1:38 ` Jason A. Donenfeld
2018-04-14  2:40   ` Jason A. Donenfeld
2018-04-14 13:16     ` Jason A. Donenfeld
2018-04-14 13:40       ` Roman Mamedov
2018-04-14 14:15         ` Jason A. Donenfeld
2018-04-14 14:38           ` Roman Mamedov
2018-04-14 14:45             ` Jason A. Donenfeld
2018-04-14 15:20               ` Roman Mamedov
2018-04-14 23:08                 ` Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).