All of lore.kernel.org
 help / color / mirror / Atom feed
* TCP performance problems - GSO/TSO, MSS, 8139cp related
@ 2016-11-11 21:05 Russell King - ARM Linux
  2016-11-11 21:23   ` [Qemu-devel] " David Woodhouse
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King - ARM Linux @ 2016-11-11 21:05 UTC (permalink / raw)
  To: netdev, dwmw2

Hi,

I seem to have found a severe performance issue somewhere in the
networking code.

This involves ZenIV.linux.org.uk, which is a qemu-kvm guest instance
on ZenV, which is configured to use macvtap for ZenIV to gain its
network access, with ZenIV using the 8139cp driver.

My initial testing was from my laptop (running 4.5.7), through a
router box (also running 4.5.7) and out my FTTC link, across the
Internet to ZenV (4.4.8-300.fc23.x86_64) and then onto the ZenIV
(also 4.4.8-300.fc23.x86_64) guest.  Thinking that it may be an
issue with my crappy FTTC, I switched the routing at my end over
the ADSL line, which showed the same issues.

Eventually, what fixed it was disabling both TSO and GSO in the
ZenIV guest.

Now, both my FTTC and ADSL links have a reduced MTU, and I'm having
to use TCPMSS on the router box to clamp the MSS - which gets
clamped to 1452, 8 bytes lower than the usual 1460 for standard
ethernet.

With TSO on, I see the guest sending TCP packets with a 2880 byte
payload:

17:36:07.006009 IP (tos 0x0, ttl 52, id 17517, offset 0, flags [DF], proto TCP (6), length 60)
    84.xx.xxx.196.60846 > 195.92.253.2.http: Flags [S], cksum 0x2c25 (correct), seq 356291023, win 29200, options [mss 1452,sackOK,TS val 1372902818 ecr 0,nop,wscale 7], length 0
17:36:07.006122 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    195.92.253.2.http > 84.xx.xxx.196.60846: Flags [S.], cksum 0xed7f (incorrect -> 0x674a), seq 2784716623, ack 356291024, win 28960, options [mss 1460,sackOK,TS val 3358126141 ecr 1372902818,nop,wscale 7], length 0
17:36:07.035531 IP (tos 0x0, ttl 52, id 17518, offset 0, flags [DF], proto TCP (6), length 52)
    84.xx.xxx.196.60846 > 195.92.253.2.http: Flags [.], cksum 0x0634 (correct), ack 1, win 229, options [nop,nop,TS val 1372902848 ecr 3358126141], length 0
17:36:07.038233 IP (tos 0x0, ttl 52, id 17519, offset 0, flags [DF], proto TCP (6), length 205)
    84.xx.xxx.196.60846 > 195.92.253.2.http: Flags [P.], cksum 0x3a1e (correct), seq 1:154, ack 1, win 229, options [nop,nop,TS val 1372902848 ecr 3358126141], length 153: HTTP, length: 153
17:36:07.038356 IP (tos 0x0, ttl 64, id 38669, offset 0, flags [DF], proto TCP (6), length 52)
    195.92.253.2.http > 84.xx.xxx.196.60846: Flags [.], cksum 0xed77 (incorrect -> 0x0575), ack 154, win 235, options [nop,nop,TS val 3358126173 ecr 1372902848], length 0
17:36:07.039255 IP (tos 0x0, ttl 64, id 38670, offset 0, flags [DF], proto TCP (6), length 2932)
    195.92.253.2.http > 84.xx.xxx.196.60846: Flags [.], seq 1:2881, ack 154, win 235, options [nop,nop,TS val 3358126174 ecr 1372902848], length 2880: HTTP, length: 2880
17:36:07.039442 IP (tos 0x0, ttl 64, id 38672, offset 0, flags [DF], proto TCP (6), length 2932)
    195.92.253.2.http > 84.xx.xxx.196.60846: Flags [.], seq 2881:5761, ack 154, win 235, options [nop,nop,TS val 3358126174 ecr 1372902848], length 2880: HTTP
17:36:07.039579 IP (tos 0x0, ttl 64, id 38674, offset 0, flags [DF], proto TCP (6), length 2932)
    195.92.253.2.http > 84.xx.xxx.196.60846: Flags [.], seq 5761:8641, ack 154, win 235, options [nop,nop,TS val 3358126174 ecr 1372902848], length 2880: HTTP
...etc...

On the macvtap side, however, which is post-segmentation by the
virtualised 8139cp hardware (this taken at a later time):

18:59:38.782818 IP (tos 0x0, ttl 52, id 35619, offset 0, flags [DF], proto TCP (6), length 60)
    84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [S], cksum 0x88db (correct), seq 158975430, win 29200, options [mss 1452,sackOK,TS val 1377914597 ecr 0,nop,wscale 7], length 0
18:59:38.783270 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [S.], cksum 0x575d (correct), seq 4091022471, ack 158975431, win 28960, options [mss 1460,sackOK,TS val 3363137919 ecr 1377914597,nop,wscale 7], length 0
18:59:38.812089 IP (tos 0x0, ttl 52, id 35620, offset 0, flags [DF], proto TCP (6), length 52)
    84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [.], cksum 0xf646 (correct), ack 1, win 229, options [nop,nop,TS val 1377914627 ecr 3363137919], length 0
18:59:38.814623 IP (tos 0x0, ttl 52, id 35621, offset 0, flags [DF], proto TCP (6), length 205)
    84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [P.], cksum 0x2a31 (correct), seq 1:154, ack 1, win 229, options [nop,nop,TS val 1377914627 ecr 3363137919], length 153: HTTP, length: 153
18:59:38.815025 IP (tos 0x0, ttl 64, id 25878, offset 0, flags [DF], proto TCP (6), length 52)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], cksum 0xf588 (correct), ack 154, win 235, options [nop,nop,TS val 3363137950 ecr 1377914627], length 0
18:59:38.816371 IP (tos 0x0, ttl 64, id 25879, offset 0, flags [DF], proto TCP (6), length 1500)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1:1449, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP, length: 1448
18:59:38.816393 IP (tos 0x0, ttl 64, id 25880, offset 0, flags [DF], proto TCP (6), length 1484)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1449:2881, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1432: HTTP
18:59:38.816471 IP (tos 0x0, ttl 64, id 25881, offset 0, flags [DF], proto TCP (6), length 1500)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 2881:4329, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP
18:59:38.816501 IP (tos 0x0, ttl 64, id 25882, offset 0, flags [DF], proto TCP (6), length 1484)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 4329:5761, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1432: HTTP
18:59:38.816660 IP (tos 0x0, ttl 64, id 25883, offset 0, flags [DF], proto TCP (6), length 1500)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 5761:7209, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP

Now, every packet which has 1448 bytes of payload is 1514 bytes in length,
which gets dropped on its way to me at the ISP end of the link, because
the PPPoE link seems unable to handle this sized packet (annoyingly.)

The result is that the oversized "200 OK" packet gets lost and has to be
re-transmitted - here it is on the guest side:

17:36:07.176351 IP (tos 0x0, ttl 64, id 38681, offset 0, flags [DF], proto TCP (6), length 1492)
    195.92.253.2.http > 84.xx.xxx.196.60846: Flags [.], seq 1:1441, ack 154, win 235, options [nop,nop,TS val 3358126311 ecr 1372902989], length 1440: HTTP, length: 1440

notice that it is 1440 bytes in size now... and of course it comes
through on the macvtap side correctly:

18:59:38.950513 IP (tos 0x0, ttl 64, id 25890, offset 0, flags [DF], proto TCP (6), length 1492)
    195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1:1441, ack 154, win 235, options [nop,nop,TS val 3363138086 ecr 1377914764], length 1440: HTTP, length: 1440

This kind of thing goes on throughout the transfer - whenever the guest
sends a GSO/TSO packet, it is incorrectly segmented, resulting in the
over-sized segments being dropped, and causing lots of retransmissions.

The result is that with TSO/GSO on, I get around 70-80KB/s, but with
TSO/GSO off, I get 723KB/s - around a factor of 10 faster.

Doing some local testing between the 4.5.7 laptop and a Marvell board
running 4.9-rc, and using TCPMSS to clamp the MSS To 1452 between these
(on both the SYN and SYNACK packets) shows that the laptop's E1000e
driver and the 4.5.7 net stack correctly segment - I end up with TCP
packets with 1440 byte payloads being spat out of the E1000e NIC.

So, my guess is there's something wrong with either 8139cp (and dwmw2's
commit says to scream at him if it breaks!) or something wrong in the
qemu 8139cp hardware emulation.

I've suggested to bryce (who setup the VM and knows it better than I)
to try switching ZenIV to E1000e to see whether that makes any
difference - that would point towards either the 8139cp driver or the
qemu 8139 hardware emulation being broken, rather than something in
the network stack.

However, it may be worth someone testing TSO/GSO with real 8139cp
hardware - the MSS can be clamped with:

# iptables -t mangle -I INPUT -p tcp --tcp-flags SYN,RST SYN \
	-j TCPMSS --set-mss 1452
# iptables -t mangle -I OUTPUT -p tcp --tcp-flags SYN,RST SYN \
	-j TCPMSS --set-mss 1452

and testing with something like wget/iperf.  You'll need to ensure
that GRO is disabled on the box receiving the TCP packets from the
8139cp machine to see the raw packets in tcpdump, otherwise you'll
get much larger packets reassembled by the GRO code.  You should
see the TCP packets with a data size of 1440 bytes, not alternating
between 1448 and 1432 bytes.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP performance problems - GSO/TSO, MSS, 8139cp related
  2016-11-11 21:05 TCP performance problems - GSO/TSO, MSS, 8139cp related Russell King - ARM Linux
@ 2016-11-11 21:23   ` David Woodhouse
  0 siblings, 0 replies; 12+ messages in thread
From: David Woodhouse @ 2016-11-11 21:23 UTC (permalink / raw)
  To: Russell King - ARM Linux, netdev; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On Fri, 2016-11-11 at 21:05 +0000, Russell King - ARM Linux wrote:
> 
> 18:59:38.782818 IP (tos 0x0, ttl 52, id 35619, offset 0, flags [DF], proto TCP (6), length 60)
>     84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [S], cksum 0x88db (correct), seq 158975430, win 29200, options [mss 1452,sackOK,TS val 1377914597 ecr 0,nop,wscale 7], length 0

... (MSS 1452)

> 18:59:38.816371 IP (tos 0x0, ttl 64, id 25879, offset 0, flags [DF], proto TCP (6), length 1500)
>     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1:1449, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP, length: 1448
> 18:59:38.816393 IP (tos 0x0, ttl 64, id 25880, offset 0, flags [DF], proto TCP (6), length 1484)
>     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1449:2881, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1432: HTTP

Can you instrument cp_start_xmit() in 8139cp.c and get it to print the
value of 'mss' when this happens?

All we do is take that value from skb_shinfo(skb)->gso_size, shift it a
bit, and shove it in the descriptor ring. There's not much scope for a
driver-specific bug.

It's also *fairly* unlikely that the kernel in the guest has developed
a bug and isn't setting gso_size sanely. I'm more inclined to suspect
that qemu isn't properly emulating those bits. But at first glance at
the code, it looks like *that's* been there for the last decade too...

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
@ 2016-11-11 21:23   ` David Woodhouse
  0 siblings, 0 replies; 12+ messages in thread
From: David Woodhouse @ 2016-11-11 21:23 UTC (permalink / raw)
  To: Russell King - ARM Linux, netdev; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 1494 bytes --]

On Fri, 2016-11-11 at 21:05 +0000, Russell King - ARM Linux wrote:
> 
> 18:59:38.782818 IP (tos 0x0, ttl 52, id 35619, offset 0, flags [DF], proto TCP (6), length 60)
>     84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [S], cksum 0x88db (correct), seq 158975430, win 29200, options [mss 1452,sackOK,TS val 1377914597 ecr 0,nop,wscale 7], length 0

... (MSS 1452)

> 18:59:38.816371 IP (tos 0x0, ttl 64, id 25879, offset 0, flags [DF], proto TCP (6), length 1500)
>     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1:1449, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP, length: 1448
> 18:59:38.816393 IP (tos 0x0, ttl 64, id 25880, offset 0, flags [DF], proto TCP (6), length 1484)
>     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1449:2881, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1432: HTTP

Can you instrument cp_start_xmit() in 8139cp.c and get it to print the
value of 'mss' when this happens?

All we do is take that value from skb_shinfo(skb)->gso_size, shift it a
bit, and shove it in the descriptor ring. There's not much scope for a
driver-specific bug.

It's also *fairly* unlikely that the kernel in the guest has developed
a bug and isn't setting gso_size sanely. I'm more inclined to suspect
that qemu isn't properly emulating those bits. But at first glance at
the code, it looks like *that's* been there for the last decade too...

-- 
dwmw2

[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5760 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP performance problems - GSO/TSO, MSS, 8139cp related
  2016-11-11 21:23   ` [Qemu-devel] " David Woodhouse
@ 2016-11-11 22:33     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2016-11-11 22:33 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, qemu-devel

On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> On Fri, 2016-11-11 at 21:05 +0000, Russell King - ARM Linux wrote:
> > 
> > 18:59:38.782818 IP (tos 0x0, ttl 52, id 35619, offset 0, flags [DF], proto TCP (6), length 60)
> >     84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [S], cksum 0x88db (correct), seq 158975430, win 29200, options [mss 1452,sackOK,TS val 1377914597 ecr 0,nop,wscale 7], length 0
> 
> ... (MSS 1452)
> 
> > 18:59:38.816371 IP (tos 0x0, ttl 64, id 25879, offset 0, flags [DF], proto TCP (6), length 1500)
> >     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1:1449, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP, length: 1448
> > 18:59:38.816393 IP (tos 0x0, ttl 64, id 25880, offset 0, flags [DF], proto TCP (6), length 1484)
> >     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1449:2881, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1432: HTTP
> 
> Can you instrument cp_start_xmit() in 8139cp.c and get it to print the
> value of 'mss' when this happens?

Well, I'm not going to fiddle in such a way with a public box... that
would be utter madness.  I'll fiddle with mvneta locally on 4.9-rc
instead - and yes, I know that's not the F23 4.4 kernel, so doesn't
really tell us very much.

I _could_ ask bryce to setup another VM on ZenV for me to play with,
but we'll have to wait for bryce to be around for that... I don't
want to break zenv or zeniv. :)

> All we do is take that value from skb_shinfo(skb)->gso_size, shift it a
> bit, and shove it in the descriptor ring. There's not much scope for a
> driver-specific bug.

Unless there's a different interpretation of what the MSS field in the
driver means...

Looking at mvneta, which works correctly,

- On mvneta (192.168.1.59):

21:39:38.535549 IP (tos 0x0, ttl 64, id 27668, offset 0, flags [DF], proto TCP (6), length 7252)
    192.168.1.59.55170 > 192.168.1.18.5001: Flags [.], seq 25:7225, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 7200

- On laptop (192.168.1.18):

21:39:38.537442 IP (tos 0x0, ttl 64, id 27668, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 25:1465, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537453 IP (tos 0x0, ttl 64, id 27669, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 1465:2905, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537461 IP (tos 0x0, ttl 64, id 27670, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 2905:4345, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537464 IP (tos 0x0, ttl 64, id 9968, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.18.commplex-link > 192.168.1.59.55170: Flags [.], cksum 0x83c4 (incorrect -> 0xa338), ack 1465, win 249, options [nop,nop,TS val 1387514368 ecr 62231754], length 0
21:39:38.537465 IP (tos 0x0, ttl 64, id 27671, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 4345:5785, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537469 IP (tos 0x0, ttl 64, id 27672, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 5785:7225, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440

which is all correct.  Now, these packets have a larger TCP header
due to the options:

        0x0000:  0022 6815 37dd 0050 4321 0201 0800 4500  ."h.7..PC!....E.
                 ^mac                               ^iphdr
        0x0010:  05d4 6c14 4000 4006 4572 c0a8 013b c0a8  ..l.@.@.Er...;..
        0x0020:  0112 d782 1389 4cb4 f8f4 7454 ef10 8010  ......L...tT....
                      ^tcphdr
        0x0030:  00e5 2a80 0000 0101 080a 03b5 94ca 52b3  ..*...........R.
                                ^tcpopts
        0x0040:  c9ff 0000 0000 0000 0001 0000 1389 0000  ................
                      ^start of data
        0x0050:  0000 0000 0000 ffff fc18 3435 3637 3839  ..........456789
        0x0060:  3031 3233 3435 3637 3839 3031 3233 3435  0123456789012345

So the data starts at 66 (0x42) into this packet, followed by 1440 bytes
of data.  Looking at drivers/net/ethernet/marvell/mvneta.c, the only
way this can happen is if skb_shinfo(skb)->gso_size is 1440.  I'll
instrument mvneta to dump this value...

While waiting for the kernel to build, I've been reading the TCP code,
and found this:

/* Compute the current effective MSS, taking SACKs and IP options,
 * and even PMTU discovery events into account.
 */
unsigned int tcp_current_mss(struct sock *sk)
...
        /* The mss_cache is sized based on tp->tcp_header_len, which assumes
         * some common options. If this is an odd packet (because we have SACK
         * blocks etc) then our calculated header_len will be different, and
         * we have to adjust mss_now correspondingly */

mss_now is what becomes gso_size, which means that gso_size will be
adjusted for the TCP options - which makes sense.  So, because there
are 12 bytes of options in the above hex packet dump, negotiated MSS
- 12 gives the data payload size of 1440, and so gso_size will be
1440.

And now, going back to that kernel that finished compiling...

[   53.468319] skb len=7266 hdr len=66 gso_size=1440
...
[   53.728752] skb len=64866 hdr len=66 gso_size=1440

so my guesses were right, at least for 4.9-rc4.  Whether that holds
for the fedora f23 4.4 kernel is another matter.  For the record,
removing the TCPMSS clamp gives the expected result:

[  231.244018] skb len=7306 hdr len=66 gso_size=1448

So, gso_size is the size of the TCP data after the TCP header plus
TCP options.

The other thing to notice is that the SKB length minus header
length is divisible by the gso size for these full-sized packets.
There is no "one packet larger next packet smaller" here.

> It's also *fairly* unlikely that the kernel in the guest has developed
> a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> that qemu isn't properly emulating those bits. But at first glance at
> the code, it looks like *that's* been there for the last decade too...

Whether or not it's been there for a decade is kind of irrelevant -
bugs can be around for a decade and remain undiscovered.

Looking at the 8139C information, it says:

"The new buffer management algorithm provides capabilities of Microsoft
Large-Send offload" and as yet I haven't found anything that describes
what this is or how it works.  How certain are we that the LSO MSS
value is the same as our gso_size, iow the size of the data after
the TCP header and options?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
@ 2016-11-11 22:33     ` Russell King - ARM Linux
  0 siblings, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2016-11-11 22:33 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, qemu-devel

On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> On Fri, 2016-11-11 at 21:05 +0000, Russell King - ARM Linux wrote:
> > 
> > 18:59:38.782818 IP (tos 0x0, ttl 52, id 35619, offset 0, flags [DF], proto TCP (6), length 60)
> >     84.xx.xxx.196.61236 > 195.92.253.2.http: Flags [S], cksum 0x88db (correct), seq 158975430, win 29200, options [mss 1452,sackOK,TS val 1377914597 ecr 0,nop,wscale 7], length 0
> 
> ... (MSS 1452)
> 
> > 18:59:38.816371 IP (tos 0x0, ttl 64, id 25879, offset 0, flags [DF], proto TCP (6), length 1500)
> >     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1:1449, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1448: HTTP, length: 1448
> > 18:59:38.816393 IP (tos 0x0, ttl 64, id 25880, offset 0, flags [DF], proto TCP (6), length 1484)
> >     195.92.253.2.http > 84.xx.xxx.196.61236: Flags [.], seq 1449:2881, ack 154, win 235, options [nop,nop,TS val 3363137952 ecr 1377914627], length 1432: HTTP
> 
> Can you instrument cp_start_xmit() in 8139cp.c and get it to print the
> value of 'mss' when this happens?

Well, I'm not going to fiddle in such a way with a public box... that
would be utter madness.  I'll fiddle with mvneta locally on 4.9-rc
instead - and yes, I know that's not the F23 4.4 kernel, so doesn't
really tell us very much.

I _could_ ask bryce to setup another VM on ZenV for me to play with,
but we'll have to wait for bryce to be around for that... I don't
want to break zenv or zeniv. :)

> All we do is take that value from skb_shinfo(skb)->gso_size, shift it a
> bit, and shove it in the descriptor ring. There's not much scope for a
> driver-specific bug.

Unless there's a different interpretation of what the MSS field in the
driver means...

Looking at mvneta, which works correctly,

- On mvneta (192.168.1.59):

21:39:38.535549 IP (tos 0x0, ttl 64, id 27668, offset 0, flags [DF], proto TCP (6), length 7252)
    192.168.1.59.55170 > 192.168.1.18.5001: Flags [.], seq 25:7225, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 7200

- On laptop (192.168.1.18):

21:39:38.537442 IP (tos 0x0, ttl 64, id 27668, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 25:1465, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537453 IP (tos 0x0, ttl 64, id 27669, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 1465:2905, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537461 IP (tos 0x0, ttl 64, id 27670, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 2905:4345, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537464 IP (tos 0x0, ttl 64, id 9968, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.1.18.commplex-link > 192.168.1.59.55170: Flags [.], cksum 0x83c4 (incorrect -> 0xa338), ack 1465, win 249, options [nop,nop,TS val 1387514368 ecr 62231754], length 0
21:39:38.537465 IP (tos 0x0, ttl 64, id 27671, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 4345:5785, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440
21:39:38.537469 IP (tos 0x0, ttl 64, id 27672, offset 0, flags [DF], proto TCP (6), length 1492)
    192.168.1.59.55170 > 192.168.1.18.commplex-link: Flags [.], seq 5785:7225, ack 1, win 229, options [nop,nop,TS val 62231754 ecr 1387514367], length 1440

which is all correct.  Now, these packets have a larger TCP header
due to the options:

        0x0000:  0022 6815 37dd 0050 4321 0201 0800 4500  ."h.7..PC!....E.
                 ^mac                               ^iphdr
        0x0010:  05d4 6c14 4000 4006 4572 c0a8 013b c0a8  ..l.@.@.Er...;..
        0x0020:  0112 d782 1389 4cb4 f8f4 7454 ef10 8010  ......L...tT....
                      ^tcphdr
        0x0030:  00e5 2a80 0000 0101 080a 03b5 94ca 52b3  ..*...........R.
                                ^tcpopts
        0x0040:  c9ff 0000 0000 0000 0001 0000 1389 0000  ................
                      ^start of data
        0x0050:  0000 0000 0000 ffff fc18 3435 3637 3839  ..........456789
        0x0060:  3031 3233 3435 3637 3839 3031 3233 3435  0123456789012345

So the data starts at 66 (0x42) into this packet, followed by 1440 bytes
of data.  Looking at drivers/net/ethernet/marvell/mvneta.c, the only
way this can happen is if skb_shinfo(skb)->gso_size is 1440.  I'll
instrument mvneta to dump this value...

While waiting for the kernel to build, I've been reading the TCP code,
and found this:

/* Compute the current effective MSS, taking SACKs and IP options,
 * and even PMTU discovery events into account.
 */
unsigned int tcp_current_mss(struct sock *sk)
...
        /* The mss_cache is sized based on tp->tcp_header_len, which assumes
         * some common options. If this is an odd packet (because we have SACK
         * blocks etc) then our calculated header_len will be different, and
         * we have to adjust mss_now correspondingly */

mss_now is what becomes gso_size, which means that gso_size will be
adjusted for the TCP options - which makes sense.  So, because there
are 12 bytes of options in the above hex packet dump, negotiated MSS
- 12 gives the data payload size of 1440, and so gso_size will be
1440.

And now, going back to that kernel that finished compiling...

[   53.468319] skb len=7266 hdr len=66 gso_size=1440
...
[   53.728752] skb len=64866 hdr len=66 gso_size=1440

so my guesses were right, at least for 4.9-rc4.  Whether that holds
for the fedora f23 4.4 kernel is another matter.  For the record,
removing the TCPMSS clamp gives the expected result:

[  231.244018] skb len=7306 hdr len=66 gso_size=1448

So, gso_size is the size of the TCP data after the TCP header plus
TCP options.

The other thing to notice is that the SKB length minus header
length is divisible by the gso size for these full-sized packets.
There is no "one packet larger next packet smaller" here.

> It's also *fairly* unlikely that the kernel in the guest has developed
> a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> that qemu isn't properly emulating those bits. But at first glance at
> the code, it looks like *that's* been there for the last decade too...

Whether or not it's been there for a decade is kind of irrelevant -
bugs can be around for a decade and remain undiscovered.

Looking at the 8139C information, it says:

"The new buffer management algorithm provides capabilities of Microsoft
Large-Send offload" and as yet I haven't found anything that describes
what this is or how it works.  How certain are we that the LSO MSS
value is the same as our gso_size, iow the size of the data after
the TCP header and options?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP performance problems - GSO/TSO, MSS, 8139cp related
  2016-11-11 21:23   ` [Qemu-devel] " David Woodhouse
@ 2016-11-11 22:44     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2016-11-11 22:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, qemu-devel

On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> It's also *fairly* unlikely that the kernel in the guest has developed
> a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> that qemu isn't properly emulating those bits. But at first glance at
> the code, it looks like *that's* been there for the last decade too...

I take issue with that, having looked at the qemu rtl8139 code:

                if ((txdw0 & CP_TX_LGSEN) && ip_protocol == IP_PROTO_TCP)
                {
                    int large_send_mss = (txdw0 >> 16) & CP_TC_LGSEN_MSS_MASK;

                    DPRINTF("+++ C+ mode offloaded task TSO MTU=%d IP data %d "
                        "frame data %d specified MSS=%d\n", ETH_MTU,
                        ip_data_len, saved_size - ETH_HLEN, large_send_mss);

That's the only reference to "large_send_mss" there, other than that,
the MSS value that gets stuck into the field by 8139cp.c is completely
unused.  Instead, qemu does this:

                eth_payload_data = saved_buffer + ETH_HLEN;
                eth_payload_len  = saved_size   - ETH_HLEN;

                ip = (ip_header*)eth_payload_data;

                    hlen = IP_HEADER_LENGTH(ip);
                    ip_data_len = be16_to_cpu(ip->ip_len) - hlen;

                    tcp_header *p_tcp_hdr = (tcp_header*)(eth_payload_data + hlen);
                    int tcp_hlen = TCP_HEADER_DATA_OFFSET(p_tcp_hdr);

                    /* ETH_MTU = ip header len + tcp header len + payload */
                    int tcp_data_len = ip_data_len - tcp_hlen;
                    int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;

                    for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
                    {

It uses a fixed value of ETH_MTU to calculate the size of the TCP
data chunks, and this is not surprisingly the well known:

#define ETH_MTU     1500

Qemu seems to be buggy - it ignores the MSS value, and always tries to
send 1500 byte frames.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
@ 2016-11-11 22:44     ` Russell King - ARM Linux
  0 siblings, 0 replies; 12+ messages in thread
From: Russell King - ARM Linux @ 2016-11-11 22:44 UTC (permalink / raw)
  To: David Woodhouse; +Cc: netdev, qemu-devel

On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> It's also *fairly* unlikely that the kernel in the guest has developed
> a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> that qemu isn't properly emulating those bits. But at first glance at
> the code, it looks like *that's* been there for the last decade too...

I take issue with that, having looked at the qemu rtl8139 code:

                if ((txdw0 & CP_TX_LGSEN) && ip_protocol == IP_PROTO_TCP)
                {
                    int large_send_mss = (txdw0 >> 16) & CP_TC_LGSEN_MSS_MASK;

                    DPRINTF("+++ C+ mode offloaded task TSO MTU=%d IP data %d "
                        "frame data %d specified MSS=%d\n", ETH_MTU,
                        ip_data_len, saved_size - ETH_HLEN, large_send_mss);

That's the only reference to "large_send_mss" there, other than that,
the MSS value that gets stuck into the field by 8139cp.c is completely
unused.  Instead, qemu does this:

                eth_payload_data = saved_buffer + ETH_HLEN;
                eth_payload_len  = saved_size   - ETH_HLEN;

                ip = (ip_header*)eth_payload_data;

                    hlen = IP_HEADER_LENGTH(ip);
                    ip_data_len = be16_to_cpu(ip->ip_len) - hlen;

                    tcp_header *p_tcp_hdr = (tcp_header*)(eth_payload_data + hlen);
                    int tcp_hlen = TCP_HEADER_DATA_OFFSET(p_tcp_hdr);

                    /* ETH_MTU = ip header len + tcp header len + payload */
                    int tcp_data_len = ip_data_len - tcp_hlen;
                    int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;

                    for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
                    {

It uses a fixed value of ETH_MTU to calculate the size of the TCP
data chunks, and this is not surprisingly the well known:

#define ETH_MTU     1500

Qemu seems to be buggy - it ignores the MSS value, and always tries to
send 1500 byte frames.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: TCP performance problems - GSO/TSO, MSS, 8139cp related
  2016-11-11 22:33     ` [Qemu-devel] " Russell King - ARM Linux
@ 2016-11-12  2:52       ` David Miller
  -1 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2016-11-12  2:52 UTC (permalink / raw)
  To: linux; +Cc: dwmw2, netdev, qemu-devel

From: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Fri, 11 Nov 2016 22:33:08 +0000

> "The new buffer management algorithm provides capabilities of Microsoft
> Large-Send offload" and as yet I haven't found anything that describes
> what this is or how it works.

For once I will give Microsoft a big shout out here.

This, and everything a Microsoft networking driver interfaces to, is
_very_ much documented in extreme detail in the Microsoft NDIS
(Network Driver Interface Specification).

Microsoft's networking driver interfaces and expectations are
documented 1,000 times better than that of Linux.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
@ 2016-11-12  2:52       ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2016-11-12  2:52 UTC (permalink / raw)
  To: linux; +Cc: dwmw2, netdev, qemu-devel

From: Russell King - ARM Linux <linux@armlinux.org.uk>
Date: Fri, 11 Nov 2016 22:33:08 +0000

> "The new buffer management algorithm provides capabilities of Microsoft
> Large-Send offload" and as yet I haven't found anything that describes
> what this is or how it works.

For once I will give Microsoft a big shout out here.

This, and everything a Microsoft networking driver interfaces to, is
_very_ much documented in extreme detail in the Microsoft NDIS
(Network Driver Interface Specification).

Microsoft's networking driver interfaces and expectations are
documented 1,000 times better than that of Linux.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
  2016-11-11 22:44     ` [Qemu-devel] " Russell King - ARM Linux
@ 2016-11-14  9:29       ` Dr. David Alan Gilbert
  -1 siblings, 0 replies; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2016-11-14  9:29 UTC (permalink / raw)
  To: Russell King - ARM Linux, jasowang, vyasevic, stefanha
  Cc: David Woodhouse, netdev, qemu-devel

* Russell King - ARM Linux (linux@armlinux.org.uk) wrote:
> On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> > It's also *fairly* unlikely that the kernel in the guest has developed
> > a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> > that qemu isn't properly emulating those bits. But at first glance at
> > the code, it looks like *that's* been there for the last decade too...
> 
> I take issue with that, having looked at the qemu rtl8139 code:
> 
>                 if ((txdw0 & CP_TX_LGSEN) && ip_protocol == IP_PROTO_TCP)
>                 {
>                     int large_send_mss = (txdw0 >> 16) & CP_TC_LGSEN_MSS_MASK;
> 
>                     DPRINTF("+++ C+ mode offloaded task TSO MTU=%d IP data %d "
>                         "frame data %d specified MSS=%d\n", ETH_MTU,
>                         ip_data_len, saved_size - ETH_HLEN, large_send_mss);
> 
> That's the only reference to "large_send_mss" there, other than that,
> the MSS value that gets stuck into the field by 8139cp.c is completely
> unused.  Instead, qemu does this:
> 
>                 eth_payload_data = saved_buffer + ETH_HLEN;
>                 eth_payload_len  = saved_size   - ETH_HLEN;
> 
>                 ip = (ip_header*)eth_payload_data;
> 
>                     hlen = IP_HEADER_LENGTH(ip);
>                     ip_data_len = be16_to_cpu(ip->ip_len) - hlen;
> 
>                     tcp_header *p_tcp_hdr = (tcp_header*)(eth_payload_data + hlen);
>                     int tcp_hlen = TCP_HEADER_DATA_OFFSET(p_tcp_hdr);
> 
>                     /* ETH_MTU = ip header len + tcp header len + payload */
>                     int tcp_data_len = ip_data_len - tcp_hlen;
>                     int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;
> 
>                     for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
>                     {
> 
> It uses a fixed value of ETH_MTU to calculate the size of the TCP
> data chunks, and this is not surprisingly the well known:
> 
> #define ETH_MTU     1500
> 
> Qemu seems to be buggy - it ignores the MSS value, and always tries to
> send 1500 byte frames.

cc'ing in Stefan who last touched that code and Jason and Vlad who
know the net code.

Dave

> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
@ 2016-11-14  9:29       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 12+ messages in thread
From: Dr. David Alan Gilbert @ 2016-11-14  9:29 UTC (permalink / raw)
  To: Russell King - ARM Linux, jasowang, vyasevic, stefanha
  Cc: David Woodhouse, netdev, qemu-devel

* Russell King - ARM Linux (linux@armlinux.org.uk) wrote:
> On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> > It's also *fairly* unlikely that the kernel in the guest has developed
> > a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> > that qemu isn't properly emulating those bits. But at first glance at
> > the code, it looks like *that's* been there for the last decade too...
> 
> I take issue with that, having looked at the qemu rtl8139 code:
> 
>                 if ((txdw0 & CP_TX_LGSEN) && ip_protocol == IP_PROTO_TCP)
>                 {
>                     int large_send_mss = (txdw0 >> 16) & CP_TC_LGSEN_MSS_MASK;
> 
>                     DPRINTF("+++ C+ mode offloaded task TSO MTU=%d IP data %d "
>                         "frame data %d specified MSS=%d\n", ETH_MTU,
>                         ip_data_len, saved_size - ETH_HLEN, large_send_mss);
> 
> That's the only reference to "large_send_mss" there, other than that,
> the MSS value that gets stuck into the field by 8139cp.c is completely
> unused.  Instead, qemu does this:
> 
>                 eth_payload_data = saved_buffer + ETH_HLEN;
>                 eth_payload_len  = saved_size   - ETH_HLEN;
> 
>                 ip = (ip_header*)eth_payload_data;
> 
>                     hlen = IP_HEADER_LENGTH(ip);
>                     ip_data_len = be16_to_cpu(ip->ip_len) - hlen;
> 
>                     tcp_header *p_tcp_hdr = (tcp_header*)(eth_payload_data + hlen);
>                     int tcp_hlen = TCP_HEADER_DATA_OFFSET(p_tcp_hdr);
> 
>                     /* ETH_MTU = ip header len + tcp header len + payload */
>                     int tcp_data_len = ip_data_len - tcp_hlen;
>                     int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;
> 
>                     for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
>                     {
> 
> It uses a fixed value of ETH_MTU to calculate the size of the TCP
> data chunks, and this is not surprisingly the well known:
> 
> #define ETH_MTU     1500
> 
> Qemu seems to be buggy - it ignores the MSS value, and always tries to
> send 1500 byte frames.

cc'ing in Stefan who last touched that code and Jason and Vlad who
know the net code.

Dave

> -- 
> RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
> FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
> according to speedtest.net.
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] TCP performance problems - GSO/TSO, MSS, 8139cp related
  2016-11-14  9:29       ` Dr. David Alan Gilbert
  (?)
@ 2016-11-14 16:25       ` Stefan Hajnoczi
  -1 siblings, 0 replies; 12+ messages in thread
From: Stefan Hajnoczi @ 2016-11-14 16:25 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Woodhouse
  Cc: jasowang, vyasevic, stefanha, netdev, qemu-devel,
	igor.v.kovalenko, dgilbert

[-- Attachment #1: Type: text/plain, Size: 3792 bytes --]

On Mon, Nov 14, 2016 at 09:29:48AM +0000, Dr. David Alan Gilbert wrote:
> * Russell King - ARM Linux (linux@armlinux.org.uk) wrote:
> > On Fri, Nov 11, 2016 at 09:23:43PM +0000, David Woodhouse wrote:
> > > It's also *fairly* unlikely that the kernel in the guest has developed
> > > a bug and isn't setting gso_size sanely. I'm more inclined to suspect
> > > that qemu isn't properly emulating those bits. But at first glance at
> > > the code, it looks like *that's* been there for the last decade too...
> > 
> > I take issue with that, having looked at the qemu rtl8139 code:
> > 
> >                 if ((txdw0 & CP_TX_LGSEN) && ip_protocol == IP_PROTO_TCP)
> >                 {
> >                     int large_send_mss = (txdw0 >> 16) & CP_TC_LGSEN_MSS_MASK;
> > 
> >                     DPRINTF("+++ C+ mode offloaded task TSO MTU=%d IP data %d "
> >                         "frame data %d specified MSS=%d\n", ETH_MTU,
> >                         ip_data_len, saved_size - ETH_HLEN, large_send_mss);
> > 
> > That's the only reference to "large_send_mss" there, other than that,
> > the MSS value that gets stuck into the field by 8139cp.c is completely
> > unused.  Instead, qemu does this:
> > 
> >                 eth_payload_data = saved_buffer + ETH_HLEN;
> >                 eth_payload_len  = saved_size   - ETH_HLEN;
> > 
> >                 ip = (ip_header*)eth_payload_data;
> > 
> >                     hlen = IP_HEADER_LENGTH(ip);
> >                     ip_data_len = be16_to_cpu(ip->ip_len) - hlen;
> > 
> >                     tcp_header *p_tcp_hdr = (tcp_header*)(eth_payload_data + hlen);
> >                     int tcp_hlen = TCP_HEADER_DATA_OFFSET(p_tcp_hdr);
> > 
> >                     /* ETH_MTU = ip header len + tcp header len + payload */
> >                     int tcp_data_len = ip_data_len - tcp_hlen;
> >                     int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;
> > 
> >                     for (tcp_send_offset = 0; tcp_send_offset < tcp_data_len; tcp_send_offset += tcp_chunk_size)
> >                     {
> > 
> > It uses a fixed value of ETH_MTU to calculate the size of the TCP
> > data chunks, and this is not surprisingly the well known:
> > 
> > #define ETH_MTU     1500
> > 
> > Qemu seems to be buggy - it ignores the MSS value, and always tries to
> > send 1500 byte frames.
> 
> cc'ing in Stefan who last touched that code and Jason and Vlad who
> know the net code.

CCing Igor Kovalenko who implemented "fixed for TCP segmentation
offloading - removed dependency on slirp.h" in 2006.  I don't actually
expect him to remember this from 10 years ago though :).

Looking at the history the large_send_mss variable was never used for
anything beyond the debug printf.

The datasheet for this NIC is here:
http://realtek.info/pdf/rtl8139cp.pdf.  See 9.2.1 Transmit.

Does this untested patch work for you?

diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index f05e59c..a3f1af5 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -2167,9 +2167,13 @@ static int rtl8139_cplus_transmit_one(RTL8139State *s)
                     goto skip_offload;
                 }
 
-                /* ETH_MTU = ip header len + tcp header len + payload */
+                /* MSS too small */
+                if (tcp_hlen + hlen >= large_send_mss) {
+                    goto skip_offload;
+                }
+
                 int tcp_data_len = ip_data_len - tcp_hlen;
-                int tcp_chunk_size = ETH_MTU - hlen - tcp_hlen;
+                int tcp_chunk_size = large_send_mss - hlen - tcp_hlen;
 
                 DPRINTF("+++ C+ mode TSO IP data len %d TCP hlen %d TCP "
                     "data len %d TCP chunk size %d\n", ip_data_len,

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-14 16:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-11 21:05 TCP performance problems - GSO/TSO, MSS, 8139cp related Russell King - ARM Linux
2016-11-11 21:23 ` David Woodhouse
2016-11-11 21:23   ` [Qemu-devel] " David Woodhouse
2016-11-11 22:33   ` Russell King - ARM Linux
2016-11-11 22:33     ` [Qemu-devel] " Russell King - ARM Linux
2016-11-12  2:52     ` David Miller
2016-11-12  2:52       ` [Qemu-devel] " David Miller
2016-11-11 22:44   ` Russell King - ARM Linux
2016-11-11 22:44     ` [Qemu-devel] " Russell King - ARM Linux
2016-11-14  9:29     ` Dr. David Alan Gilbert
2016-11-14  9:29       ` Dr. David Alan Gilbert
2016-11-14 16:25       ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.