xen-netfront possibly rides the rocket too often

* xen-netfront possibly rides the rocket too often
@ 2014-05-13 18:21 Stefan Bader
  2014-05-14 19:49 ` Zoltan Kiss
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Bader @ 2014-05-13 18:21 UTC (permalink / raw)
  To: xen-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 2263 bytes --]

We had reports about this message being seen on EC2 for a while but finally a
reporter did notice some details about the guests and was able to provide a
simple way to reproduce[1].

For my local experiments I use a Xen-4.2.2 based host (though I would say the
host versions are not important). The host has one NIC which is used as the
outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
that bridge. I set the mtu to 9001 (which was seen on affected instance types)
and also inside the guests. As described in the report one guests runs
redis-server and the other nodejs through two scripts (for me I had to do the
two sub.js calls in separate shells). After a bit the error messages appear on
the guest running the redis-server.

I added some debug printk's to show a bit more detail about the skb and got the
following (<length>@<offset (after masking off complete pages)>):

[ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
[ 698.108134] header 1490@238 -> 1 slots
[ 698.108139] frag #0 1614@2164 -> + 1 pages
[ 698.108143] frag #1 3038@1296 -> + 2 pages
[ 698.108147] frag #2 6076@1852 -> + 2 pages
[ 698.108151] frag #3 6076@292 -> + 2 pages
[ 698.108156] frag #4 6076@2828 -> + 3 pages
[ 698.108160] frag #5 3038@1268 -> + 2 pages
[ 698.108164] frag #6 2272@1824 -> + 1 pages
[ 698.108168] frag #7 3804@0 -> + 1 pages
[ 698.108172] frag #8 6076@264 -> + 2 pages
[ 698.108177] frag #9 3946@2800 -> + 2 pages
[ 698.108180] frags adding 18 slots

Since I am not deeply familiar with the networking code, I wonder about two things:
- is there something that should limit the skb data length from all frags
  to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
- is multiple frags having offsets expected?

The latter is the problem here. If I did the maths right, the overall data size
is around 41K. But since frags 1,4,5, and 9 have an offset big enough to require
an additional page, the overall slot count goes up to 19.

If such a layout is valid, maybe the xen-netfront driver needs to reduce its
XEN_NETIF_MAX_TX_SIZE which currently is set to 64K? Or something else...

-Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread