netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* xen-netfront possibly rides the rocket too often
@ 2014-05-13 18:21 Stefan Bader
  2014-05-14 19:49 ` Zoltan Kiss
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Bader @ 2014-05-13 18:21 UTC (permalink / raw)
  To: xen-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 2263 bytes --]

We had reports about this message being seen on EC2 for a while but finally a
reporter did notice some details about the guests and was able to provide a
simple way to reproduce[1].

For my local experiments I use a Xen-4.2.2 based host (though I would say the
host versions are not important). The host has one NIC which is used as the
outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
that bridge. I set the mtu to 9001 (which was seen on affected instance types)
and also inside the guests. As described in the report one guests runs
redis-server and the other nodejs through two scripts (for me I had to do the
two sub.js calls in separate shells). After a bit the error messages appear on
the guest running the redis-server.

I added some debug printk's to show a bit more detail about the skb and got the
following (<length>@<offset (after masking off complete pages)>):

[ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
[ 698.108134] header 1490@238 -> 1 slots
[ 698.108139] frag #0 1614@2164 -> + 1 pages
[ 698.108143] frag #1 3038@1296 -> + 2 pages
[ 698.108147] frag #2 6076@1852 -> + 2 pages
[ 698.108151] frag #3 6076@292 -> + 2 pages
[ 698.108156] frag #4 6076@2828 -> + 3 pages
[ 698.108160] frag #5 3038@1268 -> + 2 pages
[ 698.108164] frag #6 2272@1824 -> + 1 pages
[ 698.108168] frag #7 3804@0 -> + 1 pages
[ 698.108172] frag #8 6076@264 -> + 2 pages
[ 698.108177] frag #9 3946@2800 -> + 2 pages
[ 698.108180] frags adding 18 slots

Since I am not deeply familiar with the networking code, I wonder about two things:
- is there something that should limit the skb data length from all frags
  to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
- is multiple frags having offsets expected?

The latter is the problem here. If I did the maths right, the overall data size
is around 41K. But since frags 1,4,5, and 9 have an offset big enough to require
an additional page, the overall slot count goes up to 19.

If such a layout is valid, maybe the xen-netfront driver needs to reduce its
XEN_NETIF_MAX_TX_SIZE which currently is set to 64K? Or something else...

-Stefan

[1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xen-netfront possibly rides the rocket too often
  2014-05-13 18:21 xen-netfront possibly rides the rocket too often Stefan Bader
@ 2014-05-14 19:49 ` Zoltan Kiss
  2014-05-14 20:06   ` Zoltan Kiss
  2014-05-15  8:46   ` Ian Campbell
  0 siblings, 2 replies; 19+ messages in thread
From: Zoltan Kiss @ 2014-05-14 19:49 UTC (permalink / raw)
  To: Stefan Bader, xen-devel, netdev

On 13/05/14 19:21, Stefan Bader wrote:
> We had reports about this message being seen on EC2 for a while but finally a
> reporter did notice some details about the guests and was able to provide a
> simple way to reproduce[1].
>
> For my local experiments I use a Xen-4.2.2 based host (though I would say the
> host versions are not important). The host has one NIC which is used as the
> outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
> that bridge. I set the mtu to 9001 (which was seen on affected instance types)
> and also inside the guests. As described in the report one guests runs
> redis-server and the other nodejs through two scripts (for me I had to do the
> two sub.js calls in separate shells). After a bit the error messages appear on
> the guest running the redis-server.
>
> I added some debug printk's to show a bit more detail about the skb and got the
> following (<length>@<offset (after masking off complete pages)>):
>
> [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
> [ 698.108134] header 1490@238 -> 1 slots
> [ 698.108139] frag #0 1614@2164 -> + 1 pages
> [ 698.108143] frag #1 3038@1296 -> + 2 pages
> [ 698.108147] frag #2 6076@1852 -> + 2 pages
> [ 698.108151] frag #3 6076@292 -> + 2 pages
> [ 698.108156] frag #4 6076@2828 -> + 3 pages
> [ 698.108160] frag #5 3038@1268 -> + 2 pages
> [ 698.108164] frag #6 2272@1824 -> + 1 pages
> [ 698.108168] frag #7 3804@0 -> + 1 pages
> [ 698.108172] frag #8 6076@264 -> + 2 pages
> [ 698.108177] frag #9 3946@2800 -> + 2 pages
> [ 698.108180] frags adding 18 slots
>
> Since I am not deeply familiar with the networking code, I wonder about two things:
> - is there something that should limit the skb data length from all frags
>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
I think netfront should be able to handle 64K packets at most.
> - is multiple frags having offsets expected?
Yes, since compound pages a frag could over the 4K page boundary. The 
problem is, that in the netback/front protocol the assumption is that 
every slot is a single page, because grant operations could be done only 
on a 4K page. And every slot ends up as a frag (expect maybe the first, 
it can happen it is grant copied straight to the linear buffer), 
therefore the frontend cannot send an skb which occupies more than 
MAX_SKB_FRAGS individual 4k page.
The problem is known for a while, the solution is not, unfortunately.
>
> The latter is the problem here. If I did the maths right, the overall data size
> is around 41K. But since frags 1,4,5, and 9 have an offset big enough to require
> an additional page, the overall slot count goes up to 19.
>
> If such a layout is valid, maybe the xen-netfront driver needs to reduce its
> XEN_NETIF_MAX_TX_SIZE which currently is set to 64K? Or something else...
>
> -Stefan
>
> [1] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1317811
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: xen-netfront possibly rides the rocket too often
  2014-05-14 19:49 ` Zoltan Kiss
@ 2014-05-14 20:06   ` Zoltan Kiss
  2014-05-15  8:38     ` [Xen-devel] " Sander Eikelenboom
  2014-05-15  8:46   ` Ian Campbell
  1 sibling, 1 reply; 19+ messages in thread
From: Zoltan Kiss @ 2014-05-14 20:06 UTC (permalink / raw)
  To: Stefan Bader, xen-devel, netdev

On 14/05/14 20:49, Zoltan Kiss wrote:
> On 13/05/14 19:21, Stefan Bader wrote:
>> Since I am not deeply familiar with the networking code, I wonder
>> about two things:
>> - is there something that should limit the skb data length from all frags
>>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
> I think netfront should be able to handle 64K packets at most.
>> - is multiple frags having offsets expected?
> Yes, since compound pages a frag could over the 4K page boundary. The
> problem is, that in the netback/front protocol the assumption is that
> every slot is a single page, because grant operations could be done only
> on a 4K page. And every slot ends up as a frag (expect maybe the first,
> it can happen it is grant copied straight to the linear buffer),
> therefore the frontend cannot send an skb which occupies more than
> MAX_SKB_FRAGS individual 4k page.
> The problem is known for a while, the solution is not, unfortunately.

I think the worst case scenario is when every frag and the linear buffer 
contains 2 bytes, which are overlapping a page boundary (that's 
(17+1)*2=36 so far), plus 15 of them have a 4k page in the middle of 
them, so, a 1+4096+1 byte buffer can span over 3 page. That's 51 
individual pages.
With the previous grant copy implementation there would be the option to 
modify backend and coalesce everything into a well formed skb. That 
would be a minor change there. But with grant mapping it's harder.
Slots of compound pages could be mapped to adjacent pages to Dom0, maybe 
somehow you can present them as compound pages in Dom0 as well. But in 
MFN space they wouldn't be contiguous, you need SWIOTLB or use IOMMU to 
hide that from the devices. Plus, what happens when you can't find 
adjacent pending slots?
I think we would be better off at the moment with trying to compact 
these skbs a bit. Usually they overflow the limit by one or two, which 
means we should reallocate one or two frag, or the linear buffer to 
decrease the number of 4K pages used.

Zoli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-14 20:06   ` Zoltan Kiss
@ 2014-05-15  8:38     ` Sander Eikelenboom
  2014-05-15  9:03       ` Stefan Bader
  0 siblings, 1 reply; 19+ messages in thread
From: Sander Eikelenboom @ 2014-05-15  8:38 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Stefan Bader, xen-devel, netdev


Wednesday, May 14, 2014, 10:06:41 PM, you wrote:

> On 14/05/14 20:49, Zoltan Kiss wrote:
>> On 13/05/14 19:21, Stefan Bader wrote:
>>> Since I am not deeply familiar with the networking code, I wonder
>>> about two things:
>>> - is there something that should limit the skb data length from all frags
>>>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
>> I think netfront should be able to handle 64K packets at most.
>>> - is multiple frags having offsets expected?
>> Yes, since compound pages a frag could over the 4K page boundary. The
>> problem is, that in the netback/front protocol the assumption is that
>> every slot is a single page, because grant operations could be done only
>> on a 4K page. And every slot ends up as a frag (expect maybe the first,
>> it can happen it is grant copied straight to the linear buffer),
>> therefore the frontend cannot send an skb which occupies more than
>> MAX_SKB_FRAGS individual 4k page.
>> The problem is known for a while, the solution is not, unfortunately.

> I think the worst case scenario is when every frag and the linear buffer 
> contains 2 bytes, which are overlapping a page boundary (that's 
> (17+1)*2=36 so far), plus 15 of them have a 4k page in the middle of 
> them, so, a 1+4096+1 byte buffer can span over 3 page. That's 51 
> individual pages.
> With the previous grant copy implementation there would be the option to 
> modify backend and coalesce everything into a well formed skb. That 
> would be a minor change there. But with grant mapping it's harder.
> Slots of compound pages could be mapped to adjacent pages to Dom0, maybe 
> somehow you can present them as compound pages in Dom0 as well. But in 
> MFN space they wouldn't be contiguous, you need SWIOTLB or use IOMMU to 
> hide that from the devices. Plus, what happens when you can't find 
> adjacent pending slots?
> I think we would be better off at the moment with trying to compact 
> these skbs a bit. Usually they overflow the limit by one or two, which 
> means we should reallocate one or two frag, or the linear buffer to 
> decrease the number of 4K pages used.

How does virtio-net handle this, the would probably have ran into the same problems ?

--
Sander

> Zoli

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-14 19:49 ` Zoltan Kiss
  2014-05-14 20:06   ` Zoltan Kiss
@ 2014-05-15  8:46   ` Ian Campbell
  2014-05-15  8:58     ` Stefan Bader
  2014-05-15 11:04     ` Wei Liu
  1 sibling, 2 replies; 19+ messages in thread
From: Ian Campbell @ 2014-05-15  8:46 UTC (permalink / raw)
  To: Zoltan Kiss; +Cc: Stefan Bader, xen-devel, netdev, Wei Liu

On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
> On 13/05/14 19:21, Stefan Bader wrote:
> > We had reports about this message being seen on EC2 for a while but finally a
> > reporter did notice some details about the guests and was able to provide a
> > simple way to reproduce[1].
> >
> > For my local experiments I use a Xen-4.2.2 based host (though I would say the
> > host versions are not important). The host has one NIC which is used as the
> > outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
> > that bridge. I set the mtu to 9001 (which was seen on affected instance types)
> > and also inside the guests. As described in the report one guests runs
> > redis-server and the other nodejs through two scripts (for me I had to do the
> > two sub.js calls in separate shells). After a bit the error messages appear on
> > the guest running the redis-server.
> >
> > I added some debug printk's to show a bit more detail about the skb and got the
> > following (<length>@<offset (after masking off complete pages)>):
> >
> > [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
> > [ 698.108134] header 1490@238 -> 1 slots
> > [ 698.108139] frag #0 1614@2164 -> + 1 pages
> > [ 698.108143] frag #1 3038@1296 -> + 2 pages
> > [ 698.108147] frag #2 6076@1852 -> + 2 pages
> > [ 698.108151] frag #3 6076@292 -> + 2 pages
> > [ 698.108156] frag #4 6076@2828 -> + 3 pages
> > [ 698.108160] frag #5 3038@1268 -> + 2 pages
> > [ 698.108164] frag #6 2272@1824 -> + 1 pages
> > [ 698.108168] frag #7 3804@0 -> + 1 pages
> > [ 698.108172] frag #8 6076@264 -> + 2 pages
> > [ 698.108177] frag #9 3946@2800 -> + 2 pages
> > [ 698.108180] frags adding 18 slots
> >
> > Since I am not deeply familiar with the networking code, I wonder about two things:
> > - is there something that should limit the skb data length from all frags
> >    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
> I think netfront should be able to handle 64K packets at most.

Ah, maybe this relates to this fix from Wei?

commit 9ecd1a75d977e2e8c48139c7d3efed183f898d94
Author: Wei Liu <wei.liu2@citrix.com>
Date:   Mon Apr 22 02:20:41 2013 +0000

    xen-netfront: reduce gso_max_size to account for max TCP header
    
    The maximum packet including header that can be handled by netfront / netback
    wire format is 65535. Reduce gso_max_size accordingly.
    
    Drop skb and print warning when skb->len > 65535. This can 1) save the effort
    to send malformed packet to netback, 2) help spotting misconfiguration of
    netfront in the future.
    
    Signed-off-by: Wei Liu <wei.liu2@citrix.com>
    Acked-by: Ian Campbell <ian.campbell@citrix.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 1bb2e20..1db10141 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -36,7 +36,7 @@
 #include <linux/skbuff.h>
 #include <linux/ethtool.h>
 #include <linux/if_ether.h>
-#include <linux/tcp.h>
+#include <net/tcp.h>
 #include <linux/udp.h>
 #include <linux/moduleparam.h>
 #include <linux/mm.h>
@@ -547,6 +547,16 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
+	/* If skb->len is too big for wire format, drop skb and alert
+	 * user about misconfiguration.
+	 */
+	if (unlikely(skb->len > XEN_NETIF_MAX_TX_SIZE)) {
+		net_alert_ratelimited(
+			"xennet: skb->len = %u, too big for wire format\n",
+			skb->len);
+		goto drop;
+	}
+
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
@@ -1058,7 +1068,8 @@ err:
 
 static int xennet_change_mtu(struct net_device *dev, int mtu)
 {
-	int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
+	int max = xennet_can_sg(dev) ?
+		XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER : ETH_DATA_LEN;
 
 	if (mtu > max)
 		return -EINVAL;
@@ -1362,6 +1373,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
 	SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
 	SET_NETDEV_DEV(netdev, &dev->dev);
 
+	netif_set_gso_max_size(netdev, XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER);
+
 	np->netdev = netdev;
 
 	netif_carrier_off(netdev);
diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
index 9dfc120..58fadca 100644
--- a/include/xen/interface/io/netif.h
+++ b/include/xen/interface/io/netif.h
@@ -47,6 +47,7 @@
 #define _XEN_NETTXF_extra_info		(3)
 #define  XEN_NETTXF_extra_info		(1U<<_XEN_NETTXF_extra_info)
 
+#define XEN_NETIF_MAX_TX_SIZE 0xFFFF
 struct xen_netif_tx_request {
     grant_ref_t gref;      /* Reference to buffer page */
     uint16_t offset;       /* Offset within buffer page */

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15  8:46   ` Ian Campbell
@ 2014-05-15  8:58     ` Stefan Bader
  2014-05-15  9:38       ` Sander Eikelenboom
  2014-05-15 11:04     ` Wei Liu
  1 sibling, 1 reply; 19+ messages in thread
From: Stefan Bader @ 2014-05-15  8:58 UTC (permalink / raw)
  To: Ian Campbell, Zoltan Kiss; +Cc: xen-devel, Wei Liu, netdev

[-- Attachment #1: Type: text/plain, Size: 6275 bytes --]

On 15.05.2014 10:46, Ian Campbell wrote:
> On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
>> On 13/05/14 19:21, Stefan Bader wrote:
>>> We had reports about this message being seen on EC2 for a while but finally a
>>> reporter did notice some details about the guests and was able to provide a
>>> simple way to reproduce[1].
>>>
>>> For my local experiments I use a Xen-4.2.2 based host (though I would say the
>>> host versions are not important). The host has one NIC which is used as the
>>> outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
>>> that bridge. I set the mtu to 9001 (which was seen on affected instance types)
>>> and also inside the guests. As described in the report one guests runs
>>> redis-server and the other nodejs through two scripts (for me I had to do the
>>> two sub.js calls in separate shells). After a bit the error messages appear on
>>> the guest running the redis-server.
>>>
>>> I added some debug printk's to show a bit more detail about the skb and got the
>>> following (<length>@<offset (after masking off complete pages)>):
>>>
>>> [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
>>> [ 698.108134] header 1490@238 -> 1 slots
>>> [ 698.108139] frag #0 1614@2164 -> + 1 pages
>>> [ 698.108143] frag #1 3038@1296 -> + 2 pages
>>> [ 698.108147] frag #2 6076@1852 -> + 2 pages
>>> [ 698.108151] frag #3 6076@292 -> + 2 pages
>>> [ 698.108156] frag #4 6076@2828 -> + 3 pages
>>> [ 698.108160] frag #5 3038@1268 -> + 2 pages
>>> [ 698.108164] frag #6 2272@1824 -> + 1 pages
>>> [ 698.108168] frag #7 3804@0 -> + 1 pages
>>> [ 698.108172] frag #8 6076@264 -> + 2 pages
>>> [ 698.108177] frag #9 3946@2800 -> + 2 pages
>>> [ 698.108180] frags adding 18 slots
>>>
>>> Since I am not deeply familiar with the networking code, I wonder about two things:
>>> - is there something that should limit the skb data length from all frags
>>>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
>> I think netfront should be able to handle 64K packets at most.
> 
> Ah, maybe this relates to this fix from Wei?

That indeed should (imo) limit the overall size. And if that would happen still,
we would see the different message. The problem I see is not the overall size
but the layout of the frags. Since multiple start at an offset high enough, the
count of slots goes too high.

Now I have to read that code in more detail, but there also has been some
changes which introduce a make frags function. I wonder whether maybe there is
already some kind of shifting (or copying) going on and whether its just the
check that needs to improve. But right now that is speculation.

For the fun I actually found an even simpler way to trigger it and (need to
confirm it yet without mucking around with mtu before). It looked like mtu
actually did not make a difference. One only needed the redis server on one
guest and run redis-benchmark from the other with (I think -d 1000, or whatever
it is that sets the datasize). Then on the range tests this happens quite often.

-Stefan
> 
> commit 9ecd1a75d977e2e8c48139c7d3efed183f898d94
> Author: Wei Liu <wei.liu2@citrix.com>
> Date:   Mon Apr 22 02:20:41 2013 +0000
> 
>     xen-netfront: reduce gso_max_size to account for max TCP header
>     
>     The maximum packet including header that can be handled by netfront / netback
>     wire format is 65535. Reduce gso_max_size accordingly.
>     
>     Drop skb and print warning when skb->len > 65535. This can 1) save the effort
>     to send malformed packet to netback, 2) help spotting misconfiguration of
>     netfront in the future.
>     
>     Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>     Acked-by: Ian Campbell <ian.campbell@citrix.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 1bb2e20..1db10141 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -36,7 +36,7 @@
>  #include <linux/skbuff.h>
>  #include <linux/ethtool.h>
>  #include <linux/if_ether.h>
> -#include <linux/tcp.h>
> +#include <net/tcp.h>
>  #include <linux/udp.h>
>  #include <linux/moduleparam.h>
>  #include <linux/mm.h>
> @@ -547,6 +547,16 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/* If skb->len is too big for wire format, drop skb and alert
> +	 * user about misconfiguration.
> +	 */
> +	if (unlikely(skb->len > XEN_NETIF_MAX_TX_SIZE)) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %u, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> @@ -1058,7 +1068,8 @@ err:
>  
>  static int xennet_change_mtu(struct net_device *dev, int mtu)
>  {
> -	int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
> +	int max = xennet_can_sg(dev) ?
> +		XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER : ETH_DATA_LEN;
>  
>  	if (mtu > max)
>  		return -EINVAL;
> @@ -1362,6 +1373,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
>  	SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
>  	SET_NETDEV_DEV(netdev, &dev->dev);
>  
> +	netif_set_gso_max_size(netdev, XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER);
> +
>  	np->netdev = netdev;
>  
>  	netif_carrier_off(netdev);
> diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
> index 9dfc120..58fadca 100644
> --- a/include/xen/interface/io/netif.h
> +++ b/include/xen/interface/io/netif.h
> @@ -47,6 +47,7 @@
>  #define _XEN_NETTXF_extra_info		(3)
>  #define  XEN_NETTXF_extra_info		(1U<<_XEN_NETTXF_extra_info)
>  
> +#define XEN_NETIF_MAX_TX_SIZE 0xFFFF
>  struct xen_netif_tx_request {
>      grant_ref_t gref;      /* Reference to buffer page */
>      uint16_t offset;       /* Offset within buffer page */
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15  8:38     ` [Xen-devel] " Sander Eikelenboom
@ 2014-05-15  9:03       ` Stefan Bader
  0 siblings, 0 replies; 19+ messages in thread
From: Stefan Bader @ 2014-05-15  9:03 UTC (permalink / raw)
  To: Sander Eikelenboom, Zoltan Kiss; +Cc: xen-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 2784 bytes --]

On 15.05.2014 10:38, Sander Eikelenboom wrote:
> 
> Wednesday, May 14, 2014, 10:06:41 PM, you wrote:
> 
>> On 14/05/14 20:49, Zoltan Kiss wrote:
>>> On 13/05/14 19:21, Stefan Bader wrote:
>>>> Since I am not deeply familiar with the networking code, I wonder
>>>> about two things:
>>>> - is there something that should limit the skb data length from all frags
>>>>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
>>> I think netfront should be able to handle 64K packets at most.
>>>> - is multiple frags having offsets expected?
>>> Yes, since compound pages a frag could over the 4K page boundary. The
>>> problem is, that in the netback/front protocol the assumption is that
>>> every slot is a single page, because grant operations could be done only
>>> on a 4K page. And every slot ends up as a frag (expect maybe the first,
>>> it can happen it is grant copied straight to the linear buffer),
>>> therefore the frontend cannot send an skb which occupies more than
>>> MAX_SKB_FRAGS individual 4k page.
>>> The problem is known for a while, the solution is not, unfortunately.
> 
>> I think the worst case scenario is when every frag and the linear buffer 
>> contains 2 bytes, which are overlapping a page boundary (that's 
>> (17+1)*2=36 so far), plus 15 of them have a 4k page in the middle of 
>> them, so, a 1+4096+1 byte buffer can span over 3 page. That's 51 
>> individual pages.
>> With the previous grant copy implementation there would be the option to 
>> modify backend and coalesce everything into a well formed skb. That 
>> would be a minor change there. But with grant mapping it's harder.
>> Slots of compound pages could be mapped to adjacent pages to Dom0, maybe 
>> somehow you can present them as compound pages in Dom0 as well. But in 
>> MFN space they wouldn't be contiguous, you need SWIOTLB or use IOMMU to 
>> hide that from the devices. Plus, what happens when you can't find 
>> adjacent pending slots?
>> I think we would be better off at the moment with trying to compact 
>> these skbs a bit. Usually they overflow the limit by one or two, which 
>> means we should reallocate one or two frag, or the linear buffer to 
>> decrease the number of 4K pages used.
> 
> How does virtio-net handle this, the would probably have ran into the same problems ?

Maybe, though it seems not too many things cause this kind of traffic and then I
could not say whether they have to handle a limited set of ring pages, too. But
its something to keep in mind when digging deeper into it.

-Stefan
> 
> --
> Sander
> 
>> Zoli
> 
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15  8:58     ` Stefan Bader
@ 2014-05-15  9:38       ` Sander Eikelenboom
  0 siblings, 0 replies; 19+ messages in thread
From: Sander Eikelenboom @ 2014-05-15  9:38 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Ian Campbell, Zoltan Kiss, xen-devel, Wei Liu, netdev



Thursday, May 15, 2014, 10:58:38 AM, you wrote:

> On 15.05.2014 10:46, Ian Campbell wrote:
>> On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
>>> On 13/05/14 19:21, Stefan Bader wrote:
>>>> We had reports about this message being seen on EC2 for a while but finally a
>>>> reporter did notice some details about the guests and was able to provide a
>>>> simple way to reproduce[1].
>>>>
>>>> For my local experiments I use a Xen-4.2.2 based host (though I would say the
>>>> host versions are not important). The host has one NIC which is used as the
>>>> outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
>>>> that bridge. I set the mtu to 9001 (which was seen on affected instance types)
>>>> and also inside the guests. As described in the report one guests runs
>>>> redis-server and the other nodejs through two scripts (for me I had to do the
>>>> two sub.js calls in separate shells). After a bit the error messages appear on
>>>> the guest running the redis-server.
>>>>
>>>> I added some debug printk's to show a bit more detail about the skb and got the
>>>> following (<length>@<offset (after masking off complete pages)>):
>>>>
>>>> [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
>>>> [ 698.108134] header 1490@238 -> 1 slots
>>>> [ 698.108139] frag #0 1614@2164 -> + 1 pages
>>>> [ 698.108143] frag #1 3038@1296 -> + 2 pages
>>>> [ 698.108147] frag #2 6076@1852 -> + 2 pages
>>>> [ 698.108151] frag #3 6076@292 -> + 2 pages
>>>> [ 698.108156] frag #4 6076@2828 -> + 3 pages
>>>> [ 698.108160] frag #5 3038@1268 -> + 2 pages
>>>> [ 698.108164] frag #6 2272@1824 -> + 1 pages
>>>> [ 698.108168] frag #7 3804@0 -> + 1 pages
>>>> [ 698.108172] frag #8 6076@264 -> + 2 pages
>>>> [ 698.108177] frag #9 3946@2800 -> + 2 pages
>>>> [ 698.108180] frags adding 18 slots
>>>>
>>>> Since I am not deeply familiar with the networking code, I wonder about two things:
>>>> - is there something that should limit the skb data length from all frags
>>>>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
>>> I think netfront should be able to handle 64K packets at most.
>> 
>> Ah, maybe this relates to this fix from Wei?

> That indeed should (imo) limit the overall size. And if that would happen still,
> we would see the different message. The problem I see is not the overall size
> but the layout of the frags. Since multiple start at an offset high enough, the
> count of slots goes too high.

> Now I have to read that code in more detail, but there also has been some
> changes which introduce a make frags function. I wonder whether maybe there is
> already some kind of shifting (or copying) going on and whether its just the
> check that needs to improve. But right now that is speculation.

Netback seems to be able to cope with it, but properly calculating and checking 
the needed slots proved difficult (see discussion with Paul Durrant for 3.14).

>From my tests there it seemed that max slots was capped to MAX_SKB_FRAGS. 
I assume that could be explained by packets having a max size, so if there are 
frags with offsets that need more pages per frags, that is counterbalanced .. 
since the total packet size is already limited .. so it would have to have less 
frags.

That's at least what the netback code is using now as a calculation method for 
the required slots, pessimistic with a cap on MAX_SKB_FRAGS.

But i'm not 100% sure the assumption is valid (by backing it theoretically).

> For the fun I actually found an even simpler way to trigger it and (need to
> confirm it yet without mucking around with mtu before). It looked like mtu
> actually did not make a difference. One only needed the redis server on one
> guest and run redis-benchmark from the other with (I think -d 1000, or whatever
> it is that sets the datasize). Then on the range tests this happens quite often.

> -Stefan
>> 
>> commit 9ecd1a75d977e2e8c48139c7d3efed183f898d94
>> Author: Wei Liu <wei.liu2@citrix.com>
>> Date:   Mon Apr 22 02:20:41 2013 +0000
>> 
>>     xen-netfront: reduce gso_max_size to account for max TCP header
>>     
>>     The maximum packet including header that can be handled by netfront / netback
>>     wire format is 65535. Reduce gso_max_size accordingly.
>>     
>>     Drop skb and print warning when skb->len > 65535. This can 1) save the effort
>>     to send malformed packet to netback, 2) help spotting misconfiguration of
>>     netfront in the future.
>>     
>>     Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>     Acked-by: Ian Campbell <ian.campbell@citrix.com>
>>     Signed-off-by: David S. Miller <davem@davemloft.net>
>> 
>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>> index 1bb2e20..1db10141 100644
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -36,7 +36,7 @@
>>  #include <linux/skbuff.h>
>>  #include <linux/ethtool.h>
>>  #include <linux/if_ether.h>
>> -#include <linux/tcp.h>
>> +#include <net/tcp.h>
>>  #include <linux/udp.h>
>>  #include <linux/moduleparam.h>
>>  #include <linux/mm.h>
>> @@ -547,6 +547,16 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>       unsigned int len = skb_headlen(skb);
>>       unsigned long flags;
>>  
>> +     /* If skb->len is too big for wire format, drop skb and alert
>> +      * user about misconfiguration.
>> +      */
>> +     if (unlikely(skb->len > XEN_NETIF_MAX_TX_SIZE)) {
>> +             net_alert_ratelimited(
>> +                     "xennet: skb->len = %u, too big for wire format\n",
>> +                     skb->len);
>> +             goto drop;
>> +     }
>> +
>>       slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>>               xennet_count_skb_frag_slots(skb);
>>       if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
>> @@ -1058,7 +1068,8 @@ err:
>>  
>>  static int xennet_change_mtu(struct net_device *dev, int mtu)
>>  {
>> -     int max = xennet_can_sg(dev) ? 65535 - ETH_HLEN : ETH_DATA_LEN;
>> +     int max = xennet_can_sg(dev) ?
>> +             XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER : ETH_DATA_LEN;
>>  
>>       if (mtu > max)
>>               return -EINVAL;
>> @@ -1362,6 +1373,8 @@ static struct net_device *xennet_create_dev(struct xenbus_device *dev)
>>       SET_ETHTOOL_OPS(netdev, &xennet_ethtool_ops);
>>       SET_NETDEV_DEV(netdev, &dev->dev);
>>  
>> +     netif_set_gso_max_size(netdev, XEN_NETIF_MAX_TX_SIZE - MAX_TCP_HEADER);
>> +
>>       np->netdev = netdev;
>>  
>>       netif_carrier_off(netdev);
>> diff --git a/include/xen/interface/io/netif.h b/include/xen/interface/io/netif.h
>> index 9dfc120..58fadca 100644
>> --- a/include/xen/interface/io/netif.h
>> +++ b/include/xen/interface/io/netif.h
>> @@ -47,6 +47,7 @@
>>  #define _XEN_NETTXF_extra_info               (3)
>>  #define  XEN_NETTXF_extra_info               (1U<<_XEN_NETTXF_extra_info)
>>  
>> +#define XEN_NETIF_MAX_TX_SIZE 0xFFFF
>>  struct xen_netif_tx_request {
>>      grant_ref_t gref;      /* Reference to buffer page */
>>      uint16_t offset;       /* Offset within buffer page */
>> 
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> http://lists.xen.org/xen-devel
>> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15  8:46   ` Ian Campbell
  2014-05-15  8:58     ` Stefan Bader
@ 2014-05-15 11:04     ` Wei Liu
  2014-05-15 11:14       ` David Laight
                         ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Wei Liu @ 2014-05-15 11:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Zoltan Kiss, Stefan Bader, xen-devel, netdev, Wei Liu

On Thu, May 15, 2014 at 09:46:45AM +0100, Ian Campbell wrote:
> On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
> > On 13/05/14 19:21, Stefan Bader wrote:
> > > We had reports about this message being seen on EC2 for a while but finally a
> > > reporter did notice some details about the guests and was able to provide a
> > > simple way to reproduce[1].
> > >
> > > For my local experiments I use a Xen-4.2.2 based host (though I would say the
> > > host versions are not important). The host has one NIC which is used as the
> > > outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
> > > that bridge. I set the mtu to 9001 (which was seen on affected instance types)
> > > and also inside the guests. As described in the report one guests runs
> > > redis-server and the other nodejs through two scripts (for me I had to do the
> > > two sub.js calls in separate shells). After a bit the error messages appear on
> > > the guest running the redis-server.
> > >
> > > I added some debug printk's to show a bit more detail about the skb and got the
> > > following (<length>@<offset (after masking off complete pages)>):
> > >
> > > [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
> > > [ 698.108134] header 1490@238 -> 1 slots
> > > [ 698.108139] frag #0 1614@2164 -> + 1 pages
> > > [ 698.108143] frag #1 3038@1296 -> + 2 pages
> > > [ 698.108147] frag #2 6076@1852 -> + 2 pages
> > > [ 698.108151] frag #3 6076@292 -> + 2 pages
> > > [ 698.108156] frag #4 6076@2828 -> + 3 pages
> > > [ 698.108160] frag #5 3038@1268 -> + 2 pages
> > > [ 698.108164] frag #6 2272@1824 -> + 1 pages
> > > [ 698.108168] frag #7 3804@0 -> + 1 pages
> > > [ 698.108172] frag #8 6076@264 -> + 2 pages
> > > [ 698.108177] frag #9 3946@2800 -> + 2 pages
> > > [ 698.108180] frags adding 18 slots
> > >
> > > Since I am not deeply familiar with the networking code, I wonder about two things:
> > > - is there something that should limit the skb data length from all frags
> > >    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
> > I think netfront should be able to handle 64K packets at most.
> 
> Ah, maybe this relates to this fix from Wei?
> 

Yes, below patch limits SKB size to 64KB.  However the problem here is
not SKB exceeding 64KB. The said SKB is acutally 43KB in size. The
problem is that guest kernel is  using compound page so a frag which can
be fit into one 4K page spans two 4K pages.  The fix seems to be
coalescing SKB in frontend, but it will degrade performance.

Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15 11:04     ` Wei Liu
@ 2014-05-15 11:14       ` David Laight
  2014-05-15 11:47       ` Ian Campbell
  2014-05-15 12:14       ` Stefan Bader
  2 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2014-05-15 11:14 UTC (permalink / raw)
  To: 'Wei Liu', Ian Campbell
  Cc: Zoltan Kiss, Stefan Bader, xen-devel, netdev

From: Wei Liu
> On Thu, May 15, 2014 at 09:46:45AM +0100, Ian Campbell wrote:
> > On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
> > > On 13/05/14 19:21, Stefan Bader wrote:
> > > > We had reports about this message being seen on EC2 for a while but finally a
> > > > reporter did notice some details about the guests and was able to provide a
> > > > simple way to reproduce[1].
> > > >
> > > > For my local experiments I use a Xen-4.2.2 based host (though I would say the
> > > > host versions are not important). The host has one NIC which is used as the
> > > > outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
> > > > that bridge. I set the mtu to 9001 (which was seen on affected instance types)
> > > > and also inside the guests. As described in the report one guests runs
> > > > redis-server and the other nodejs through two scripts (for me I had to do the
> > > > two sub.js calls in separate shells). After a bit the error messages appear on
> > > > the guest running the redis-server.
> > > >
> > > > I added some debug printk's to show a bit more detail about the skb and got the
> > > > following (<length>@<offset (after masking off complete pages)>):
> > > >
> > > > [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
> > > > [ 698.108134] header 1490@238 -> 1 slots
> > > > [ 698.108139] frag #0 1614@2164 -> + 1 pages
> > > > [ 698.108143] frag #1 3038@1296 -> + 2 pages
> > > > [ 698.108147] frag #2 6076@1852 -> + 2 pages
> > > > [ 698.108151] frag #3 6076@292 -> + 2 pages
> > > > [ 698.108156] frag #4 6076@2828 -> + 3 pages
> > > > [ 698.108160] frag #5 3038@1268 -> + 2 pages
> > > > [ 698.108164] frag #6 2272@1824 -> + 1 pages
> > > > [ 698.108168] frag #7 3804@0 -> + 1 pages
> > > > [ 698.108172] frag #8 6076@264 -> + 2 pages
> > > > [ 698.108177] frag #9 3946@2800 -> + 2 pages
> > > > [ 698.108180] frags adding 18 slots
> > > >
> > > > Since I am not deeply familiar with the networking code, I wonder about two things:
> > > > - is there something that should limit the skb data length from all frags
> > > >    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
> > > I think netfront should be able to handle 64K packets at most.
> >
> > Ah, maybe this relates to this fix from Wei?
> >
> 
> Yes, below patch limits SKB size to 64KB.  However the problem here is
> not SKB exceeding 64KB. The said SKB is acutally 43KB in size. The
> problem is that guest kernel is  using compound page so a frag which can
> be fit into one 4K page spans two 4K pages.  The fix seems to be
> coalescing SKB in frontend, but it will degrade performance.

Presumably the thing to do is copy the frag list into a locally allocated
list of 4k fragments.
Unless you can support having some fragments mapped and some local
there probably isn't much point in trying to maintain the fragment
boundaries.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15 11:04     ` Wei Liu
  2014-05-15 11:14       ` David Laight
@ 2014-05-15 11:47       ` Ian Campbell
  2014-05-15 12:14       ` Stefan Bader
  2 siblings, 0 replies; 19+ messages in thread
From: Ian Campbell @ 2014-05-15 11:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: Zoltan Kiss, Stefan Bader, xen-devel, netdev

On Thu, 2014-05-15 at 12:04 +0100, Wei Liu wrote:
> On Thu, May 15, 2014 at 09:46:45AM +0100, Ian Campbell wrote:
> > On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
> > > On 13/05/14 19:21, Stefan Bader wrote:
> > > > We had reports about this message being seen on EC2 for a while but finally a
> > > > reporter did notice some details about the guests and was able to provide a
> > > > simple way to reproduce[1].
> > > >
> > > > For my local experiments I use a Xen-4.2.2 based host (though I would say the
> > > > host versions are not important). The host has one NIC which is used as the
> > > > outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
> > > > that bridge. I set the mtu to 9001 (which was seen on affected instance types)
> > > > and also inside the guests. As described in the report one guests runs
> > > > redis-server and the other nodejs through two scripts (for me I had to do the
> > > > two sub.js calls in separate shells). After a bit the error messages appear on
> > > > the guest running the redis-server.
> > > >
> > > > I added some debug printk's to show a bit more detail about the skb and got the
> > > > following (<length>@<offset (after masking off complete pages)>):
> > > >
> > > > [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
> > > > [ 698.108134] header 1490@238 -> 1 slots
> > > > [ 698.108139] frag #0 1614@2164 -> + 1 pages
> > > > [ 698.108143] frag #1 3038@1296 -> + 2 pages
> > > > [ 698.108147] frag #2 6076@1852 -> + 2 pages
> > > > [ 698.108151] frag #3 6076@292 -> + 2 pages
> > > > [ 698.108156] frag #4 6076@2828 -> + 3 pages
> > > > [ 698.108160] frag #5 3038@1268 -> + 2 pages
> > > > [ 698.108164] frag #6 2272@1824 -> + 1 pages
> > > > [ 698.108168] frag #7 3804@0 -> + 1 pages
> > > > [ 698.108172] frag #8 6076@264 -> + 2 pages
> > > > [ 698.108177] frag #9 3946@2800 -> + 2 pages
> > > > [ 698.108180] frags adding 18 slots
> > > >
> > > > Since I am not deeply familiar with the networking code, I wonder about two things:
> > > > - is there something that should limit the skb data length from all frags
> > > >    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
> > > I think netfront should be able to handle 64K packets at most.
> > 
> > Ah, maybe this relates to this fix from Wei?
> > 
> 
> Yes, below patch limits SKB size to 64KB.  However the problem here is
> not SKB exceeding 64KB. The said SKB is acutally 43KB in size. The
> problem is that guest kernel is  using compound page so a frag which can
> be fit into one 4K page spans two 4K pages.  The fix seems to be
> coalescing SKB in frontend, but it will degrade performance.

So long as it only happens when this scenario occurs a performance
degradation would seem preferable to dropping the skb altogether.

Ian.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15 11:04     ` Wei Liu
  2014-05-15 11:14       ` David Laight
  2014-05-15 11:47       ` Ian Campbell
@ 2014-05-15 12:14       ` Stefan Bader
  2014-05-16  9:48         ` Wei Liu
  2 siblings, 1 reply; 19+ messages in thread
From: Stefan Bader @ 2014-05-15 12:14 UTC (permalink / raw)
  To: Wei Liu, Ian Campbell; +Cc: Zoltan Kiss, xen-devel, netdev

[-- Attachment #1: Type: text/plain, Size: 3716 bytes --]

On 15.05.2014 13:04, Wei Liu wrote:
> On Thu, May 15, 2014 at 09:46:45AM +0100, Ian Campbell wrote:
>> On Wed, 2014-05-14 at 20:49 +0100, Zoltan Kiss wrote:
>>> On 13/05/14 19:21, Stefan Bader wrote:
>>>> We had reports about this message being seen on EC2 for a while but finally a
>>>> reporter did notice some details about the guests and was able to provide a
>>>> simple way to reproduce[1].
>>>>
>>>> For my local experiments I use a Xen-4.2.2 based host (though I would say the
>>>> host versions are not important). The host has one NIC which is used as the
>>>> outgoing port of a Linux based (not openvswitch) bridge. And the PV guests use
>>>> that bridge. I set the mtu to 9001 (which was seen on affected instance types)
>>>> and also inside the guests. As described in the report one guests runs
>>>> redis-server and the other nodejs through two scripts (for me I had to do the
>>>> two sub.js calls in separate shells). After a bit the error messages appear on
>>>> the guest running the redis-server.
>>>>
>>>> I added some debug printk's to show a bit more detail about the skb and got the
>>>> following (<length>@<offset (after masking off complete pages)>):
>>>>
>>>> [ 698.108119] xen_netfront: xennet: skb rides the rocket: 19 slots
>>>> [ 698.108134] header 1490@238 -> 1 slots
>>>> [ 698.108139] frag #0 1614@2164 -> + 1 pages
>>>> [ 698.108143] frag #1 3038@1296 -> + 2 pages
>>>> [ 698.108147] frag #2 6076@1852 -> + 2 pages
>>>> [ 698.108151] frag #3 6076@292 -> + 2 pages
>>>> [ 698.108156] frag #4 6076@2828 -> + 3 pages
>>>> [ 698.108160] frag #5 3038@1268 -> + 2 pages
>>>> [ 698.108164] frag #6 2272@1824 -> + 1 pages
>>>> [ 698.108168] frag #7 3804@0 -> + 1 pages
>>>> [ 698.108172] frag #8 6076@264 -> + 2 pages
>>>> [ 698.108177] frag #9 3946@2800 -> + 2 pages
>>>> [ 698.108180] frags adding 18 slots
>>>>
>>>> Since I am not deeply familiar with the networking code, I wonder about two things:
>>>> - is there something that should limit the skb data length from all frags
>>>>    to stay below the 64K which the definition of MAX_SKB_FRAGS hints?
>>> I think netfront should be able to handle 64K packets at most.
>>
>> Ah, maybe this relates to this fix from Wei?
>>
> 
> Yes, below patch limits SKB size to 64KB.  However the problem here is
> not SKB exceeding 64KB. The said SKB is acutally 43KB in size. The
> problem is that guest kernel is  using compound page so a frag which can
> be fit into one 4K page spans two 4K pages.  The fix seems to be
> coalescing SKB in frontend, but it will degrade performance.
> 
> Wei.
> 
Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
least now with compound pages) cannot be used in any way to derive the number of
4k slots a transfer will require.

Zoltan already commented on worst cases. Not sure it would get as bad as that or
"just" 16*4k frags all in the middle of compound pages. That would then end in
around 33 or 34 slots, depending on the header.

Zoltan wrote:
> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
them have a 4k
> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> That's 51 individual pages.

I cannot claim to really know what to expect worst case. Somewhat I was thinking
of a
worst case of (16+1)*2, which would be inconvenient enough.

So without knowing exactly how to do it, but as Ian said it sounds best to come
up with some sort of exception coalescing in cases the slot count goes over 18
and we know the data size is below 64K.

-Stefan



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-15 12:14       ` Stefan Bader
@ 2014-05-16  9:48         ` Wei Liu
  2014-05-16  9:57           ` Wei Liu
                             ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Wei Liu @ 2014-05-16  9:48 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Wei Liu, Ian Campbell, Zoltan Kiss, xen-devel, netdev

On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
[...]
> > Wei.
> > 
> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> least now with compound pages) cannot be used in any way to derive the number of
> 4k slots a transfer will require.
> 
> Zoltan already commented on worst cases. Not sure it would get as bad as that or
> "just" 16*4k frags all in the middle of compound pages. That would then end in
> around 33 or 34 slots, depending on the header.
> 
> Zoltan wrote:
> > I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> > which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> them have a 4k
> > page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> > That's 51 individual pages.
> 
> I cannot claim to really know what to expect worst case. Somewhat I was thinking
> of a
> worst case of (16+1)*2, which would be inconvenient enough.
> 
> So without knowing exactly how to do it, but as Ian said it sounds best to come
> up with some sort of exception coalescing in cases the slot count goes over 18
> and we know the data size is below 64K.
> 

I took a stab at it this morning and came up with this patch. Ran
redis-benchmark, it seemed to fix that for me -- only saw one "failed to
linearize skb" during

  redis-benchmark -h XXX -d 1000 -t lrange

And before this change, a lot of "rides rocket" were triggered.

Thought?

---8<---
>From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
From: Wei Liu <wei.liu2@citrix.com>
Date: Fri, 16 May 2014 10:39:01 +0100
Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots

Some workload, such as Redis can generate SKBs which make use of compound
pages. Netfront doesn't quite like that because it doesn't want to send
exessive slots to the backend as backend might deem it malicious. On the
flip side these packets are actually legit, the size check at the
beginning of xennet_start_xmit ensures that packet size is below 64K.

So we linearize SKB if it occupies too many slots. If the linearization
fails then the SKB is dropped.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |   18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 895355d..0361fc5 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-		net_alert_ratelimited(
-			"xennet: skb rides the rocket: %d slots\n", slots);
-		goto drop;
+		if (skb_linearize(skb)) {
+			net_alert_ratelimited(
+				"xennet: failed to linearize skb, skb dropped\n");
+			goto drop;
+		}
+		data = skb->data;
+		offset = offset_in_page(data);
+		len = skb_headlen(skb);
+		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
+			xennet_count_skb_frag_slots(skb);
+		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
+			net_alert_ratelimited(
+				"xennet: still too many slots after linerization: %d", slots);
+			goto drop;
+		}
 	}
 
 	spin_lock_irqsave(&np->tx_lock, flags);
-- 
1.7.10.4


> -Stefan
> 
> 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-16  9:48         ` Wei Liu
@ 2014-05-16  9:57           ` Wei Liu
  2014-05-16 10:05           ` David Laight
  2014-05-16 10:09           ` Stefan Bader
  2 siblings, 0 replies; 19+ messages in thread
From: Wei Liu @ 2014-05-16  9:57 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Wei Liu, Ian Campbell, Zoltan Kiss, xen-devel, netdev

On Fri, May 16, 2014 at 10:48:42AM +0100, Wei Liu wrote:
> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> [...]
> > > Wei.
> > > 
> > Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> > least now with compound pages) cannot be used in any way to derive the number of
> > 4k slots a transfer will require.
> > 
> > Zoltan already commented on worst cases. Not sure it would get as bad as that or
> > "just" 16*4k frags all in the middle of compound pages. That would then end in
> > around 33 or 34 slots, depending on the header.
> > 
> > Zoltan wrote:
> > > I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> > > which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> > them have a 4k
> > > page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> > > That's 51 individual pages.
> > 
> > I cannot claim to really know what to expect worst case. Somewhat I was thinking
> > of a
> > worst case of (16+1)*2, which would be inconvenient enough.
> > 
> > So without knowing exactly how to do it, but as Ian said it sounds best to come
> > up with some sort of exception coalescing in cases the slot count goes over 18
> > and we know the data size is below 64K.
> > 
> 
> I took a stab at it this morning and came up with this patch. Ran
> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> linearize skb" during
> 
>   redis-benchmark -h XXX -d 1000 -t lrange
> 
> And before this change, a lot of "rides rocket" were triggered.
> 
> Thought?
> 

And don't take this as a formal patch because I haven't tested it
thoroughly. It's just an idea to coalesce SKBs. We can certainly code
up a loop to do so by ourself.

Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-16  9:48         ` Wei Liu
  2014-05-16  9:57           ` Wei Liu
@ 2014-05-16 10:05           ` David Laight
  2014-05-16 10:22             ` Wei Liu
  2014-05-16 10:09           ` Stefan Bader
  2 siblings, 1 reply; 19+ messages in thread
From: David Laight @ 2014-05-16 10:05 UTC (permalink / raw)
  To: 'Wei Liu', Stefan Bader
  Cc: Ian Campbell, Zoltan Kiss, xen-devel, netdev

From: Wei Liu
> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> [...]
> > > Wei.
> > >
> > Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> > least now with compound pages) cannot be used in any way to derive the number of
> > 4k slots a transfer will require.
> >
> > Zoltan already commented on worst cases. Not sure it would get as bad as that or
> > "just" 16*4k frags all in the middle of compound pages. That would then end in
> > around 33 or 34 slots, depending on the header.
> >
> > Zoltan wrote:
> > > I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> > > which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> > them have a 4k
> > > page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> > > That's 51 individual pages.
> >
> > I cannot claim to really know what to expect worst case. Somewhat I was thinking
> > of a worst case of (16+1)*2, which would be inconvenient enough.
> >
> > So without knowing exactly how to do it, but as Ian said it sounds best to come
> > up with some sort of exception coalescing in cases the slot count goes over 18
> > and we know the data size is below 64K.
> >
> 
> I took a stab at it this morning and came up with this patch. Ran
> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> linearize skb" during
> 
>   redis-benchmark -h XXX -d 1000 -t lrange
> 
> And before this change, a lot of "rides rocket" were triggered.
> 
> Thought?
> 
> ---8<---
> From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Fri, 16 May 2014 10:39:01 +0100
> Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots
> 
> Some workload, such as Redis can generate SKBs which make use of compound
> pages. Netfront doesn't quite like that because it doesn't want to send
> exessive slots to the backend as backend might deem it malicious. On the
> flip side these packets are actually legit, the size check at the
> beginning of xennet_start_xmit ensures that packet size is below 64K.
> 
> So we linearize SKB if it occupies too many slots. If the linearization
> fails then the SKB is dropped.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 895355d..0361fc5 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		if (skb_linearize(skb)) {

You don't need to actually linearize the skb here.
One with multiple fragments is fine.
I'm not sure there is a standard function to 'copy and refragment'
the skb data though.

> +			net_alert_ratelimited(
> +				"xennet: failed to linearize skb, skb dropped\n");
> +			goto drop;
> +		}
> +		data = skb->data;
> +		offset = offset_in_page(data);
> +		len = skb_headlen(skb);
> +		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +			xennet_count_skb_frag_slots(skb);

IIRC If you have called skb_linearize then there shouldn't be any fragments.

> +		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +			net_alert_ratelimited(
> +				"xennet: still too many slots after linerization: %d", slots);
> +			goto drop;
> +		}
>  	}
> 
>  	spin_lock_irqsave(&np->tx_lock, flags);
> --
> 1.7.10.4

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-16  9:48         ` Wei Liu
  2014-05-16  9:57           ` Wei Liu
  2014-05-16 10:05           ` David Laight
@ 2014-05-16 10:09           ` Stefan Bader
  2014-05-16 10:17             ` Stefan Bader
  2 siblings, 1 reply; 19+ messages in thread
From: Stefan Bader @ 2014-05-16 10:09 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Campbell, Zoltan Kiss, netdev

[-- Attachment #1: Type: text/plain, Size: 3863 bytes --]

On 16.05.2014 11:48, Wei Liu wrote:
> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> [...]
>>> Wei.
>>>
>> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
>> least now with compound pages) cannot be used in any way to derive the number of
>> 4k slots a transfer will require.
>>
>> Zoltan already commented on worst cases. Not sure it would get as bad as that or
>> "just" 16*4k frags all in the middle of compound pages. That would then end in
>> around 33 or 34 slots, depending on the header.
>>
>> Zoltan wrote:
>>> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
>>> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
>> them have a 4k
>>> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
>>> That's 51 individual pages.
>>
>> I cannot claim to really know what to expect worst case. Somewhat I was thinking
>> of a
>> worst case of (16+1)*2, which would be inconvenient enough.
>>
>> So without knowing exactly how to do it, but as Ian said it sounds best to come
>> up with some sort of exception coalescing in cases the slot count goes over 18
>> and we know the data size is below 64K.
>>
> 
> I took a stab at it this morning and came up with this patch. Ran
> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> linearize skb" during
> 
>   redis-benchmark -h XXX -d 1000 -t lrange
> 
> And before this change, a lot of "rides rocket" were triggered.
> 
> Thought?

It appears at least to me as something that nicely makes use of existing code. I
was wondering about what could or could not be used. Trying to get ones head
around the whole thing is kind of a lot to look at.

The change at least looks straight forward enough.

-Stefan
> 
> ---8<---
> From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
> From: Wei Liu <wei.liu2@citrix.com>
> Date: Fri, 16 May 2014 10:39:01 +0100
> Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots
> 
> Some workload, such as Redis can generate SKBs which make use of compound
> pages. Netfront doesn't quite like that because it doesn't want to send
> exessive slots to the backend as backend might deem it malicious. On the
> flip side these packets are actually legit, the size check at the
> beginning of xennet_start_xmit ensures that packet size is below 64K.
> 
> So we linearize SKB if it occupies too many slots. If the linearization
> fails then the SKB is dropped.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 895355d..0361fc5 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -		net_alert_ratelimited(
> -			"xennet: skb rides the rocket: %d slots\n", slots);
> -		goto drop;
> +		if (skb_linearize(skb)) {
> +			net_alert_ratelimited(
> +				"xennet: failed to linearize skb, skb dropped\n");
> +			goto drop;
> +		}
> +		data = skb->data;
> +		offset = offset_in_page(data);
> +		len = skb_headlen(skb);
> +		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> +			xennet_count_skb_frag_slots(skb);
> +		if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> +			net_alert_ratelimited(
> +				"xennet: still too many slots after linerization: %d", slots);
> +			goto drop;
> +		}
>  	}
>  
>  	spin_lock_irqsave(&np->tx_lock, flags);
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-16 10:09           ` Stefan Bader
@ 2014-05-16 10:17             ` Stefan Bader
  2014-05-16 10:32               ` Wei Liu
  0 siblings, 1 reply; 19+ messages in thread
From: Stefan Bader @ 2014-05-16 10:17 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, Ian Campbell, Zoltan Kiss, netdev

[-- Attachment #1: Type: text/plain, Size: 2246 bytes --]

On 16.05.2014 12:09, Stefan Bader wrote:
> On 16.05.2014 11:48, Wei Liu wrote:
>> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
>> [...]
>>>> Wei.
>>>>
>>> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
>>> least now with compound pages) cannot be used in any way to derive the number of
>>> 4k slots a transfer will require.
>>>
>>> Zoltan already commented on worst cases. Not sure it would get as bad as that or
>>> "just" 16*4k frags all in the middle of compound pages. That would then end in
>>> around 33 or 34 slots, depending on the header.
>>>
>>> Zoltan wrote:
>>>> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
>>>> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
>>> them have a 4k
>>>> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
>>>> That's 51 individual pages.
>>>
>>> I cannot claim to really know what to expect worst case. Somewhat I was thinking
>>> of a
>>> worst case of (16+1)*2, which would be inconvenient enough.
>>>
>>> So without knowing exactly how to do it, but as Ian said it sounds best to come
>>> up with some sort of exception coalescing in cases the slot count goes over 18
>>> and we know the data size is below 64K.
>>>
>>
>> I took a stab at it this morning and came up with this patch. Ran
>> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
>> linearize skb" during
>>
>>   redis-benchmark -h XXX -d 1000 -t lrange
>>
>> And before this change, a lot of "rides rocket" were triggered.
>>
>> Thought?
> 
> It appears at least to me as something that nicely makes use of existing code. I
> was wondering about what could or could not be used. Trying to get ones head
> around the whole thing is kind of a lot to look at.
> 
> The change at least looks straight forward enough.

The only woe for me is that I am looking puzzled at the implementation of
skb_linearize(). Somehow the data_len element decides whether a skb can be
linearized and basically how much it tries to pull from the tail. It probably
makes sense ... just not to me with not deep experience here.

-Stefan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 901 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-16 10:05           ` David Laight
@ 2014-05-16 10:22             ` Wei Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Wei Liu @ 2014-05-16 10:22 UTC (permalink / raw)
  To: David Laight
  Cc: 'Wei Liu',
	Stefan Bader, Ian Campbell, Zoltan Kiss, xen-devel, netdev

On Fri, May 16, 2014 at 10:05:46AM +0000, David Laight wrote:
[...]
> > ---8<---
> > From 743495a2b2d338fc6cfe9bfd4b6e840392b87f4a Mon Sep 17 00:00:00 2001
> > From: Wei Liu <wei.liu2@citrix.com>
> > Date: Fri, 16 May 2014 10:39:01 +0100
> > Subject: [PATCH] xen-netfront: linearize SKB if it occupies too many slots
> > 
> > Some workload, such as Redis can generate SKBs which make use of compound
> > pages. Netfront doesn't quite like that because it doesn't want to send
> > exessive slots to the backend as backend might deem it malicious. On the
> > flip side these packets are actually legit, the size check at the
> > beginning of xennet_start_xmit ensures that packet size is below 64K.
> > 
> > So we linearize SKB if it occupies too many slots. If the linearization
> > fails then the SKB is dropped.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 895355d..0361fc5 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -573,9 +573,21 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >  		xennet_count_skb_frag_slots(skb);
> >  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> > -		net_alert_ratelimited(
> > -			"xennet: skb rides the rocket: %d slots\n", slots);
> > -		goto drop;
> > +		if (skb_linearize(skb)) {
> 
> You don't need to actually linearize the skb here.
> One with multiple fragments is fine.

Yes I know; but my idea is to get a SKB that doesn't use up too many
slots -- a linearized SKB will not use too many slots.

> I'm not sure there is a standard function to 'copy and refragment'
> the skb data though.
> 

That can only help if we can control how the fragment page is allocated.

> > +			net_alert_ratelimited(
> > +				"xennet: failed to linearize skb, skb dropped\n");
> > +			goto drop;
> > +		}
> > +		data = skb->data;
> > +		offset = offset_in_page(data);
> > +		len = skb_headlen(skb);
> > +		slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> > +			xennet_count_skb_frag_slots(skb);
> 
> IIRC If you have called skb_linearize then there shouldn't be any fragments.
> 

Plain copy-n-paste from code above. I will pay more attention if it's a
formal patch. ;-)

Wei.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Xen-devel] xen-netfront possibly rides the rocket too often
  2014-05-16 10:17             ` Stefan Bader
@ 2014-05-16 10:32               ` Wei Liu
  0 siblings, 0 replies; 19+ messages in thread
From: Wei Liu @ 2014-05-16 10:32 UTC (permalink / raw)
  To: Stefan Bader; +Cc: Wei Liu, xen-devel, Ian Campbell, Zoltan Kiss, netdev

On Fri, May 16, 2014 at 12:17:48PM +0200, Stefan Bader wrote:
> On 16.05.2014 12:09, Stefan Bader wrote:
> > On 16.05.2014 11:48, Wei Liu wrote:
> >> On Thu, May 15, 2014 at 02:14:00PM +0200, Stefan Bader wrote:
> >> [...]
> >>>> Wei.
> >>>>
> >>> Reading more of the code I would agree. The definition of MAX_SKB_FRAGS (at
> >>> least now with compound pages) cannot be used in any way to derive the number of
> >>> 4k slots a transfer will require.
> >>>
> >>> Zoltan already commented on worst cases. Not sure it would get as bad as that or
> >>> "just" 16*4k frags all in the middle of compound pages. That would then end in
> >>> around 33 or 34 slots, depending on the header.
> >>>
> >>> Zoltan wrote:
> >>>> I think the worst case scenario is when every frag and the linear buffer contains 2 bytes,
> >>>> which are overlapping a page boundary (that's (17+1)*2=36 so far), plus 15 of
> >>> them have a 4k
> >>>> page in the middle of them, so, a 1+4096+1 byte buffer can span over 3 page.
> >>>> That's 51 individual pages.
> >>>
> >>> I cannot claim to really know what to expect worst case. Somewhat I was thinking
> >>> of a
> >>> worst case of (16+1)*2, which would be inconvenient enough.
> >>>
> >>> So without knowing exactly how to do it, but as Ian said it sounds best to come
> >>> up with some sort of exception coalescing in cases the slot count goes over 18
> >>> and we know the data size is below 64K.
> >>>
> >>
> >> I took a stab at it this morning and came up with this patch. Ran
> >> redis-benchmark, it seemed to fix that for me -- only saw one "failed to
> >> linearize skb" during
> >>
> >>   redis-benchmark -h XXX -d 1000 -t lrange
> >>
> >> And before this change, a lot of "rides rocket" were triggered.
> >>
> >> Thought?
> > 
> > It appears at least to me as something that nicely makes use of existing code. I
> > was wondering about what could or could not be used. Trying to get ones head
> > around the whole thing is kind of a lot to look at.
> > 
> > The change at least looks straight forward enough.
> 
> The only woe for me is that I am looking puzzled at the implementation of
> skb_linearize(). Somehow the data_len element decides whether a skb can be
> linearized and basically how much it tries to pull from the tail. It probably
> makes sense ... just not to me with not deep experience here.
> 

Data_len is the size of paged data (that's the size of data in frags).
If it pulls everything from frags to linear area then SKB is linearized.

Wei.

> -Stefan
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-05-16 10:32 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-13 18:21 xen-netfront possibly rides the rocket too often Stefan Bader
2014-05-14 19:49 ` Zoltan Kiss
2014-05-14 20:06   ` Zoltan Kiss
2014-05-15  8:38     ` [Xen-devel] " Sander Eikelenboom
2014-05-15  9:03       ` Stefan Bader
2014-05-15  8:46   ` Ian Campbell
2014-05-15  8:58     ` Stefan Bader
2014-05-15  9:38       ` Sander Eikelenboom
2014-05-15 11:04     ` Wei Liu
2014-05-15 11:14       ` David Laight
2014-05-15 11:47       ` Ian Campbell
2014-05-15 12:14       ` Stefan Bader
2014-05-16  9:48         ` Wei Liu
2014-05-16  9:57           ` Wei Liu
2014-05-16 10:05           ` David Laight
2014-05-16 10:22             ` Wei Liu
2014-05-16 10:09           ` Stefan Bader
2014-05-16 10:17             ` Stefan Bader
2014-05-16 10:32               ` Wei Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).