All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Bundle fixes for xen-netfront/back
@ 2013-03-18 10:35 Wei Liu
  2013-03-18 10:35 ` [PATCH 1/4] xen-netfront: remove unused variable `extra' Wei Liu
                   ` (7 more replies)
  0 siblings, 8 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, annie.li

This series contains 4 patches for Xen netfront and netback modules.

1 and 3 just remove some unused variables, nothing special.

2 is used to drop malformed packets in netfront before sending it to netback.

4 is used to deal with the situation when netfront's MAX_SKB_FRAGS is bigger
than netback's MAX_SKB_FRAGS.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
  2013-03-18 10:35 ` [PATCH 1/4] xen-netfront: remove unused variable `extra' Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 11:42   ` Ian Campbell
  2013-03-18 11:42   ` Ian Campbell
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, annie.li, Wei Liu

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 7ffa43b..5527663 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netfront_info *np = netdev_priv(dev);
 	struct netfront_stats *stats = this_cpu_ptr(np->stats);
 	struct xen_netif_tx_request *tx;
-	struct xen_netif_extra_info *extra;
 	char *data = skb->data;
 	RING_IDX i;
 	grant_ref_t ref;
@@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	tx->gref = np->grant_tx_ref[id] = ref;
 	tx->offset = offset;
 	tx->size = len;
-	extra = NULL;
 
 	tx->flags = 0;
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		gso = (struct xen_netif_extra_info *)
 			RING_GET_REQUEST(&np->tx, ++i);
 
-		if (extra)
-			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
-		else
-			tx->flags |= XEN_NETTXF_extra_info;
+		tx->flags |= XEN_NETTXF_extra_info;
 
 		gso->u.gso.size = skb_shinfo(skb)->gso_size;
 		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
@@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
 		gso->flags = 0;
-		extra = gso;
 	}
 
 	np->tx.req_prod_pvt = i + 1;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 10:35 ` Wei Liu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: annie.li, Wei Liu, ian.campbell, konrad.wilk

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |    8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 7ffa43b..5527663 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netfront_info *np = netdev_priv(dev);
 	struct netfront_stats *stats = this_cpu_ptr(np->stats);
 	struct xen_netif_tx_request *tx;
-	struct xen_netif_extra_info *extra;
 	char *data = skb->data;
 	RING_IDX i;
 	grant_ref_t ref;
@@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	tx->gref = np->grant_tx_ref[id] = ref;
 	tx->offset = offset;
 	tx->size = len;
-	extra = NULL;
 
 	tx->flags = 0;
 	if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		gso = (struct xen_netif_extra_info *)
 			RING_GET_REQUEST(&np->tx, ++i);
 
-		if (extra)
-			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
-		else
-			tx->flags |= XEN_NETTXF_extra_info;
+		tx->flags |= XEN_NETTXF_extra_info;
 
 		gso->u.gso.size = skb_shinfo(skb)->gso_size;
 		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
@@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
 		gso->flags = 0;
-		extra = gso;
 	}
 
 	np->tx.req_prod_pvt = i + 1;
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
  2013-03-18 10:35 ` [PATCH 1/4] xen-netfront: remove unused variable `extra' Wei Liu
  2013-03-18 10:35 ` Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 11:42   ` Ian Campbell
                     ` (8 more replies)
  2013-03-18 10:35 ` Wei Liu
                   ` (4 subsequent siblings)
  7 siblings, 9 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, annie.li, Wei Liu

The `size' field of Xen network wire format is uint16_t, anything bigger than
65535 will cause overflow.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 5527663..8c3d065 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
+	/*
+	 * wire format of xen_netif_tx_request only supports skb->len
+	 * < 64K, because size field in xen_netif_tx_request is
+	 * uint16_t.
+	 */
+	if (unlikely(skb->len > (uint16_t)(~0))) {
+		net_alert_ratelimited(
+			"xennet: skb->len = %d, too big for wire format\n",
+			skb->len);
+		goto drop;
+	}
+
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
                   ` (2 preceding siblings ...)
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 10:35 ` [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page Wei Liu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: annie.li, Wei Liu, ian.campbell, konrad.wilk

The `size' field of Xen network wire format is uint16_t, anything bigger than
65535 will cause overflow.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 5527663..8c3d065 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int len = skb_headlen(skb);
 	unsigned long flags;
 
+	/*
+	 * wire format of xen_netif_tx_request only supports skb->len
+	 * < 64K, because size field in xen_netif_tx_request is
+	 * uint16_t.
+	 */
+	if (unlikely(skb->len > (uint16_t)(~0))) {
+		net_alert_ratelimited(
+			"xennet: skb->len = %d, too big for wire format\n",
+			skb->len);
+		goto drop;
+	}
+
 	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
 		xennet_count_skb_frag_slots(skb);
 	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
                   ` (3 preceding siblings ...)
  2013-03-18 10:35 ` Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 11:37   ` Ian Campbell
  2013-03-18 11:37   ` Ian Campbell
  2013-03-18 10:35 ` Wei Liu
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, annie.li, Wei Liu

This variable is never used.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index da726a3..6e8e51a 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -948,7 +948,6 @@ static int netbk_count_requests(struct xenvif *vif,
 }
 
 static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
-					 struct sk_buff *skb,
 					 u16 pending_idx)
 {
 	struct page *page;
@@ -982,7 +981,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 
 		index = pending_index(netbk->pending_cons++);
 		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(netbk, pending_idx);
 		if (!page)
 			goto err;
 
@@ -1387,7 +1386,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		}
 
 		/* XXX could copy straight to head */
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(netbk, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
 			netbk_tx_err(vif, &txreq, idx);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
                   ` (4 preceding siblings ...)
  2013-03-18 10:35 ` [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 10:35 ` [PATCH 4/4] xen-netback: coalesce slots before copying Wei Liu
  2013-03-18 10:35 ` Wei Liu
  7 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: annie.li, Wei Liu, ian.campbell, konrad.wilk

This variable is never used.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index da726a3..6e8e51a 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -948,7 +948,6 @@ static int netbk_count_requests(struct xenvif *vif,
 }
 
 static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
-					 struct sk_buff *skb,
 					 u16 pending_idx)
 {
 	struct page *page;
@@ -982,7 +981,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 
 		index = pending_index(netbk->pending_cons++);
 		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(netbk, pending_idx);
 		if (!page)
 			goto err;
 
@@ -1387,7 +1386,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		}
 
 		/* XXX could copy straight to head */
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(netbk, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
 			netbk_tx_err(vif, &txreq, idx);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
                   ` (5 preceding siblings ...)
  2013-03-18 10:35 ` Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  2013-03-18 12:07   ` Ian Campbell
                     ` (2 more replies)
  2013-03-18 10:35 ` Wei Liu
  7 siblings, 3 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, annie.li, Wei Liu

This patch tries to coalesce tx requests when constructing grant copy
structures. It enables netback to deal with situation when frontend's
MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.

It defines max_skb_slots, which is a estimation of the maximum number of slots
a guest can send, anything bigger than that is considered malicious. Now it is
set to 20, which should be enough to accommodate Linux (16 to 19) and possibly
Windows (19?).

Also change variable name from "frags" to "slots" in netbk_count_requests.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |  204 ++++++++++++++++++++++++++++---------
 1 file changed, 157 insertions(+), 47 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 6e8e51a..d7bbce9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -47,9 +47,20 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
+/*
+ * This is an estimation of the maximum possible frags a SKB might
+ * have, anything larger than this is considered malicious. Typically
+ * Linux has 16 to 19, Windows has 19(?).
+ */
+#define MAX_SKB_SLOTS_DEFAULT 20
+static unsigned int max_skb_slots = MAX_SKB_SLOTS_DEFAULT;
+module_param(max_skb_slots, uint, 0444);
+
 struct pending_tx_info {
-	struct xen_netif_tx_request req;
+	struct xen_netif_tx_request req; /* coalesced tx request  */
 	struct xenvif *vif;
+	unsigned int nr_tx_req; /* how many tx req we have in a chain (>=1) */
+	unsigned int start_idx; /* starting index of pending ring index */
 };
 typedef unsigned int pending_ring_idx_t;
 
@@ -251,7 +262,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	if (vif->can_sg || vif->gso || vif->gso_prefix)
-		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
+		max += max_skb_slots + 1; /* extra_info + frags */
 
 	return max;
 }
@@ -657,7 +668,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 		__skb_queue_tail(&rxq, skb);
 
 		/* Filled the batch queue? */
-		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
+		if (count + max_skb_slots >= XEN_NETIF_RX_RING_SIZE)
 			break;
 	}
 
@@ -908,34 +919,34 @@ static int netbk_count_requests(struct xenvif *vif,
 				int work_to_do)
 {
 	RING_IDX cons = vif->tx.req_cons;
-	int frags = 0;
+	int slots = 0;
 
 	if (!(first->flags & XEN_NETTXF_more_data))
 		return 0;
 
 	do {
-		if (frags >= work_to_do) {
-			netdev_err(vif->dev, "Need more frags\n");
+		if (slots >= work_to_do) {
+			netdev_err(vif->dev, "Need more slots\n");
 			netbk_fatal_tx_err(vif);
 			return -ENODATA;
 		}
 
-		if (unlikely(frags >= MAX_SKB_FRAGS)) {
-			netdev_err(vif->dev, "Too many frags\n");
+		if (unlikely(slots >= max_skb_slots)) {
+			netdev_err(vif->dev, "Too many slots\n");
 			netbk_fatal_tx_err(vif);
 			return -E2BIG;
 		}
 
-		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
+		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
 		       sizeof(*txp));
 		if (txp->size > first->size) {
-			netdev_err(vif->dev, "Frag is bigger than frame.\n");
+			netdev_err(vif->dev, "Packet is bigger than frame.\n");
 			netbk_fatal_tx_err(vif);
 			return -EIO;
 		}
 
 		first->size -= txp->size;
-		frags++;
+		slots++;
 
 		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
 			netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
@@ -944,7 +955,7 @@ static int netbk_count_requests(struct xenvif *vif,
 			return -EINVAL;
 		}
 	} while ((txp++)->flags & XEN_NETTXF_more_data);
-	return frags;
+	return slots;
 }
 
 static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
@@ -968,48 +979,120 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
 	u16 pending_idx = *((u16 *)skb->data);
-	int i, start;
+	u16 head_idx = 0;
+	int slot, start;
+	struct page *page;
+	pending_ring_idx_t index;
+	uint16_t dst_offset;
+	unsigned int nr_slots;
+	struct pending_tx_info *first = NULL;
+	int nr_txp;
+	unsigned int start_idx = 0;
+
+	/* At this point shinfo->nr_frags is in fact the number of
+	 * slots, which can be as large as max_skb_slots.
+	 */
+	nr_slots = shinfo->nr_frags;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
-	for (i = start; i < shinfo->nr_frags; i++, txp++) {
-		struct page *page;
-		pending_ring_idx_t index;
+	/* Coalesce tx requests, at this point the packet passed in
+	 * should be <= 64K. Any packets larger than 64K has been
+	 * dropped / caused fatal error early on.
+	 */
+	for (shinfo->nr_frags = slot = start; slot < nr_slots;
+	     shinfo->nr_frags++) {
 		struct pending_tx_info *pending_tx_info =
 			netbk->pending_tx_info;
 
-		index = pending_index(netbk->pending_cons++);
-		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, pending_idx);
+		page = alloc_page(GFP_KERNEL|__GFP_COLD);
 		if (!page)
 			goto err;
 
-		gop->source.u.ref = txp->gref;
-		gop->source.domid = vif->domid;
-		gop->source.offset = txp->offset;
-
-		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-		gop->dest.domid = DOMID_SELF;
-		gop->dest.offset = txp->offset;
-
-		gop->len = txp->size;
-		gop->flags = GNTCOPY_source_gref;
+		nr_txp = 0;
+		dst_offset = 0;
+		first = NULL;
+		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
+			gop->flags = GNTCOPY_source_gref;
+
+			gop->source.u.ref = txp->gref;
+			gop->source.domid = vif->domid;
+			gop->source.offset = txp->offset;
+
+			gop->dest.domid = DOMID_SELF;
+
+			gop->dest.offset = dst_offset;
+			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
+
+			if (dst_offset + txp->size > PAGE_SIZE) {
+				/* This page can only merge a portion
+				 * of tx request. Do not increment any
+				 * pointer / counter here. The txp
+				 * will be dealt with in future
+				 * rounds, eventually hitting the
+				 * `else` branch.
+				 */
+				gop->len = PAGE_SIZE - dst_offset;
+				txp->offset += gop->len;
+				txp->size -= gop->len;
+				dst_offset += gop->len; /* quit loop */
+			} else {
+				/* This tx request can be merged in the page */
+				gop->len = txp->size;
+				dst_offset += gop->len;
+
+				index = pending_index(netbk->pending_cons++);
+
+				pending_idx = netbk->pending_ring[index];
+
+				memcpy(&pending_tx_info[pending_idx].req, txp,
+				       sizeof(*txp));
+				xenvif_get(vif);
+
+				pending_tx_info[pending_idx].vif = vif;
+
+				/* Poison these fields, corresponding
+				 * fields for head tx req will be set
+				 * to correct values after the loop.
+				 */
+				pending_tx_info[pending_idx].nr_tx_req =
+					(u16)(~0);
+				netbk->mmap_pages[pending_idx] = (void *)(~0UL);
+				pending_tx_info[pending_idx].start_idx = ~0U;
+
+				if (unlikely(!first)) {
+					first = &pending_tx_info[pending_idx];
+					start_idx = index;
+					head_idx = pending_idx;
+				}
+
+				txp++;
+				nr_txp++;
+				slot++;
+			}
 
-		gop++;
+			gop++;
+		}
 
-		memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp));
-		xenvif_get(vif);
-		pending_tx_info[pending_idx].vif = vif;
-		frag_set_pending_idx(&frags[i], pending_idx);
+		first->req.offset = 0;
+		first->req.size = dst_offset;
+		first->nr_tx_req = nr_txp;
+		first->start_idx = start_idx;
+		set_page_ext(page, netbk, head_idx);
+		netbk->mmap_pages[head_idx] = page;
+		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
 	}
 
+	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
+
 	return gop;
 err:
 	/* Unwind, freeing all pages and sending error responses. */
-	while (i-- > start) {
-		xen_netbk_idx_release(netbk, frag_get_pending_idx(&frags[i]),
-				      XEN_NETIF_RSP_ERROR);
+	while (shinfo->nr_frags-- > start) {
+		xen_netbk_idx_release(netbk,
+				frag_get_pending_idx(&frags[shinfo->nr_frags]),
+				XEN_NETIF_RSP_ERROR);
 	}
 	/* The head too, if necessary. */
 	if (start)
@@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	struct gnttab_copy *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
 
@@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
 	for (i = start; i < nr_frags; i++) {
-		int j, newerr;
+		int j, newerr = 0, n;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
+		tx_info = &netbk->pending_tx_info[pending_idx];
 
 		/* Check error status: if okay then remember grant handle. */
-		newerr = (++gop)->status;
+		for (n = 0; n < tx_info->nr_tx_req; n++) {
+			newerr = (++gop)->status;
+			if (newerr)
+				break;
+		}
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
@@ -1267,11 +1356,11 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 	struct sk_buff *skb;
 	int ret;
 
-	while (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	while (((nr_pending_reqs(netbk) + max_skb_slots) < MAX_PENDING_REQS) &&
 		!list_empty(&netbk->net_schedule_list)) {
 		struct xenvif *vif;
 		struct xen_netif_tx_request txreq;
-		struct xen_netif_tx_request txfrags[MAX_SKB_FRAGS];
+		struct xen_netif_tx_request txfrags[max_skb_slots];
 		struct page *page;
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
 		u16 pending_idx;
@@ -1359,7 +1448,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		pending_idx = netbk->pending_ring[index];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
-			    ret < MAX_SKB_FRAGS) ?
+			    ret < max_skb_slots) ?
 			PKT_PROT_LEN : txreq.size;
 
 		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
@@ -1409,6 +1498,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		memcpy(&netbk->pending_tx_info[pending_idx].req,
 		       &txreq, sizeof(txreq));
 		netbk->pending_tx_info[pending_idx].vif = vif;
+		netbk->pending_tx_info[pending_idx].start_idx = index;
+		netbk->pending_tx_info[pending_idx].nr_tx_req = 1;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1540,6 +1631,11 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
 	struct xenvif *vif;
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
+	unsigned int nr = 0;
+	unsigned int i = 0;
+	unsigned int start_idx = 0;
+
+	BUG_ON(netbk->mmap_pages[pending_idx] == (void *)(~0UL));
 
 	/* Already complete? */
 	if (netbk->mmap_pages[pending_idx] == NULL)
@@ -1548,13 +1644,27 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
 	pending_tx_info = &netbk->pending_tx_info[pending_idx];
 
 	vif = pending_tx_info->vif;
+	nr = pending_tx_info->nr_tx_req;
+	start_idx = pending_tx_info->start_idx;
 
-	make_tx_response(vif, &pending_tx_info->req, status);
+	BUG_ON(nr == (u16)(~0));
 
-	index = pending_index(netbk->pending_prod++);
-	netbk->pending_ring[index] = pending_idx;
+	BUG_ON(netbk->pending_ring[pending_index(start_idx)] != pending_idx);
 
-	xenvif_put(vif);
+	for (i = 0; i < nr; i++) {
+		struct xen_netif_tx_request *txp;
+		unsigned int idx = pending_index(start_idx + i);
+		unsigned int info_idx = netbk->pending_ring[idx];
+
+		pending_tx_info = &netbk->pending_tx_info[info_idx];
+		txp = &pending_tx_info->req;
+		make_tx_response(vif, &pending_tx_info->req, status);
+
+		index = pending_index(netbk->pending_prod++);
+		netbk->pending_ring[index] = netbk->pending_ring[info_idx];
+
+		xenvif_put(vif);
+	}
 
 	netbk->mmap_pages[pending_idx]->mapping = 0;
 	put_page(netbk->mmap_pages[pending_idx]);
@@ -1613,7 +1723,7 @@ static inline int rx_work_todo(struct xen_netbk *netbk)
 static inline int tx_work_todo(struct xen_netbk *netbk)
 {
 
-	if (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	if (((nr_pending_reqs(netbk) + max_skb_slots) < MAX_PENDING_REQS) &&
 			!list_empty(&netbk->net_schedule_list))
 		return 1;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
                   ` (6 preceding siblings ...)
  2013-03-18 10:35 ` [PATCH 4/4] xen-netback: coalesce slots before copying Wei Liu
@ 2013-03-18 10:35 ` Wei Liu
  7 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 10:35 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: annie.li, Wei Liu, ian.campbell, konrad.wilk

This patch tries to coalesce tx requests when constructing grant copy
structures. It enables netback to deal with situation when frontend's
MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.

It defines max_skb_slots, which is a estimation of the maximum number of slots
a guest can send, anything bigger than that is considered malicious. Now it is
set to 20, which should be enough to accommodate Linux (16 to 19) and possibly
Windows (19?).

Also change variable name from "frags" to "slots" in netbk_count_requests.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |  204 ++++++++++++++++++++++++++++---------
 1 file changed, 157 insertions(+), 47 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 6e8e51a..d7bbce9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -47,9 +47,20 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
+/*
+ * This is an estimation of the maximum possible frags a SKB might
+ * have, anything larger than this is considered malicious. Typically
+ * Linux has 16 to 19, Windows has 19(?).
+ */
+#define MAX_SKB_SLOTS_DEFAULT 20
+static unsigned int max_skb_slots = MAX_SKB_SLOTS_DEFAULT;
+module_param(max_skb_slots, uint, 0444);
+
 struct pending_tx_info {
-	struct xen_netif_tx_request req;
+	struct xen_netif_tx_request req; /* coalesced tx request  */
 	struct xenvif *vif;
+	unsigned int nr_tx_req; /* how many tx req we have in a chain (>=1) */
+	unsigned int start_idx; /* starting index of pending ring index */
 };
 typedef unsigned int pending_ring_idx_t;
 
@@ -251,7 +262,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
 	if (vif->can_sg || vif->gso || vif->gso_prefix)
-		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
+		max += max_skb_slots + 1; /* extra_info + frags */
 
 	return max;
 }
@@ -657,7 +668,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 		__skb_queue_tail(&rxq, skb);
 
 		/* Filled the batch queue? */
-		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
+		if (count + max_skb_slots >= XEN_NETIF_RX_RING_SIZE)
 			break;
 	}
 
@@ -908,34 +919,34 @@ static int netbk_count_requests(struct xenvif *vif,
 				int work_to_do)
 {
 	RING_IDX cons = vif->tx.req_cons;
-	int frags = 0;
+	int slots = 0;
 
 	if (!(first->flags & XEN_NETTXF_more_data))
 		return 0;
 
 	do {
-		if (frags >= work_to_do) {
-			netdev_err(vif->dev, "Need more frags\n");
+		if (slots >= work_to_do) {
+			netdev_err(vif->dev, "Need more slots\n");
 			netbk_fatal_tx_err(vif);
 			return -ENODATA;
 		}
 
-		if (unlikely(frags >= MAX_SKB_FRAGS)) {
-			netdev_err(vif->dev, "Too many frags\n");
+		if (unlikely(slots >= max_skb_slots)) {
+			netdev_err(vif->dev, "Too many slots\n");
 			netbk_fatal_tx_err(vif);
 			return -E2BIG;
 		}
 
-		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
+		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
 		       sizeof(*txp));
 		if (txp->size > first->size) {
-			netdev_err(vif->dev, "Frag is bigger than frame.\n");
+			netdev_err(vif->dev, "Packet is bigger than frame.\n");
 			netbk_fatal_tx_err(vif);
 			return -EIO;
 		}
 
 		first->size -= txp->size;
-		frags++;
+		slots++;
 
 		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
 			netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
@@ -944,7 +955,7 @@ static int netbk_count_requests(struct xenvif *vif,
 			return -EINVAL;
 		}
 	} while ((txp++)->flags & XEN_NETTXF_more_data);
-	return frags;
+	return slots;
 }
 
 static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
@@ -968,48 +979,120 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
 	u16 pending_idx = *((u16 *)skb->data);
-	int i, start;
+	u16 head_idx = 0;
+	int slot, start;
+	struct page *page;
+	pending_ring_idx_t index;
+	uint16_t dst_offset;
+	unsigned int nr_slots;
+	struct pending_tx_info *first = NULL;
+	int nr_txp;
+	unsigned int start_idx = 0;
+
+	/* At this point shinfo->nr_frags is in fact the number of
+	 * slots, which can be as large as max_skb_slots.
+	 */
+	nr_slots = shinfo->nr_frags;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
-	for (i = start; i < shinfo->nr_frags; i++, txp++) {
-		struct page *page;
-		pending_ring_idx_t index;
+	/* Coalesce tx requests, at this point the packet passed in
+	 * should be <= 64K. Any packets larger than 64K has been
+	 * dropped / caused fatal error early on.
+	 */
+	for (shinfo->nr_frags = slot = start; slot < nr_slots;
+	     shinfo->nr_frags++) {
 		struct pending_tx_info *pending_tx_info =
 			netbk->pending_tx_info;
 
-		index = pending_index(netbk->pending_cons++);
-		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, pending_idx);
+		page = alloc_page(GFP_KERNEL|__GFP_COLD);
 		if (!page)
 			goto err;
 
-		gop->source.u.ref = txp->gref;
-		gop->source.domid = vif->domid;
-		gop->source.offset = txp->offset;
-
-		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-		gop->dest.domid = DOMID_SELF;
-		gop->dest.offset = txp->offset;
-
-		gop->len = txp->size;
-		gop->flags = GNTCOPY_source_gref;
+		nr_txp = 0;
+		dst_offset = 0;
+		first = NULL;
+		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
+			gop->flags = GNTCOPY_source_gref;
+
+			gop->source.u.ref = txp->gref;
+			gop->source.domid = vif->domid;
+			gop->source.offset = txp->offset;
+
+			gop->dest.domid = DOMID_SELF;
+
+			gop->dest.offset = dst_offset;
+			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
+
+			if (dst_offset + txp->size > PAGE_SIZE) {
+				/* This page can only merge a portion
+				 * of tx request. Do not increment any
+				 * pointer / counter here. The txp
+				 * will be dealt with in future
+				 * rounds, eventually hitting the
+				 * `else` branch.
+				 */
+				gop->len = PAGE_SIZE - dst_offset;
+				txp->offset += gop->len;
+				txp->size -= gop->len;
+				dst_offset += gop->len; /* quit loop */
+			} else {
+				/* This tx request can be merged in the page */
+				gop->len = txp->size;
+				dst_offset += gop->len;
+
+				index = pending_index(netbk->pending_cons++);
+
+				pending_idx = netbk->pending_ring[index];
+
+				memcpy(&pending_tx_info[pending_idx].req, txp,
+				       sizeof(*txp));
+				xenvif_get(vif);
+
+				pending_tx_info[pending_idx].vif = vif;
+
+				/* Poison these fields, corresponding
+				 * fields for head tx req will be set
+				 * to correct values after the loop.
+				 */
+				pending_tx_info[pending_idx].nr_tx_req =
+					(u16)(~0);
+				netbk->mmap_pages[pending_idx] = (void *)(~0UL);
+				pending_tx_info[pending_idx].start_idx = ~0U;
+
+				if (unlikely(!first)) {
+					first = &pending_tx_info[pending_idx];
+					start_idx = index;
+					head_idx = pending_idx;
+				}
+
+				txp++;
+				nr_txp++;
+				slot++;
+			}
 
-		gop++;
+			gop++;
+		}
 
-		memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp));
-		xenvif_get(vif);
-		pending_tx_info[pending_idx].vif = vif;
-		frag_set_pending_idx(&frags[i], pending_idx);
+		first->req.offset = 0;
+		first->req.size = dst_offset;
+		first->nr_tx_req = nr_txp;
+		first->start_idx = start_idx;
+		set_page_ext(page, netbk, head_idx);
+		netbk->mmap_pages[head_idx] = page;
+		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
 	}
 
+	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
+
 	return gop;
 err:
 	/* Unwind, freeing all pages and sending error responses. */
-	while (i-- > start) {
-		xen_netbk_idx_release(netbk, frag_get_pending_idx(&frags[i]),
-				      XEN_NETIF_RSP_ERROR);
+	while (shinfo->nr_frags-- > start) {
+		xen_netbk_idx_release(netbk,
+				frag_get_pending_idx(&frags[shinfo->nr_frags]),
+				XEN_NETIF_RSP_ERROR);
 	}
 	/* The head too, if necessary. */
 	if (start)
@@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	struct gnttab_copy *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
+	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
 
@@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
 	for (i = start; i < nr_frags; i++) {
-		int j, newerr;
+		int j, newerr = 0, n;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
+		tx_info = &netbk->pending_tx_info[pending_idx];
 
 		/* Check error status: if okay then remember grant handle. */
-		newerr = (++gop)->status;
+		for (n = 0; n < tx_info->nr_tx_req; n++) {
+			newerr = (++gop)->status;
+			if (newerr)
+				break;
+		}
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
@@ -1267,11 +1356,11 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 	struct sk_buff *skb;
 	int ret;
 
-	while (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	while (((nr_pending_reqs(netbk) + max_skb_slots) < MAX_PENDING_REQS) &&
 		!list_empty(&netbk->net_schedule_list)) {
 		struct xenvif *vif;
 		struct xen_netif_tx_request txreq;
-		struct xen_netif_tx_request txfrags[MAX_SKB_FRAGS];
+		struct xen_netif_tx_request txfrags[max_skb_slots];
 		struct page *page;
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
 		u16 pending_idx;
@@ -1359,7 +1448,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		pending_idx = netbk->pending_ring[index];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
-			    ret < MAX_SKB_FRAGS) ?
+			    ret < max_skb_slots) ?
 			PKT_PROT_LEN : txreq.size;
 
 		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
@@ -1409,6 +1498,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		memcpy(&netbk->pending_tx_info[pending_idx].req,
 		       &txreq, sizeof(txreq));
 		netbk->pending_tx_info[pending_idx].vif = vif;
+		netbk->pending_tx_info[pending_idx].start_idx = index;
+		netbk->pending_tx_info[pending_idx].nr_tx_req = 1;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1540,6 +1631,11 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
 	struct xenvif *vif;
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
+	unsigned int nr = 0;
+	unsigned int i = 0;
+	unsigned int start_idx = 0;
+
+	BUG_ON(netbk->mmap_pages[pending_idx] == (void *)(~0UL));
 
 	/* Already complete? */
 	if (netbk->mmap_pages[pending_idx] == NULL)
@@ -1548,13 +1644,27 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx,
 	pending_tx_info = &netbk->pending_tx_info[pending_idx];
 
 	vif = pending_tx_info->vif;
+	nr = pending_tx_info->nr_tx_req;
+	start_idx = pending_tx_info->start_idx;
 
-	make_tx_response(vif, &pending_tx_info->req, status);
+	BUG_ON(nr == (u16)(~0));
 
-	index = pending_index(netbk->pending_prod++);
-	netbk->pending_ring[index] = pending_idx;
+	BUG_ON(netbk->pending_ring[pending_index(start_idx)] != pending_idx);
 
-	xenvif_put(vif);
+	for (i = 0; i < nr; i++) {
+		struct xen_netif_tx_request *txp;
+		unsigned int idx = pending_index(start_idx + i);
+		unsigned int info_idx = netbk->pending_ring[idx];
+
+		pending_tx_info = &netbk->pending_tx_info[info_idx];
+		txp = &pending_tx_info->req;
+		make_tx_response(vif, &pending_tx_info->req, status);
+
+		index = pending_index(netbk->pending_prod++);
+		netbk->pending_ring[index] = netbk->pending_ring[info_idx];
+
+		xenvif_put(vif);
+	}
 
 	netbk->mmap_pages[pending_idx]->mapping = 0;
 	put_page(netbk->mmap_pages[pending_idx]);
@@ -1613,7 +1723,7 @@ static inline int rx_work_todo(struct xen_netbk *netbk)
 static inline int tx_work_todo(struct xen_netbk *netbk)
 {
 
-	if (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
+	if (((nr_pending_reqs(netbk) + max_skb_slots) < MAX_PENDING_REQS) &&
 			!list_empty(&netbk->net_schedule_list))
 		return 1;
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page
  2013-03-18 10:35 ` [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page Wei Liu
  2013-03-18 11:37   ` Ian Campbell
@ 2013-03-18 11:37   ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 11:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> This variable is never used.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>  drivers/net/xen-netback/netback.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index da726a3..6e8e51a 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -948,7 +948,6 @@ static int netbk_count_requests(struct xenvif *vif,
>  }
>  
>  static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
> -					 struct sk_buff *skb,
>  					 u16 pending_idx)
>  {
>  	struct page *page;
> @@ -982,7 +981,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
>  
>  		index = pending_index(netbk->pending_cons++);
>  		pending_idx = netbk->pending_ring[index];
> -		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
> +		page = xen_netbk_alloc_page(netbk, pending_idx);
>  		if (!page)
>  			goto err;
>  
> @@ -1387,7 +1386,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
>  		}
>  
>  		/* XXX could copy straight to head */
> -		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
> +		page = xen_netbk_alloc_page(netbk, pending_idx);
>  		if (!page) {
>  			kfree_skb(skb);
>  			netbk_tx_err(vif, &txreq, idx);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page
  2013-03-18 10:35 ` [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page Wei Liu
@ 2013-03-18 11:37   ` Ian Campbell
  2013-03-18 11:37   ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 11:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> This variable is never used.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>

Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
>  drivers/net/xen-netback/netback.c |    5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index da726a3..6e8e51a 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -948,7 +948,6 @@ static int netbk_count_requests(struct xenvif *vif,
>  }
>  
>  static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
> -					 struct sk_buff *skb,
>  					 u16 pending_idx)
>  {
>  	struct page *page;
> @@ -982,7 +981,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
>  
>  		index = pending_index(netbk->pending_cons++);
>  		pending_idx = netbk->pending_ring[index];
> -		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
> +		page = xen_netbk_alloc_page(netbk, pending_idx);
>  		if (!page)
>  			goto err;
>  
> @@ -1387,7 +1386,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
>  		}
>  
>  		/* XXX could copy straight to head */
> -		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
> +		page = xen_netbk_alloc_page(netbk, pending_idx);
>  		if (!page) {
>  			kfree_skb(skb);
>  			netbk_tx_err(vif, &txreq, idx);

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 10:35 ` Wei Liu
  2013-03-18 11:42   ` Ian Campbell
@ 2013-03-18 11:42   ` Ian Campbell
  2013-03-18 12:04     ` Wei Liu
  2013-03-18 12:04     ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 11:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:

I think a few more words are needed here since from the code you are
removing it seems very much like gso is used for something. If you have
a proof that the "extra = gso" case is never hit then please explain it.
Perhaps a reference to the removal of the last user?

Or maybe it is the case that it should be used and the bug is that it
isn't?

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |    8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 7ffa43b..5527663 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct netfront_info *np = netdev_priv(dev);
>  	struct netfront_stats *stats = this_cpu_ptr(np->stats);
>  	struct xen_netif_tx_request *tx;
> -	struct xen_netif_extra_info *extra;
>  	char *data = skb->data;
>  	RING_IDX i;
>  	grant_ref_t ref;
> @@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	tx->gref = np->grant_tx_ref[id] = ref;
>  	tx->offset = offset;
>  	tx->size = len;
> -	extra = NULL;
>  
>  	tx->flags = 0;
>  	if (skb->ip_summed == CHECKSUM_PARTIAL)
> @@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		gso = (struct xen_netif_extra_info *)
>  			RING_GET_REQUEST(&np->tx, ++i);
>  
> -		if (extra)
> -			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
> -		else
> -			tx->flags |= XEN_NETTXF_extra_info;
> +		tx->flags |= XEN_NETTXF_extra_info;
>  
>  		gso->u.gso.size = skb_shinfo(skb)->gso_size;
>  		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> @@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
>  		gso->flags = 0;
> -		extra = gso;
>  	}
>  
>  	np->tx.req_prod_pvt = i + 1;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 10:35 ` Wei Liu
@ 2013-03-18 11:42   ` Ian Campbell
  2013-03-18 11:42   ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 11:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:

I think a few more words are needed here since from the code you are
removing it seems very much like gso is used for something. If you have
a proof that the "extra = gso" case is never hit then please explain it.
Perhaps a reference to the removal of the last user?

Or maybe it is the case that it should be used and the bug is that it
isn't?

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |    8 +-------
>  1 file changed, 1 insertion(+), 7 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 7ffa43b..5527663 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct netfront_info *np = netdev_priv(dev);
>  	struct netfront_stats *stats = this_cpu_ptr(np->stats);
>  	struct xen_netif_tx_request *tx;
> -	struct xen_netif_extra_info *extra;
>  	char *data = skb->data;
>  	RING_IDX i;
>  	grant_ref_t ref;
> @@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	tx->gref = np->grant_tx_ref[id] = ref;
>  	tx->offset = offset;
>  	tx->size = len;
> -	extra = NULL;
>  
>  	tx->flags = 0;
>  	if (skb->ip_summed == CHECKSUM_PARTIAL)
> @@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		gso = (struct xen_netif_extra_info *)
>  			RING_GET_REQUEST(&np->tx, ++i);
>  
> -		if (extra)
> -			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
> -		else
> -			tx->flags |= XEN_NETTXF_extra_info;
> +		tx->flags |= XEN_NETTXF_extra_info;
>  
>  		gso->u.gso.size = skb_shinfo(skb)->gso_size;
>  		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> @@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
>  		gso->flags = 0;
> -		extra = gso;
>  	}
>  
>  	np->tx.req_prod_pvt = i + 1;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
@ 2013-03-18 11:42   ` Ian Campbell
  2013-03-18 14:40     ` Wei Liu
  2013-03-18 14:40     ` Wei Liu
  2013-03-18 11:42   ` Ian Campbell
                     ` (7 subsequent siblings)
  8 siblings, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 11:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.

Is there some field we can set e.g. in struct ethernet_device which
would stop this from happening?


> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
  2013-03-18 11:42   ` Ian Campbell
@ 2013-03-18 11:42   ` Ian Campbell
  2013-03-18 13:44   ` Konrad Rzeszutek Wilk
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 11:42 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.

Is there some field we can set e.g. in struct ethernet_device which
would stop this from happening?


> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 11:42   ` Ian Campbell
@ 2013-03-18 12:04     ` Wei Liu
  2013-03-18 12:14       ` Ian Campbell
  2013-03-18 12:14       ` Ian Campbell
  2013-03-18 12:04     ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 12:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> 
> I think a few more words are needed here since from the code you are
> removing it seems very much like gso is used for something. If you have
> a proof that the "extra = gso" case is never hit then please explain it.
> Perhaps a reference to the removal of the last user?
> 
> Or maybe it is the case that it should be used and the bug is that it
> isn't?
> 

Looks like the latter one. 'extra' field should  be used to get hold of
the last extra info in the ring. ;-)

But, the only extra info in upstream kernel is XEN_NETIF_EXTRA_TYPE_GSO,
so there's really no other extra info in the ring at that point. Could
it be possible that it is something from classic Xen kernel?


Wei.

> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |    8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 7ffa43b..5527663 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	struct netfront_info *np = netdev_priv(dev);
> >  	struct netfront_stats *stats = this_cpu_ptr(np->stats);
> >  	struct xen_netif_tx_request *tx;
> > -	struct xen_netif_extra_info *extra;
> >  	char *data = skb->data;
> >  	RING_IDX i;
> >  	grant_ref_t ref;
> > @@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	tx->gref = np->grant_tx_ref[id] = ref;
> >  	tx->offset = offset;
> >  	tx->size = len;
> > -	extra = NULL;
> >  
> >  	tx->flags = 0;
> >  	if (skb->ip_summed == CHECKSUM_PARTIAL)
> > @@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  		gso = (struct xen_netif_extra_info *)
> >  			RING_GET_REQUEST(&np->tx, ++i);
> >  
> > -		if (extra)
> > -			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
> > -		else
> > -			tx->flags |= XEN_NETTXF_extra_info;
> > +		tx->flags |= XEN_NETTXF_extra_info;
> >  
> >  		gso->u.gso.size = skb_shinfo(skb)->gso_size;
> >  		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> > @@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >  		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
> >  		gso->flags = 0;
> > -		extra = gso;
> >  	}
> >  
> >  	np->tx.req_prod_pvt = i + 1;
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 11:42   ` Ian Campbell
  2013-03-18 12:04     ` Wei Liu
@ 2013-03-18 12:04     ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 12:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, wei.liu2, xen-devel

On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> 
> I think a few more words are needed here since from the code you are
> removing it seems very much like gso is used for something. If you have
> a proof that the "extra = gso" case is never hit then please explain it.
> Perhaps a reference to the removal of the last user?
> 
> Or maybe it is the case that it should be used and the bug is that it
> isn't?
> 

Looks like the latter one. 'extra' field should  be used to get hold of
the last extra info in the ring. ;-)

But, the only extra info in upstream kernel is XEN_NETIF_EXTRA_TYPE_GSO,
so there's really no other extra info in the ring at that point. Could
it be possible that it is something from classic Xen kernel?


Wei.

> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |    8 +-------
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 7ffa43b..5527663 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -537,7 +537,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	struct netfront_info *np = netdev_priv(dev);
> >  	struct netfront_stats *stats = this_cpu_ptr(np->stats);
> >  	struct xen_netif_tx_request *tx;
> > -	struct xen_netif_extra_info *extra;
> >  	char *data = skb->data;
> >  	RING_IDX i;
> >  	grant_ref_t ref;
> > @@ -581,7 +580,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	tx->gref = np->grant_tx_ref[id] = ref;
> >  	tx->offset = offset;
> >  	tx->size = len;
> > -	extra = NULL;
> >  
> >  	tx->flags = 0;
> >  	if (skb->ip_summed == CHECKSUM_PARTIAL)
> > @@ -597,10 +595,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  		gso = (struct xen_netif_extra_info *)
> >  			RING_GET_REQUEST(&np->tx, ++i);
> >  
> > -		if (extra)
> > -			extra->flags |= XEN_NETIF_EXTRA_FLAG_MORE;
> > -		else
> > -			tx->flags |= XEN_NETTXF_extra_info;
> > +		tx->flags |= XEN_NETTXF_extra_info;
> >  
> >  		gso->u.gso.size = skb_shinfo(skb)->gso_size;
> >  		gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
> > @@ -609,7 +604,6 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  
> >  		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
> >  		gso->flags = 0;
> > -		extra = gso;
> >  	}
> >  
> >  	np->tx.req_prod_pvt = i + 1;
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 10:35 ` [PATCH 4/4] xen-netback: coalesce slots before copying Wei Liu
  2013-03-18 12:07   ` Ian Campbell
@ 2013-03-18 12:07   ` Ian Campbell
  2013-03-21 18:37     ` Wei Liu
  2013-03-21 18:37     ` Wei Liu
  2013-03-18 13:09   ` James Harper
  2 siblings, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 12:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> This patch tries to coalesce tx requests when constructing grant copy
> structures. It enables netback to deal with situation when frontend's
> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> 
> It defines max_skb_slots, which is a estimation of the maximum number of slots
> a guest can send, anything bigger than that is considered malicious. Now it is
> set to 20, which should be enough to accommodate Linux (16 to 19) and possibly
> Windows (19?).
> 
> Also change variable name from "frags" to "slots" in netbk_count_requests.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/netback.c |  204 ++++++++++++++++++++++++++++---------
>  1 file changed, 157 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 6e8e51a..d7bbce9 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -47,9 +47,20 @@
>  #include <asm/xen/hypercall.h>
>  #include <asm/xen/page.h>
>  
> +/*
> + * This is an estimation of the maximum possible frags a SKB might
> + * have, anything larger than this is considered malicious. Typically
> + * Linux has 16 to 19, Windows has 19(?).
> + */
> +#define MAX_SKB_SLOTS_DEFAULT 20
> +static unsigned int max_skb_slots = MAX_SKB_SLOTS_DEFAULT;
> +module_param(max_skb_slots, uint, 0444);
> +
>  struct pending_tx_info {
> -	struct xen_netif_tx_request req;
> +	struct xen_netif_tx_request req; /* coalesced tx request  */
>  	struct xenvif *vif;
> +	unsigned int nr_tx_req; /* how many tx req we have in a chain (>=1) */
> +	unsigned int start_idx; /* starting index of pending ring index */

This one should be a RING_IDX I think, not an unsigned int.

>  };
>  typedef unsigned int pending_ring_idx_t;
>  
> @@ -251,7 +262,7 @@ static int max_required_rx_slots(struct xenvif *vif)
>  	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
>  
>  	if (vif->can_sg || vif->gso || vif->gso_prefix)
> -		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
> +		max += max_skb_slots + 1; /* extra_info + frags */
>  
>  	return max;
>  }
> @@ -657,7 +668,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
>  		__skb_queue_tail(&rxq, skb);
>  
>  		/* Filled the batch queue? */
> -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> +		if (count + max_skb_slots >= XEN_NETIF_RX_RING_SIZE)
>  			break;
>  	}
>  
> @@ -908,34 +919,34 @@ static int netbk_count_requests(struct xenvif *vif,
>  				int work_to_do)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
> -	int frags = 0;
> +	int slots = 0;
>  
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
>  
>  	do {
> -		if (frags >= work_to_do) {
> -			netdev_err(vif->dev, "Need more frags\n");
> +		if (slots >= work_to_do) {
> +			netdev_err(vif->dev, "Need more slots\n");
>  			netbk_fatal_tx_err(vif);
>  			return -ENODATA;
>  		}
>  
> -		if (unlikely(frags >= MAX_SKB_FRAGS)) {
> -			netdev_err(vif->dev, "Too many frags\n");
> +		if (unlikely(slots >= max_skb_slots)) {
> +			netdev_err(vif->dev, "Too many slots\n");
>  			netbk_fatal_tx_err(vif);
>  			return -E2BIG;
>  		}
>  
> -		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
> +		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
>  		       sizeof(*txp));
>  		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> +			netdev_err(vif->dev, "Packet is bigger than frame.\n");
>  			netbk_fatal_tx_err(vif);
>  			return -EIO;
>  		}
>  
>  		first->size -= txp->size;
> -		frags++;
> +		slots++;
>  
>  		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
>  			netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
> @@ -944,7 +955,7 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> -	return frags;
> +	return slots;
>  }
>  
>  static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
> @@ -968,48 +979,120 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
>  	skb_frag_t *frags = shinfo->frags;
>  	u16 pending_idx = *((u16 *)skb->data);
> -	int i, start;
> +	u16 head_idx = 0;
> +	int slot, start;
> +	struct page *page;
> +	pending_ring_idx_t index;
> +	uint16_t dst_offset;
> +	unsigned int nr_slots;
> +	struct pending_tx_info *first = NULL;
> +	int nr_txp;
> +	unsigned int start_idx = 0;
> +
> +	/* At this point shinfo->nr_frags is in fact the number of
> +	 * slots, which can be as large as max_skb_slots.
> +	 */
> +	nr_slots = shinfo->nr_frags;
>  
>  	/* Skip first skb fragment if it is on same page as header fragment. */
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
> -	for (i = start; i < shinfo->nr_frags; i++, txp++) {
> -		struct page *page;
> -		pending_ring_idx_t index;
> +	/* Coalesce tx requests, at this point the packet passed in
> +	 * should be <= 64K. Any packets larger than 64K has been
> +	 * dropped / caused fatal error early on.

Whereabouts is this? Since the size field is u16 how do we even detect
this case. Since (at least prior to your other fix in this series) it
would have overflowed when the guest constructed the request.


> @@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
>  	struct gnttab_copy *gop = *gopp;
>  	u16 pending_idx = *((u16 *)skb->data);
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
> +	struct pending_tx_info *tx_info;
>  	int nr_frags = shinfo->nr_frags;
>  	int i, err, start;
>  
> @@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
>  	for (i = start; i < nr_frags; i++) {
> -		int j, newerr;
> +		int j, newerr = 0, n;
>  
>  		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
> +		tx_info = &netbk->pending_tx_info[pending_idx];
>  
>  		/* Check error status: if okay then remember grant handle. */
> -		newerr = (++gop)->status;
> +		for (n = 0; n < tx_info->nr_tx_req; n++) {
struct pending_tx_info is used in some arrays which can have a fair few
elements so if there are ways to reduce the size that is worth
considering I think.

So rather than storing both nr_tx_req and start_idx can we just store
start_idx and loop while start_idx != 0 (where the first one has
start_idx == zero)?

This might fall out more naturally if you were to instead store next_idx
in each pending tx with a suitable terminator at the end? Or could be
last_idx if it is convenient to count that way round, you don't need to
respond in-order.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 10:35 ` [PATCH 4/4] xen-netback: coalesce slots before copying Wei Liu
@ 2013-03-18 12:07   ` Ian Campbell
  2013-03-18 12:07   ` Ian Campbell
  2013-03-18 13:09   ` James Harper
  2 siblings, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 12:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> This patch tries to coalesce tx requests when constructing grant copy
> structures. It enables netback to deal with situation when frontend's
> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> 
> It defines max_skb_slots, which is a estimation of the maximum number of slots
> a guest can send, anything bigger than that is considered malicious. Now it is
> set to 20, which should be enough to accommodate Linux (16 to 19) and possibly
> Windows (19?).
> 
> Also change variable name from "frags" to "slots" in netbk_count_requests.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/netback.c |  204 ++++++++++++++++++++++++++++---------
>  1 file changed, 157 insertions(+), 47 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 6e8e51a..d7bbce9 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -47,9 +47,20 @@
>  #include <asm/xen/hypercall.h>
>  #include <asm/xen/page.h>
>  
> +/*
> + * This is an estimation of the maximum possible frags a SKB might
> + * have, anything larger than this is considered malicious. Typically
> + * Linux has 16 to 19, Windows has 19(?).
> + */
> +#define MAX_SKB_SLOTS_DEFAULT 20
> +static unsigned int max_skb_slots = MAX_SKB_SLOTS_DEFAULT;
> +module_param(max_skb_slots, uint, 0444);
> +
>  struct pending_tx_info {
> -	struct xen_netif_tx_request req;
> +	struct xen_netif_tx_request req; /* coalesced tx request  */
>  	struct xenvif *vif;
> +	unsigned int nr_tx_req; /* how many tx req we have in a chain (>=1) */
> +	unsigned int start_idx; /* starting index of pending ring index */

This one should be a RING_IDX I think, not an unsigned int.

>  };
>  typedef unsigned int pending_ring_idx_t;
>  
> @@ -251,7 +262,7 @@ static int max_required_rx_slots(struct xenvif *vif)
>  	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
>  
>  	if (vif->can_sg || vif->gso || vif->gso_prefix)
> -		max += MAX_SKB_FRAGS + 1; /* extra_info + frags */
> +		max += max_skb_slots + 1; /* extra_info + frags */
>  
>  	return max;
>  }
> @@ -657,7 +668,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
>  		__skb_queue_tail(&rxq, skb);
>  
>  		/* Filled the batch queue? */
> -		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
> +		if (count + max_skb_slots >= XEN_NETIF_RX_RING_SIZE)
>  			break;
>  	}
>  
> @@ -908,34 +919,34 @@ static int netbk_count_requests(struct xenvif *vif,
>  				int work_to_do)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
> -	int frags = 0;
> +	int slots = 0;
>  
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
>  
>  	do {
> -		if (frags >= work_to_do) {
> -			netdev_err(vif->dev, "Need more frags\n");
> +		if (slots >= work_to_do) {
> +			netdev_err(vif->dev, "Need more slots\n");
>  			netbk_fatal_tx_err(vif);
>  			return -ENODATA;
>  		}
>  
> -		if (unlikely(frags >= MAX_SKB_FRAGS)) {
> -			netdev_err(vif->dev, "Too many frags\n");
> +		if (unlikely(slots >= max_skb_slots)) {
> +			netdev_err(vif->dev, "Too many slots\n");
>  			netbk_fatal_tx_err(vif);
>  			return -E2BIG;
>  		}
>  
> -		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
> +		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + slots),
>  		       sizeof(*txp));
>  		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> +			netdev_err(vif->dev, "Packet is bigger than frame.\n");
>  			netbk_fatal_tx_err(vif);
>  			return -EIO;
>  		}
>  
>  		first->size -= txp->size;
> -		frags++;
> +		slots++;
>  
>  		if (unlikely((txp->offset + txp->size) > PAGE_SIZE)) {
>  			netdev_err(vif->dev, "txp->offset: %x, size: %u\n",
> @@ -944,7 +955,7 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> -	return frags;
> +	return slots;
>  }
>  
>  static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
> @@ -968,48 +979,120 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
>  	skb_frag_t *frags = shinfo->frags;
>  	u16 pending_idx = *((u16 *)skb->data);
> -	int i, start;
> +	u16 head_idx = 0;
> +	int slot, start;
> +	struct page *page;
> +	pending_ring_idx_t index;
> +	uint16_t dst_offset;
> +	unsigned int nr_slots;
> +	struct pending_tx_info *first = NULL;
> +	int nr_txp;
> +	unsigned int start_idx = 0;
> +
> +	/* At this point shinfo->nr_frags is in fact the number of
> +	 * slots, which can be as large as max_skb_slots.
> +	 */
> +	nr_slots = shinfo->nr_frags;
>  
>  	/* Skip first skb fragment if it is on same page as header fragment. */
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
> -	for (i = start; i < shinfo->nr_frags; i++, txp++) {
> -		struct page *page;
> -		pending_ring_idx_t index;
> +	/* Coalesce tx requests, at this point the packet passed in
> +	 * should be <= 64K. Any packets larger than 64K has been
> +	 * dropped / caused fatal error early on.

Whereabouts is this? Since the size field is u16 how do we even detect
this case. Since (at least prior to your other fix in this series) it
would have overflowed when the guest constructed the request.


> @@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
>  	struct gnttab_copy *gop = *gopp;
>  	u16 pending_idx = *((u16 *)skb->data);
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
> +	struct pending_tx_info *tx_info;
>  	int nr_frags = shinfo->nr_frags;
>  	int i, err, start;
>  
> @@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
>  	for (i = start; i < nr_frags; i++) {
> -		int j, newerr;
> +		int j, newerr = 0, n;
>  
>  		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
> +		tx_info = &netbk->pending_tx_info[pending_idx];
>  
>  		/* Check error status: if okay then remember grant handle. */
> -		newerr = (++gop)->status;
> +		for (n = 0; n < tx_info->nr_tx_req; n++) {
struct pending_tx_info is used in some arrays which can have a fair few
elements so if there are ways to reduce the size that is worth
considering I think.

So rather than storing both nr_tx_req and start_idx can we just store
start_idx and loop while start_idx != 0 (where the first one has
start_idx == zero)?

This might fall out more naturally if you were to instead store next_idx
in each pending tx with a suitable terminator at the end? Or could be
last_idx if it is convenient to count that way round, you don't need to
respond in-order.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 12:04     ` Wei Liu
  2013-03-18 12:14       ` Ian Campbell
@ 2013-03-18 12:14       ` Ian Campbell
  2013-03-19  2:39         ` annie li
  2013-03-19  2:39         ` annie li
  1 sibling, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 12:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > 
> > I think a few more words are needed here since from the code you are
> > removing it seems very much like gso is used for something. If you have
> > a proof that the "extra = gso" case is never hit then please explain it.
> > Perhaps a reference to the removal of the last user?
> > 
> > Or maybe it is the case that it should be used and the bug is that it
> > isn't?
> > 
> 
> Looks like the latter one. 'extra' field should  be used to get hold of
> the last extra info in the ring. ;-)
> 
> But, the only extra info in upstream kernel is XEN_NETIF_EXTRA_TYPE_GSO,
> so there's really no other extra info in the ring at that point. Could
> it be possible that it is something from classic Xen kernel?

The classic kernel netfront has exactly the same code it seems and
netif_extra_type_gso is the only one I've ever heard of.

Maybe this extra thing is just redundant unless/until a second extra
comes along.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 12:04     ` Wei Liu
@ 2013-03-18 12:14       ` Ian Campbell
  2013-03-18 12:14       ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 12:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > 
> > I think a few more words are needed here since from the code you are
> > removing it seems very much like gso is used for something. If you have
> > a proof that the "extra = gso" case is never hit then please explain it.
> > Perhaps a reference to the removal of the last user?
> > 
> > Or maybe it is the case that it should be used and the bug is that it
> > isn't?
> > 
> 
> Looks like the latter one. 'extra' field should  be used to get hold of
> the last extra info in the ring. ;-)
> 
> But, the only extra info in upstream kernel is XEN_NETIF_EXTRA_TYPE_GSO,
> so there's really no other extra info in the ring at that point. Could
> it be possible that it is something from classic Xen kernel?

The classic kernel netfront has exactly the same code it seems and
netif_extra_type_gso is the only one I've ever heard of.

Maybe this extra thing is just redundant unless/until a second extra
comes along.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 10:35 ` [PATCH 4/4] xen-netback: coalesce slots before copying Wei Liu
  2013-03-18 12:07   ` Ian Campbell
  2013-03-18 12:07   ` Ian Campbell
@ 2013-03-18 13:09   ` James Harper
  2013-03-18 13:27     ` James Harper
  2 siblings, 1 reply; 97+ messages in thread
From: James Harper @ 2013-03-18 13:09 UTC (permalink / raw)
  To: Wei Liu, netdev, xen-devel; +Cc: annie.li, ian.campbell, konrad.wilk

> 
> This patch tries to coalesce tx requests when constructing grant copy
> structures. It enables netback to deal with situation when frontend's
> MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> 
> It defines max_skb_slots, which is a estimation of the maximum number of
> slots
> a guest can send, anything bigger than that is considered malicious. Now it is
> set to 20, which should be enough to accommodate Linux (16 to 19) and
> possibly
> Windows (19?).
> 
> +/*
> + * This is an estimation of the maximum possible frags a SKB might
> + * have, anything larger than this is considered malicious. Typically
> + * Linux has 16 to 19, Windows has 19(?).
> + */

Could you remove the "Windows has 19(?)" comment? I don't think it's helpful, even with the "(?)"... I just checked and windows 2008R2 gives GPLPV a maximum of 20 buffers in all the testing I've done, and that's after the header is coalesced so it's probably more than that. I'm pretty sure I tested windows 2003 quite a while back and I could coax it into giving ridiculous numbers of buffers when using iperf with tiny buffers.

Maybe "Windows has >19" if you need to put a number on it?

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 13:09   ` James Harper
@ 2013-03-18 13:27     ` James Harper
  2013-03-21 19:08       ` Wei Liu
  0 siblings, 1 reply; 97+ messages in thread
From: James Harper @ 2013-03-18 13:27 UTC (permalink / raw)
  To: James Harper, Wei Liu, netdev, xen-devel
  Cc: annie.li, ian.campbell, konrad.wilk

> >
> > This patch tries to coalesce tx requests when constructing grant copy
> > structures. It enables netback to deal with situation when frontend's
> > MAX_SKB_FRAGS is larger than backend's MAX_SKB_FRAGS.
> >
> > It defines max_skb_slots, which is a estimation of the maximum number of
> > slots
> > a guest can send, anything bigger than that is considered malicious. Now it
> is
> > set to 20, which should be enough to accommodate Linux (16 to 19) and
> > possibly
> > Windows (19?).
> >
> > +/*
> > + * This is an estimation of the maximum possible frags a SKB might
> > + * have, anything larger than this is considered malicious. Typically
> > + * Linux has 16 to 19, Windows has 19(?).
> > + */
> 
> Could you remove the "Windows has 19(?)" comment? I don't think it's
> helpful, even with the "(?)"... I just checked and windows 2008R2 gives
> GPLPV a maximum of 20 buffers in all the testing I've done, and that's after
> the header is coalesced so it's probably more than that. I'm pretty sure I
> tested windows 2003 quite a while back and I could coax it into giving
> ridiculous numbers of buffers when using iperf with tiny buffers.
> 
> Maybe "Windows has >19" if you need to put a number on it?
> 

Actually it turns out GPLPV just stops counting at 20. If I keep counting I can sometimes see over 1000 buffers per GSO packet under Windows using "iperf -l50", so windows will quite happily send 1000's of buffers and I don't have any evidence that it wouldn't cope with a similar number on receive. fwiw.

(of course coalescing vs using 1000 ring slots is an obvious choice...)

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
  2013-03-18 11:42   ` Ian Campbell
  2013-03-18 11:42   ` Ian Campbell
@ 2013-03-18 13:44   ` Konrad Rzeszutek Wilk
  2013-03-18 13:44   ` Konrad Rzeszutek Wilk
                     ` (5 subsequent siblings)
  8 siblings, 0 replies; 97+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-18 13:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell, annie.li

On Mon, Mar 18, 2013 at 10:35:53AM +0000, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.

Should this also copy stable@vger.kernel.org?
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                     ` (2 preceding siblings ...)
  2013-03-18 13:44   ` Konrad Rzeszutek Wilk
@ 2013-03-18 13:44   ` Konrad Rzeszutek Wilk
  2013-03-18 13:46   ` David Vrabel
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 97+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-03-18 13:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, ian.campbell, xen-devel

On Mon, Mar 18, 2013 at 10:35:53AM +0000, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.

Should this also copy stable@vger.kernel.org?
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> -- 
> 1.7.10.4
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                     ` (4 preceding siblings ...)
  2013-03-18 13:46   ` David Vrabel
@ 2013-03-18 13:46   ` David Vrabel
  2013-03-18 13:48     ` Ian Campbell
  2013-03-18 13:48     ` [Xen-devel] " Ian Campbell
  2013-03-19  1:35   ` [Xen-devel] " annie li
                     ` (2 subsequent siblings)
  8 siblings, 2 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-18 13:46 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, annie.li, ian.campbell, konrad.wilk

On 18/03/13 10:35, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.

The backend needs to be able to handle these bad packets without
disconnecting the VIF -- we can't fix all the frontend drivers.

David

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                     ` (3 preceding siblings ...)
  2013-03-18 13:44   ` Konrad Rzeszutek Wilk
@ 2013-03-18 13:46   ` David Vrabel
  2013-03-18 13:46   ` [Xen-devel] " David Vrabel
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-18 13:46 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, ian.campbell, xen-devel

On 18/03/13 10:35, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.

The backend needs to be able to handle these bad packets without
disconnecting the VIF -- we can't fix all the frontend drivers.

David

> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 13:46   ` [Xen-devel] " David Vrabel
  2013-03-18 13:48     ` Ian Campbell
@ 2013-03-18 13:48     ` Ian Campbell
  2013-03-18 14:00       ` David Vrabel
  2013-03-18 14:00       ` David Vrabel
  1 sibling, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 13:48 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, netdev, xen-devel, annie.li, konrad.wilk

On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
> On 18/03/13 10:35, Wei Liu wrote:
> > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > 65535 will cause overflow.
> 
> The backend needs to be able to handle these bad packets without
> disconnecting the VIF -- we can't fix all the frontend drivers.

Agreed, although that doesn't imply that we shouldn't fix the frontend
where we can -- such as upstream as Wei does here.

Ian.

> 
> David
> 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 5527663..8c3d065 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > +	/*
> > +	 * wire format of xen_netif_tx_request only supports skb->len
> > +	 * < 64K, because size field in xen_netif_tx_request is
> > +	 * uint16_t.
> > +	 */
> > +	if (unlikely(skb->len > (uint16_t)(~0))) {
> > +		net_alert_ratelimited(
> > +			"xennet: skb->len = %d, too big for wire format\n",
> > +			skb->len);
> > +		goto drop;
> > +	}
> > +
> >  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >  		xennet_count_skb_frag_slots(skb);
> >  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 13:46   ` [Xen-devel] " David Vrabel
@ 2013-03-18 13:48     ` Ian Campbell
  2013-03-18 13:48     ` [Xen-devel] " Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 13:48 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
> On 18/03/13 10:35, Wei Liu wrote:
> > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > 65535 will cause overflow.
> 
> The backend needs to be able to handle these bad packets without
> disconnecting the VIF -- we can't fix all the frontend drivers.

Agreed, although that doesn't imply that we shouldn't fix the frontend
where we can -- such as upstream as Wei does here.

Ian.

> 
> David
> 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 5527663..8c3d065 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > +	/*
> > +	 * wire format of xen_netif_tx_request only supports skb->len
> > +	 * < 64K, because size field in xen_netif_tx_request is
> > +	 * uint16_t.
> > +	 */
> > +	if (unlikely(skb->len > (uint16_t)(~0))) {
> > +		net_alert_ratelimited(
> > +			"xennet: skb->len = %d, too big for wire format\n",
> > +			skb->len);
> > +		goto drop;
> > +	}
> > +
> >  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >  		xennet_count_skb_frag_slots(skb);
> >  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 13:48     ` [Xen-devel] " Ian Campbell
@ 2013-03-18 14:00       ` David Vrabel
  2013-03-18 14:19         ` Wei Liu
                           ` (3 more replies)
  2013-03-18 14:00       ` David Vrabel
  1 sibling, 4 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-18 14:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, netdev, xen-devel, annie.li, konrad.wilk

On 18/03/13 13:48, Ian Campbell wrote:
> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
>> On 18/03/13 10:35, Wei Liu wrote:
>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>> 65535 will cause overflow.
>>
>> The backend needs to be able to handle these bad packets without
>> disconnecting the VIF -- we can't fix all the frontend drivers.
> 
> Agreed, although that doesn't imply that we shouldn't fix the frontend
> where we can -- such as upstream as Wei does here.

Yes, frontends should be fixed where possible.

This is what I came up with for the backend.  I don't have time to look
into it further but, Wei, feel free to use it as a starting point.

David

diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index cd49ba9..18e2671 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -899,10 +899,11 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
 static int netbk_count_requests(struct xenvif *vif,
 				struct xen_netif_tx_request *first,
 				struct xen_netif_tx_request *txp,
-				int work_to_do)
+				int work_to_do, int idx)
 {
 	RING_IDX cons = vif->tx.req_cons;
 	int frags = 0;
+	bool drop = false;

 	if (!(first->flags & XEN_NETTXF_more_data))
 		return 0;
@@ -922,10 +923,20 @@ static int netbk_count_requests(struct xenvif *vif,

 		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
 		       sizeof(*txp));
-		if (txp->size > first->size) {
-			netdev_err(vif->dev, "Frag is bigger than frame.\n");
-			netbk_fatal_tx_err(vif);
-			return -EIO;
+
+		/*
+		 * If the guest submitted a frame >= 64 KiB then
+		 * first->size overflowed and following frags will
+		 * appear to be larger than the frame.
+		 *
+		 * This cannot be a fatal error as there are buggy
+		 * frontends that do this.
+		 *
+		 * Consume all the frags and drop the packet.
+		 */
+		if (!drop && txp->size > first->size) {
+			netdev_dbg(vif->dev, "Frag is bigger than frame.\n");
+			drop = true;
 		}

 		first->size -= txp->size;
@@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
 			return -EINVAL;
 		}
 	} while ((txp++)->flags & XEN_NETTXF_more_data);
+
+	if (drop) {
+		netbk_tx_err(vif, txp, idx + frags);
+		return -EIO;
+	}
+
 	return frags;
 }

@@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
xen_netbk *netbk)
 				continue;
 		}

-		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
+		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
 		if (unlikely(ret < 0))
 			continue;

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 13:48     ` [Xen-devel] " Ian Campbell
  2013-03-18 14:00       ` David Vrabel
@ 2013-03-18 14:00       ` David Vrabel
  1 sibling, 0 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-18 14:00 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On 18/03/13 13:48, Ian Campbell wrote:
> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
>> On 18/03/13 10:35, Wei Liu wrote:
>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>> 65535 will cause overflow.
>>
>> The backend needs to be able to handle these bad packets without
>> disconnecting the VIF -- we can't fix all the frontend drivers.
> 
> Agreed, although that doesn't imply that we shouldn't fix the frontend
> where we can -- such as upstream as Wei does here.

Yes, frontends should be fixed where possible.

This is what I came up with for the backend.  I don't have time to look
into it further but, Wei, feel free to use it as a starting point.

David

diff --git a/drivers/net/xen-netback/netback.c
b/drivers/net/xen-netback/netback.c
index cd49ba9..18e2671 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -899,10 +899,11 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
 static int netbk_count_requests(struct xenvif *vif,
 				struct xen_netif_tx_request *first,
 				struct xen_netif_tx_request *txp,
-				int work_to_do)
+				int work_to_do, int idx)
 {
 	RING_IDX cons = vif->tx.req_cons;
 	int frags = 0;
+	bool drop = false;

 	if (!(first->flags & XEN_NETTXF_more_data))
 		return 0;
@@ -922,10 +923,20 @@ static int netbk_count_requests(struct xenvif *vif,

 		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
 		       sizeof(*txp));
-		if (txp->size > first->size) {
-			netdev_err(vif->dev, "Frag is bigger than frame.\n");
-			netbk_fatal_tx_err(vif);
-			return -EIO;
+
+		/*
+		 * If the guest submitted a frame >= 64 KiB then
+		 * first->size overflowed and following frags will
+		 * appear to be larger than the frame.
+		 *
+		 * This cannot be a fatal error as there are buggy
+		 * frontends that do this.
+		 *
+		 * Consume all the frags and drop the packet.
+		 */
+		if (!drop && txp->size > first->size) {
+			netdev_dbg(vif->dev, "Frag is bigger than frame.\n");
+			drop = true;
 		}

 		first->size -= txp->size;
@@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
 			return -EINVAL;
 		}
 	} while ((txp++)->flags & XEN_NETTXF_more_data);
+
+	if (drop) {
+		netbk_tx_err(vif, txp, idx + frags);
+		return -EIO;
+	}
+
 	return frags;
 }

@@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
xen_netbk *netbk)
 				continue;
 		}

-		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
+		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
 		if (unlikely(ret < 0))
 			continue;

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:00       ` David Vrabel
@ 2013-03-18 14:19         ` Wei Liu
  2013-03-19 13:40           ` David Vrabel
  2013-03-19 13:40           ` [Xen-devel] " David Vrabel
  2013-03-18 14:19         ` Wei Liu
                           ` (2 subsequent siblings)
  3 siblings, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 14:19 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, netdev, xen-devel, annie.li, konrad.wilk

On Mon, 2013-03-18 at 14:00 +0000, David Vrabel wrote:
> On 18/03/13 13:48, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
> >> On 18/03/13 10:35, Wei Liu wrote:
> >>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>> 65535 will cause overflow.
> >>
> >> The backend needs to be able to handle these bad packets without
> >> disconnecting the VIF -- we can't fix all the frontend drivers.
> > 
> > Agreed, although that doesn't imply that we shouldn't fix the frontend
> > where we can -- such as upstream as Wei does here.
> 
> Yes, frontends should be fixed where possible.
> 
> This is what I came up with for the backend.  I don't have time to look
> into it further but, Wei, feel free to use it as a starting point.
> 

Thanks for this patch.

I haven't gone through XSA-39 discussion, this is why I didn't come up
with a fix for backend -- I need to make sure dropping packet like this
won't re-exhibit the security hole.


Wei.

> David
> 
> diff --git a/drivers/net/xen-netback/netback.c
> b/drivers/net/xen-netback/netback.c
> index cd49ba9..18e2671 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -899,10 +899,11 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
>  static int netbk_count_requests(struct xenvif *vif,
>  				struct xen_netif_tx_request *first,
>  				struct xen_netif_tx_request *txp,
> -				int work_to_do)
> +				int work_to_do, int idx)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
>  	int frags = 0;
> +	bool drop = false;
> 
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
> @@ -922,10 +923,20 @@ static int netbk_count_requests(struct xenvif *vif,
> 
>  		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
>  		       sizeof(*txp));
> -		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> -			netbk_fatal_tx_err(vif);
> -			return -EIO;
> +
> +		/*
> +		 * If the guest submitted a frame >= 64 KiB then
> +		 * first->size overflowed and following frags will
> +		 * appear to be larger than the frame.
> +		 *
> +		 * This cannot be a fatal error as there are buggy
> +		 * frontends that do this.
> +		 *
> +		 * Consume all the frags and drop the packet.
> +		 */
> +		if (!drop && txp->size > first->size) {
> +			netdev_dbg(vif->dev, "Frag is bigger than frame.\n");
> +			drop = true;
>  		}
> 
>  		first->size -= txp->size;
> @@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> +
> +	if (drop) {
> +		netbk_tx_err(vif, txp, idx + frags);
> +		return -EIO;
> +	}
> +
>  	return frags;
>  }
> 
> @@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
> xen_netbk *netbk)
>  				continue;
>  		}
> 
> -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> +		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
>  		if (unlikely(ret < 0))
>  			continue;
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:00       ` David Vrabel
  2013-03-18 14:19         ` Wei Liu
@ 2013-03-18 14:19         ` Wei Liu
  2013-03-20 20:02         ` David Vrabel
  2013-03-20 20:02         ` [Xen-devel] " David Vrabel
  3 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 14:19 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On Mon, 2013-03-18 at 14:00 +0000, David Vrabel wrote:
> On 18/03/13 13:48, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
> >> On 18/03/13 10:35, Wei Liu wrote:
> >>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>> 65535 will cause overflow.
> >>
> >> The backend needs to be able to handle these bad packets without
> >> disconnecting the VIF -- we can't fix all the frontend drivers.
> > 
> > Agreed, although that doesn't imply that we shouldn't fix the frontend
> > where we can -- such as upstream as Wei does here.
> 
> Yes, frontends should be fixed where possible.
> 
> This is what I came up with for the backend.  I don't have time to look
> into it further but, Wei, feel free to use it as a starting point.
> 

Thanks for this patch.

I haven't gone through XSA-39 discussion, this is why I didn't come up
with a fix for backend -- I need to make sure dropping packet like this
won't re-exhibit the security hole.


Wei.

> David
> 
> diff --git a/drivers/net/xen-netback/netback.c
> b/drivers/net/xen-netback/netback.c
> index cd49ba9..18e2671 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -899,10 +899,11 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
>  static int netbk_count_requests(struct xenvif *vif,
>  				struct xen_netif_tx_request *first,
>  				struct xen_netif_tx_request *txp,
> -				int work_to_do)
> +				int work_to_do, int idx)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
>  	int frags = 0;
> +	bool drop = false;
> 
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
> @@ -922,10 +923,20 @@ static int netbk_count_requests(struct xenvif *vif,
> 
>  		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
>  		       sizeof(*txp));
> -		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> -			netbk_fatal_tx_err(vif);
> -			return -EIO;
> +
> +		/*
> +		 * If the guest submitted a frame >= 64 KiB then
> +		 * first->size overflowed and following frags will
> +		 * appear to be larger than the frame.
> +		 *
> +		 * This cannot be a fatal error as there are buggy
> +		 * frontends that do this.
> +		 *
> +		 * Consume all the frags and drop the packet.
> +		 */
> +		if (!drop && txp->size > first->size) {
> +			netdev_dbg(vif->dev, "Frag is bigger than frame.\n");
> +			drop = true;
>  		}
> 
>  		first->size -= txp->size;
> @@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> +
> +	if (drop) {
> +		netbk_tx_err(vif, txp, idx + frags);
> +		return -EIO;
> +	}
> +
>  	return frags;
>  }
> 
> @@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
> xen_netbk *netbk)
>  				continue;
>  		}
> 
> -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> +		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
>  		if (unlikely(ret < 0))
>  			continue;
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 11:42   ` Ian Campbell
  2013-03-18 14:40     ` Wei Liu
@ 2013-03-18 14:40     ` Wei Liu
  2013-03-18 14:54       ` Ian Campbell
  2013-03-18 14:54       ` Ian Campbell
  1 sibling, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 14:40 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > 65535 will cause overflow.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 5527663..8c3d065 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > +	/*
> > +	 * wire format of xen_netif_tx_request only supports skb->len
> > +	 * < 64K, because size field in xen_netif_tx_request is
> > +	 * uint16_t.
> 
> Is there some field we can set e.g. in struct ethernet_device which
> would stop this from happening?
> 

struct ethernet_device? I could not find it.

And for struct net_device, there is no field for this AFAICT.


Wei.

> 
> > +	 */
> > +	if (unlikely(skb->len > (uint16_t)(~0))) {
> > +		net_alert_ratelimited(
> > +			"xennet: skb->len = %d, too big for wire format\n",
> > +			skb->len);
> > +		goto drop;
> > +	}
> > +
> >  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >  		xennet_count_skb_frag_slots(skb);
> >  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 11:42   ` Ian Campbell
@ 2013-03-18 14:40     ` Wei Liu
  2013-03-18 14:40     ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 14:40 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, wei.liu2, xen-devel

On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > 65535 will cause overflow.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |   12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 5527663..8c3d065 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >  	unsigned int len = skb_headlen(skb);
> >  	unsigned long flags;
> >  
> > +	/*
> > +	 * wire format of xen_netif_tx_request only supports skb->len
> > +	 * < 64K, because size field in xen_netif_tx_request is
> > +	 * uint16_t.
> 
> Is there some field we can set e.g. in struct ethernet_device which
> would stop this from happening?
> 

struct ethernet_device? I could not find it.

And for struct net_device, there is no field for this AFAICT.


Wei.

> 
> > +	 */
> > +	if (unlikely(skb->len > (uint16_t)(~0))) {
> > +		net_alert_ratelimited(
> > +			"xennet: skb->len = %d, too big for wire format\n",
> > +			skb->len);
> > +		goto drop;
> > +	}
> > +
> >  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
> >  		xennet_count_skb_frag_slots(skb);
> >  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:40     ` Wei Liu
  2013-03-18 14:54       ` Ian Campbell
@ 2013-03-18 14:54       ` Ian Campbell
  2013-03-18 15:04         ` Wei Liu
  2013-03-18 15:04         ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 14:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > 65535 will cause overflow.
> > > 
> > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > index 5527663..8c3d065 100644
> > > --- a/drivers/net/xen-netfront.c
> > > +++ b/drivers/net/xen-netfront.c
> > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >  	unsigned int len = skb_headlen(skb);
> > >  	unsigned long flags;
> > >  
> > > +	/*
> > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > +	 * uint16_t.
> > 
> > Is there some field we can set e.g. in struct ethernet_device which
> > would stop this from happening?
> > 
> 
> struct ethernet_device? I could not find it.
> 
> And for struct net_device,

I meant struct net_device.

>  there is no field for this AFAICT.

Interesting. Are hardware devices expected to cope with arbitrary sized
GSO skbs then I wonder.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:40     ` Wei Liu
@ 2013-03-18 14:54       ` Ian Campbell
  2013-03-18 14:54       ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 14:54 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > 65535 will cause overflow.
> > > 
> > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > ---
> > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > >  1 file changed, 12 insertions(+)
> > > 
> > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > index 5527663..8c3d065 100644
> > > --- a/drivers/net/xen-netfront.c
> > > +++ b/drivers/net/xen-netfront.c
> > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > >  	unsigned int len = skb_headlen(skb);
> > >  	unsigned long flags;
> > >  
> > > +	/*
> > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > +	 * uint16_t.
> > 
> > Is there some field we can set e.g. in struct ethernet_device which
> > would stop this from happening?
> > 
> 
> struct ethernet_device? I could not find it.
> 
> And for struct net_device,

I meant struct net_device.

>  there is no field for this AFAICT.

Interesting. Are hardware devices expected to cope with arbitrary sized
GSO skbs then I wonder.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:54       ` Ian Campbell
@ 2013-03-18 15:04         ` Wei Liu
  2013-03-18 15:07           ` Ian Campbell
  2013-03-18 15:07           ` Ian Campbell
  2013-03-18 15:04         ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 15:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > 65535 will cause overflow.
> > > > 
> > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > ---
> > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > > 
> > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > index 5527663..8c3d065 100644
> > > > --- a/drivers/net/xen-netfront.c
> > > > +++ b/drivers/net/xen-netfront.c
> > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >  	unsigned int len = skb_headlen(skb);
> > > >  	unsigned long flags;
> > > >  
> > > > +	/*
> > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > +	 * uint16_t.
> > > 
> > > Is there some field we can set e.g. in struct ethernet_device which
> > > would stop this from happening?
> > > 
> > 
> > struct ethernet_device? I could not find it.
> > 
> > And for struct net_device,
> 
> I meant struct net_device.
> 
> >  there is no field for this AFAICT.
> 
> Interesting. Are hardware devices expected to cope with arbitrary sized
> GSO skbs then I wonder.
> 

No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
net_device. :-)


Wei.

> Ian.
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:54       ` Ian Campbell
  2013-03-18 15:04         ` Wei Liu
@ 2013-03-18 15:04         ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 15:04 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, wei.liu2, xen-devel

On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > 65535 will cause overflow.
> > > > 
> > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > ---
> > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > >  1 file changed, 12 insertions(+)
> > > > 
> > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > index 5527663..8c3d065 100644
> > > > --- a/drivers/net/xen-netfront.c
> > > > +++ b/drivers/net/xen-netfront.c
> > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > >  	unsigned int len = skb_headlen(skb);
> > > >  	unsigned long flags;
> > > >  
> > > > +	/*
> > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > +	 * uint16_t.
> > > 
> > > Is there some field we can set e.g. in struct ethernet_device which
> > > would stop this from happening?
> > > 
> > 
> > struct ethernet_device? I could not find it.
> > 
> > And for struct net_device,
> 
> I meant struct net_device.
> 
> >  there is no field for this AFAICT.
> 
> Interesting. Are hardware devices expected to cope with arbitrary sized
> GSO skbs then I wonder.
> 

No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
net_device. :-)


Wei.

> Ian.
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 15:04         ` Wei Liu
@ 2013-03-18 15:07           ` Ian Campbell
  2013-03-18 15:10             ` Wei Liu
                               ` (3 more replies)
  2013-03-18 15:07           ` Ian Campbell
  1 sibling, 4 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 15:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > 65535 will cause overflow.
> > > > > 
> > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > ---
> > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > >  1 file changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > index 5527663..8c3d065 100644
> > > > > --- a/drivers/net/xen-netfront.c
> > > > > +++ b/drivers/net/xen-netfront.c
> > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >  	unsigned int len = skb_headlen(skb);
> > > > >  	unsigned long flags;
> > > > >  
> > > > > +	/*
> > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > +	 * uint16_t.
> > > > 
> > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > would stop this from happening?
> > > > 
> > > 
> > > struct ethernet_device? I could not find it.
> > > 
> > > And for struct net_device,
> > 
> > I meant struct net_device.
> > 
> > >  there is no field for this AFAICT.
> > 
> > Interesting. Are hardware devices expected to cope with arbitrary sized
> > GSO skbs then I wonder.
> > 
> 
> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> net_device. :-)

But aren't we seeing skb's bigger than that?

Maybe this is just a historical bug in some older guests?

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 15:04         ` Wei Liu
  2013-03-18 15:07           ` Ian Campbell
@ 2013-03-18 15:07           ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-03-18 15:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, xen-devel

On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > 65535 will cause overflow.
> > > > > 
> > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > ---
> > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > >  1 file changed, 12 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > index 5527663..8c3d065 100644
> > > > > --- a/drivers/net/xen-netfront.c
> > > > > +++ b/drivers/net/xen-netfront.c
> > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > >  	unsigned int len = skb_headlen(skb);
> > > > >  	unsigned long flags;
> > > > >  
> > > > > +	/*
> > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > +	 * uint16_t.
> > > > 
> > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > would stop this from happening?
> > > > 
> > > 
> > > struct ethernet_device? I could not find it.
> > > 
> > > And for struct net_device,
> > 
> > I meant struct net_device.
> > 
> > >  there is no field for this AFAICT.
> > 
> > Interesting. Are hardware devices expected to cope with arbitrary sized
> > GSO skbs then I wonder.
> > 
> 
> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> net_device. :-)

But aren't we seeing skb's bigger than that?

Maybe this is just a historical bug in some older guests?

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 15:07           ` Ian Campbell
@ 2013-03-18 15:10             ` Wei Liu
  2013-03-18 15:10             ` Wei Liu
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 15:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > 65535 will cause overflow.
> > > > > > 
> > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > ---
> > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > >  1 file changed, 12 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > index 5527663..8c3d065 100644
> > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > >  	unsigned long flags;
> > > > > >  
> > > > > > +	/*
> > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > +	 * uint16_t.
> > > > > 
> > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > would stop this from happening?
> > > > > 
> > > > 
> > > > struct ethernet_device? I could not find it.
> > > > 
> > > > And for struct net_device,
> > > 
> > > I meant struct net_device.
> > > 
> > > >  there is no field for this AFAICT.
> > > 
> > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > GSO skbs then I wonder.
> > > 
> > 
> > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > net_device. :-)
> 
> But aren't we seeing skb's bigger than that?
> 

Yes, skb->len = 65538.

> Maybe this is just a historical bug in some older guests?
> 

I saw this with latest upstream kernel.


Wei.

> Ian.
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 15:07           ` Ian Campbell
  2013-03-18 15:10             ` Wei Liu
@ 2013-03-18 15:10             ` Wei Liu
  2013-03-19 21:24             ` Ben Hutchings
  2013-03-19 21:24             ` Ben Hutchings
  3 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-18 15:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, wei.liu2, xen-devel

On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > 65535 will cause overflow.
> > > > > > 
> > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > ---
> > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > >  1 file changed, 12 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > index 5527663..8c3d065 100644
> > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > >  	unsigned long flags;
> > > > > >  
> > > > > > +	/*
> > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > +	 * uint16_t.
> > > > > 
> > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > would stop this from happening?
> > > > > 
> > > > 
> > > > struct ethernet_device? I could not find it.
> > > > 
> > > > And for struct net_device,
> > > 
> > > I meant struct net_device.
> > > 
> > > >  there is no field for this AFAICT.
> > > 
> > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > GSO skbs then I wonder.
> > > 
> > 
> > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > net_device. :-)
> 
> But aren't we seeing skb's bigger than that?
> 

Yes, skb->len = 65538.

> Maybe this is just a historical bug in some older guests?
> 

I saw this with latest upstream kernel.


Wei.

> Ian.
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                     ` (5 preceding siblings ...)
  2013-03-18 13:46   ` [Xen-devel] " David Vrabel
@ 2013-03-19  1:35   ` annie li
  2013-03-19  1:35   ` annie li
  2013-03-19 20:13   ` Nick Pegg
  8 siblings, 0 replies; 97+ messages in thread
From: annie li @ 2013-03-19  1:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell, konrad.wilk


On 2013-3-18 18:35, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>   drivers/net/xen-netfront.c |   12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	unsigned int len = skb_headlen(skb);
>   	unsigned long flags;
>   
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +

Maybe it is better to do some segmentation for packets(>15536) which 
support segments, and drop those which do not support segment.
This can also be implemented in another patch(Just like what i did for 
packets which requires slots larger than SKB_MAX_FRAGS).

Thanks
Annie
>   	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>   		xennet_count_skb_frag_slots(skb);
>   	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                     ` (6 preceding siblings ...)
  2013-03-19  1:35   ` [Xen-devel] " annie li
@ 2013-03-19  1:35   ` annie li
  2013-03-19 20:13   ` Nick Pegg
  8 siblings, 0 replies; 97+ messages in thread
From: annie li @ 2013-03-19  1:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, konrad.wilk, ian.campbell, xen-devel


On 2013-3-18 18:35, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.
>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>   drivers/net/xen-netfront.c |   12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	unsigned int len = skb_headlen(skb);
>   	unsigned long flags;
>   
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +

Maybe it is better to do some segmentation for packets(>15536) which 
support segments, and drop those which do not support segment.
This can also be implemented in another patch(Just like what i did for 
packets which requires slots larger than SKB_MAX_FRAGS).

Thanks
Annie
>   	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>   		xennet_count_skb_frag_slots(skb);
>   	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 12:14       ` Ian Campbell
  2013-03-19  2:39         ` annie li
@ 2013-03-19  2:39         ` annie li
  2013-03-19  3:02           ` [Xen-devel] " James Harper
                             ` (3 more replies)
  1 sibling, 4 replies; 97+ messages in thread
From: annie li @ 2013-03-19  2:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, netdev, xen-devel, konrad.wilk


On 2013-3-18 20:14, Ian Campbell wrote:
> On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>
>>> I think a few more words are needed here since from the code you are
>>> removing it seems very much like gso is used for something. If you have
>>> a proof that the "extra = gso" case is never hit then please explain it.
>>> Perhaps a reference to the removal of the last user?
>>>
>>> Or maybe it is the case that it should be used and the bug is that it
>>> isn't?
>>>
>> Looks like the latter one. 'extra' field should  be used to get hold of
>> the last extra info in the ring. ;-)
>>
>> But, the only extra info in upstream kernel is XEN_NETIF_EXTRA_TYPE_GSO,
>> so there's really no other extra info in the ring at that point. Could
>> it be possible that it is something from classic Xen kernel?
> The classic kernel netfront has exactly the same code it seems and
> netif_extra_type_gso is the only one I've ever heard of.
>
> Maybe this extra thing is just redundant unless/until a second extra
> comes along.

In our windows pv driver, we do not process this for GSO in tx path 
either. Maybe we ignored processing for some special GSO?

BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only 
processes it in xen_netback_tx_build_gops, but netfront xmit path does 
not really set this flag. I did process it in rx path of my windows pv 
driver(linux netfront did that too), but it seems unnecessary since 
netback does not set this flag at all.

Thanks
Annie

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-18 12:14       ` Ian Campbell
@ 2013-03-19  2:39         ` annie li
  2013-03-19  2:39         ` annie li
  1 sibling, 0 replies; 97+ messages in thread
From: annie li @ 2013-03-19  2:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, konrad.wilk, Wei Liu, xen-devel


On 2013-3-18 20:14, Ian Campbell wrote:
> On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>
>>> I think a few more words are needed here since from the code you are
>>> removing it seems very much like gso is used for something. If you have
>>> a proof that the "extra = gso" case is never hit then please explain it.
>>> Perhaps a reference to the removal of the last user?
>>>
>>> Or maybe it is the case that it should be used and the bug is that it
>>> isn't?
>>>
>> Looks like the latter one. 'extra' field should  be used to get hold of
>> the last extra info in the ring. ;-)
>>
>> But, the only extra info in upstream kernel is XEN_NETIF_EXTRA_TYPE_GSO,
>> so there's really no other extra info in the ring at that point. Could
>> it be possible that it is something from classic Xen kernel?
> The classic kernel netfront has exactly the same code it seems and
> netif_extra_type_gso is the only one I've ever heard of.
>
> Maybe this extra thing is just redundant unless/until a second extra
> comes along.

In our windows pv driver, we do not process this for GSO in tx path 
either. Maybe we ignored processing for some special GSO?

BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only 
processes it in xen_netback_tx_build_gops, but netfront xmit path does 
not really set this flag. I did process it in rx path of my windows pv 
driver(linux netfront did that too), but it seems unnecessary since 
netback does not set this flag at all.

Thanks
Annie

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  2:39         ` annie li
@ 2013-03-19  3:02           ` James Harper
  2013-03-19  3:02           ` James Harper
                             ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: James Harper @ 2013-03-19  3:02 UTC (permalink / raw)
  To: annie li, Ian Campbell; +Cc: netdev, konrad.wilk, Wei Liu, xen-devel

> 
> On 2013-3-18 20:14, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> >> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>
> >>> I think a few more words are needed here since from the code you are
> >>> removing it seems very much like gso is used for something. If you have
> >>> a proof that the "extra = gso" case is never hit then please explain it.
> >>> Perhaps a reference to the removal of the last user?
> >>>
> >>> Or maybe it is the case that it should be used and the bug is that it
> >>> isn't?
> >>>
> >> Looks like the latter one. 'extra' field should  be used to get hold of
> >> the last extra info in the ring. ;-)
> >>
> >> But, the only extra info in upstream kernel is
> XEN_NETIF_EXTRA_TYPE_GSO,
> >> so there's really no other extra info in the ring at that point. Could
> >> it be possible that it is something from classic Xen kernel?
> > The classic kernel netfront has exactly the same code it seems and
> > netif_extra_type_gso is the only one I've ever heard of.
> >
> > Maybe this extra thing is just redundant unless/until a second extra
> > comes along.
> 
> In our windows pv driver, we do not process this for GSO in tx path
> either. Maybe we ignored processing for some special GSO?
> 
> BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
> processes it in xen_netback_tx_build_gops, but netfront xmit path does
> not really set this flag. I did process it in rx path of my windows pv
> driver(linux netfront did that too), but it seems unnecessary since
> netback does not set this flag at all.
> 

This flag is set to say if there is another 'extra' ring entry. From netif.h:

/*
 * This is the 'wire' format for packets:
 *  Request 1: netif_tx_request -- NETTXF_* (any flags)
 * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
 * [Request 3: netif_tx_extra]  (only if request 2 has XEN_NETIF_EXTRA_MORE)
 *  Request 4: netif_tx_request -- NETTXF_more_data
 *  Request 5: netif_tx_request -- NETTXF_more_data
 *  ...
 *  Request N: netif_tx_request -- 0
 */

I think the only extra type is GSO so you'll probably never see it, but that's what it's for.

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  2:39         ` annie li
  2013-03-19  3:02           ` [Xen-devel] " James Harper
@ 2013-03-19  3:02           ` James Harper
  2013-03-19  9:28           ` Paul Durrant
  2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
  3 siblings, 0 replies; 97+ messages in thread
From: James Harper @ 2013-03-19  3:02 UTC (permalink / raw)
  To: annie li, Ian Campbell; +Cc: netdev, xen-devel, Wei Liu, konrad.wilk

> 
> On 2013-3-18 20:14, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> >> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>
> >>> I think a few more words are needed here since from the code you are
> >>> removing it seems very much like gso is used for something. If you have
> >>> a proof that the "extra = gso" case is never hit then please explain it.
> >>> Perhaps a reference to the removal of the last user?
> >>>
> >>> Or maybe it is the case that it should be used and the bug is that it
> >>> isn't?
> >>>
> >> Looks like the latter one. 'extra' field should  be used to get hold of
> >> the last extra info in the ring. ;-)
> >>
> >> But, the only extra info in upstream kernel is
> XEN_NETIF_EXTRA_TYPE_GSO,
> >> so there's really no other extra info in the ring at that point. Could
> >> it be possible that it is something from classic Xen kernel?
> > The classic kernel netfront has exactly the same code it seems and
> > netif_extra_type_gso is the only one I've ever heard of.
> >
> > Maybe this extra thing is just redundant unless/until a second extra
> > comes along.
> 
> In our windows pv driver, we do not process this for GSO in tx path
> either. Maybe we ignored processing for some special GSO?
> 
> BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
> processes it in xen_netback_tx_build_gops, but netfront xmit path does
> not really set this flag. I did process it in rx path of my windows pv
> driver(linux netfront did that too), but it seems unnecessary since
> netback does not set this flag at all.
> 

This flag is set to say if there is another 'extra' ring entry. From netif.h:

/*
 * This is the 'wire' format for packets:
 *  Request 1: netif_tx_request -- NETTXF_* (any flags)
 * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
 * [Request 3: netif_tx_extra]  (only if request 2 has XEN_NETIF_EXTRA_MORE)
 *  Request 4: netif_tx_request -- NETTXF_more_data
 *  Request 5: netif_tx_request -- NETTXF_more_data
 *  ...
 *  Request N: netif_tx_request -- 0
 */

I think the only extra type is GSO so you'll probably never see it, but that's what it's for.

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  2:39         ` annie li
                             ` (2 preceding siblings ...)
  2013-03-19  9:28           ` Paul Durrant
@ 2013-03-19  9:28           ` Paul Durrant
  2013-03-19  9:53             ` annie li
                               ` (3 more replies)
  3 siblings, 4 replies; 97+ messages in thread
From: Paul Durrant @ 2013-03-19  9:28 UTC (permalink / raw)
  To: annie li, Ian Campbell; +Cc: netdev, konrad.wilk, Wei Liu, xen-devel

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of annie li
> Sent: 19 March 2013 02:39
> To: Ian Campbell
> Cc: netdev@vger.kernel.org; konrad.wilk@oracle.com; Wei Liu; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable
> `extra'
> 
> 
> On 2013-3-18 20:14, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> >> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>
> >>> I think a few more words are needed here since from the code you are
> >>> removing it seems very much like gso is used for something. If you have
> >>> a proof that the "extra = gso" case is never hit then please explain it.
> >>> Perhaps a reference to the removal of the last user?
> >>>
> >>> Or maybe it is the case that it should be used and the bug is that it
> >>> isn't?
> >>>
> >> Looks like the latter one. 'extra' field should  be used to get hold of
> >> the last extra info in the ring. ;-)
> >>
> >> But, the only extra info in upstream kernel is
> XEN_NETIF_EXTRA_TYPE_GSO,
> >> so there's really no other extra info in the ring at that point. Could
> >> it be possible that it is something from classic Xen kernel?
> > The classic kernel netfront has exactly the same code it seems and
> > netif_extra_type_gso is the only one I've ever heard of.
> >
> > Maybe this extra thing is just redundant unless/until a second extra
> > comes along.
> 
> In our windows pv driver, we do not process this for GSO in tx path
> either. Maybe we ignored processing for some special GSO?
> 
> BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
> processes it in xen_netback_tx_build_gops, but netfront xmit path does
> not really set this flag. I did process it in rx path of my windows pv
> driver(linux netfront did that too), but it seems unnecessary since
> netback does not set this flag at all.
> 

The flag is there to denote the existence of an 'extra' segment in the packet. The 'extra' segment goes after the 1st segment and specifies metadata such as the GSO type (TCPv4 is the only one at the moment but we'll need TCPv6 very shortly) and the MSS.
Extra segments are certainly not redundant; the Citrix Windows PV drivers send TSOs using them and handle LRO using them too.

  Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  2:39         ` annie li
  2013-03-19  3:02           ` [Xen-devel] " James Harper
  2013-03-19  3:02           ` James Harper
@ 2013-03-19  9:28           ` Paul Durrant
  2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
  3 siblings, 0 replies; 97+ messages in thread
From: Paul Durrant @ 2013-03-19  9:28 UTC (permalink / raw)
  To: annie li, Ian Campbell; +Cc: netdev, xen-devel, Wei Liu, konrad.wilk

> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> bounces@lists.xen.org] On Behalf Of annie li
> Sent: 19 March 2013 02:39
> To: Ian Campbell
> Cc: netdev@vger.kernel.org; konrad.wilk@oracle.com; Wei Liu; xen-
> devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable
> `extra'
> 
> 
> On 2013-3-18 20:14, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> >> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>
> >>> I think a few more words are needed here since from the code you are
> >>> removing it seems very much like gso is used for something. If you have
> >>> a proof that the "extra = gso" case is never hit then please explain it.
> >>> Perhaps a reference to the removal of the last user?
> >>>
> >>> Or maybe it is the case that it should be used and the bug is that it
> >>> isn't?
> >>>
> >> Looks like the latter one. 'extra' field should  be used to get hold of
> >> the last extra info in the ring. ;-)
> >>
> >> But, the only extra info in upstream kernel is
> XEN_NETIF_EXTRA_TYPE_GSO,
> >> so there's really no other extra info in the ring at that point. Could
> >> it be possible that it is something from classic Xen kernel?
> > The classic kernel netfront has exactly the same code it seems and
> > netif_extra_type_gso is the only one I've ever heard of.
> >
> > Maybe this extra thing is just redundant unless/until a second extra
> > comes along.
> 
> In our windows pv driver, we do not process this for GSO in tx path
> either. Maybe we ignored processing for some special GSO?
> 
> BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
> processes it in xen_netback_tx_build_gops, but netfront xmit path does
> not really set this flag. I did process it in rx path of my windows pv
> driver(linux netfront did that too), but it seems unnecessary since
> netback does not set this flag at all.
> 

The flag is there to denote the existence of an 'extra' segment in the packet. The 'extra' segment goes after the 1st segment and specifies metadata such as the GSO type (TCPv4 is the only one at the moment but we'll need TCPv6 very shortly) and the MSS.
Extra segments are certainly not redundant; the Citrix Windows PV drivers send TSOs using them and handle LRO using them too.

  Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
  2013-03-19  9:53             ` annie li
@ 2013-03-19  9:53             ` annie li
  2013-03-19 10:03               ` Paul Durrant
  2013-03-19 10:03               ` Paul Durrant
  2013-03-19 15:26             ` Wei Liu
  2013-03-19 15:26             ` [Xen-devel] " Wei Liu
  3 siblings, 2 replies; 97+ messages in thread
From: annie li @ 2013-03-19  9:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: Ian Campbell, netdev, konrad.wilk, Wei Liu, xen-devel


On 2013-3-19 17:28, Paul Durrant wrote:
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of annie li
>> Sent: 19 March 2013 02:39
>> To: Ian Campbell
>> Cc: netdev@vger.kernel.org; konrad.wilk@oracle.com; Wei Liu; xen-
>> devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable
>> `extra'
>>
>>
>> On 2013-3-18 20:14, Ian Campbell wrote:
>>> On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>>>
>>>>> I think a few more words are needed here since from the code you are
>>>>> removing it seems very much like gso is used for something. If you have
>>>>> a proof that the "extra = gso" case is never hit then please explain it.
>>>>> Perhaps a reference to the removal of the last user?
>>>>>
>>>>> Or maybe it is the case that it should be used and the bug is that it
>>>>> isn't?
>>>>>
>>>> Looks like the latter one. 'extra' field should  be used to get hold of
>>>> the last extra info in the ring. ;-)
>>>>
>>>> But, the only extra info in upstream kernel is
>> XEN_NETIF_EXTRA_TYPE_GSO,
>>>> so there's really no other extra info in the ring at that point. Could
>>>> it be possible that it is something from classic Xen kernel?
>>> The classic kernel netfront has exactly the same code it seems and
>>> netif_extra_type_gso is the only one I've ever heard of.
>>>
>>> Maybe this extra thing is just redundant unless/until a second extra
>>> comes along.
>> In our windows pv driver, we do not process this for GSO in tx path
>> either. Maybe we ignored processing for some special GSO?
>>
>> BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
>> processes it in xen_netback_tx_build_gops, but netfront xmit path does
>> not really set this flag. I did process it in rx path of my windows pv
>> driver(linux netfront did that too), but it seems unnecessary since
>> netback does not set this flag at all.
>>
> The flag is there to denote the existence of an 'extra' segment in the packet. The 'extra' segment goes after the 1st segment and specifies metadata such as the GSO type (TCPv4 is the only one at the moment but we'll need TCPv6 very shortly) and the MSS.

For TCPv4 GSO, it seems one extra info request(NETTXF_extra_info) is 
enough in my winpv driver, and I did not process the XEN_NETIF_EXTRA_MORE.
Do you create two extra info requests for bothTCPv4 and TCPv6 GSO like 
following?

  * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
  * [Request 3: netif_tx_extra]  (only if request 2 has XEN_NETIF_EXTRA_MORE)


> Extra segments are certainly not redundant; the Citrix Windows PV drivers send TSOs using them and handle LRO using them too.

About the LRO, upstream netback does not create any response with 
XEN_NETIF_EXTRA_MORE, so I assume your dom0 did such process?

Thanks
Annie

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
@ 2013-03-19  9:53             ` annie li
  2013-03-19  9:53             ` [Xen-devel] " annie li
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 97+ messages in thread
From: annie li @ 2013-03-19  9:53 UTC (permalink / raw)
  To: Paul Durrant; +Cc: netdev, xen-devel, Wei Liu, Ian Campbell, konrad.wilk


On 2013-3-19 17:28, Paul Durrant wrote:
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
>> bounces@lists.xen.org] On Behalf Of annie li
>> Sent: 19 March 2013 02:39
>> To: Ian Campbell
>> Cc: netdev@vger.kernel.org; konrad.wilk@oracle.com; Wei Liu; xen-
>> devel@lists.xen.org
>> Subject: Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable
>> `extra'
>>
>>
>> On 2013-3-18 20:14, Ian Campbell wrote:
>>> On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>>>
>>>>> I think a few more words are needed here since from the code you are
>>>>> removing it seems very much like gso is used for something. If you have
>>>>> a proof that the "extra = gso" case is never hit then please explain it.
>>>>> Perhaps a reference to the removal of the last user?
>>>>>
>>>>> Or maybe it is the case that it should be used and the bug is that it
>>>>> isn't?
>>>>>
>>>> Looks like the latter one. 'extra' field should  be used to get hold of
>>>> the last extra info in the ring. ;-)
>>>>
>>>> But, the only extra info in upstream kernel is
>> XEN_NETIF_EXTRA_TYPE_GSO,
>>>> so there's really no other extra info in the ring at that point. Could
>>>> it be possible that it is something from classic Xen kernel?
>>> The classic kernel netfront has exactly the same code it seems and
>>> netif_extra_type_gso is the only one I've ever heard of.
>>>
>>> Maybe this extra thing is just redundant unless/until a second extra
>>> comes along.
>> In our windows pv driver, we do not process this for GSO in tx path
>> either. Maybe we ignored processing for some special GSO?
>>
>> BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
>> processes it in xen_netback_tx_build_gops, but netfront xmit path does
>> not really set this flag. I did process it in rx path of my windows pv
>> driver(linux netfront did that too), but it seems unnecessary since
>> netback does not set this flag at all.
>>
> The flag is there to denote the existence of an 'extra' segment in the packet. The 'extra' segment goes after the 1st segment and specifies metadata such as the GSO type (TCPv4 is the only one at the moment but we'll need TCPv6 very shortly) and the MSS.

For TCPv4 GSO, it seems one extra info request(NETTXF_extra_info) is 
enough in my winpv driver, and I did not process the XEN_NETIF_EXTRA_MORE.
Do you create two extra info requests for bothTCPv4 and TCPv6 GSO like 
following?

  * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
  * [Request 3: netif_tx_extra]  (only if request 2 has XEN_NETIF_EXTRA_MORE)


> Extra segments are certainly not redundant; the Citrix Windows PV drivers send TSOs using them and handle LRO using them too.

About the LRO, upstream netback does not create any response with 
XEN_NETIF_EXTRA_MORE, so I assume your dom0 did such process?

Thanks
Annie

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  9:53             ` [Xen-devel] " annie li
@ 2013-03-19 10:03               ` Paul Durrant
  2013-03-19 10:03               ` Paul Durrant
  1 sibling, 0 replies; 97+ messages in thread
From: Paul Durrant @ 2013-03-19 10:03 UTC (permalink / raw)
  To: annie li; +Cc: Ian Campbell, netdev, konrad.wilk, Wei Liu, xen-devel

> -----Original Message-----
[snip] 
> For TCPv4 GSO, it seems one extra info request(NETTXF_extra_info) is
> enough in my winpv driver, and I did not process the
> XEN_NETIF_EXTRA_MORE.
> Do you create two extra info requests for bothTCPv4 and TCPv6 GSO like
> following?
> 
>   * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
>   * [Request 3: netif_tx_extra]  (only if request 2 has
> XEN_NETIF_EXTRA_MORE)
> 

No, I just use one. I don't use XEN_NETIF_EXTRA_FLAG_MORE. I thought you were questioning the existence of extra segments rather than this flag. I guess I got the wrong end of the stick.
I've not seen anything use XEN_NETIF_EXTRA_FLAG_MORE, but that's not to say nothing will use it in the future. Clearly something is needed to indicate subsequent extra segments should they ever be needed.

  Paul

> 
> > Extra segments are certainly not redundant; the Citrix Windows PV drivers
> send TSOs using them and handle LRO using them too.
> 
> About the LRO, upstream netback does not create any response with
> XEN_NETIF_EXTRA_MORE, so I assume your dom0 did such process?
> 
> Thanks
> Annie

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  9:53             ` [Xen-devel] " annie li
  2013-03-19 10:03               ` Paul Durrant
@ 2013-03-19 10:03               ` Paul Durrant
  1 sibling, 0 replies; 97+ messages in thread
From: Paul Durrant @ 2013-03-19 10:03 UTC (permalink / raw)
  To: annie li; +Cc: netdev, xen-devel, Wei Liu, Ian Campbell, konrad.wilk

> -----Original Message-----
[snip] 
> For TCPv4 GSO, it seems one extra info request(NETTXF_extra_info) is
> enough in my winpv driver, and I did not process the
> XEN_NETIF_EXTRA_MORE.
> Do you create two extra info requests for bothTCPv4 and TCPv6 GSO like
> following?
> 
>   * [Request 2: netif_tx_extra]  (only if request 1 has NETTXF_extra_info)
>   * [Request 3: netif_tx_extra]  (only if request 2 has
> XEN_NETIF_EXTRA_MORE)
> 

No, I just use one. I don't use XEN_NETIF_EXTRA_FLAG_MORE. I thought you were questioning the existence of extra segments rather than this flag. I guess I got the wrong end of the stick.
I've not seen anything use XEN_NETIF_EXTRA_FLAG_MORE, but that's not to say nothing will use it in the future. Clearly something is needed to indicate subsequent extra segments should they ever be needed.

  Paul

> 
> > Extra segments are certainly not redundant; the Citrix Windows PV drivers
> send TSOs using them and handle LRO using them too.
> 
> About the LRO, upstream netback does not create any response with
> XEN_NETIF_EXTRA_MORE, so I assume your dom0 did such process?
> 
> Thanks
> Annie

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:19         ` Wei Liu
  2013-03-19 13:40           ` David Vrabel
@ 2013-03-19 13:40           ` David Vrabel
  2013-03-19 15:23             ` Wei Liu
  2013-03-19 15:23             ` [Xen-devel] " Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-19 13:40 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, netdev, xen-devel, annie.li, konrad.wilk

On 18/03/13 14:19, Wei Liu wrote:
> On Mon, 2013-03-18 at 14:00 +0000, David Vrabel wrote:
>> On 18/03/13 13:48, Ian Campbell wrote:
>>> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
>>>> On 18/03/13 10:35, Wei Liu wrote:
>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>>> 65535 will cause overflow.
>>>>
>>>> The backend needs to be able to handle these bad packets without
>>>> disconnecting the VIF -- we can't fix all the frontend drivers.
>>>
>>> Agreed, although that doesn't imply that we shouldn't fix the frontend
>>> where we can -- such as upstream as Wei does here.
>>
>> Yes, frontends should be fixed where possible.
>>
>> This is what I came up with for the backend.  I don't have time to look
>> into it further but, Wei, feel free to use it as a starting point.
>>
> 
> Thanks for this patch.
> 
> I haven't gone through XSA-39 discussion, this is why I didn't come up
> with a fix for backend -- I need to make sure dropping packet like this
> won't re-exhibit the security hole.

How are these overlarge packets generated?  How do you reproduce the issue?

David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:19         ` Wei Liu
@ 2013-03-19 13:40           ` David Vrabel
  2013-03-19 13:40           ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-19 13:40 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, Ian Campbell, xen-devel

On 18/03/13 14:19, Wei Liu wrote:
> On Mon, 2013-03-18 at 14:00 +0000, David Vrabel wrote:
>> On 18/03/13 13:48, Ian Campbell wrote:
>>> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
>>>> On 18/03/13 10:35, Wei Liu wrote:
>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>>> 65535 will cause overflow.
>>>>
>>>> The backend needs to be able to handle these bad packets without
>>>> disconnecting the VIF -- we can't fix all the frontend drivers.
>>>
>>> Agreed, although that doesn't imply that we shouldn't fix the frontend
>>> where we can -- such as upstream as Wei does here.
>>
>> Yes, frontends should be fixed where possible.
>>
>> This is what I came up with for the backend.  I don't have time to look
>> into it further but, Wei, feel free to use it as a starting point.
>>
> 
> Thanks for this patch.
> 
> I haven't gone through XSA-39 discussion, this is why I didn't come up
> with a fix for backend -- I need to make sure dropping packet like this
> won't re-exhibit the security hole.

How are these overlarge packets generated?  How do you reproduce the issue?

David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-19 13:40           ` [Xen-devel] " David Vrabel
  2013-03-19 15:23             ` Wei Liu
@ 2013-03-19 15:23             ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-19 15:23 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, netdev, xen-devel, annie.li, konrad.wilk

On Tue, 2013-03-19 at 13:40 +0000, David Vrabel wrote:
> On 18/03/13 14:19, Wei Liu wrote:
> > On Mon, 2013-03-18 at 14:00 +0000, David Vrabel wrote:
> >> On 18/03/13 13:48, Ian Campbell wrote:
> >>> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
> >>>> On 18/03/13 10:35, Wei Liu wrote:
> >>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>>>> 65535 will cause overflow.
> >>>>
> >>>> The backend needs to be able to handle these bad packets without
> >>>> disconnecting the VIF -- we can't fix all the frontend drivers.
> >>>
> >>> Agreed, although that doesn't imply that we shouldn't fix the frontend
> >>> where we can -- such as upstream as Wei does here.
> >>
> >> Yes, frontends should be fixed where possible.
> >>
> >> This is what I came up with for the backend.  I don't have time to look
> >> into it further but, Wei, feel free to use it as a starting point.
> >>
> > 
> > Thanks for this patch.
> > 
> > I haven't gone through XSA-39 discussion, this is why I didn't come up
> > with a fix for backend -- I need to make sure dropping packet like this
> > won't re-exhibit the security hole.
> 
> How are these overlarge packets generated?  How do you reproduce the issue?
> 

Inside a VM, ifconfig eth0 mtu 100, iperf -c XXXX .

But other people seeing this could not be using the same method because
nobody would set mtu to 100 in production system AFAICT.


Wei.

> David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-19 13:40           ` [Xen-devel] " David Vrabel
@ 2013-03-19 15:23             ` Wei Liu
  2013-03-19 15:23             ` [Xen-devel] " Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-19 15:23 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On Tue, 2013-03-19 at 13:40 +0000, David Vrabel wrote:
> On 18/03/13 14:19, Wei Liu wrote:
> > On Mon, 2013-03-18 at 14:00 +0000, David Vrabel wrote:
> >> On 18/03/13 13:48, Ian Campbell wrote:
> >>> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
> >>>> On 18/03/13 10:35, Wei Liu wrote:
> >>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>>>> 65535 will cause overflow.
> >>>>
> >>>> The backend needs to be able to handle these bad packets without
> >>>> disconnecting the VIF -- we can't fix all the frontend drivers.
> >>>
> >>> Agreed, although that doesn't imply that we shouldn't fix the frontend
> >>> where we can -- such as upstream as Wei does here.
> >>
> >> Yes, frontends should be fixed where possible.
> >>
> >> This is what I came up with for the backend.  I don't have time to look
> >> into it further but, Wei, feel free to use it as a starting point.
> >>
> > 
> > Thanks for this patch.
> > 
> > I haven't gone through XSA-39 discussion, this is why I didn't come up
> > with a fix for backend -- I need to make sure dropping packet like this
> > won't re-exhibit the security hole.
> 
> How are these overlarge packets generated?  How do you reproduce the issue?
> 

Inside a VM, ifconfig eth0 mtu 100, iperf -c XXXX .

But other people seeing this could not be using the same method because
nobody would set mtu to 100 in production system AFAICT.


Wei.

> David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
                               ` (2 preceding siblings ...)
  2013-03-19 15:26             ` Wei Liu
@ 2013-03-19 15:26             ` Wei Liu
  2013-04-09 14:28               ` Ian Campbell
  2013-04-09 14:28               ` Ian Campbell
  3 siblings, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-19 15:26 UTC (permalink / raw)
  To: Paul Durrant
  Cc: wei.liu2, annie li, Ian Campbell, netdev, konrad.wilk, xen-devel

On Tue, 2013-03-19 at 09:28 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> > bounces@lists.xen.org] On Behalf Of annie li
> > Sent: 19 March 2013 02:39
> > To: Ian Campbell
> > Cc: netdev@vger.kernel.org; konrad.wilk@oracle.com; Wei Liu; xen-
> > devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable
> > `extra'
> > 
> > 
> > On 2013-3-18 20:14, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> > >> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > >>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > >>>
> > >>> I think a few more words are needed here since from the code you are
> > >>> removing it seems very much like gso is used for something. If you have
> > >>> a proof that the "extra = gso" case is never hit then please explain it.
> > >>> Perhaps a reference to the removal of the last user?
> > >>>
> > >>> Or maybe it is the case that it should be used and the bug is that it
> > >>> isn't?
> > >>>
> > >> Looks like the latter one. 'extra' field should  be used to get hold of
> > >> the last extra info in the ring. ;-)
> > >>
> > >> But, the only extra info in upstream kernel is
> > XEN_NETIF_EXTRA_TYPE_GSO,
> > >> so there's really no other extra info in the ring at that point. Could
> > >> it be possible that it is something from classic Xen kernel?
> > > The classic kernel netfront has exactly the same code it seems and
> > > netif_extra_type_gso is the only one I've ever heard of.
> > >
> > > Maybe this extra thing is just redundant unless/until a second extra
> > > comes along.
> > 
> > In our windows pv driver, we do not process this for GSO in tx path
> > either. Maybe we ignored processing for some special GSO?
> > 
> > BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
> > processes it in xen_netback_tx_build_gops, but netfront xmit path does
> > not really set this flag. I did process it in rx path of my windows pv
> > driver(linux netfront did that too), but it seems unnecessary since
> > netback does not set this flag at all.
> > 
> 
> The flag is there to denote the existence of an 'extra' segment in the packet. The 'extra' segment goes after the 1st segment and specifies metadata such as the GSO type (TCPv4 is the only one at the moment but we'll need TCPv6 very shortly) and the MSS.
> Extra segments are certainly not redundant; the Citrix Windows PV drivers send TSOs using them and handle LRO using them too.
> 

I think Ian's (and my) idea of redundant is that this 'extra' variable
is never used in the code now and causes confusion. It can be removed
now and add back in the future if necessary.


Wei.

>   Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
  2013-03-19  9:53             ` annie li
  2013-03-19  9:53             ` [Xen-devel] " annie li
@ 2013-03-19 15:26             ` Wei Liu
  2013-03-19 15:26             ` [Xen-devel] " Wei Liu
  3 siblings, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-19 15:26 UTC (permalink / raw)
  To: Paul Durrant
  Cc: wei.liu2, Ian Campbell, konrad.wilk, netdev, xen-devel, annie li

On Tue, 2013-03-19 at 09:28 +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-
> > bounces@lists.xen.org] On Behalf Of annie li
> > Sent: 19 March 2013 02:39
> > To: Ian Campbell
> > Cc: netdev@vger.kernel.org; konrad.wilk@oracle.com; Wei Liu; xen-
> > devel@lists.xen.org
> > Subject: Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable
> > `extra'
> > 
> > 
> > On 2013-3-18 20:14, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 12:04 +0000, Wei Liu wrote:
> > >> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > >>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > >>>
> > >>> I think a few more words are needed here since from the code you are
> > >>> removing it seems very much like gso is used for something. If you have
> > >>> a proof that the "extra = gso" case is never hit then please explain it.
> > >>> Perhaps a reference to the removal of the last user?
> > >>>
> > >>> Or maybe it is the case that it should be used and the bug is that it
> > >>> isn't?
> > >>>
> > >> Looks like the latter one. 'extra' field should  be used to get hold of
> > >> the last extra info in the ring. ;-)
> > >>
> > >> But, the only extra info in upstream kernel is
> > XEN_NETIF_EXTRA_TYPE_GSO,
> > >> so there's really no other extra info in the ring at that point. Could
> > >> it be possible that it is something from classic Xen kernel?
> > > The classic kernel netfront has exactly the same code it seems and
> > > netif_extra_type_gso is the only one I've ever heard of.
> > >
> > > Maybe this extra thing is just redundant unless/until a second extra
> > > comes along.
> > 
> > In our windows pv driver, we do not process this for GSO in tx path
> > either. Maybe we ignored processing for some special GSO?
> > 
> > BTW, what is XEN_NETIF_EXTRA_FLAG_MORE actually for? Backend only
> > processes it in xen_netback_tx_build_gops, but netfront xmit path does
> > not really set this flag. I did process it in rx path of my windows pv
> > driver(linux netfront did that too), but it seems unnecessary since
> > netback does not set this flag at all.
> > 
> 
> The flag is there to denote the existence of an 'extra' segment in the packet. The 'extra' segment goes after the 1st segment and specifies metadata such as the GSO type (TCPv4 is the only one at the moment but we'll need TCPv6 very shortly) and the MSS.
> Extra segments are certainly not redundant; the Citrix Windows PV drivers send TSOs using them and handle LRO using them too.
> 

I think Ian's (and my) idea of redundant is that this 'extra' variable
is never used in the code now and causes confusion. It can be removed
now and add back in the future if necessary.


Wei.

>   Paul

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
                     ` (7 preceding siblings ...)
  2013-03-19  1:35   ` annie li
@ 2013-03-19 20:13   ` Nick Pegg
  8 siblings, 0 replies; 97+ messages in thread
From: Nick Pegg @ 2013-03-19 20:13 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, konrad.wilk, ian.campbell, xen-devel


On 3/18/13 6:35 AM, Wei Liu wrote:
> The `size' field of Xen network wire format is uint16_t, anything bigger than
> 65535 will cause overflow.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |   12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 5527663..8c3d065 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	unsigned int len = skb_headlen(skb);
>  	unsigned long flags;
>  
> +	/*
> +	 * wire format of xen_netif_tx_request only supports skb->len
> +	 * < 64K, because size field in xen_netif_tx_request is
> +	 * uint16_t.
> +	 */
> +	if (unlikely(skb->len > (uint16_t)(~0))) {
> +		net_alert_ratelimited(
> +			"xennet: skb->len = %d, too big for wire format\n",
> +			skb->len);
> +		goto drop;
> +	}
> +
>  	slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
>  		xennet_count_skb_frag_slots(skb);
>  	if (unlikely(slots > MAX_SKB_FRAGS + 1)) {
> 

I have tested this patch on a 3.7.10 DomU and have confirmed that it
works for the test case that Wei came up with (set MTU to 100 on DomU,
run iperf). I haven't come across any other ways to cause this to
happen, so my testing isn't very thorough.


-Nick

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 15:07           ` Ian Campbell
                               ` (2 preceding siblings ...)
  2013-03-19 21:24             ` Ben Hutchings
@ 2013-03-19 21:24             ` Ben Hutchings
  2013-03-19 21:28               ` Ben Hutchings
  2013-03-19 21:28               ` Ben Hutchings
  3 siblings, 2 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-03-19 21:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > 65535 will cause overflow.
> > > > > > 
> > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > ---
> > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > >  1 file changed, 12 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > index 5527663..8c3d065 100644
> > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > >  	unsigned long flags;
> > > > > >  
> > > > > > +	/*
> > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > +	 * uint16_t.
> > > > > 
> > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > would stop this from happening?
> > > > > 
> > > > 
> > > > struct ethernet_device? I could not find it.
> > > > 
> > > > And for struct net_device,
> > > 
> > > I meant struct net_device.
> > > 
> > > >  there is no field for this AFAICT.
> > > 
> > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > GSO skbs then I wonder.
> > > 
> > 
> > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > net_device. :-)
> 
> But aren't we seeing skb's bigger than that?
> 
> Maybe this is just a historical bug in some older guests?

GSO_MAX_SIZE is the maximum payload length, not the maximum total length
of an skb.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 15:07           ` Ian Campbell
  2013-03-18 15:10             ` Wei Liu
  2013-03-18 15:10             ` Wei Liu
@ 2013-03-19 21:24             ` Ben Hutchings
  2013-03-19 21:24             ` Ben Hutchings
  3 siblings, 0 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-03-19 21:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > 65535 will cause overflow.
> > > > > > 
> > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > ---
> > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > >  1 file changed, 12 insertions(+)
> > > > > > 
> > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > index 5527663..8c3d065 100644
> > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > >  	unsigned long flags;
> > > > > >  
> > > > > > +	/*
> > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > +	 * uint16_t.
> > > > > 
> > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > would stop this from happening?
> > > > > 
> > > > 
> > > > struct ethernet_device? I could not find it.
> > > > 
> > > > And for struct net_device,
> > > 
> > > I meant struct net_device.
> > > 
> > > >  there is no field for this AFAICT.
> > > 
> > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > GSO skbs then I wonder.
> > > 
> > 
> > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > net_device. :-)
> 
> But aren't we seeing skb's bigger than that?
> 
> Maybe this is just a historical bug in some older guests?

GSO_MAX_SIZE is the maximum payload length, not the maximum total length
of an skb.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-19 21:24             ` Ben Hutchings
@ 2013-03-19 21:28               ` Ben Hutchings
  2013-04-09 14:30                 ` Ian Campbell
  2013-04-09 14:30                 ` Ian Campbell
  2013-03-19 21:28               ` Ben Hutchings
  1 sibling, 2 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-03-19 21:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, netdev, xen-devel, konrad.wilk, annie.li

On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > 65535 will cause overflow.
> > > > > > > 
> > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > ---
> > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > index 5527663..8c3d065 100644
> > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > >  	unsigned long flags;
> > > > > > >  
> > > > > > > +	/*
> > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > +	 * uint16_t.
> > > > > > 
> > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > would stop this from happening?
> > > > > > 
> > > > > 
> > > > > struct ethernet_device? I could not find it.
> > > > > 
> > > > > And for struct net_device,
> > > > 
> > > > I meant struct net_device.
> > > > 
> > > > >  there is no field for this AFAICT.
> > > > 
> > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > GSO skbs then I wonder.
> > > > 
> > > 
> > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > net_device. :-)
> > 
> > But aren't we seeing skb's bigger than that?
> > 
> > Maybe this is just a historical bug in some older guests?
> 
> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> of an skb.

...and it's actually just the default value assigned to
dev->gso_max_size.  You'll want to change it to your actual maximum
(65535 - maximum length of headers) before registering your net devices.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-19 21:24             ` Ben Hutchings
  2013-03-19 21:28               ` Ben Hutchings
@ 2013-03-19 21:28               ` Ben Hutchings
  1 sibling, 0 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-03-19 21:28 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > 65535 will cause overflow.
> > > > > > > 
> > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > ---
> > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > index 5527663..8c3d065 100644
> > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > >  	unsigned long flags;
> > > > > > >  
> > > > > > > +	/*
> > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > +	 * uint16_t.
> > > > > > 
> > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > would stop this from happening?
> > > > > > 
> > > > > 
> > > > > struct ethernet_device? I could not find it.
> > > > > 
> > > > > And for struct net_device,
> > > > 
> > > > I meant struct net_device.
> > > > 
> > > > >  there is no field for this AFAICT.
> > > > 
> > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > GSO skbs then I wonder.
> > > > 
> > > 
> > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > net_device. :-)
> > 
> > But aren't we seeing skb's bigger than that?
> > 
> > Maybe this is just a historical bug in some older guests?
> 
> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> of an skb.

...and it's actually just the default value assigned to
dev->gso_max_size.  You'll want to change it to your actual maximum
(65535 - maximum length of headers) before registering your net devices.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:00       ` David Vrabel
                           ` (2 preceding siblings ...)
  2013-03-20 20:02         ` David Vrabel
@ 2013-03-20 20:02         ` David Vrabel
  2013-03-21 13:40           ` Wei Liu
  2013-03-21 13:40           ` Wei Liu
  3 siblings, 2 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-20 20:02 UTC (permalink / raw)
  To: David Vrabel
  Cc: Ian Campbell, netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On 18/03/13 14:00, David Vrabel wrote:
> On 18/03/13 13:48, Ian Campbell wrote:
>> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
>>> On 18/03/13 10:35, Wei Liu wrote:
>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>> 65535 will cause overflow.
>>>
>>> The backend needs to be able to handle these bad packets without
>>> disconnecting the VIF -- we can't fix all the frontend drivers.
>>
>> Agreed, although that doesn't imply that we shouldn't fix the frontend
>> where we can -- such as upstream as Wei does here.
> 
> Yes, frontends should be fixed where possible.
> 
> This is what I came up with for the backend.  I don't have time to look
> into it further but, Wei, feel free to use it as a starting point.

Got some time to test this (or more correctly, something similar with
XCP's kernel) and some fixes to the suggested patch are needed.  See below.

> diff --git a/drivers/net/xen-netback/netback.c
> b/drivers/net/xen-netback/netback.c
> index cd49ba9..18e2671 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -899,10 +899,11 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
>  static int netbk_count_requests(struct xenvif *vif,
>  				struct xen_netif_tx_request *first,
>  				struct xen_netif_tx_request *txp,
> -				int work_to_do)
> +				int work_to_do, int idx)

idx should be of RING_IDX type.

>  {
>  	RING_IDX cons = vif->tx.req_cons;
>  	int frags = 0;
> +	bool drop = false;
> 
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
> @@ -922,10 +923,20 @@ static int netbk_count_requests(struct xenvif *vif,
> 
>  		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
>  		       sizeof(*txp));
> -		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> -			netbk_fatal_tx_err(vif);
> -			return -EIO;
> +
> +		/*
> +		 * If the guest submitted a frame >= 64 KiB then
> +		 * first->size overflowed and following frags will
> +		 * appear to be larger than the frame.
> +		 *
> +		 * This cannot be a fatal error as there are buggy
> +		 * frontends that do this.
> +		 *
> +		 * Consume all the frags and drop the packet.
> +		 */
> +		if (!drop && txp->size > first->size) {
> +			netdev_dbg(vif->dev, "Frag is bigger than frame.\n");
> +			drop = true;
>  		}
> 
>  		first->size -= txp->size;
> @@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> +
> +	if (drop) {
> +		netbk_tx_err(vif, txp, idx + frags);

This needs to be netbk_tx_err(vif, first, idx + frags) or the guest will
crash as we push a bunch of invalid responses.

David

> +		return -EIO;
> +	}
> +
>  	return frags;
>  }
> 
> @@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
> xen_netbk *netbk)
>  				continue;
>  		}
> 
> -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> +		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
>  		if (unlikely(ret < 0))
>  			continue;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-18 14:00       ` David Vrabel
  2013-03-18 14:19         ` Wei Liu
  2013-03-18 14:19         ` Wei Liu
@ 2013-03-20 20:02         ` David Vrabel
  2013-03-20 20:02         ` [Xen-devel] " David Vrabel
  3 siblings, 0 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-20 20:02 UTC (permalink / raw)
  To: David Vrabel
  Cc: Wei Liu, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On 18/03/13 14:00, David Vrabel wrote:
> On 18/03/13 13:48, Ian Campbell wrote:
>> On Mon, 2013-03-18 at 13:46 +0000, David Vrabel wrote:
>>> On 18/03/13 10:35, Wei Liu wrote:
>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>> 65535 will cause overflow.
>>>
>>> The backend needs to be able to handle these bad packets without
>>> disconnecting the VIF -- we can't fix all the frontend drivers.
>>
>> Agreed, although that doesn't imply that we shouldn't fix the frontend
>> where we can -- such as upstream as Wei does here.
> 
> Yes, frontends should be fixed where possible.
> 
> This is what I came up with for the backend.  I don't have time to look
> into it further but, Wei, feel free to use it as a starting point.

Got some time to test this (or more correctly, something similar with
XCP's kernel) and some fixes to the suggested patch are needed.  See below.

> diff --git a/drivers/net/xen-netback/netback.c
> b/drivers/net/xen-netback/netback.c
> index cd49ba9..18e2671 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -899,10 +899,11 @@ static void netbk_fatal_tx_err(struct xenvif *vif)
>  static int netbk_count_requests(struct xenvif *vif,
>  				struct xen_netif_tx_request *first,
>  				struct xen_netif_tx_request *txp,
> -				int work_to_do)
> +				int work_to_do, int idx)

idx should be of RING_IDX type.

>  {
>  	RING_IDX cons = vif->tx.req_cons;
>  	int frags = 0;
> +	bool drop = false;
> 
>  	if (!(first->flags & XEN_NETTXF_more_data))
>  		return 0;
> @@ -922,10 +923,20 @@ static int netbk_count_requests(struct xenvif *vif,
> 
>  		memcpy(txp, RING_GET_REQUEST(&vif->tx, cons + frags),
>  		       sizeof(*txp));
> -		if (txp->size > first->size) {
> -			netdev_err(vif->dev, "Frag is bigger than frame.\n");
> -			netbk_fatal_tx_err(vif);
> -			return -EIO;
> +
> +		/*
> +		 * If the guest submitted a frame >= 64 KiB then
> +		 * first->size overflowed and following frags will
> +		 * appear to be larger than the frame.
> +		 *
> +		 * This cannot be a fatal error as there are buggy
> +		 * frontends that do this.
> +		 *
> +		 * Consume all the frags and drop the packet.
> +		 */
> +		if (!drop && txp->size > first->size) {
> +			netdev_dbg(vif->dev, "Frag is bigger than frame.\n");
> +			drop = true;
>  		}
> 
>  		first->size -= txp->size;
> @@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
>  			return -EINVAL;
>  		}
>  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> +
> +	if (drop) {
> +		netbk_tx_err(vif, txp, idx + frags);

This needs to be netbk_tx_err(vif, first, idx + frags) or the guest will
crash as we push a bunch of invalid responses.

David

> +		return -EIO;
> +	}
> +
>  	return frags;
>  }
> 
> @@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
> xen_netbk *netbk)
>  				continue;
>  		}
> 
> -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> +		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
>  		if (unlikely(ret < 0))
>  			continue;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-20 20:02         ` [Xen-devel] " David Vrabel
@ 2013-03-21 13:40           ` Wei Liu
  2013-03-21 14:11             ` David Vrabel
  2013-03-21 14:11             ` [Xen-devel] " David Vrabel
  2013-03-21 13:40           ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 13:40 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, netdev, annie.li, konrad.wilk, xen-devel

On Wed, 2013-03-20 at 20:02 +0000, David Vrabel wrote:

> >  		first->size -= txp->size;
> > @@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
> >  			return -EINVAL;
> >  		}
> >  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> > +
> > +	if (drop) {
> > +		netbk_tx_err(vif, txp, idx + frags);
> 
> This needs to be netbk_tx_err(vif, first, idx + frags) or the guest will
> crash as we push a bunch of invalid responses.
> 

Can this really handle the situation when first->size overflows? In that
case frag == 0, the netbk_tx_err call is in fact netbk_tx_err(vif, txp,
idx). idx is the ring index of first txp, so in fact you're only
responding to the head txp, ignoring other tx requests for the same skb?

Even first->size doesn't overflow, a malicious / buggy frontend can
still generate tx req that makes txp->size > first->size. In that case
there could be also some trailing tx requests left un-responded.

I check the code before XSA-39 fix, its logic is more or less the same,
but it did work. My suspicion is that those trailing tx requests are
invalidated in future loops of xen_netbk_tx_build_gops.

I think the correct action is to just take first txp and loop responding
until we consume the whole packet.


Wei.


> David
> 
> > +		return -EIO;
> > +	}
> > +
> >  	return frags;
> >  }
> > 
> > @@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
> > xen_netbk *netbk)
> >  				continue;
> >  		}
> > 
> > -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> > +		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
> >  		if (unlikely(ret < 0))
> >  			continue;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-20 20:02         ` [Xen-devel] " David Vrabel
  2013-03-21 13:40           ` Wei Liu
@ 2013-03-21 13:40           ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 13:40 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On Wed, 2013-03-20 at 20:02 +0000, David Vrabel wrote:

> >  		first->size -= txp->size;
> > @@ -938,6 +949,12 @@ static int netbk_count_requests(struct xenvif *vif,
> >  			return -EINVAL;
> >  		}
> >  	} while ((txp++)->flags & XEN_NETTXF_more_data);
> > +
> > +	if (drop) {
> > +		netbk_tx_err(vif, txp, idx + frags);
> 
> This needs to be netbk_tx_err(vif, first, idx + frags) or the guest will
> crash as we push a bunch of invalid responses.
> 

Can this really handle the situation when first->size overflows? In that
case frag == 0, the netbk_tx_err call is in fact netbk_tx_err(vif, txp,
idx). idx is the ring index of first txp, so in fact you're only
responding to the head txp, ignoring other tx requests for the same skb?

Even first->size doesn't overflow, a malicious / buggy frontend can
still generate tx req that makes txp->size > first->size. In that case
there could be also some trailing tx requests left un-responded.

I check the code before XSA-39 fix, its logic is more or less the same,
but it did work. My suspicion is that those trailing tx requests are
invalidated in future loops of xen_netbk_tx_build_gops.

I think the correct action is to just take first txp and loop responding
until we consume the whole packet.


Wei.


> David
> 
> > +		return -EIO;
> > +	}
> > +
> >  	return frags;
> >  }
> > 
> > @@ -1327,7 +1344,7 @@ static unsigned xen_netbk_tx_build_gops(struct
> > xen_netbk *netbk)
> >  				continue;
> >  		}
> > 
> > -		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
> > +		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do, idx);
> >  		if (unlikely(ret < 0))
> >  			continue;

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-21 13:40           ` Wei Liu
  2013-03-21 14:11             ` David Vrabel
@ 2013-03-21 14:11             ` David Vrabel
  2013-03-21 14:15               ` Wei Liu
  2013-03-21 14:15               ` [Xen-devel] " Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-21 14:11 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, netdev, annie.li, konrad.wilk, xen-devel

On 21/03/13 13:40, Wei Liu wrote:
> 
> 
> I think the correct action is to just take first txp and loop responding
> until we consume the whole packet.

Um.  This is what the patch is doing.

David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-21 13:40           ` Wei Liu
@ 2013-03-21 14:11             ` David Vrabel
  2013-03-21 14:11             ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 97+ messages in thread
From: David Vrabel @ 2013-03-21 14:11 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie.li, xen-devel, Ian Campbell, konrad.wilk

On 21/03/13 13:40, Wei Liu wrote:
> 
> 
> I think the correct action is to just take first txp and loop responding
> until we consume the whole packet.

Um.  This is what the patch is doing.

David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-21 14:11             ` [Xen-devel] " David Vrabel
  2013-03-21 14:15               ` Wei Liu
@ 2013-03-21 14:15               ` Wei Liu
  2013-03-21 14:20                 ` Wei Liu
  2013-03-21 14:20                 ` [Xen-devel] " Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 14:15 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, netdev, annie.li, konrad.wilk, xen-devel

On Thu, 2013-03-21 at 14:11 +0000, David Vrabel wrote:
> On 21/03/13 13:40, Wei Liu wrote:
> > 
> > 
> > I think the correct action is to just take first txp and loop responding
> > until we consume the whole packet.
> 
> Um.  This is what the patch is doing.
> 

No. The idx you passed in is the index of the first txp, and idx + fras
doesn't necessary point to last tx requests of the packet. We should use
XEN_NETTXF_more_data to loop through the packet.


Wei.

> David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-21 14:11             ` [Xen-devel] " David Vrabel
@ 2013-03-21 14:15               ` Wei Liu
  2013-03-21 14:15               ` [Xen-devel] " Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 14:15 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On Thu, 2013-03-21 at 14:11 +0000, David Vrabel wrote:
> On 21/03/13 13:40, Wei Liu wrote:
> > 
> > 
> > I think the correct action is to just take first txp and loop responding
> > until we consume the whole packet.
> 
> Um.  This is what the patch is doing.
> 

No. The idx you passed in is the index of the first txp, and idx + fras
doesn't necessary point to last tx requests of the packet. We should use
XEN_NETTXF_more_data to loop through the packet.


Wei.

> David

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-21 14:15               ` [Xen-devel] " Wei Liu
  2013-03-21 14:20                 ` Wei Liu
@ 2013-03-21 14:20                 ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 14:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, netdev, annie.li, konrad.wilk, xen-devel

On Thu, 2013-03-21 at 14:15 +0000, Wei Liu wrote:
> On Thu, 2013-03-21 at 14:11 +0000, David Vrabel wrote:
> > On 21/03/13 13:40, Wei Liu wrote:
> > > 
> > > 
> > > I think the correct action is to just take first txp and loop responding
> > > until we consume the whole packet.
> > 
> > Um.  This is what the patch is doing.
> > 
> 
> No. The idx you passed in is the index of the first txp, and idx + fras
> doesn't necessary point to last tx requests of the packet. We should use
> XEN_NETTXF_more_data to loop through the packet.
> 

Sorry for the noise, this patch already loop through the whole packet.


Wei.

> 
> Wei.
> 
> > David
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-21 14:15               ` [Xen-devel] " Wei Liu
@ 2013-03-21 14:20                 ` Wei Liu
  2013-03-21 14:20                 ` [Xen-devel] " Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 14:20 UTC (permalink / raw)
  To: David Vrabel
  Cc: wei.liu2, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On Thu, 2013-03-21 at 14:15 +0000, Wei Liu wrote:
> On Thu, 2013-03-21 at 14:11 +0000, David Vrabel wrote:
> > On 21/03/13 13:40, Wei Liu wrote:
> > > 
> > > 
> > > I think the correct action is to just take first txp and loop responding
> > > until we consume the whole packet.
> > 
> > Um.  This is what the patch is doing.
> > 
> 
> No. The idx you passed in is the index of the first txp, and idx + fras
> doesn't necessary point to last tx requests of the packet. We should use
> XEN_NETTXF_more_data to loop through the packet.
> 

Sorry for the noise, this patch already loop through the whole packet.


Wei.

> 
> Wei.
> 
> > David
> 
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 12:07   ` Ian Campbell
  2013-03-21 18:37     ` Wei Liu
@ 2013-03-21 18:37     ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 18:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk, annie.li

On Mon, 2013-03-18 at 12:07 +0000, Ian Campbell wrote:

> >  	/* Skip first skb fragment if it is on same page as header fragment. */
> >  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> >  
> > -	for (i = start; i < shinfo->nr_frags; i++, txp++) {
> > -		struct page *page;
> > -		pending_ring_idx_t index;
> > +	/* Coalesce tx requests, at this point the packet passed in
> > +	 * should be <= 64K. Any packets larger than 64K has been
> > +	 * dropped / caused fatal error early on.
> 
> Whereabouts is this? Since the size field is u16 how do we even detect
> this case. Since (at least prior to your other fix in this series) it
> would have overflowed when the guest constructed the request.
> 

This is done in netbk_count_requests(). I will fix the comment here.

> 
> > @@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
> >  	struct gnttab_copy *gop = *gopp;
> >  	u16 pending_idx = *((u16 *)skb->data);
> >  	struct skb_shared_info *shinfo = skb_shinfo(skb);
> > +	struct pending_tx_info *tx_info;
> >  	int nr_frags = shinfo->nr_frags;
> >  	int i, err, start;
> >  
> > @@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
> >  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> >  
> >  	for (i = start; i < nr_frags; i++) {
> > -		int j, newerr;
> > +		int j, newerr = 0, n;
> >  
> >  		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
> > +		tx_info = &netbk->pending_tx_info[pending_idx];
> >  
> >  		/* Check error status: if okay then remember grant handle. */
> > -		newerr = (++gop)->status;
> > +		for (n = 0; n < tx_info->nr_tx_req; n++) {
> struct pending_tx_info is used in some arrays which can have a fair few
> elements so if there are ways to reduce the size that is worth
> considering I think.
> 
> So rather than storing both nr_tx_req and start_idx can we just store
> start_idx and loop while start_idx != 0 (where the first one has
> start_idx == zero)?
> 
> This might fall out more naturally if you were to instead store next_idx
> in each pending tx with a suitable terminator at the end? Or could be
> last_idx if it is convenient to count that way round, you don't need to
> respond in-order.
> 

Done shrinking this structure.


Wei.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 12:07   ` Ian Campbell
@ 2013-03-21 18:37     ` Wei Liu
  2013-03-21 18:37     ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 18:37 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, wei.liu2, xen-devel

On Mon, 2013-03-18 at 12:07 +0000, Ian Campbell wrote:

> >  	/* Skip first skb fragment if it is on same page as header fragment. */
> >  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> >  
> > -	for (i = start; i < shinfo->nr_frags; i++, txp++) {
> > -		struct page *page;
> > -		pending_ring_idx_t index;
> > +	/* Coalesce tx requests, at this point the packet passed in
> > +	 * should be <= 64K. Any packets larger than 64K has been
> > +	 * dropped / caused fatal error early on.
> 
> Whereabouts is this? Since the size field is u16 how do we even detect
> this case. Since (at least prior to your other fix in this series) it
> would have overflowed when the guest constructed the request.
> 

This is done in netbk_count_requests(). I will fix the comment here.

> 
> > @@ -1025,6 +1108,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
> >  	struct gnttab_copy *gop = *gopp;
> >  	u16 pending_idx = *((u16 *)skb->data);
> >  	struct skb_shared_info *shinfo = skb_shinfo(skb);
> > +	struct pending_tx_info *tx_info;
> >  	int nr_frags = shinfo->nr_frags;
> >  	int i, err, start;
> >  
> > @@ -1037,12 +1121,17 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
> >  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
> >  
> >  	for (i = start; i < nr_frags; i++) {
> > -		int j, newerr;
> > +		int j, newerr = 0, n;
> >  
> >  		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
> > +		tx_info = &netbk->pending_tx_info[pending_idx];
> >  
> >  		/* Check error status: if okay then remember grant handle. */
> > -		newerr = (++gop)->status;
> > +		for (n = 0; n < tx_info->nr_tx_req; n++) {
> struct pending_tx_info is used in some arrays which can have a fair few
> elements so if there are ways to reduce the size that is worth
> considering I think.
> 
> So rather than storing both nr_tx_req and start_idx can we just store
> start_idx and loop while start_idx != 0 (where the first one has
> start_idx == zero)?
> 
> This might fall out more naturally if you were to instead store next_idx
> in each pending tx with a suitable terminator at the end? Or could be
> last_idx if it is convenient to count that way round, you don't need to
> respond in-order.
> 

Done shrinking this structure.


Wei.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-18 13:27     ` James Harper
@ 2013-03-21 19:08       ` Wei Liu
  2013-03-21 22:14         ` James Harper
  2013-03-21 22:14         ` [Xen-devel] " James Harper
  0 siblings, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-21 19:08 UTC (permalink / raw)
  To: James Harper
  Cc: Wei Liu, ian.campbell, konrad.wilk, netdev, xen-devel, annie.li


[-- Attachment #1.1: Type: text/plain, Size: 336 bytes --]

On Mon, Mar 18, 2013 at 1:27 PM, James Harper <james.harper@bendigoit.com.au
> wrote:
>
>  >
>
> Actually it turns out GPLPV just stops counting at 20. If I keep counting
> I can sometimes see over 1000 buffers per GSO packet under Windows using
> "iperf -


Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?


Wei.

[-- Attachment #1.2: Type: text/html, Size: 878 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [Xen-devel] [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-21 19:08       ` Wei Liu
  2013-03-21 22:14         ` James Harper
@ 2013-03-21 22:14         ` James Harper
  2013-03-22 11:06           ` Wei Liu
  2013-03-22 11:06           ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: James Harper @ 2013-03-21 22:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: Wei Liu, netdev, xen-devel, annie.li, ian.campbell, konrad.wilk

> 
>> Actually it turns out GPLPV just stops counting at 20. If I keep
>> counting I can sometimes see over 1000 buffers per GSO packet under
>> Windows using "iperf -
> 
> Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> 

Doesn't really matter. Under windows you have to coalesce anyway and the number of cases where the skb count is 20 or 21 is very small so there will be negligible gain and it will break guests that can't handle more than 19.

Has anyone done the benchmarks on if memcpy to coalesce is better or worse than consuming additional ring slots? Probably OT here but I'm talking about packets that might have 19 buffers but could fit on a page or two of coalesced.

James


^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-21 19:08       ` Wei Liu
@ 2013-03-21 22:14         ` James Harper
  2013-03-21 22:14         ` [Xen-devel] " James Harper
  1 sibling, 0 replies; 97+ messages in thread
From: James Harper @ 2013-03-21 22:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: Wei Liu, ian.campbell, konrad.wilk, netdev, xen-devel, annie.li

> 
>> Actually it turns out GPLPV just stops counting at 20. If I keep
>> counting I can sometimes see over 1000 buffers per GSO packet under
>> Windows using "iperf -
> 
> Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> 

Doesn't really matter. Under windows you have to coalesce anyway and the number of cases where the skb count is 20 or 21 is very small so there will be negligible gain and it will break guests that can't handle more than 19.

Has anyone done the benchmarks on if memcpy to coalesce is better or worse than consuming additional ring slots? Probably OT here but I'm talking about packets that might have 19 buffers but could fit on a page or two of coalesced.

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-21 22:14         ` [Xen-devel] " James Harper
@ 2013-03-22 11:06           ` Wei Liu
  2013-03-22 11:19             ` James Harper
  2013-03-22 11:19             ` [Xen-devel] " James Harper
  2013-03-22 11:06           ` Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-22 11:06 UTC (permalink / raw)
  To: James Harper
  Cc: Wei Liu, netdev, xen-devel, annie.li, Ian Campbell, konrad.wilk

On Thu, Mar 21, 2013 at 10:14:17PM +0000, James Harper wrote:
> > 
> >> Actually it turns out GPLPV just stops counting at 20. If I keep
> >> counting I can sometimes see over 1000 buffers per GSO packet under
> >> Windows using "iperf -
> > 
> > Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> > 
> 
> Doesn't really matter. Under windows you have to coalesce anyway and the number of cases where the skb count is 20 or 21 is very small so there will be negligible gain and it will break guests that can't handle more than 19.

It's not about performance, it's about usability. If frontend uses more
slots than backend allows it to, it gets disconnected. In case we don't
push the wrong value upstream, it is important to know whether 20 is
enough for Windows PV driver.

> 
> Has anyone done the benchmarks on if memcpy to coalesce is better or worse than consuming additional ring slots? Probably OT here but I'm talking about packets that might have 19 buffers but could fit on a page or two of coalesced.
> 

After this changeset number of grant copy operations is greater or equal
to number of slots. I run iperf as my functional test, I also notice
the result is within the same range before this change.

And a future improvement would be using compound page for backend, which
can make number of grant copy ops more or less equal to number of slots
used.


Wei.

> James
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-21 22:14         ` [Xen-devel] " James Harper
  2013-03-22 11:06           ` Wei Liu
@ 2013-03-22 11:06           ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-22 11:06 UTC (permalink / raw)
  To: James Harper
  Cc: Ian Campbell, konrad.wilk, netdev, Wei Liu, xen-devel, annie.li

On Thu, Mar 21, 2013 at 10:14:17PM +0000, James Harper wrote:
> > 
> >> Actually it turns out GPLPV just stops counting at 20. If I keep
> >> counting I can sometimes see over 1000 buffers per GSO packet under
> >> Windows using "iperf -
> > 
> > Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> > 
> 
> Doesn't really matter. Under windows you have to coalesce anyway and the number of cases where the skb count is 20 or 21 is very small so there will be negligible gain and it will break guests that can't handle more than 19.

It's not about performance, it's about usability. If frontend uses more
slots than backend allows it to, it gets disconnected. In case we don't
push the wrong value upstream, it is important to know whether 20 is
enough for Windows PV driver.

> 
> Has anyone done the benchmarks on if memcpy to coalesce is better or worse than consuming additional ring slots? Probably OT here but I'm talking about packets that might have 19 buffers but could fit on a page or two of coalesced.
> 

After this changeset number of grant copy operations is greater or equal
to number of slots. I run iperf as my functional test, I also notice
the result is within the same range before this change.

And a future improvement would be using compound page for backend, which
can make number of grant copy ops more or less equal to number of slots
used.


Wei.

> James
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* RE: [Xen-devel] [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-22 11:06           ` Wei Liu
  2013-03-22 11:19             ` James Harper
@ 2013-03-22 11:19             ` James Harper
  2013-03-22 11:28               ` Wei Liu
  2013-03-22 11:28               ` [Xen-devel] " Wei Liu
  1 sibling, 2 replies; 97+ messages in thread
From: James Harper @ 2013-03-22 11:19 UTC (permalink / raw)
  To: Wei Liu; +Cc: Wei Liu, netdev, xen-devel, annie.li, Ian Campbell, konrad.wilk

> 
> On Thu, Mar 21, 2013 at 10:14:17PM +0000, James Harper wrote:
> > >
> > >> Actually it turns out GPLPV just stops counting at 20. If I keep
> > >> counting I can sometimes see over 1000 buffers per GSO packet under
> > >> Windows using "iperf -
> > >
> > > Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> > >
> >
> > Doesn't really matter. Under windows you have to coalesce anyway and
> the number of cases where the skb count is 20 or 21 is very small so there will
> be negligible gain and it will break guests that can't handle more than 19.
> 
> It's not about performance, it's about usability. If frontend uses more
> slots than backend allows it to, it gets disconnected. In case we don't
> push the wrong value upstream, it is important to know whether 20 is
> enough for Windows PV driver.
> 

Windows will accept whatever you throw at it (there may be some upper limit, but I suspect it's quite high). Whatever Linux will accept, it will be less than the 1000+ buffers that Windows can generate, so some degree of coalescing will be required for Windows->Linux.

In GPLPV I already coalesce anything with more than 19 buffers, because I have no guarantee that Dom0 will accept anything more (and who knows what Solaris or BSD will accept, if those are still valid backends...), so whatever you increase Dom0 to won't matter because I would still need to assume that Linux can't accept more than 19, until such time as Dom0 (or driver domain) advertises the maximum buffer count it can support in xenstore...

So do what you need to do to make Linux work, just don't put the erroneous comment that "windows has a maximum of 20 buffers" or whatever it was in the comments :)

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-22 11:06           ` Wei Liu
@ 2013-03-22 11:19             ` James Harper
  2013-03-22 11:19             ` [Xen-devel] " James Harper
  1 sibling, 0 replies; 97+ messages in thread
From: James Harper @ 2013-03-22 11:19 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, konrad.wilk, netdev, Wei Liu, xen-devel, annie.li

> 
> On Thu, Mar 21, 2013 at 10:14:17PM +0000, James Harper wrote:
> > >
> > >> Actually it turns out GPLPV just stops counting at 20. If I keep
> > >> counting I can sometimes see over 1000 buffers per GSO packet under
> > >> Windows using "iperf -
> > >
> > > Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> > >
> >
> > Doesn't really matter. Under windows you have to coalesce anyway and
> the number of cases where the skb count is 20 or 21 is very small so there will
> be negligible gain and it will break guests that can't handle more than 19.
> 
> It's not about performance, it's about usability. If frontend uses more
> slots than backend allows it to, it gets disconnected. In case we don't
> push the wrong value upstream, it is important to know whether 20 is
> enough for Windows PV driver.
> 

Windows will accept whatever you throw at it (there may be some upper limit, but I suspect it's quite high). Whatever Linux will accept, it will be less than the 1000+ buffers that Windows can generate, so some degree of coalescing will be required for Windows->Linux.

In GPLPV I already coalesce anything with more than 19 buffers, because I have no guarantee that Dom0 will accept anything more (and who knows what Solaris or BSD will accept, if those are still valid backends...), so whatever you increase Dom0 to won't matter because I would still need to assume that Linux can't accept more than 19, until such time as Dom0 (or driver domain) advertises the maximum buffer count it can support in xenstore...

So do what you need to do to make Linux work, just don't put the erroneous comment that "windows has a maximum of 20 buffers" or whatever it was in the comments :)

James

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-22 11:19             ` [Xen-devel] " James Harper
  2013-03-22 11:28               ` Wei Liu
@ 2013-03-22 11:28               ` Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-22 11:28 UTC (permalink / raw)
  To: James Harper
  Cc: Wei Liu, netdev, xen-devel, annie.li, Ian Campbell, konrad.wilk

On Fri, Mar 22, 2013 at 11:19:56AM +0000, James Harper wrote:
> > 
> > On Thu, Mar 21, 2013 at 10:14:17PM +0000, James Harper wrote:
> > > >
> > > >> Actually it turns out GPLPV just stops counting at 20. If I keep
> > > >> counting I can sometimes see over 1000 buffers per GSO packet under
> > > >> Windows using "iperf -
> > > >
> > > > Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> > > >
> > >
> > > Doesn't really matter. Under windows you have to coalesce anyway and
> > the number of cases where the skb count is 20 or 21 is very small so there will
> > be negligible gain and it will break guests that can't handle more than 19.
> > 
> > It's not about performance, it's about usability. If frontend uses more
> > slots than backend allows it to, it gets disconnected. In case we don't
> > push the wrong value upstream, it is important to know whether 20 is
> > enough for Windows PV driver.
> > 
> 
> Windows will accept whatever you throw at it (there may be some upper limit, but I suspect it's quite high). Whatever Linux will accept, it will be less than the 1000+ buffers that Windows can generate, so some degree of coalescing will be required for Windows->Linux.
> 
> In GPLPV I already coalesce anything with more than 19 buffers, because I have no guarantee that Dom0 will accept anything more (and who knows what Solaris or BSD will accept, if those are still valid backends...), so whatever you increase Dom0 to won't matter because I would still need to assume that Linux can't accept more than 19, until such time as Dom0 (or driver domain) advertises the maximum buffer count it can support in xenstore...
> 
> So do what you need to do to make Linux work, just don't put the erroneous comment that "windows has a maximum of 20 buffers" or whatever it was in the comments :)
> 

OK, problem solved. :-)


Wei.

> James
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/4] xen-netback: coalesce slots before copying
  2013-03-22 11:19             ` [Xen-devel] " James Harper
@ 2013-03-22 11:28               ` Wei Liu
  2013-03-22 11:28               ` [Xen-devel] " Wei Liu
  1 sibling, 0 replies; 97+ messages in thread
From: Wei Liu @ 2013-03-22 11:28 UTC (permalink / raw)
  To: James Harper
  Cc: Ian Campbell, konrad.wilk, netdev, Wei Liu, xen-devel, annie.li

On Fri, Mar 22, 2013 at 11:19:56AM +0000, James Harper wrote:
> > 
> > On Thu, Mar 21, 2013 at 10:14:17PM +0000, James Harper wrote:
> > > >
> > > >> Actually it turns out GPLPV just stops counting at 20. If I keep
> > > >> counting I can sometimes see over 1000 buffers per GSO packet under
> > > >> Windows using "iperf -
> > > >
> > > > Do you think it is necessary to increase MAX_SKB_SLOTS_DEFAULT to 21?
> > > >
> > >
> > > Doesn't really matter. Under windows you have to coalesce anyway and
> > the number of cases where the skb count is 20 or 21 is very small so there will
> > be negligible gain and it will break guests that can't handle more than 19.
> > 
> > It's not about performance, it's about usability. If frontend uses more
> > slots than backend allows it to, it gets disconnected. In case we don't
> > push the wrong value upstream, it is important to know whether 20 is
> > enough for Windows PV driver.
> > 
> 
> Windows will accept whatever you throw at it (there may be some upper limit, but I suspect it's quite high). Whatever Linux will accept, it will be less than the 1000+ buffers that Windows can generate, so some degree of coalescing will be required for Windows->Linux.
> 
> In GPLPV I already coalesce anything with more than 19 buffers, because I have no guarantee that Dom0 will accept anything more (and who knows what Solaris or BSD will accept, if those are still valid backends...), so whatever you increase Dom0 to won't matter because I would still need to assume that Linux can't accept more than 19, until such time as Dom0 (or driver domain) advertises the maximum buffer count it can support in xenstore...
> 
> So do what you need to do to make Linux work, just don't put the erroneous comment that "windows has a maximum of 20 buffers" or whatever it was in the comments :)
> 

OK, problem solved. :-)


Wei.

> James
> 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19 15:26             ` [Xen-devel] " Wei Liu
@ 2013-04-09 14:28               ` Ian Campbell
  2013-04-09 14:28               ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-04-09 14:28 UTC (permalink / raw)
  To: Wei Liu; +Cc: Paul Durrant, annie li, netdev, konrad.wilk, xen-devel

(apologies for the late reply, I've been away)

On Tue, 2013-03-19 at 15:26 +0000, Wei Liu wrote:
> I think Ian's (and my) idea of redundant is that this 'extra' variable
> is never used in the code now and causes confusion. It can be removed
> now and add back in the future if necessary.

Right, the "extra" I was questioning at the top was a local variable in
the Linux code not the XEN_NETIF_EXTRA_FLAG_MORE thing. Although the
variable was related to the handling of that flag it currently was
written and then never read...

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/4] xen-netfront: remove unused variable `extra'
  2013-03-19 15:26             ` [Xen-devel] " Wei Liu
  2013-04-09 14:28               ` Ian Campbell
@ 2013-04-09 14:28               ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-04-09 14:28 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, annie li, Paul Durrant, xen-devel, konrad.wilk

(apologies for the late reply, I've been away)

On Tue, 2013-03-19 at 15:26 +0000, Wei Liu wrote:
> I think Ian's (and my) idea of redundant is that this 'extra' variable
> is never used in the code now and causes confusion. It can be removed
> now and add back in the future if necessary.

Right, the "extra" I was questioning at the top was a local variable in
the Linux code not the XEN_NETIF_EXTRA_FLAG_MORE thing. Although the
variable was related to the handling of that flag it currently was
written and then never read...

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-19 21:28               ` Ben Hutchings
  2013-04-09 14:30                 ` Ian Campbell
@ 2013-04-09 14:30                 ` Ian Campbell
  2013-04-09 14:45                   ` Ben Hutchings
  2013-04-09 14:45                   ` Ben Hutchings
  1 sibling, 2 replies; 97+ messages in thread
From: Ian Campbell @ 2013-04-09 14:30 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: Wei Liu, netdev, xen-devel, konrad.wilk, annie.li

On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> > On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > > 65535 will cause overflow.
> > > > > > > > 
> > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > > index 5527663..8c3d065 100644
> > > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > > >  	unsigned long flags;
> > > > > > > >  
> > > > > > > > +	/*
> > > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > > +	 * uint16_t.
> > > > > > > 
> > > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > > would stop this from happening?
> > > > > > > 
> > > > > > 
> > > > > > struct ethernet_device? I could not find it.
> > > > > > 
> > > > > > And for struct net_device,
> > > > > 
> > > > > I meant struct net_device.
> > > > > 
> > > > > >  there is no field for this AFAICT.
> > > > > 
> > > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > > GSO skbs then I wonder.
> > > > > 
> > > > 
> > > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > > net_device. :-)
> > > 
> > > But aren't we seeing skb's bigger than that?
> > > 
> > > Maybe this is just a historical bug in some older guests?
> > 
> > GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> > of an skb.
> 
> ...and it's actually just the default value assigned to
> dev->gso_max_size.  You'll want to change it to your actual maximum
> (65535 - maximum length of headers) before registering your net devices.

Thanks. 

"maximum length of headers" might be a bit tricky to determine
generically :-(.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-03-19 21:28               ` Ben Hutchings
@ 2013-04-09 14:30                 ` Ian Campbell
  2013-04-09 14:30                 ` Ian Campbell
  1 sibling, 0 replies; 97+ messages in thread
From: Ian Campbell @ 2013-04-09 14:30 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> > On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > > 65535 will cause overflow.
> > > > > > > > 
> > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > ---
> > > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > > index 5527663..8c3d065 100644
> > > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > > >  	unsigned long flags;
> > > > > > > >  
> > > > > > > > +	/*
> > > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > > +	 * uint16_t.
> > > > > > > 
> > > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > > would stop this from happening?
> > > > > > > 
> > > > > > 
> > > > > > struct ethernet_device? I could not find it.
> > > > > > 
> > > > > > And for struct net_device,
> > > > > 
> > > > > I meant struct net_device.
> > > > > 
> > > > > >  there is no field for this AFAICT.
> > > > > 
> > > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > > GSO skbs then I wonder.
> > > > > 
> > > > 
> > > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > > net_device. :-)
> > > 
> > > But aren't we seeing skb's bigger than that?
> > > 
> > > Maybe this is just a historical bug in some older guests?
> > 
> > GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> > of an skb.
> 
> ...and it's actually just the default value assigned to
> dev->gso_max_size.  You'll want to change it to your actual maximum
> (65535 - maximum length of headers) before registering your net devices.

Thanks. 

"maximum length of headers" might be a bit tricky to determine
generically :-(.

Ian.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-04-09 14:30                 ` Ian Campbell
@ 2013-04-09 14:45                   ` Ben Hutchings
  2013-04-09 14:53                     ` [Xen-devel] " Christoph Egger
  2013-04-09 14:53                     ` Christoph Egger
  2013-04-09 14:45                   ` Ben Hutchings
  1 sibling, 2 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-04-09 14:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Wei Liu, netdev, xen-devel, konrad.wilk, annie.li

On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> > On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> > > On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > > > 65535 will cause overflow.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > > > index 5527663..8c3d065 100644
> > > > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > > > >  	unsigned long flags;
> > > > > > > > >  
> > > > > > > > > +	/*
> > > > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > > > +	 * uint16_t.
> > > > > > > > 
> > > > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > > > would stop this from happening?
> > > > > > > > 
> > > > > > > 
> > > > > > > struct ethernet_device? I could not find it.
> > > > > > > 
> > > > > > > And for struct net_device,
> > > > > > 
> > > > > > I meant struct net_device.
> > > > > > 
> > > > > > >  there is no field for this AFAICT.
> > > > > > 
> > > > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > > > GSO skbs then I wonder.
> > > > > > 
> > > > > 
> > > > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > > > net_device. :-)
> > > > 
> > > > But aren't we seeing skb's bigger than that?
> > > > 
> > > > Maybe this is just a historical bug in some older guests?
> > > 
> > > GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> > > of an skb.
> > 
> > ...and it's actually just the default value assigned to
> > dev->gso_max_size.  You'll want to change it to your actual maximum
> > (65535 - maximum length of headers) before registering your net devices.
> 
> Thanks. 
> 
> "maximum length of headers" might be a bit tricky to determine
> generically :-(.

Well you don't need to be generic, you need to know the maximum length
of headers that might appear in a TSO skb.

Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
not sure whether there can be other IP or TCP options in a TSO skb.  I'd
really like to get the TSO requirements clearly documented somewhere.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-04-09 14:30                 ` Ian Campbell
  2013-04-09 14:45                   ` Ben Hutchings
@ 2013-04-09 14:45                   ` Ben Hutchings
  1 sibling, 0 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-04-09 14:45 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> > On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> > > On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> > > > On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> > > > > On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> > > > > > On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> > > > > > > On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> > > > > > > > On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> > > > > > > > > The `size' field of Xen network wire format is uint16_t, anything bigger than
> > > > > > > > > 65535 will cause overflow.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > > > > > > > > ---
> > > > > > > > >  drivers/net/xen-netfront.c |   12 ++++++++++++
> > > > > > > > >  1 file changed, 12 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > > > > > > > > index 5527663..8c3d065 100644
> > > > > > > > > --- a/drivers/net/xen-netfront.c
> > > > > > > > > +++ b/drivers/net/xen-netfront.c
> > > > > > > > > @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > > > > > >  	unsigned int len = skb_headlen(skb);
> > > > > > > > >  	unsigned long flags;
> > > > > > > > >  
> > > > > > > > > +	/*
> > > > > > > > > +	 * wire format of xen_netif_tx_request only supports skb->len
> > > > > > > > > +	 * < 64K, because size field in xen_netif_tx_request is
> > > > > > > > > +	 * uint16_t.
> > > > > > > > 
> > > > > > > > Is there some field we can set e.g. in struct ethernet_device which
> > > > > > > > would stop this from happening?
> > > > > > > > 
> > > > > > > 
> > > > > > > struct ethernet_device? I could not find it.
> > > > > > > 
> > > > > > > And for struct net_device,
> > > > > > 
> > > > > > I meant struct net_device.
> > > > > > 
> > > > > > >  there is no field for this AFAICT.
> > > > > > 
> > > > > > Interesting. Are hardware devices expected to cope with arbitrary sized
> > > > > > GSO skbs then I wonder.
> > > > > > 
> > > > > 
> > > > > No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> > > > > net_device. :-)
> > > > 
> > > > But aren't we seeing skb's bigger than that?
> > > > 
> > > > Maybe this is just a historical bug in some older guests?
> > > 
> > > GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> > > of an skb.
> > 
> > ...and it's actually just the default value assigned to
> > dev->gso_max_size.  You'll want to change it to your actual maximum
> > (65535 - maximum length of headers) before registering your net devices.
> 
> Thanks. 
> 
> "maximum length of headers" might be a bit tricky to determine
> generically :-(.

Well you don't need to be generic, you need to know the maximum length
of headers that might appear in a TSO skb.

Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
not sure whether there can be other IP or TCP options in a TSO skb.  I'd
really like to get the TSO requirements clearly documented somewhere.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-04-09 14:45                   ` Ben Hutchings
@ 2013-04-09 14:53                     ` Christoph Egger
  2013-04-09 14:59                       ` Ben Hutchings
  2013-04-09 14:59                       ` [Xen-devel] " Ben Hutchings
  2013-04-09 14:53                     ` Christoph Egger
  1 sibling, 2 replies; 97+ messages in thread
From: Christoph Egger @ 2013-04-09 14:53 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Ian Campbell, netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On 09.04.13 16:45, Ben Hutchings wrote:
> On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
>> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
>>> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
>>>> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
>>>>> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
>>>>>> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
>>>>>>> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
>>>>>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>>>>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>>>>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>>>>>>>> 65535 will cause overflow.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>>>>>> ---
>>>>>>>>>>   drivers/net/xen-netfront.c |   12 ++++++++++++
>>>>>>>>>>   1 file changed, 12 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>>>>>>>>> index 5527663..8c3d065 100644
>>>>>>>>>> --- a/drivers/net/xen-netfront.c
>>>>>>>>>> +++ b/drivers/net/xen-netfront.c
>>>>>>>>>> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>>>>>   	unsigned int len = skb_headlen(skb);
>>>>>>>>>>   	unsigned long flags;
>>>>>>>>>>
>>>>>>>>>> +	/*
>>>>>>>>>> +	 * wire format of xen_netif_tx_request only supports skb->len
>>>>>>>>>> +	 * < 64K, because size field in xen_netif_tx_request is
>>>>>>>>>> +	 * uint16_t.
>>>>>>>>>
>>>>>>>>> Is there some field we can set e.g. in struct ethernet_device which
>>>>>>>>> would stop this from happening?
>>>>>>>>>
>>>>>>>>
>>>>>>>> struct ethernet_device? I could not find it.
>>>>>>>>
>>>>>>>> And for struct net_device,
>>>>>>>
>>>>>>> I meant struct net_device.
>>>>>>>
>>>>>>>>   there is no field for this AFAICT.
>>>>>>>
>>>>>>> Interesting. Are hardware devices expected to cope with arbitrary sized
>>>>>>> GSO skbs then I wonder.
>>>>>>>
>>>>>>
>>>>>> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
>>>>>> net_device. :-)
>>>>>
>>>>> But aren't we seeing skb's bigger than that?
>>>>>
>>>>> Maybe this is just a historical bug in some older guests?
>>>>
>>>> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
>>>> of an skb.
>>>
>>> ...and it's actually just the default value assigned to
>>> dev->gso_max_size.  You'll want to change it to your actual maximum
>>> (65535 - maximum length of headers) before registering your net devices.
>>
>> Thanks.
>>
>> "maximum length of headers" might be a bit tricky to determine
>> generically :-(.
>
> Well you don't need to be generic, you need to know the maximum length
> of headers that might appear in a TSO skb.
>
> Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
> not sure whether there can be other IP or TCP options in a TSO skb.  I'd
> really like to get the TSO requirements clearly documented somewhere.

What about encapsulated IPSEC, IP-in-IP-tunnels, etc. ?

Christoph

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-04-09 14:45                   ` Ben Hutchings
  2013-04-09 14:53                     ` [Xen-devel] " Christoph Egger
@ 2013-04-09 14:53                     ` Christoph Egger
  1 sibling, 0 replies; 97+ messages in thread
From: Christoph Egger @ 2013-04-09 14:53 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Wei Liu, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On 09.04.13 16:45, Ben Hutchings wrote:
> On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
>> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
>>> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
>>>> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
>>>>> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
>>>>>> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
>>>>>>> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
>>>>>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
>>>>>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
>>>>>>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
>>>>>>>>>> 65535 will cause overflow.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
>>>>>>>>>> ---
>>>>>>>>>>   drivers/net/xen-netfront.c |   12 ++++++++++++
>>>>>>>>>>   1 file changed, 12 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
>>>>>>>>>> index 5527663..8c3d065 100644
>>>>>>>>>> --- a/drivers/net/xen-netfront.c
>>>>>>>>>> +++ b/drivers/net/xen-netfront.c
>>>>>>>>>> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>>>>>>>>>   	unsigned int len = skb_headlen(skb);
>>>>>>>>>>   	unsigned long flags;
>>>>>>>>>>
>>>>>>>>>> +	/*
>>>>>>>>>> +	 * wire format of xen_netif_tx_request only supports skb->len
>>>>>>>>>> +	 * < 64K, because size field in xen_netif_tx_request is
>>>>>>>>>> +	 * uint16_t.
>>>>>>>>>
>>>>>>>>> Is there some field we can set e.g. in struct ethernet_device which
>>>>>>>>> would stop this from happening?
>>>>>>>>>
>>>>>>>>
>>>>>>>> struct ethernet_device? I could not find it.
>>>>>>>>
>>>>>>>> And for struct net_device,
>>>>>>>
>>>>>>> I meant struct net_device.
>>>>>>>
>>>>>>>>   there is no field for this AFAICT.
>>>>>>>
>>>>>>> Interesting. Are hardware devices expected to cope with arbitrary sized
>>>>>>> GSO skbs then I wonder.
>>>>>>>
>>>>>>
>>>>>> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
>>>>>> net_device. :-)
>>>>>
>>>>> But aren't we seeing skb's bigger than that?
>>>>>
>>>>> Maybe this is just a historical bug in some older guests?
>>>>
>>>> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
>>>> of an skb.
>>>
>>> ...and it's actually just the default value assigned to
>>> dev->gso_max_size.  You'll want to change it to your actual maximum
>>> (65535 - maximum length of headers) before registering your net devices.
>>
>> Thanks.
>>
>> "maximum length of headers" might be a bit tricky to determine
>> generically :-(.
>
> Well you don't need to be generic, you need to know the maximum length
> of headers that might appear in a TSO skb.
>
> Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
> not sure whether there can be other IP or TCP options in a TSO skb.  I'd
> really like to get the TSO requirements clearly documented somewhere.

What about encapsulated IPSEC, IP-in-IP-tunnels, etc. ?

Christoph

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [Xen-devel] [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-04-09 14:53                     ` [Xen-devel] " Christoph Egger
  2013-04-09 14:59                       ` Ben Hutchings
@ 2013-04-09 14:59                       ` Ben Hutchings
  1 sibling, 0 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-04-09 14:59 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Ian Campbell, netdev, annie.li, konrad.wilk, Wei Liu, xen-devel

On Tue, 2013-04-09 at 16:53 +0200, Christoph Egger wrote:
> On 09.04.13 16:45, Ben Hutchings wrote:
> > On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
> >> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> >>> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> >>>> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> >>>>> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> >>>>>> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> >>>>>>> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> >>>>>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>>>>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>>>>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>>>>>>>>> 65535 will cause overflow.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >>>>>>>>>> ---
> >>>>>>>>>>   drivers/net/xen-netfront.c |   12 ++++++++++++
> >>>>>>>>>>   1 file changed, 12 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> >>>>>>>>>> index 5527663..8c3d065 100644
> >>>>>>>>>> --- a/drivers/net/xen-netfront.c
> >>>>>>>>>> +++ b/drivers/net/xen-netfront.c
> >>>>>>>>>> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>>>>>>   	unsigned int len = skb_headlen(skb);
> >>>>>>>>>>   	unsigned long flags;
> >>>>>>>>>>
> >>>>>>>>>> +	/*
> >>>>>>>>>> +	 * wire format of xen_netif_tx_request only supports skb->len
> >>>>>>>>>> +	 * < 64K, because size field in xen_netif_tx_request is
> >>>>>>>>>> +	 * uint16_t.
> >>>>>>>>>
> >>>>>>>>> Is there some field we can set e.g. in struct ethernet_device which
> >>>>>>>>> would stop this from happening?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> struct ethernet_device? I could not find it.
> >>>>>>>>
> >>>>>>>> And for struct net_device,
> >>>>>>>
> >>>>>>> I meant struct net_device.
> >>>>>>>
> >>>>>>>>   there is no field for this AFAICT.
> >>>>>>>
> >>>>>>> Interesting. Are hardware devices expected to cope with arbitrary sized
> >>>>>>> GSO skbs then I wonder.
> >>>>>>>
> >>>>>>
> >>>>>> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> >>>>>> net_device. :-)
> >>>>>
> >>>>> But aren't we seeing skb's bigger than that?
> >>>>>
> >>>>> Maybe this is just a historical bug in some older guests?
> >>>>
> >>>> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> >>>> of an skb.
> >>>
> >>> ...and it's actually just the default value assigned to
> >>> dev->gso_max_size.  You'll want to change it to your actual maximum
> >>> (65535 - maximum length of headers) before registering your net devices.
> >>
> >> Thanks.
> >>
> >> "maximum length of headers" might be a bit tricky to determine
> >> generically :-(.
> >
> > Well you don't need to be generic, you need to know the maximum length
> > of headers that might appear in a TSO skb.
> >
> > Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
> > not sure whether there can be other IP or TCP options in a TSO skb.  I'd
> > really like to get the TSO requirements clearly documented somewhere.
> 
> What about encapsulated IPSEC, IP-in-IP-tunnels, etc. ?

xen-netfront doesn't offload GSO for those, unless I'm much mistaken.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535
  2013-04-09 14:53                     ` [Xen-devel] " Christoph Egger
@ 2013-04-09 14:59                       ` Ben Hutchings
  2013-04-09 14:59                       ` [Xen-devel] " Ben Hutchings
  1 sibling, 0 replies; 97+ messages in thread
From: Ben Hutchings @ 2013-04-09 14:59 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Wei Liu, Ian Campbell, konrad.wilk, netdev, xen-devel, annie.li

On Tue, 2013-04-09 at 16:53 +0200, Christoph Egger wrote:
> On 09.04.13 16:45, Ben Hutchings wrote:
> > On Tue, 2013-04-09 at 15:30 +0100, Ian Campbell wrote:
> >> On Tue, 2013-03-19 at 21:28 +0000, Ben Hutchings wrote:
> >>> On Tue, 2013-03-19 at 21:24 +0000, Ben Hutchings wrote:
> >>>> On Mon, 2013-03-18 at 15:07 +0000, Ian Campbell wrote:
> >>>>> On Mon, 2013-03-18 at 15:04 +0000, Wei Liu wrote:
> >>>>>> On Mon, 2013-03-18 at 14:54 +0000, Ian Campbell wrote:
> >>>>>>> On Mon, 2013-03-18 at 14:40 +0000, Wei Liu wrote:
> >>>>>>>> On Mon, 2013-03-18 at 11:42 +0000, Ian Campbell wrote:
> >>>>>>>>> On Mon, 2013-03-18 at 10:35 +0000, Wei Liu wrote:
> >>>>>>>>>> The `size' field of Xen network wire format is uint16_t, anything bigger than
> >>>>>>>>>> 65535 will cause overflow.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> >>>>>>>>>> ---
> >>>>>>>>>>   drivers/net/xen-netfront.c |   12 ++++++++++++
> >>>>>>>>>>   1 file changed, 12 insertions(+)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> >>>>>>>>>> index 5527663..8c3d065 100644
> >>>>>>>>>> --- a/drivers/net/xen-netfront.c
> >>>>>>>>>> +++ b/drivers/net/xen-netfront.c
> >>>>>>>>>> @@ -547,6 +547,18 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >>>>>>>>>>   	unsigned int len = skb_headlen(skb);
> >>>>>>>>>>   	unsigned long flags;
> >>>>>>>>>>
> >>>>>>>>>> +	/*
> >>>>>>>>>> +	 * wire format of xen_netif_tx_request only supports skb->len
> >>>>>>>>>> +	 * < 64K, because size field in xen_netif_tx_request is
> >>>>>>>>>> +	 * uint16_t.
> >>>>>>>>>
> >>>>>>>>> Is there some field we can set e.g. in struct ethernet_device which
> >>>>>>>>> would stop this from happening?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> struct ethernet_device? I could not find it.
> >>>>>>>>
> >>>>>>>> And for struct net_device,
> >>>>>>>
> >>>>>>> I meant struct net_device.
> >>>>>>>
> >>>>>>>>   there is no field for this AFAICT.
> >>>>>>>
> >>>>>>> Interesting. Are hardware devices expected to cope with arbitrary sized
> >>>>>>> GSO skbs then I wonder.
> >>>>>>>
> >>>>>>
> >>>>>> No idea. But there is a macro called GSO_MAX_SIZE (65536) in struct
> >>>>>> net_device. :-)
> >>>>>
> >>>>> But aren't we seeing skb's bigger than that?
> >>>>>
> >>>>> Maybe this is just a historical bug in some older guests?
> >>>>
> >>>> GSO_MAX_SIZE is the maximum payload length, not the maximum total length
> >>>> of an skb.
> >>>
> >>> ...and it's actually just the default value assigned to
> >>> dev->gso_max_size.  You'll want to change it to your actual maximum
> >>> (65535 - maximum length of headers) before registering your net devices.
> >>
> >> Thanks.
> >>
> >> "maximum length of headers" might be a bit tricky to determine
> >> generically :-(.
> >
> > Well you don't need to be generic, you need to know the maximum length
> > of headers that might appear in a TSO skb.
> >
> > Ethernet + VLAN tag + IPv6 + TCP + timestamp option = 90 bytes, but I'm
> > not sure whether there can be other IP or TCP options in a TSO skb.  I'd
> > really like to get the TSO requirements clearly documented somewhere.
> 
> What about encapsulated IPSEC, IP-in-IP-tunnels, etc. ?

xen-netfront doesn't offload GSO for those, unless I'm much mistaken.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2013-04-09 14:59 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 10:35 [PATCH 0/4] Bundle fixes for xen-netfront/back Wei Liu
2013-03-18 10:35 ` [PATCH 1/4] xen-netfront: remove unused variable `extra' Wei Liu
2013-03-18 10:35 ` Wei Liu
2013-03-18 11:42   ` Ian Campbell
2013-03-18 11:42   ` Ian Campbell
2013-03-18 12:04     ` Wei Liu
2013-03-18 12:14       ` Ian Campbell
2013-03-18 12:14       ` Ian Campbell
2013-03-19  2:39         ` annie li
2013-03-19  2:39         ` annie li
2013-03-19  3:02           ` [Xen-devel] " James Harper
2013-03-19  3:02           ` James Harper
2013-03-19  9:28           ` Paul Durrant
2013-03-19  9:28           ` [Xen-devel] " Paul Durrant
2013-03-19  9:53             ` annie li
2013-03-19  9:53             ` [Xen-devel] " annie li
2013-03-19 10:03               ` Paul Durrant
2013-03-19 10:03               ` Paul Durrant
2013-03-19 15:26             ` Wei Liu
2013-03-19 15:26             ` [Xen-devel] " Wei Liu
2013-04-09 14:28               ` Ian Campbell
2013-04-09 14:28               ` Ian Campbell
2013-03-18 12:04     ` Wei Liu
2013-03-18 10:35 ` [PATCH 2/4] xen-netfront: drop skb when skb->len > 65535 Wei Liu
2013-03-18 11:42   ` Ian Campbell
2013-03-18 14:40     ` Wei Liu
2013-03-18 14:40     ` Wei Liu
2013-03-18 14:54       ` Ian Campbell
2013-03-18 14:54       ` Ian Campbell
2013-03-18 15:04         ` Wei Liu
2013-03-18 15:07           ` Ian Campbell
2013-03-18 15:10             ` Wei Liu
2013-03-18 15:10             ` Wei Liu
2013-03-19 21:24             ` Ben Hutchings
2013-03-19 21:24             ` Ben Hutchings
2013-03-19 21:28               ` Ben Hutchings
2013-04-09 14:30                 ` Ian Campbell
2013-04-09 14:30                 ` Ian Campbell
2013-04-09 14:45                   ` Ben Hutchings
2013-04-09 14:53                     ` [Xen-devel] " Christoph Egger
2013-04-09 14:59                       ` Ben Hutchings
2013-04-09 14:59                       ` [Xen-devel] " Ben Hutchings
2013-04-09 14:53                     ` Christoph Egger
2013-04-09 14:45                   ` Ben Hutchings
2013-03-19 21:28               ` Ben Hutchings
2013-03-18 15:07           ` Ian Campbell
2013-03-18 15:04         ` Wei Liu
2013-03-18 11:42   ` Ian Campbell
2013-03-18 13:44   ` Konrad Rzeszutek Wilk
2013-03-18 13:44   ` Konrad Rzeszutek Wilk
2013-03-18 13:46   ` David Vrabel
2013-03-18 13:46   ` [Xen-devel] " David Vrabel
2013-03-18 13:48     ` Ian Campbell
2013-03-18 13:48     ` [Xen-devel] " Ian Campbell
2013-03-18 14:00       ` David Vrabel
2013-03-18 14:19         ` Wei Liu
2013-03-19 13:40           ` David Vrabel
2013-03-19 13:40           ` [Xen-devel] " David Vrabel
2013-03-19 15:23             ` Wei Liu
2013-03-19 15:23             ` [Xen-devel] " Wei Liu
2013-03-18 14:19         ` Wei Liu
2013-03-20 20:02         ` David Vrabel
2013-03-20 20:02         ` [Xen-devel] " David Vrabel
2013-03-21 13:40           ` Wei Liu
2013-03-21 14:11             ` David Vrabel
2013-03-21 14:11             ` [Xen-devel] " David Vrabel
2013-03-21 14:15               ` Wei Liu
2013-03-21 14:15               ` [Xen-devel] " Wei Liu
2013-03-21 14:20                 ` Wei Liu
2013-03-21 14:20                 ` [Xen-devel] " Wei Liu
2013-03-21 13:40           ` Wei Liu
2013-03-18 14:00       ` David Vrabel
2013-03-19  1:35   ` [Xen-devel] " annie li
2013-03-19  1:35   ` annie li
2013-03-19 20:13   ` Nick Pegg
2013-03-18 10:35 ` Wei Liu
2013-03-18 10:35 ` [PATCH 3/4] xen-netback: remove skb in xen_netbk_alloc_page Wei Liu
2013-03-18 11:37   ` Ian Campbell
2013-03-18 11:37   ` Ian Campbell
2013-03-18 10:35 ` Wei Liu
2013-03-18 10:35 ` [PATCH 4/4] xen-netback: coalesce slots before copying Wei Liu
2013-03-18 12:07   ` Ian Campbell
2013-03-18 12:07   ` Ian Campbell
2013-03-21 18:37     ` Wei Liu
2013-03-21 18:37     ` Wei Liu
2013-03-18 13:09   ` James Harper
2013-03-18 13:27     ` James Harper
2013-03-21 19:08       ` Wei Liu
2013-03-21 22:14         ` James Harper
2013-03-21 22:14         ` [Xen-devel] " James Harper
2013-03-22 11:06           ` Wei Liu
2013-03-22 11:19             ` James Harper
2013-03-22 11:19             ` [Xen-devel] " James Harper
2013-03-22 11:28               ` Wei Liu
2013-03-22 11:28               ` [Xen-devel] " Wei Liu
2013-03-22 11:06           ` Wei Liu
2013-03-18 10:35 ` Wei Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.