All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2013-12-12 23:48 Zoltan Kiss
  2013-12-12 23:48 ` [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
                   ` (17 more replies)
  0 siblings, 18 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-13 15:31   ` Wei Liu
  2013-12-13 15:31   ` Wei Liu
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (16 subsequent siblings)
  17 siblings, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

This patch contains the new definitions necessary for grant mapping.

v2:
- move unmapping to separate thread. The NAPI instance has to be scheduled
  even from thread context, which can cause huge delays
- that causes unfortunately bigger struct xenvif
- store grant handle after checking validity

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

---
 drivers/net/xen-netback/common.h    |   30 ++++++-
 drivers/net/xen-netback/interface.c |    1 +
 drivers/net/xen-netback/netback.c   |  164 +++++++++++++++++++++++++++++++++++
 3 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ba30a6d..33cb12c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -79,6 +79,11 @@ struct pending_tx_info {
 				  * if it is head of one or more tx
 				  * reqs
 				  */
+	/* callback data for released SKBs. The	callback is always
+	 * xenvif_zerocopy_callback, ctx points to the next fragment, desc
+	 * contains the pending_idx
+	 */
+	struct ubuf_info callback_struct;
 };
 
 #define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
@@ -101,6 +106,8 @@ struct xenvif_rx_meta {
 
 #define MAX_PENDING_REQS 256
 
+#define NETBACK_INVALID_HANDLE -1
+
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
@@ -119,13 +126,26 @@ struct xenvif {
 	pending_ring_idx_t pending_cons;
 	u16 pending_ring[MAX_PENDING_REQS];
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
+	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
 	/* Coalescing tx requests before copying makes number of grant
 	 * copy ops greater or equal to number of slots required. In
 	 * worst case a tx request consumes 2 gnttab_copy.
 	 */
 	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
-
+	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
+	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
+	struct page *pages_to_map[MAX_PENDING_REQS];
+	struct page *pages_to_unmap[MAX_PENDING_REQS];
+
+	spinlock_t dealloc_lock;
+	spinlock_t response_lock;
+	pending_ring_idx_t dealloc_prod;
+	pending_ring_idx_t dealloc_cons;
+	u16 dealloc_ring[MAX_PENDING_REQS];
+	struct task_struct *dealloc_task;
+	wait_queue_head_t dealloc_wq;
 
 	/* Use kthread for guest RX */
 	struct task_struct *task;
@@ -215,6 +235,8 @@ int xenvif_tx_action(struct xenvif *vif, int budget);
 int xenvif_kthread(void *data);
 void xenvif_kick_thread(struct xenvif *vif);
 
+int xenvif_dealloc_kthread(void *data);
+
 /* Determine whether the needed number of slots (req) are available,
  * and set req_event if not.
  */
@@ -222,6 +244,12 @@ bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
 
 void xenvif_stop_queue(struct xenvif *vif);
 
+/* Callback from stack when TX packet can be released */
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+
+/* Unmap a pending page, usually has to be called before xenvif_idx_release */
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
+
 extern bool separate_tx_rx_irq;
 
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1dcb960..1c27e9e 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -37,6 +37,7 @@
 
 #include <xen/events.h>
 #include <asm/xen/hypercall.h>
+#include <xen/balloon.h>
 
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index c1b7a42..3ddc474 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
 	return page;
 }
 
+static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
+	       struct xen_netif_tx_request *txp,
+	       struct gnttab_map_grant_ref *gop)
+{
+	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
+	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
+			  GNTMAP_host_map | GNTMAP_readonly,
+			  txp->gref, vif->domid);
+
+	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
+	       sizeof(*txp));
+
+}
+
 static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
@@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
 	return work_done;
 }
 
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+{
+	unsigned long flags;
+	pending_ring_idx_t index;
+	u16 pending_idx = ubuf->desc;
+	struct pending_tx_info *temp =
+		container_of(ubuf, struct pending_tx_info, callback_struct);
+	struct xenvif *vif =
+		container_of(temp - pending_idx, struct xenvif,
+			pending_tx_info[0]);
+
+	spin_lock_irqsave(&vif->dealloc_lock, flags);
+	do {
+		pending_idx = ubuf->desc;
+		ubuf = (struct ubuf_info *) ubuf->ctx;
+		index = pending_index(vif->dealloc_prod);
+		vif->dealloc_ring[index] = pending_idx;
+		/* Sync with xenvif_tx_action_dealloc:
+		 * insert idx then incr producer.
+		 */
+		smp_wmb();
+		vif->dealloc_prod++;
+	} while (ubuf);
+	wake_up(&vif->dealloc_wq);
+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+}
+
+static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
+{
+	struct gnttab_unmap_grant_ref *gop;
+	pending_ring_idx_t dc, dp;
+	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
+	unsigned int i = 0;
+
+	dc = vif->dealloc_cons;
+	gop = vif->tx_unmap_ops;
+
+	/* Free up any grants we have finished using */
+	do {
+		dp = vif->dealloc_prod;
+
+		/* Ensure we see all indices enqueued by netif_idx_release(). */
+		smp_rmb();
+
+		while (dc != dp) {
+			pending_idx =
+				vif->dealloc_ring[pending_index(dc++)];
+
+			/* Already unmapped? */
+			if (vif->grant_tx_handle[pending_idx] ==
+				NETBACK_INVALID_HANDLE) {
+				netdev_err(vif->dev,
+					"Trying to unmap invalid handle! "
+					"pending_idx: %x\n", pending_idx);
+				continue;
+			}
+
+			pending_idx_release[gop-vif->tx_unmap_ops] =
+				pending_idx;
+			vif->pages_to_unmap[gop-vif->tx_unmap_ops] =
+				vif->mmap_pages[pending_idx];
+			gnttab_set_unmap_op(gop,
+					idx_to_kaddr(vif, pending_idx),
+					GNTMAP_host_map,
+					vif->grant_tx_handle[pending_idx]);
+			vif->grant_tx_handle[pending_idx] =
+				NETBACK_INVALID_HANDLE;
+			++gop;
+		}
+
+	} while (dp != vif->dealloc_prod);
+
+	vif->dealloc_cons = dc;
+
+	if (gop - vif->tx_unmap_ops > 0) {
+		int ret;
+		ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+			NULL,
+			vif->pages_to_unmap,
+			gop - vif->tx_unmap_ops);
+		if (ret) {
+			netdev_err(vif->dev, "Unmap fail: nr_ops %x ret %d\n",
+				gop - vif->tx_unmap_ops, ret);
+			for (i = 0; i < gop - vif->tx_unmap_ops; ++i) {
+				netdev_err(vif->dev,
+					" host_addr: %llx handle: %x status: %d\n",
+					gop[i].host_addr,
+					gop[i].handle,
+					gop[i].status);
+			}
+			BUG();
+		}
+	}
+
+	for (i = 0; i < gop - vif->tx_unmap_ops; ++i)
+		xenvif_idx_release(vif, pending_idx_release[i],
+				XEN_NETIF_RSP_OKAY);
+}
+
+
 /* Called after netfront has transmitted */
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
@@ -1659,6 +1773,26 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 	vif->mmap_pages[pending_idx] = NULL;
 }
 
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
+{
+	int ret;
+	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
+		netdev_err(vif->dev,
+				"Trying to unmap invalid handle! pending_idx: %x\n",
+				pending_idx);
+		return;
+	}
+	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
+			idx_to_kaddr(vif, pending_idx),
+			GNTMAP_host_map,
+			vif->grant_tx_handle[pending_idx]);
+	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+			NULL,
+			&vif->mmap_pages[pending_idx],
+			1);
+	BUG_ON(ret);
+	vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
+}
 
 static void make_tx_response(struct xenvif *vif,
 			     struct xen_netif_tx_request *txp,
@@ -1720,6 +1854,14 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
+static inline int tx_dealloc_work_todo(struct xenvif *vif)
+{
+	if (vif->dealloc_cons != vif->dealloc_prod)
+		return 1;
+
+	return 0;
+}
+
 void xenvif_unmap_frontend_rings(struct xenvif *vif)
 {
 	if (vif->tx.sring)
@@ -1808,6 +1950,28 @@ int xenvif_kthread(void *data)
 	return 0;
 }
 
+int xenvif_dealloc_kthread(void *data)
+{
+	struct xenvif *vif = data;
+
+	while (!kthread_should_stop()) {
+		wait_event_interruptible(vif->dealloc_wq,
+					tx_dealloc_work_todo(vif) ||
+					 kthread_should_stop());
+		if (kthread_should_stop())
+			break;
+
+		xenvif_tx_dealloc_action(vif);
+		cond_resched();
+	}
+
+	/* Unmap anything remaining*/
+	if (tx_dealloc_work_todo(vif))
+		xenvif_tx_dealloc_action(vif);
+
+	return 0;
+}
+
 static int __init netback_init(void)
 {
 	int rc = 0;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
  2013-12-12 23:48 ` [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48   ` Zoltan Kiss
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

This patch contains the new definitions necessary for grant mapping.

v2:
- move unmapping to separate thread. The NAPI instance has to be scheduled
  even from thread context, which can cause huge delays
- that causes unfortunately bigger struct xenvif
- store grant handle after checking validity

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

---
 drivers/net/xen-netback/common.h    |   30 ++++++-
 drivers/net/xen-netback/interface.c |    1 +
 drivers/net/xen-netback/netback.c   |  164 +++++++++++++++++++++++++++++++++++
 3 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index ba30a6d..33cb12c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -79,6 +79,11 @@ struct pending_tx_info {
 				  * if it is head of one or more tx
 				  * reqs
 				  */
+	/* callback data for released SKBs. The	callback is always
+	 * xenvif_zerocopy_callback, ctx points to the next fragment, desc
+	 * contains the pending_idx
+	 */
+	struct ubuf_info callback_struct;
 };
 
 #define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
@@ -101,6 +106,8 @@ struct xenvif_rx_meta {
 
 #define MAX_PENDING_REQS 256
 
+#define NETBACK_INVALID_HANDLE -1
+
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
@@ -119,13 +126,26 @@ struct xenvif {
 	pending_ring_idx_t pending_cons;
 	u16 pending_ring[MAX_PENDING_REQS];
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
+	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
 	/* Coalescing tx requests before copying makes number of grant
 	 * copy ops greater or equal to number of slots required. In
 	 * worst case a tx request consumes 2 gnttab_copy.
 	 */
 	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
-
+	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
+	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
+	struct page *pages_to_map[MAX_PENDING_REQS];
+	struct page *pages_to_unmap[MAX_PENDING_REQS];
+
+	spinlock_t dealloc_lock;
+	spinlock_t response_lock;
+	pending_ring_idx_t dealloc_prod;
+	pending_ring_idx_t dealloc_cons;
+	u16 dealloc_ring[MAX_PENDING_REQS];
+	struct task_struct *dealloc_task;
+	wait_queue_head_t dealloc_wq;
 
 	/* Use kthread for guest RX */
 	struct task_struct *task;
@@ -215,6 +235,8 @@ int xenvif_tx_action(struct xenvif *vif, int budget);
 int xenvif_kthread(void *data);
 void xenvif_kick_thread(struct xenvif *vif);
 
+int xenvif_dealloc_kthread(void *data);
+
 /* Determine whether the needed number of slots (req) are available,
  * and set req_event if not.
  */
@@ -222,6 +244,12 @@ bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
 
 void xenvif_stop_queue(struct xenvif *vif);
 
+/* Callback from stack when TX packet can be released */
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+
+/* Unmap a pending page, usually has to be called before xenvif_idx_release */
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
+
 extern bool separate_tx_rx_irq;
 
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1dcb960..1c27e9e 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -37,6 +37,7 @@
 
 #include <xen/events.h>
 #include <asm/xen/hypercall.h>
+#include <xen/balloon.h>
 
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index c1b7a42..3ddc474 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
 	return page;
 }
 
+static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
+	       struct xen_netif_tx_request *txp,
+	       struct gnttab_map_grant_ref *gop)
+{
+	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
+	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
+			  GNTMAP_host_map | GNTMAP_readonly,
+			  txp->gref, vif->domid);
+
+	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
+	       sizeof(*txp));
+
+}
+
 static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
@@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
 	return work_done;
 }
 
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+{
+	unsigned long flags;
+	pending_ring_idx_t index;
+	u16 pending_idx = ubuf->desc;
+	struct pending_tx_info *temp =
+		container_of(ubuf, struct pending_tx_info, callback_struct);
+	struct xenvif *vif =
+		container_of(temp - pending_idx, struct xenvif,
+			pending_tx_info[0]);
+
+	spin_lock_irqsave(&vif->dealloc_lock, flags);
+	do {
+		pending_idx = ubuf->desc;
+		ubuf = (struct ubuf_info *) ubuf->ctx;
+		index = pending_index(vif->dealloc_prod);
+		vif->dealloc_ring[index] = pending_idx;
+		/* Sync with xenvif_tx_action_dealloc:
+		 * insert idx then incr producer.
+		 */
+		smp_wmb();
+		vif->dealloc_prod++;
+	} while (ubuf);
+	wake_up(&vif->dealloc_wq);
+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+}
+
+static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
+{
+	struct gnttab_unmap_grant_ref *gop;
+	pending_ring_idx_t dc, dp;
+	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
+	unsigned int i = 0;
+
+	dc = vif->dealloc_cons;
+	gop = vif->tx_unmap_ops;
+
+	/* Free up any grants we have finished using */
+	do {
+		dp = vif->dealloc_prod;
+
+		/* Ensure we see all indices enqueued by netif_idx_release(). */
+		smp_rmb();
+
+		while (dc != dp) {
+			pending_idx =
+				vif->dealloc_ring[pending_index(dc++)];
+
+			/* Already unmapped? */
+			if (vif->grant_tx_handle[pending_idx] ==
+				NETBACK_INVALID_HANDLE) {
+				netdev_err(vif->dev,
+					"Trying to unmap invalid handle! "
+					"pending_idx: %x\n", pending_idx);
+				continue;
+			}
+
+			pending_idx_release[gop-vif->tx_unmap_ops] =
+				pending_idx;
+			vif->pages_to_unmap[gop-vif->tx_unmap_ops] =
+				vif->mmap_pages[pending_idx];
+			gnttab_set_unmap_op(gop,
+					idx_to_kaddr(vif, pending_idx),
+					GNTMAP_host_map,
+					vif->grant_tx_handle[pending_idx]);
+			vif->grant_tx_handle[pending_idx] =
+				NETBACK_INVALID_HANDLE;
+			++gop;
+		}
+
+	} while (dp != vif->dealloc_prod);
+
+	vif->dealloc_cons = dc;
+
+	if (gop - vif->tx_unmap_ops > 0) {
+		int ret;
+		ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+			NULL,
+			vif->pages_to_unmap,
+			gop - vif->tx_unmap_ops);
+		if (ret) {
+			netdev_err(vif->dev, "Unmap fail: nr_ops %x ret %d\n",
+				gop - vif->tx_unmap_ops, ret);
+			for (i = 0; i < gop - vif->tx_unmap_ops; ++i) {
+				netdev_err(vif->dev,
+					" host_addr: %llx handle: %x status: %d\n",
+					gop[i].host_addr,
+					gop[i].handle,
+					gop[i].status);
+			}
+			BUG();
+		}
+	}
+
+	for (i = 0; i < gop - vif->tx_unmap_ops; ++i)
+		xenvif_idx_release(vif, pending_idx_release[i],
+				XEN_NETIF_RSP_OKAY);
+}
+
+
 /* Called after netfront has transmitted */
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
@@ -1659,6 +1773,26 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 	vif->mmap_pages[pending_idx] = NULL;
 }
 
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
+{
+	int ret;
+	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
+		netdev_err(vif->dev,
+				"Trying to unmap invalid handle! pending_idx: %x\n",
+				pending_idx);
+		return;
+	}
+	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
+			idx_to_kaddr(vif, pending_idx),
+			GNTMAP_host_map,
+			vif->grant_tx_handle[pending_idx]);
+	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+			NULL,
+			&vif->mmap_pages[pending_idx],
+			1);
+	BUG_ON(ret);
+	vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
+}
 
 static void make_tx_response(struct xenvif *vif,
 			     struct xen_netif_tx_request *txp,
@@ -1720,6 +1854,14 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
+static inline int tx_dealloc_work_todo(struct xenvif *vif)
+{
+	if (vif->dealloc_cons != vif->dealloc_prod)
+		return 1;
+
+	return 0;
+}
+
 void xenvif_unmap_frontend_rings(struct xenvif *vif)
 {
 	if (vif->tx.sring)
@@ -1808,6 +1950,28 @@ int xenvif_kthread(void *data)
 	return 0;
 }
 
+int xenvif_dealloc_kthread(void *data)
+{
+	struct xenvif *vif = data;
+
+	while (!kthread_should_stop()) {
+		wait_event_interruptible(vif->dealloc_wq,
+					tx_dealloc_work_todo(vif) ||
+					 kthread_should_stop());
+		if (kthread_should_stop())
+			break;
+
+		xenvif_tx_dealloc_action(vif);
+		cond_resched();
+	}
+
+	/* Unmap anything remaining*/
+	if (tx_dealloc_work_todo(vif))
+		xenvif_tx_dealloc_action(vif);
+
+	return 0;
+}
+
 static int __init netback_init(void)
 {
 	int rc = 0;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
@ 2013-12-12 23:48   ` Zoltan Kiss
  2013-12-12 23:48 ` Zoltan Kiss
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

This patch changes the grant copy on the TX patch to grant mapping

v2:
- delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
  request
- mark the effect of using ballooned pages in a comment
- place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
  before netif_receive_skb, and mark the importance of it
- grab dealloc_lock before __napi_complete to avoid contention with the
  callback's napi_schedule
- handle fragmented packets where first request < PKT_PROT_LINE
- fix up error path when checksum_setup failed
- check before teardown for pending grants, and start complain if they are
  there after 10 second

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/interface.c |   57 +++++++-
 drivers/net/xen-netback/netback.c   |  257 ++++++++++++++---------------------
 2 files changed, 156 insertions(+), 158 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1c27e9e..42946de 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -122,7 +122,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if vif is not ready */
-	if (vif->task == NULL || !xenvif_schedulable(vif))
+	if (vif->task == NULL ||
+		vif->dealloc_task == NULL ||
+		!xenvif_schedulable(vif))
 		goto drop;
 
 	/* At best we'll need one slot for the header and one for each
@@ -335,8 +337,25 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->pending_prod = MAX_PENDING_REQS;
 	for (i = 0; i < MAX_PENDING_REQS; i++)
 		vif->pending_ring[i] = i;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->mmap_pages[i] = NULL;
+	/* If ballooning is disabled, this will consume real memory, so you
+	 * better enable it. The long term solution would be to use just a
+	 * bunch of valid page descriptors, without dependency on ballooning
+	 */
+	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
+		vif->mmap_pages,
+		false);
+	if (err) {
+		netdev_err(dev, "Could not reserve mmap_pages\n");
+		return NULL;
+	}
+	for (i = 0; i < MAX_PENDING_REQS; i++) {
+		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
+			{ .callback = xenvif_zerocopy_callback,
+			  .ctx = NULL,
+			  .desc = i };
+		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
+	}
+	init_timer(&vif->dealloc_delay);
 
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
@@ -380,6 +399,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		goto err;
 
 	init_waitqueue_head(&vif->wq);
+	init_waitqueue_head(&vif->dealloc_wq);
 
 	if (tx_evtchn == rx_evtchn) {
 		/* feature-split-event-channels == 0 */
@@ -421,6 +441,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		goto err_rx_unbind;
 	}
 
+	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
+				   (void *)vif, "%s-dealloc", vif->dev->name);
+	if (IS_ERR(vif->dealloc_task)) {
+		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
+		err = PTR_ERR(vif->dealloc_task);
+		goto err_rx_unbind;
+	}
+
 	vif->task = task;
 
 	rtnl_lock();
@@ -433,6 +461,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	rtnl_unlock();
 
 	wake_up_process(vif->task);
+	wake_up_process(vif->dealloc_task);
 
 	return 0;
 
@@ -470,6 +499,12 @@ void xenvif_disconnect(struct xenvif *vif)
 		vif->task = NULL;
 	}
 
+	if (vif->dealloc_task) {
+		del_timer_sync(&vif->dealloc_delay);
+		kthread_stop(vif->dealloc_task);
+		vif->dealloc_task = NULL;
+	}
+
 	if (vif->tx_irq) {
 		if (vif->tx_irq == vif->rx_irq)
 			unbind_from_irqhandler(vif->tx_irq, vif);
@@ -485,6 +520,22 @@ void xenvif_disconnect(struct xenvif *vif)
 
 void xenvif_free(struct xenvif *vif)
 {
+	int i, unmap_timeout = 0;
+
+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
+			i = 0;
+			unmap_timeout++;
+			msleep(1000);
+			if (unmap_timeout > 9 &&
+				net_ratelimit())
+				netdev_err(vif->dev,
+					"Page still granted! Index: %x\n", i);
+		}
+	}
+
+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
+
 	netif_napi_del(&vif->napi);
 
 	unregister_netdev(vif->dev);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 3ddc474..20352be 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
 			  struct xen_netif_tx_request *txp, RING_IDX end)
 {
 	RING_IDX cons = vif->tx.req_cons;
+	unsigned long flags;
 
 	do {
+		spin_lock_irqsave(&vif->response_lock, flags);
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
+		spin_unlock_irqrestore(&vif->response_lock, flags);
 		if (cons == end)
 			break;
 		txp = RING_GET_REQUEST(&vif->tx, cons++);
@@ -786,10 +789,10 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
 
 }
 
-static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
+static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
-					       struct gnttab_copy *gop)
+					       struct gnttab_map_grant_ref *gop)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
@@ -810,83 +813,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
-	/* Coalesce tx requests, at this point the packet passed in
-	 * should be <= 64K. Any packets larger than 64K have been
-	 * handled in xenvif_count_requests().
-	 */
-	for (shinfo->nr_frags = slot = start; slot < nr_slots;
-	     shinfo->nr_frags++) {
-		struct pending_tx_info *pending_tx_info =
-			vif->pending_tx_info;
-
-		page = alloc_page(GFP_ATOMIC|__GFP_COLD);
-		if (!page)
-			goto err;
-
-		dst_offset = 0;
-		first = NULL;
-		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
-			gop->flags = GNTCOPY_source_gref;
-
-			gop->source.u.ref = txp->gref;
-			gop->source.domid = vif->domid;
-			gop->source.offset = txp->offset;
-
-			gop->dest.domid = DOMID_SELF;
-
-			gop->dest.offset = dst_offset;
-			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-
-			if (dst_offset + txp->size > PAGE_SIZE) {
-				/* This page can only merge a portion
-				 * of tx request. Do not increment any
-				 * pointer / counter here. The txp
-				 * will be dealt with in future
-				 * rounds, eventually hitting the
-				 * `else` branch.
-				 */
-				gop->len = PAGE_SIZE - dst_offset;
-				txp->offset += gop->len;
-				txp->size -= gop->len;
-				dst_offset += gop->len; /* quit loop */
-			} else {
-				/* This tx request can be merged in the page */
-				gop->len = txp->size;
-				dst_offset += gop->len;
-
+	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
+	     shinfo->nr_frags++, txp++, gop++) {
 				index = pending_index(vif->pending_cons++);
-
 				pending_idx = vif->pending_ring[index];
-
-				memcpy(&pending_tx_info[pending_idx].req, txp,
-				       sizeof(*txp));
-
-				/* Poison these fields, corresponding
-				 * fields for head tx req will be set
-				 * to correct values after the loop.
-				 */
-				vif->mmap_pages[pending_idx] = (void *)(~0UL);
-				pending_tx_info[pending_idx].head =
-					INVALID_PENDING_RING_IDX;
-
-				if (!first) {
-					first = &pending_tx_info[pending_idx];
-					start_idx = index;
-					head_idx = pending_idx;
-				}
-
-				txp++;
-				slot++;
-			}
-
-			gop++;
-		}
-
-		first->req.offset = 0;
-		first->req.size = dst_offset;
-		first->head = start_idx;
-		vif->mmap_pages[head_idx] = page;
-		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
+		xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
 
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
@@ -908,9 +840,9 @@ err:
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
 			       struct sk_buff *skb,
-			       struct gnttab_copy **gopp)
+			       struct gnttab_map_grant_ref **gopp)
 {
-	struct gnttab_copy *gop = *gopp;
+	struct gnttab_map_grant_ref *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	struct pending_tx_info *tx_info;
@@ -922,6 +854,16 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	err = gop->status;
 	if (unlikely(err))
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+	else {
+		if (vif->grant_tx_handle[pending_idx] !=
+			NETBACK_INVALID_HANDLE) {
+			netdev_err(vif->dev,
+				"Stale mapped handle! pending_idx %x handle %x\n",
+				pending_idx, vif->grant_tx_handle[pending_idx]);
+			xenvif_fatal_tx_err(vif);
+		}
+		vif->grant_tx_handle[pending_idx] = gop->handle;
+	}
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
@@ -935,18 +877,24 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-		do {
 			newerr = (++gop)->status;
-			if (newerr)
-				break;
-			peek = vif->pending_ring[pending_index(++head)];
-		} while (!pending_tx_is_head(vif, peek));
 
 		if (likely(!newerr)) {
+			if (vif->grant_tx_handle[pending_idx] !=
+				NETBACK_INVALID_HANDLE) {
+				netdev_err(vif->dev,
+					"Stale mapped handle! pending_idx %x handle %x\n",
+					pending_idx,
+					vif->grant_tx_handle[pending_idx]);
+				xenvif_fatal_tx_err(vif);
+			}
+			vif->grant_tx_handle[pending_idx] = gop->handle;
 			/* Had a previous error? Invalidate this fragment. */
-			if (unlikely(err))
+			if (unlikely(err)) {
+				xenvif_idx_unmap(vif, pending_idx);
 				xenvif_idx_release(vif, pending_idx,
 						   XEN_NETIF_RSP_OKAY);
+			}
 			continue;
 		}
 
@@ -959,9 +907,11 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
+		xenvif_idx_unmap(vif, pending_idx);
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xenvif_idx_unmap(vif, pending_idx);
 			xenvif_idx_release(vif, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
@@ -974,7 +924,8 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	return err;
 }
 
-static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb,
+		u16 prev_pending_idx)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -988,6 +939,17 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 
 		pending_idx = frag_get_pending_idx(frag);
 
+		/* If this is not the first frag, chain it to the previous*/
+		if (unlikely(prev_pending_idx == INVALID_PENDING_IDX))
+			skb_shinfo(skb)->destructor_arg =
+				&vif->pending_tx_info[pending_idx].callback_struct;
+		else if (likely(pending_idx != prev_pending_idx))
+			vif->pending_tx_info[prev_pending_idx].callback_struct.ctx =
+				&(vif->pending_tx_info[pending_idx].callback_struct);
+
+		vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
+		prev_pending_idx = pending_idx;
+
 		txp = &vif->pending_tx_info[pending_idx].req;
 		page = virt_to_page(idx_to_kaddr(vif, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
@@ -995,10 +957,15 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
-		/* Take an extra reference to offset xenvif_idx_release */
+		/* Take an extra reference to offset network stack's put_page */
 		get_page(vif->mmap_pages[pending_idx]);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
 	}
+	/* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc
+	 * overlaps with "index", and "mapping" is not set. I think mapping
+	 * should be set. If delivered to local stack, it would drop this
+	 * skb in sk_filter unless the socket has the right to use it.
+	 */
+	skb->pfmemalloc	= false;
 }
 
 static int xenvif_get_extras(struct xenvif *vif,
@@ -1367,7 +1334,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 
 static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
+	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
 	struct sk_buff *skb;
 	int ret;
 
@@ -1475,30 +1442,10 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			}
 		}
 
-		/* XXX could copy straight to head */
-		page = xenvif_alloc_page(vif, pending_idx);
-		if (!page) {
-			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
-			break;
-		}
-
-		gop->source.u.ref = txreq.gref;
-		gop->source.domid = vif->domid;
-		gop->source.offset = txreq.offset;
-
-		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-		gop->dest.domid = DOMID_SELF;
-		gop->dest.offset = txreq.offset;
-
-		gop->len = txreq.size;
-		gop->flags = GNTCOPY_source_gref;
+		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
 
 		gop++;
 
-		memcpy(&vif->pending_tx_info[pending_idx].req,
-		       &txreq, sizeof(txreq));
-		vif->pending_tx_info[pending_idx].head = index;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1527,17 +1474,17 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 
 		vif->tx.req_cons = idx;
 
-		if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+		if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops))
 			break;
 	}
 
-	return gop - vif->tx_copy_ops;
+	return gop - vif->tx_map_ops;
 }
 
 
 static int xenvif_tx_submit(struct xenvif *vif)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops;
+	struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
 	struct sk_buff *skb;
 	int work_done = 0;
 
@@ -1561,12 +1508,17 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		memcpy(skb->data,
 		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
 		       data_len);
+		vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
 			txp->offset += data_len;
 			txp->size -= data_len;
+			skb_shinfo(skb)->destructor_arg =
+				&vif->pending_tx_info[pending_idx].callback_struct;
 		} else {
 			/* Schedule a response immediately. */
+			skb_shinfo(skb)->destructor_arg = NULL;
+			xenvif_idx_unmap(vif, pending_idx);
 			xenvif_idx_release(vif, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
@@ -1576,7 +1528,11 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xenvif_fill_frags(vif, skb);
+		xenvif_fill_frags(vif,
+			skb,
+			skb_shinfo(skb)->destructor_arg ?
+					pending_idx :
+					INVALID_PENDING_IDX);
 
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
@@ -1590,6 +1546,8 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		if (checksum_setup(vif, skb)) {
 			netdev_dbg(vif->dev,
 				   "Can't setup checksum in net_tx_action\n");
+			if (skb_shinfo(skb)->destructor_arg)
+				skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			kfree_skb(skb);
 			continue;
 		}
@@ -1601,6 +1559,14 @@ static int xenvif_tx_submit(struct xenvif *vif)
 
 		work_done++;
 
+		/* Set this flag right before netif_receive_skb, otherwise
+		 * someone might think this packet already left netback, and
+		 * do a skb_copy_ubufs while we are still in control of the
+		 * skb. E.g. the __pskb_pull_tail earlier can do such thing.
+		 */
+		if (skb_shinfo(skb)->destructor_arg)
+			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+
 		netif_receive_skb(skb);
 	}
 
@@ -1711,7 +1677,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
 	unsigned nr_gops;
-	int work_done;
+	int work_done, ret;
 
 	if (unlikely(!tx_work_todo(vif)))
 		return 0;
@@ -1721,7 +1687,13 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
 	if (nr_gops == 0)
 		return 0;
 
-	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
+	if (nr_gops) {
+		ret = gnttab_map_refs(vif->tx_map_ops,
+			NULL,
+			vif->pages_to_map,
+			nr_gops);
+		BUG_ON(ret);
+	}
 
 	work_done = xenvif_tx_submit(vif);
 
@@ -1732,61 +1704,37 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 			       u8 status)
 {
 	struct pending_tx_info *pending_tx_info;
-	pending_ring_idx_t head;
+	pending_ring_idx_t index;
 	u16 peek; /* peek into next tx request */
+	unsigned long flags;
 
-	BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
-
-	/* Already complete? */
-	if (vif->mmap_pages[pending_idx] == NULL)
-		return;
-
-	pending_tx_info = &vif->pending_tx_info[pending_idx];
-
-	head = pending_tx_info->head;
-
-	BUG_ON(!pending_tx_is_head(vif, head));
-	BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
-
-	do {
-		pending_ring_idx_t index;
-		pending_ring_idx_t idx = pending_index(head);
-		u16 info_idx = vif->pending_ring[idx];
-
-		pending_tx_info = &vif->pending_tx_info[info_idx];
+		pending_tx_info = &vif->pending_tx_info[pending_idx];
+		spin_lock_irqsave(&vif->response_lock, flags);
 		make_tx_response(vif, &pending_tx_info->req, status);
-
-		/* Setting any number other than
-		 * INVALID_PENDING_RING_IDX indicates this slot is
-		 * starting a new packet / ending a previous packet.
-		 */
-		pending_tx_info->head = 0;
-
-		index = pending_index(vif->pending_prod++);
-		vif->pending_ring[index] = vif->pending_ring[info_idx];
-
-		peek = vif->pending_ring[pending_index(++head)];
-
-	} while (!pending_tx_is_head(vif, peek));
-
-	put_page(vif->mmap_pages[pending_idx]);
-	vif->mmap_pages[pending_idx] = NULL;
+		index = pending_index(vif->pending_prod);
+		vif->pending_ring[index] = pending_idx;
+		/* TX shouldn't use the index before we give it back here */
+		mb();
+		vif->pending_prod++;
+		spin_unlock_irqrestore(&vif->response_lock, flags);
 }
 
 void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
 {
 	int ret;
+	struct gnttab_unmap_grant_ref tx_unmap_op;
+
 	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
 		netdev_err(vif->dev,
 				"Trying to unmap invalid handle! pending_idx: %x\n",
 				pending_idx);
 		return;
 	}
-	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
+	gnttab_set_unmap_op(&tx_unmap_op,
 			idx_to_kaddr(vif, pending_idx),
 			GNTMAP_host_map,
 			vif->grant_tx_handle[pending_idx]);
-	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+	ret = gnttab_unmap_refs(&tx_unmap_op,
 			NULL,
 			&vif->mmap_pages[pending_idx],
 			1);
@@ -1845,7 +1793,6 @@ static inline int rx_work_todo(struct xenvif *vif)
 
 static inline int tx_work_todo(struct xenvif *vif)
 {
-
 	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
 	    (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
 	     < MAX_PENDING_REQS))

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
@ 2013-12-12 23:48   ` Zoltan Kiss
  0 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

This patch changes the grant copy on the TX patch to grant mapping

v2:
- delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
  request
- mark the effect of using ballooned pages in a comment
- place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
  before netif_receive_skb, and mark the importance of it
- grab dealloc_lock before __napi_complete to avoid contention with the
  callback's napi_schedule
- handle fragmented packets where first request < PKT_PROT_LINE
- fix up error path when checksum_setup failed
- check before teardown for pending grants, and start complain if they are
  there after 10 second

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/interface.c |   57 +++++++-
 drivers/net/xen-netback/netback.c   |  257 ++++++++++++++---------------------
 2 files changed, 156 insertions(+), 158 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1c27e9e..42946de 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -122,7 +122,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if vif is not ready */
-	if (vif->task == NULL || !xenvif_schedulable(vif))
+	if (vif->task == NULL ||
+		vif->dealloc_task == NULL ||
+		!xenvif_schedulable(vif))
 		goto drop;
 
 	/* At best we'll need one slot for the header and one for each
@@ -335,8 +337,25 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->pending_prod = MAX_PENDING_REQS;
 	for (i = 0; i < MAX_PENDING_REQS; i++)
 		vif->pending_ring[i] = i;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->mmap_pages[i] = NULL;
+	/* If ballooning is disabled, this will consume real memory, so you
+	 * better enable it. The long term solution would be to use just a
+	 * bunch of valid page descriptors, without dependency on ballooning
+	 */
+	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
+		vif->mmap_pages,
+		false);
+	if (err) {
+		netdev_err(dev, "Could not reserve mmap_pages\n");
+		return NULL;
+	}
+	for (i = 0; i < MAX_PENDING_REQS; i++) {
+		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
+			{ .callback = xenvif_zerocopy_callback,
+			  .ctx = NULL,
+			  .desc = i };
+		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
+	}
+	init_timer(&vif->dealloc_delay);
 
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
@@ -380,6 +399,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		goto err;
 
 	init_waitqueue_head(&vif->wq);
+	init_waitqueue_head(&vif->dealloc_wq);
 
 	if (tx_evtchn == rx_evtchn) {
 		/* feature-split-event-channels == 0 */
@@ -421,6 +441,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		goto err_rx_unbind;
 	}
 
+	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
+				   (void *)vif, "%s-dealloc", vif->dev->name);
+	if (IS_ERR(vif->dealloc_task)) {
+		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
+		err = PTR_ERR(vif->dealloc_task);
+		goto err_rx_unbind;
+	}
+
 	vif->task = task;
 
 	rtnl_lock();
@@ -433,6 +461,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	rtnl_unlock();
 
 	wake_up_process(vif->task);
+	wake_up_process(vif->dealloc_task);
 
 	return 0;
 
@@ -470,6 +499,12 @@ void xenvif_disconnect(struct xenvif *vif)
 		vif->task = NULL;
 	}
 
+	if (vif->dealloc_task) {
+		del_timer_sync(&vif->dealloc_delay);
+		kthread_stop(vif->dealloc_task);
+		vif->dealloc_task = NULL;
+	}
+
 	if (vif->tx_irq) {
 		if (vif->tx_irq == vif->rx_irq)
 			unbind_from_irqhandler(vif->tx_irq, vif);
@@ -485,6 +520,22 @@ void xenvif_disconnect(struct xenvif *vif)
 
 void xenvif_free(struct xenvif *vif)
 {
+	int i, unmap_timeout = 0;
+
+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
+			i = 0;
+			unmap_timeout++;
+			msleep(1000);
+			if (unmap_timeout > 9 &&
+				net_ratelimit())
+				netdev_err(vif->dev,
+					"Page still granted! Index: %x\n", i);
+		}
+	}
+
+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
+
 	netif_napi_del(&vif->napi);
 
 	unregister_netdev(vif->dev);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 3ddc474..20352be 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
 			  struct xen_netif_tx_request *txp, RING_IDX end)
 {
 	RING_IDX cons = vif->tx.req_cons;
+	unsigned long flags;
 
 	do {
+		spin_lock_irqsave(&vif->response_lock, flags);
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
+		spin_unlock_irqrestore(&vif->response_lock, flags);
 		if (cons == end)
 			break;
 		txp = RING_GET_REQUEST(&vif->tx, cons++);
@@ -786,10 +789,10 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
 
 }
 
-static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
+static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
-					       struct gnttab_copy *gop)
+					       struct gnttab_map_grant_ref *gop)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
@@ -810,83 +813,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
-	/* Coalesce tx requests, at this point the packet passed in
-	 * should be <= 64K. Any packets larger than 64K have been
-	 * handled in xenvif_count_requests().
-	 */
-	for (shinfo->nr_frags = slot = start; slot < nr_slots;
-	     shinfo->nr_frags++) {
-		struct pending_tx_info *pending_tx_info =
-			vif->pending_tx_info;
-
-		page = alloc_page(GFP_ATOMIC|__GFP_COLD);
-		if (!page)
-			goto err;
-
-		dst_offset = 0;
-		first = NULL;
-		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
-			gop->flags = GNTCOPY_source_gref;
-
-			gop->source.u.ref = txp->gref;
-			gop->source.domid = vif->domid;
-			gop->source.offset = txp->offset;
-
-			gop->dest.domid = DOMID_SELF;
-
-			gop->dest.offset = dst_offset;
-			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-
-			if (dst_offset + txp->size > PAGE_SIZE) {
-				/* This page can only merge a portion
-				 * of tx request. Do not increment any
-				 * pointer / counter here. The txp
-				 * will be dealt with in future
-				 * rounds, eventually hitting the
-				 * `else` branch.
-				 */
-				gop->len = PAGE_SIZE - dst_offset;
-				txp->offset += gop->len;
-				txp->size -= gop->len;
-				dst_offset += gop->len; /* quit loop */
-			} else {
-				/* This tx request can be merged in the page */
-				gop->len = txp->size;
-				dst_offset += gop->len;
-
+	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
+	     shinfo->nr_frags++, txp++, gop++) {
 				index = pending_index(vif->pending_cons++);
-
 				pending_idx = vif->pending_ring[index];
-
-				memcpy(&pending_tx_info[pending_idx].req, txp,
-				       sizeof(*txp));
-
-				/* Poison these fields, corresponding
-				 * fields for head tx req will be set
-				 * to correct values after the loop.
-				 */
-				vif->mmap_pages[pending_idx] = (void *)(~0UL);
-				pending_tx_info[pending_idx].head =
-					INVALID_PENDING_RING_IDX;
-
-				if (!first) {
-					first = &pending_tx_info[pending_idx];
-					start_idx = index;
-					head_idx = pending_idx;
-				}
-
-				txp++;
-				slot++;
-			}
-
-			gop++;
-		}
-
-		first->req.offset = 0;
-		first->req.size = dst_offset;
-		first->head = start_idx;
-		vif->mmap_pages[head_idx] = page;
-		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
+		xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
 
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
@@ -908,9 +840,9 @@ err:
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
 			       struct sk_buff *skb,
-			       struct gnttab_copy **gopp)
+			       struct gnttab_map_grant_ref **gopp)
 {
-	struct gnttab_copy *gop = *gopp;
+	struct gnttab_map_grant_ref *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	struct pending_tx_info *tx_info;
@@ -922,6 +854,16 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	err = gop->status;
 	if (unlikely(err))
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+	else {
+		if (vif->grant_tx_handle[pending_idx] !=
+			NETBACK_INVALID_HANDLE) {
+			netdev_err(vif->dev,
+				"Stale mapped handle! pending_idx %x handle %x\n",
+				pending_idx, vif->grant_tx_handle[pending_idx]);
+			xenvif_fatal_tx_err(vif);
+		}
+		vif->grant_tx_handle[pending_idx] = gop->handle;
+	}
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
@@ -935,18 +877,24 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-		do {
 			newerr = (++gop)->status;
-			if (newerr)
-				break;
-			peek = vif->pending_ring[pending_index(++head)];
-		} while (!pending_tx_is_head(vif, peek));
 
 		if (likely(!newerr)) {
+			if (vif->grant_tx_handle[pending_idx] !=
+				NETBACK_INVALID_HANDLE) {
+				netdev_err(vif->dev,
+					"Stale mapped handle! pending_idx %x handle %x\n",
+					pending_idx,
+					vif->grant_tx_handle[pending_idx]);
+				xenvif_fatal_tx_err(vif);
+			}
+			vif->grant_tx_handle[pending_idx] = gop->handle;
 			/* Had a previous error? Invalidate this fragment. */
-			if (unlikely(err))
+			if (unlikely(err)) {
+				xenvif_idx_unmap(vif, pending_idx);
 				xenvif_idx_release(vif, pending_idx,
 						   XEN_NETIF_RSP_OKAY);
+			}
 			continue;
 		}
 
@@ -959,9 +907,11 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
+		xenvif_idx_unmap(vif, pending_idx);
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xenvif_idx_unmap(vif, pending_idx);
 			xenvif_idx_release(vif, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
@@ -974,7 +924,8 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	return err;
 }
 
-static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb,
+		u16 prev_pending_idx)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -988,6 +939,17 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 
 		pending_idx = frag_get_pending_idx(frag);
 
+		/* If this is not the first frag, chain it to the previous*/
+		if (unlikely(prev_pending_idx == INVALID_PENDING_IDX))
+			skb_shinfo(skb)->destructor_arg =
+				&vif->pending_tx_info[pending_idx].callback_struct;
+		else if (likely(pending_idx != prev_pending_idx))
+			vif->pending_tx_info[prev_pending_idx].callback_struct.ctx =
+				&(vif->pending_tx_info[pending_idx].callback_struct);
+
+		vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
+		prev_pending_idx = pending_idx;
+
 		txp = &vif->pending_tx_info[pending_idx].req;
 		page = virt_to_page(idx_to_kaddr(vif, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
@@ -995,10 +957,15 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
-		/* Take an extra reference to offset xenvif_idx_release */
+		/* Take an extra reference to offset network stack's put_page */
 		get_page(vif->mmap_pages[pending_idx]);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
 	}
+	/* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc
+	 * overlaps with "index", and "mapping" is not set. I think mapping
+	 * should be set. If delivered to local stack, it would drop this
+	 * skb in sk_filter unless the socket has the right to use it.
+	 */
+	skb->pfmemalloc	= false;
 }
 
 static int xenvif_get_extras(struct xenvif *vif,
@@ -1367,7 +1334,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 
 static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
+	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
 	struct sk_buff *skb;
 	int ret;
 
@@ -1475,30 +1442,10 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			}
 		}
 
-		/* XXX could copy straight to head */
-		page = xenvif_alloc_page(vif, pending_idx);
-		if (!page) {
-			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
-			break;
-		}
-
-		gop->source.u.ref = txreq.gref;
-		gop->source.domid = vif->domid;
-		gop->source.offset = txreq.offset;
-
-		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-		gop->dest.domid = DOMID_SELF;
-		gop->dest.offset = txreq.offset;
-
-		gop->len = txreq.size;
-		gop->flags = GNTCOPY_source_gref;
+		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
 
 		gop++;
 
-		memcpy(&vif->pending_tx_info[pending_idx].req,
-		       &txreq, sizeof(txreq));
-		vif->pending_tx_info[pending_idx].head = index;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1527,17 +1474,17 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 
 		vif->tx.req_cons = idx;
 
-		if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+		if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops))
 			break;
 	}
 
-	return gop - vif->tx_copy_ops;
+	return gop - vif->tx_map_ops;
 }
 
 
 static int xenvif_tx_submit(struct xenvif *vif)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops;
+	struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
 	struct sk_buff *skb;
 	int work_done = 0;
 
@@ -1561,12 +1508,17 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		memcpy(skb->data,
 		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
 		       data_len);
+		vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
 			txp->offset += data_len;
 			txp->size -= data_len;
+			skb_shinfo(skb)->destructor_arg =
+				&vif->pending_tx_info[pending_idx].callback_struct;
 		} else {
 			/* Schedule a response immediately. */
+			skb_shinfo(skb)->destructor_arg = NULL;
+			xenvif_idx_unmap(vif, pending_idx);
 			xenvif_idx_release(vif, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
@@ -1576,7 +1528,11 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xenvif_fill_frags(vif, skb);
+		xenvif_fill_frags(vif,
+			skb,
+			skb_shinfo(skb)->destructor_arg ?
+					pending_idx :
+					INVALID_PENDING_IDX);
 
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
@@ -1590,6 +1546,8 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		if (checksum_setup(vif, skb)) {
 			netdev_dbg(vif->dev,
 				   "Can't setup checksum in net_tx_action\n");
+			if (skb_shinfo(skb)->destructor_arg)
+				skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			kfree_skb(skb);
 			continue;
 		}
@@ -1601,6 +1559,14 @@ static int xenvif_tx_submit(struct xenvif *vif)
 
 		work_done++;
 
+		/* Set this flag right before netif_receive_skb, otherwise
+		 * someone might think this packet already left netback, and
+		 * do a skb_copy_ubufs while we are still in control of the
+		 * skb. E.g. the __pskb_pull_tail earlier can do such thing.
+		 */
+		if (skb_shinfo(skb)->destructor_arg)
+			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+
 		netif_receive_skb(skb);
 	}
 
@@ -1711,7 +1677,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
 	unsigned nr_gops;
-	int work_done;
+	int work_done, ret;
 
 	if (unlikely(!tx_work_todo(vif)))
 		return 0;
@@ -1721,7 +1687,13 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
 	if (nr_gops == 0)
 		return 0;
 
-	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
+	if (nr_gops) {
+		ret = gnttab_map_refs(vif->tx_map_ops,
+			NULL,
+			vif->pages_to_map,
+			nr_gops);
+		BUG_ON(ret);
+	}
 
 	work_done = xenvif_tx_submit(vif);
 
@@ -1732,61 +1704,37 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 			       u8 status)
 {
 	struct pending_tx_info *pending_tx_info;
-	pending_ring_idx_t head;
+	pending_ring_idx_t index;
 	u16 peek; /* peek into next tx request */
+	unsigned long flags;
 
-	BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
-
-	/* Already complete? */
-	if (vif->mmap_pages[pending_idx] == NULL)
-		return;
-
-	pending_tx_info = &vif->pending_tx_info[pending_idx];
-
-	head = pending_tx_info->head;
-
-	BUG_ON(!pending_tx_is_head(vif, head));
-	BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
-
-	do {
-		pending_ring_idx_t index;
-		pending_ring_idx_t idx = pending_index(head);
-		u16 info_idx = vif->pending_ring[idx];
-
-		pending_tx_info = &vif->pending_tx_info[info_idx];
+		pending_tx_info = &vif->pending_tx_info[pending_idx];
+		spin_lock_irqsave(&vif->response_lock, flags);
 		make_tx_response(vif, &pending_tx_info->req, status);
-
-		/* Setting any number other than
-		 * INVALID_PENDING_RING_IDX indicates this slot is
-		 * starting a new packet / ending a previous packet.
-		 */
-		pending_tx_info->head = 0;
-
-		index = pending_index(vif->pending_prod++);
-		vif->pending_ring[index] = vif->pending_ring[info_idx];
-
-		peek = vif->pending_ring[pending_index(++head)];
-
-	} while (!pending_tx_is_head(vif, peek));
-
-	put_page(vif->mmap_pages[pending_idx]);
-	vif->mmap_pages[pending_idx] = NULL;
+		index = pending_index(vif->pending_prod);
+		vif->pending_ring[index] = pending_idx;
+		/* TX shouldn't use the index before we give it back here */
+		mb();
+		vif->pending_prod++;
+		spin_unlock_irqrestore(&vif->response_lock, flags);
 }
 
 void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
 {
 	int ret;
+	struct gnttab_unmap_grant_ref tx_unmap_op;
+
 	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
 		netdev_err(vif->dev,
 				"Trying to unmap invalid handle! pending_idx: %x\n",
 				pending_idx);
 		return;
 	}
-	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
+	gnttab_set_unmap_op(&tx_unmap_op,
 			idx_to_kaddr(vif, pending_idx),
 			GNTMAP_host_map,
 			vif->grant_tx_handle[pending_idx]);
-	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+	ret = gnttab_unmap_refs(&tx_unmap_op,
 			NULL,
 			&vif->mmap_pages[pending_idx],
 			1);
@@ -1845,7 +1793,6 @@ static inline int rx_work_todo(struct xenvif *vif)
 
 static inline int tx_work_todo(struct xenvif *vif)
 {
-
 	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
 	    (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
 	     < MAX_PENDING_REQS))

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
@ 2013-12-12 23:48   ` Zoltan Kiss
  2013-12-12 23:48 ` Zoltan Kiss
                     ` (16 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These became obsolate with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

v2:
- move the indentation fixup patch here

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h  |   37 +------------------
 drivers/net/xen-netback/netback.c |   72 ++++++++-----------------------------
 2 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 33cb12c..f286879 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -46,39 +46,9 @@
 #include <xen/xenbus.h>
 
 typedef unsigned int pending_ring_idx_t;
-#define INVALID_PENDING_RING_IDX (~0U)
 
-/* For the head field in pending_tx_info: it is used to indicate
- * whether this tx info is the head of one or more coalesced requests.
- *
- * When head != INVALID_PENDING_RING_IDX, it means the start of a new
- * tx requests queue and the end of previous queue.
- *
- * An example sequence of head fields (I = INVALID_PENDING_RING_IDX):
- *
- * ...|0 I I I|5 I|9 I I I|...
- * -->|<-INUSE----------------
- *
- * After consuming the first slot(s) we have:
- *
- * ...|V V V V|5 I|9 I I I|...
- * -----FREE->|<-INUSE--------
- *
- * where V stands for "valid pending ring index". Any number other
- * than INVALID_PENDING_RING_IDX is OK. These entries are considered
- * free and can contain any number other than
- * INVALID_PENDING_RING_IDX. In practice we use 0.
- *
- * The in use non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the
- * above example) number is the index into pending_tx_info and
- * mmap_pages arrays.
- */
 struct pending_tx_info {
-	struct xen_netif_tx_request req; /* coalesced tx request */
-	pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX
-				  * if it is head of one or more tx
-				  * reqs
-				  */
+	struct xen_netif_tx_request req; /* tx request */
 	/* callback data for released SKBs. The	callback is always
 	 * xenvif_zerocopy_callback, ctx points to the next fragment, desc
 	 * contains the pending_idx
@@ -128,11 +98,6 @@ struct xenvif {
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
 	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
-	/* Coalescing tx requests before copying makes number of grant
-	 * copy ops greater or equal to number of slots required. In
-	 * worst case a tx request consumes 2 gnttab_copy.
-	 */
-	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 20352be..88a0fad 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -71,16 +71,6 @@ module_param(fatal_skb_slots, uint, 0444);
  */
 #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN
 
-/*
- * If head != INVALID_PENDING_RING_IDX, it means this tx request is head of
- * one or more merged tx requests, otherwise it is the continuation of
- * previous tx request.
- */
-static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx)
-{
-	return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
-}
-
 static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 			       u8 status);
 
@@ -762,19 +752,6 @@ static int xenvif_count_requests(struct xenvif *vif,
 	return slots;
 }
 
-static struct page *xenvif_alloc_page(struct xenvif *vif,
-				      u16 pending_idx)
-{
-	struct page *page;
-
-	page = alloc_page(GFP_ATOMIC|__GFP_COLD);
-	if (!page)
-		return NULL;
-	vif->mmap_pages[pending_idx] = page;
-
-	return page;
-}
-
 static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
 	       struct xen_netif_tx_request *txp,
 	       struct gnttab_map_grant_ref *gop)
@@ -797,13 +774,9 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
 	u16 pending_idx = *((u16 *)skb->data);
-	u16 head_idx = 0;
-	int slot, start;
-	struct page *page;
-	pending_ring_idx_t index, start_idx = 0;
-	uint16_t dst_offset;
+	int start;
+	pending_ring_idx_t index;
 	unsigned int nr_slots;
-	struct pending_tx_info *first = NULL;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
@@ -815,8 +788,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 
 	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
 	     shinfo->nr_frags++, txp++, gop++) {
-				index = pending_index(vif->pending_cons++);
-				pending_idx = vif->pending_ring[index];
+		index = pending_index(vif->pending_cons++);
+		pending_idx = vif->pending_ring[index];
 		xenvif_tx_create_gop(vif, pending_idx, txp, gop);
 		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
@@ -824,18 +797,6 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
 
 	return gop;
-err:
-	/* Unwind, freeing all pages and sending error responses. */
-	while (shinfo->nr_frags-- > start) {
-		xenvif_idx_release(vif,
-				frag_get_pending_idx(&frags[shinfo->nr_frags]),
-				XEN_NETIF_RSP_ERROR);
-	}
-	/* The head too, if necessary. */
-	if (start)
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
-
-	return NULL;
 }
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
@@ -848,7 +809,6 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
-	u16 peek; /* peek into next tx request */
 
 	/* Check status of header. */
 	err = gop->status;
@@ -870,14 +830,12 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
-		pending_ring_idx_t head;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
 		tx_info = &vif->pending_tx_info[pending_idx];
-		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-			newerr = (++gop)->status;
+		newerr = (++gop)->status;
 
 		if (likely(!newerr)) {
 			if (vif->grant_tx_handle[pending_idx] !=
@@ -1343,7 +1301,6 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		(skb_queue_len(&vif->tx_queue) < budget)) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
-		struct page *page;
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
 		u16 pending_idx;
 		RING_IDX idx;
@@ -1705,18 +1662,17 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
-	u16 peek; /* peek into next tx request */
 	unsigned long flags;
 
-		pending_tx_info = &vif->pending_tx_info[pending_idx];
-		spin_lock_irqsave(&vif->response_lock, flags);
-		make_tx_response(vif, &pending_tx_info->req, status);
-		index = pending_index(vif->pending_prod);
-		vif->pending_ring[index] = pending_idx;
-		/* TX shouldn't use the index before we give it back here */
-		mb();
-		vif->pending_prod++;
-		spin_unlock_irqrestore(&vif->response_lock, flags);
+	pending_tx_info = &vif->pending_tx_info[pending_idx];
+	spin_lock_irqsave(&vif->response_lock, flags);
+	make_tx_response(vif, &pending_tx_info->req, status);
+	index = pending_index(vif->pending_prod);
+	vif->pending_ring[index] = pending_idx;
+	/* TX shouldn't use the index before we give it back here */
+	mb();
+	vif->pending_prod++;
+	spin_unlock_irqrestore(&vif->response_lock, flags);
 }
 
 void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations
@ 2013-12-12 23:48   ` Zoltan Kiss
  0 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These became obsolate with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

v2:
- move the indentation fixup patch here

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h  |   37 +------------------
 drivers/net/xen-netback/netback.c |   72 ++++++++-----------------------------
 2 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 33cb12c..f286879 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -46,39 +46,9 @@
 #include <xen/xenbus.h>
 
 typedef unsigned int pending_ring_idx_t;
-#define INVALID_PENDING_RING_IDX (~0U)
 
-/* For the head field in pending_tx_info: it is used to indicate
- * whether this tx info is the head of one or more coalesced requests.
- *
- * When head != INVALID_PENDING_RING_IDX, it means the start of a new
- * tx requests queue and the end of previous queue.
- *
- * An example sequence of head fields (I = INVALID_PENDING_RING_IDX):
- *
- * ...|0 I I I|5 I|9 I I I|...
- * -->|<-INUSE----------------
- *
- * After consuming the first slot(s) we have:
- *
- * ...|V V V V|5 I|9 I I I|...
- * -----FREE->|<-INUSE--------
- *
- * where V stands for "valid pending ring index". Any number other
- * than INVALID_PENDING_RING_IDX is OK. These entries are considered
- * free and can contain any number other than
- * INVALID_PENDING_RING_IDX. In practice we use 0.
- *
- * The in use non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the
- * above example) number is the index into pending_tx_info and
- * mmap_pages arrays.
- */
 struct pending_tx_info {
-	struct xen_netif_tx_request req; /* coalesced tx request */
-	pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX
-				  * if it is head of one or more tx
-				  * reqs
-				  */
+	struct xen_netif_tx_request req; /* tx request */
 	/* callback data for released SKBs. The	callback is always
 	 * xenvif_zerocopy_callback, ctx points to the next fragment, desc
 	 * contains the pending_idx
@@ -128,11 +98,6 @@ struct xenvif {
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
 	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
-	/* Coalescing tx requests before copying makes number of grant
-	 * copy ops greater or equal to number of slots required. In
-	 * worst case a tx request consumes 2 gnttab_copy.
-	 */
-	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 20352be..88a0fad 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -71,16 +71,6 @@ module_param(fatal_skb_slots, uint, 0444);
  */
 #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN
 
-/*
- * If head != INVALID_PENDING_RING_IDX, it means this tx request is head of
- * one or more merged tx requests, otherwise it is the continuation of
- * previous tx request.
- */
-static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx)
-{
-	return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
-}
-
 static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 			       u8 status);
 
@@ -762,19 +752,6 @@ static int xenvif_count_requests(struct xenvif *vif,
 	return slots;
 }
 
-static struct page *xenvif_alloc_page(struct xenvif *vif,
-				      u16 pending_idx)
-{
-	struct page *page;
-
-	page = alloc_page(GFP_ATOMIC|__GFP_COLD);
-	if (!page)
-		return NULL;
-	vif->mmap_pages[pending_idx] = page;
-
-	return page;
-}
-
 static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
 	       struct xen_netif_tx_request *txp,
 	       struct gnttab_map_grant_ref *gop)
@@ -797,13 +774,9 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
 	u16 pending_idx = *((u16 *)skb->data);
-	u16 head_idx = 0;
-	int slot, start;
-	struct page *page;
-	pending_ring_idx_t index, start_idx = 0;
-	uint16_t dst_offset;
+	int start;
+	pending_ring_idx_t index;
 	unsigned int nr_slots;
-	struct pending_tx_info *first = NULL;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
@@ -815,8 +788,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 
 	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
 	     shinfo->nr_frags++, txp++, gop++) {
-				index = pending_index(vif->pending_cons++);
-				pending_idx = vif->pending_ring[index];
+		index = pending_index(vif->pending_cons++);
+		pending_idx = vif->pending_ring[index];
 		xenvif_tx_create_gop(vif, pending_idx, txp, gop);
 		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
@@ -824,18 +797,6 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
 
 	return gop;
-err:
-	/* Unwind, freeing all pages and sending error responses. */
-	while (shinfo->nr_frags-- > start) {
-		xenvif_idx_release(vif,
-				frag_get_pending_idx(&frags[shinfo->nr_frags]),
-				XEN_NETIF_RSP_ERROR);
-	}
-	/* The head too, if necessary. */
-	if (start)
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
-
-	return NULL;
 }
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
@@ -848,7 +809,6 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
-	u16 peek; /* peek into next tx request */
 
 	/* Check status of header. */
 	err = gop->status;
@@ -870,14 +830,12 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
-		pending_ring_idx_t head;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
 		tx_info = &vif->pending_tx_info[pending_idx];
-		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-			newerr = (++gop)->status;
+		newerr = (++gop)->status;
 
 		if (likely(!newerr)) {
 			if (vif->grant_tx_handle[pending_idx] !=
@@ -1343,7 +1301,6 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 		(skb_queue_len(&vif->tx_queue) < budget)) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
-		struct page *page;
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
 		u16 pending_idx;
 		RING_IDX idx;
@@ -1705,18 +1662,17 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
-	u16 peek; /* peek into next tx request */
 	unsigned long flags;
 
-		pending_tx_info = &vif->pending_tx_info[pending_idx];
-		spin_lock_irqsave(&vif->response_lock, flags);
-		make_tx_response(vif, &pending_tx_info->req, status);
-		index = pending_index(vif->pending_prod);
-		vif->pending_ring[index] = pending_idx;
-		/* TX shouldn't use the index before we give it back here */
-		mb();
-		vif->pending_prod++;
-		spin_unlock_irqrestore(&vif->response_lock, flags);
+	pending_tx_info = &vif->pending_tx_info[pending_idx];
+	spin_lock_irqsave(&vif->response_lock, flags);
+	make_tx_response(vif, &pending_tx_info->req, status);
+	index = pending_index(vif->pending_prod);
+	vif->pending_ring[index] = pending_idx;
+	/* TX shouldn't use the index before we give it back here */
+	mb();
+	vif->pending_prod++;
+	spin_unlock_irqrestore(&vif->response_lock, flags);
 }
 
 void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 4/9] xen-netback: Change RX path for mapped SKB fragments
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (3 preceding siblings ...)
  2013-12-12 23:48   ` Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

RX path need to know if the SKB fragments are stored on pages from another
domain.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/netback.c |   46 +++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 10d0cf0..e070475 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -322,7 +322,9 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
 static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 				 struct netrx_pending_operations *npo,
 				 struct page *page, unsigned long size,
-				 unsigned long offset, int *head)
+				 unsigned long offset, int *head,
+				 struct xenvif *foreign_vif,
+				 grant_ref_t foreign_gref)
 {
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
@@ -364,8 +366,15 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->flags = GNTCOPY_dest_gref;
 		copy_gop->len = bytes;
 
-		copy_gop->source.domid = DOMID_SELF;
-		copy_gop->source.u.gmfn = virt_to_mfn(page_address(page));
+		if (foreign_vif) {
+			copy_gop->source.domid = foreign_vif->domid;
+			copy_gop->source.u.ref = foreign_gref;
+			copy_gop->flags |= GNTCOPY_source_gref;
+		} else {
+			copy_gop->source.domid = DOMID_SELF;
+			copy_gop->source.u.gmfn =
+				virt_to_mfn(page_address(page));
+		}
 		copy_gop->source.offset = offset;
 
 		copy_gop->dest.domid = vif->domid;
@@ -426,6 +435,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	int old_meta_prod;
 	int gso_type;
 	int gso_size;
+	struct ubuf_info *ubuf = skb_shinfo(skb)->destructor_arg;
+	grant_ref_t foreign_grefs[MAX_SKB_FRAGS];
+	struct xenvif *foreign_vif = NULL;
 
 	old_meta_prod = npo->meta_prod;
 
@@ -466,6 +478,26 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
 
+	if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
+		 (ubuf->callback == &xenvif_zerocopy_callback)) {
+		u16 pending_idx = ubuf->desc;
+		int i = 0;
+		struct pending_tx_info *temp =
+			container_of(ubuf,
+				struct pending_tx_info,
+				callback_struct);
+		foreign_vif =
+			container_of(temp - pending_idx,
+				struct xenvif,
+				pending_tx_info[0]);
+		do {
+			pending_idx = ubuf->desc;
+			foreign_grefs[i++] =
+				foreign_vif->pending_tx_info[pending_idx].req.gref;
+			ubuf = (struct ubuf_info *) ubuf->ctx;
+		} while (ubuf);
+	}
+
 	data = skb->data;
 	while (data < skb_tail_pointer(skb)) {
 		unsigned int offset = offset_in_page(data);
@@ -475,7 +507,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 			len = skb_tail_pointer(skb) - data;
 
 		xenvif_gop_frag_copy(vif, skb, npo,
-				     virt_to_page(data), len, offset, &head);
+				     virt_to_page(data), len, offset, &head,
+				     NULL,
+				     0);
 		data += len;
 	}
 
@@ -484,7 +518,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
 				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
 				     skb_shinfo(skb)->frags[i].page_offset,
-				     &head);
+				     &head,
+				     foreign_vif,
+				     foreign_grefs[i]);
 	}
 
 	return npo->meta_prod - old_meta_prod;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 4/9] xen-netback: Change RX path for mapped SKB fragments
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (4 preceding siblings ...)
  2013-12-12 23:48 ` [PATCH net-next v2 4/9] xen-netback: Change RX path for mapped SKB fragments Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` [PATCH net-next v2 5/9] xen-netback: Add stat counters for zerocopy Zoltan Kiss
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

RX path need to know if the SKB fragments are stored on pages from another
domain.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/netback.c |   46 +++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 10d0cf0..e070475 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -322,7 +322,9 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
 static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 				 struct netrx_pending_operations *npo,
 				 struct page *page, unsigned long size,
-				 unsigned long offset, int *head)
+				 unsigned long offset, int *head,
+				 struct xenvif *foreign_vif,
+				 grant_ref_t foreign_gref)
 {
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
@@ -364,8 +366,15 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->flags = GNTCOPY_dest_gref;
 		copy_gop->len = bytes;
 
-		copy_gop->source.domid = DOMID_SELF;
-		copy_gop->source.u.gmfn = virt_to_mfn(page_address(page));
+		if (foreign_vif) {
+			copy_gop->source.domid = foreign_vif->domid;
+			copy_gop->source.u.ref = foreign_gref;
+			copy_gop->flags |= GNTCOPY_source_gref;
+		} else {
+			copy_gop->source.domid = DOMID_SELF;
+			copy_gop->source.u.gmfn =
+				virt_to_mfn(page_address(page));
+		}
 		copy_gop->source.offset = offset;
 
 		copy_gop->dest.domid = vif->domid;
@@ -426,6 +435,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	int old_meta_prod;
 	int gso_type;
 	int gso_size;
+	struct ubuf_info *ubuf = skb_shinfo(skb)->destructor_arg;
+	grant_ref_t foreign_grefs[MAX_SKB_FRAGS];
+	struct xenvif *foreign_vif = NULL;
 
 	old_meta_prod = npo->meta_prod;
 
@@ -466,6 +478,26 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
 
+	if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
+		 (ubuf->callback == &xenvif_zerocopy_callback)) {
+		u16 pending_idx = ubuf->desc;
+		int i = 0;
+		struct pending_tx_info *temp =
+			container_of(ubuf,
+				struct pending_tx_info,
+				callback_struct);
+		foreign_vif =
+			container_of(temp - pending_idx,
+				struct xenvif,
+				pending_tx_info[0]);
+		do {
+			pending_idx = ubuf->desc;
+			foreign_grefs[i++] =
+				foreign_vif->pending_tx_info[pending_idx].req.gref;
+			ubuf = (struct ubuf_info *) ubuf->ctx;
+		} while (ubuf);
+	}
+
 	data = skb->data;
 	while (data < skb_tail_pointer(skb)) {
 		unsigned int offset = offset_in_page(data);
@@ -475,7 +507,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 			len = skb_tail_pointer(skb) - data;
 
 		xenvif_gop_frag_copy(vif, skb, npo,
-				     virt_to_page(data), len, offset, &head);
+				     virt_to_page(data), len, offset, &head,
+				     NULL,
+				     0);
 		data += len;
 	}
 
@@ -484,7 +518,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
 				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
 				     skb_shinfo(skb)->frags[i].page_offset,
-				     &head);
+				     &head,
+				     foreign_vif,
+				     foreign_grefs[i]);
 	}
 
 	return npo->meta_prod - old_meta_prod;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 5/9] xen-netback: Add stat counters for zerocopy
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (5 preceding siblings ...)
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 +++
 drivers/net/xen-netback/interface.c |   15 +++++++++++++++
 drivers/net/xen-netback/netback.c   |    9 ++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 419e63c..e3c28ff 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -155,6 +155,9 @@ struct xenvif {
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
+	unsigned long tx_zerocopy_sent;
+	unsigned long tx_zerocopy_success;
+	unsigned long tx_zerocopy_fail;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index af5216f..75fe683 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -239,6 +239,21 @@ static const struct xenvif_stat {
 		"rx_gso_checksum_fixup",
 		offsetof(struct xenvif, rx_gso_checksum_fixup)
 	},
+	/* If (sent != success + fail), there are probably packets never
+	 * freed up properly!
+	 */
+	{
+		"tx_zerocopy_sent",
+		offsetof(struct xenvif, tx_zerocopy_sent),
+	},
+	{
+		"tx_zerocopy_success",
+		offsetof(struct xenvif, tx_zerocopy_success),
+	},
+	{
+		"tx_zerocopy_fail",
+		offsetof(struct xenvif, tx_zerocopy_fail)
+	},
 };
 
 static int xenvif_get_sset_count(struct net_device *dev, int string_set)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a1b03e4..e2dd565 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1611,8 +1611,10 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		 * skb_copy_ubufs while we are still in control of the skb. E.g.
 		 * the __pskb_pull_tail earlier can do such thing.
 		 */
-		if (skb_shinfo(skb)->destructor_arg)
+		if (skb_shinfo(skb)->destructor_arg) {
 			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			vif->tx_zerocopy_sent++;
+		}
 
 		netif_receive_skb(skb);
 	}
@@ -1645,6 +1647,11 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 		napi_schedule(&vif->napi);
 	} while (ubuf);
 	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+
+	if (likely(zerocopy_success))
+		vif->tx_zerocopy_success++;
+	else
+		vif->tx_zerocopy_fail++;
 }
 
 static inline void xenvif_tx_action_dealloc(struct xenvif *vif)

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 5/9] xen-netback: Add stat counters for zerocopy
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (6 preceding siblings ...)
  2013-12-12 23:48 ` [PATCH net-next v2 5/9] xen-netback: Add stat counters for zerocopy Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 +++
 drivers/net/xen-netback/interface.c |   15 +++++++++++++++
 drivers/net/xen-netback/netback.c   |    9 ++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 419e63c..e3c28ff 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -155,6 +155,9 @@ struct xenvif {
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
+	unsigned long tx_zerocopy_sent;
+	unsigned long tx_zerocopy_success;
+	unsigned long tx_zerocopy_fail;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index af5216f..75fe683 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -239,6 +239,21 @@ static const struct xenvif_stat {
 		"rx_gso_checksum_fixup",
 		offsetof(struct xenvif, rx_gso_checksum_fixup)
 	},
+	/* If (sent != success + fail), there are probably packets never
+	 * freed up properly!
+	 */
+	{
+		"tx_zerocopy_sent",
+		offsetof(struct xenvif, tx_zerocopy_sent),
+	},
+	{
+		"tx_zerocopy_success",
+		offsetof(struct xenvif, tx_zerocopy_success),
+	},
+	{
+		"tx_zerocopy_fail",
+		offsetof(struct xenvif, tx_zerocopy_fail)
+	},
 };
 
 static int xenvif_get_sset_count(struct net_device *dev, int string_set)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a1b03e4..e2dd565 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1611,8 +1611,10 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		 * skb_copy_ubufs while we are still in control of the skb. E.g.
 		 * the __pskb_pull_tail earlier can do such thing.
 		 */
-		if (skb_shinfo(skb)->destructor_arg)
+		if (skb_shinfo(skb)->destructor_arg) {
 			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			vif->tx_zerocopy_sent++;
+		}
 
 		netif_receive_skb(skb);
 	}
@@ -1645,6 +1647,11 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 		napi_schedule(&vif->napi);
 	} while (ubuf);
 	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+
+	if (likely(zerocopy_success))
+		vif->tx_zerocopy_success++;
+	else
+		vif->tx_zerocopy_fail++;
 }
 
 static inline void xenvif_tx_action_dealloc(struct xenvif *vif)

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (8 preceding siblings ...)
  2013-12-12 23:48 ` [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-13 15:43   ` Wei Liu
  2013-12-13 15:43   ` Wei Liu
  2013-12-12 23:48 ` [PATCH net-next v2 7/9] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
                   ` (7 subsequent siblings)
  17 siblings, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy the whole stuff into a brand new skb and send it to the stack
- unmap the 2 old skb's pages

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

---
 drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
 1 file changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e26cdda..f6ed1c8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	u16 pending_idx = *((u16 *)skb->data);
 	int start;
 	pending_ring_idx_t index;
-	unsigned int nr_slots;
+	unsigned int nr_slots, frag_overflow = 0;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
 	 */
+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
+		shinfo->nr_frags = MAX_SKB_FRAGS;
+	}
 	nr_slots = shinfo->nr_frags;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -926,6 +930,33 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
 
+	if (frag_overflow) {
+		struct sk_buff *nskb = alloc_skb(NET_SKB_PAD + NET_IP_ALIGN,
+				GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(nskb == NULL)) {
+			netdev_err(vif->dev,
+				   "Can't allocate the frag_list skb.\n");
+			return NULL;
+		}
+
+		/* Packets passed to netif_rx() must have some headroom. */
+		skb_reserve(nskb, NET_SKB_PAD + NET_IP_ALIGN);
+
+		shinfo = skb_shinfo(nskb);
+		frags = shinfo->frags;
+
+		for (shinfo->nr_frags = 0; shinfo->nr_frags < frag_overflow;
+		     shinfo->nr_frags++, txp++, gop++) {
+			index = pending_index(vif->pending_cons++);
+			pending_idx = vif->pending_ring[index];
+			xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+			frag_set_pending_idx(&frags[shinfo->nr_frags],
+				pending_idx);
+		}
+
+		skb_shinfo(skb)->frag_list = nskb;
+	}
+
 	return gop;
 }
 
@@ -939,6 +970,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
+	struct sk_buff *first_skb = NULL;
 
 	/* Check status of header. */
 	err = gop->status;
@@ -958,6 +990,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
+check_frags:
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
 
@@ -992,11 +1025,20 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
 			continue;
-
 		/* First error: invalidate header and preceding fragments. */
-		pending_idx = *((u16 *)skb->data);
-		xenvif_idx_unmap(vif, pending_idx);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		if (!first_skb) {
+			pending_idx = *((u16 *)skb->data);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif,
+				pending_idx,
+				XEN_NETIF_RSP_OKAY);
+		} else {
+			pending_idx = *((u16 *)first_skb->data);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif,
+				pending_idx,
+				XEN_NETIF_RSP_OKAY);
+		}
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
 			xenvif_idx_unmap(vif, pending_idx);
@@ -1008,6 +1050,32 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		err = newerr;
 	}
 
+	if (shinfo->frag_list) {
+		first_skb = skb;
+		skb = shinfo->frag_list;
+		shinfo = skb_shinfo(skb);
+		nr_frags = shinfo->nr_frags;
+		start = 0;
+
+		goto check_frags;
+	}
+
+	/* There was a mapping error in the frag_list skb. We have to unmap
+	 * the first skb's frags
+	 */
+	if (first_skb && err) {
+		int j;
+		shinfo = skb_shinfo(first_skb);
+		pending_idx = *((u16 *)first_skb->data);
+		start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
+		for (j = start; j < shinfo->nr_frags; j++) {
+			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		}
+	}
+
 	*gopp = gop + 1;
 	return err;
 }
@@ -1541,6 +1609,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
 		unsigned data_len;
+		struct sk_buff *nskb = NULL;
 
 		pending_idx = *((u16 *)skb->data);
 		txp = &vif->pending_tx_info[pending_idx].req;
@@ -1583,6 +1652,23 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 					pending_idx :
 					INVALID_PENDING_IDX);
 
+		if (skb_shinfo(skb)->frag_list) {
+			nskb = skb_shinfo(skb)->frag_list;
+			xenvif_fill_frags(vif, nskb, INVALID_PENDING_IDX);
+			skb->len += nskb->len;
+			skb->data_len += nskb->len;
+			skb->truesize += nskb->truesize;
+			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			vif->tx_zerocopy_sent += 2;
+			nskb = skb;
+
+			skb = skb_copy_expand(skb,
+					0,
+					0,
+					GFP_ATOMIC | __GFP_NOWARN);
+			skb_shinfo(skb)->destructor_arg = NULL;
+		}
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
@@ -1619,6 +1705,9 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		}
 
 		netif_receive_skb(skb);
+
+		if (nskb)
+			kfree_skb(nskb);
 	}
 
 	return work_done;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (7 preceding siblings ...)
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy the whole stuff into a brand new skb and send it to the stack
- unmap the 2 old skb's pages

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

---
 drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
 1 file changed, 94 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index e26cdda..f6ed1c8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	u16 pending_idx = *((u16 *)skb->data);
 	int start;
 	pending_ring_idx_t index;
-	unsigned int nr_slots;
+	unsigned int nr_slots, frag_overflow = 0;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
 	 */
+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
+		shinfo->nr_frags = MAX_SKB_FRAGS;
+	}
 	nr_slots = shinfo->nr_frags;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -926,6 +930,33 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
 
+	if (frag_overflow) {
+		struct sk_buff *nskb = alloc_skb(NET_SKB_PAD + NET_IP_ALIGN,
+				GFP_ATOMIC | __GFP_NOWARN);
+		if (unlikely(nskb == NULL)) {
+			netdev_err(vif->dev,
+				   "Can't allocate the frag_list skb.\n");
+			return NULL;
+		}
+
+		/* Packets passed to netif_rx() must have some headroom. */
+		skb_reserve(nskb, NET_SKB_PAD + NET_IP_ALIGN);
+
+		shinfo = skb_shinfo(nskb);
+		frags = shinfo->frags;
+
+		for (shinfo->nr_frags = 0; shinfo->nr_frags < frag_overflow;
+		     shinfo->nr_frags++, txp++, gop++) {
+			index = pending_index(vif->pending_cons++);
+			pending_idx = vif->pending_ring[index];
+			xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+			frag_set_pending_idx(&frags[shinfo->nr_frags],
+				pending_idx);
+		}
+
+		skb_shinfo(skb)->frag_list = nskb;
+	}
+
 	return gop;
 }
 
@@ -939,6 +970,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
+	struct sk_buff *first_skb = NULL;
 
 	/* Check status of header. */
 	err = gop->status;
@@ -958,6 +990,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
+check_frags:
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
 
@@ -992,11 +1025,20 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
 			continue;
-
 		/* First error: invalidate header and preceding fragments. */
-		pending_idx = *((u16 *)skb->data);
-		xenvif_idx_unmap(vif, pending_idx);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		if (!first_skb) {
+			pending_idx = *((u16 *)skb->data);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif,
+				pending_idx,
+				XEN_NETIF_RSP_OKAY);
+		} else {
+			pending_idx = *((u16 *)first_skb->data);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif,
+				pending_idx,
+				XEN_NETIF_RSP_OKAY);
+		}
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
 			xenvif_idx_unmap(vif, pending_idx);
@@ -1008,6 +1050,32 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		err = newerr;
 	}
 
+	if (shinfo->frag_list) {
+		first_skb = skb;
+		skb = shinfo->frag_list;
+		shinfo = skb_shinfo(skb);
+		nr_frags = shinfo->nr_frags;
+		start = 0;
+
+		goto check_frags;
+	}
+
+	/* There was a mapping error in the frag_list skb. We have to unmap
+	 * the first skb's frags
+	 */
+	if (first_skb && err) {
+		int j;
+		shinfo = skb_shinfo(first_skb);
+		pending_idx = *((u16 *)first_skb->data);
+		start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
+		for (j = start; j < shinfo->nr_frags; j++) {
+			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		}
+	}
+
 	*gopp = gop + 1;
 	return err;
 }
@@ -1541,6 +1609,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
 		unsigned data_len;
+		struct sk_buff *nskb = NULL;
 
 		pending_idx = *((u16 *)skb->data);
 		txp = &vif->pending_tx_info[pending_idx].req;
@@ -1583,6 +1652,23 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 					pending_idx :
 					INVALID_PENDING_IDX);
 
+		if (skb_shinfo(skb)->frag_list) {
+			nskb = skb_shinfo(skb)->frag_list;
+			xenvif_fill_frags(vif, nskb, INVALID_PENDING_IDX);
+			skb->len += nskb->len;
+			skb->data_len += nskb->len;
+			skb->truesize += nskb->truesize;
+			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			vif->tx_zerocopy_sent += 2;
+			nskb = skb;
+
+			skb = skb_copy_expand(skb,
+					0,
+					0,
+					GFP_ATOMIC | __GFP_NOWARN);
+			skb_shinfo(skb)->destructor_arg = NULL;
+		}
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
@@ -1619,6 +1705,9 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		}
 
 		netif_receive_skb(skb);
+
+		if (nskb)
+			kfree_skb(nskb);
 	}
 
 	return work_done;

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 7/9] xen-netback: Add stat counters for frag_list skbs
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (10 preceding siblings ...)
  2013-12-12 23:48 ` [PATCH net-next v2 7/9] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These counters help determine how often the guest sends a packet with more
than MAX_SKB_FRAGS frags.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    7 +++++++
 drivers/net/xen-netback/netback.c   |    1 +
 3 files changed, 9 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e3c28ff..c037efb 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -158,6 +158,7 @@ struct xenvif {
 	unsigned long tx_zerocopy_sent;
 	unsigned long tx_zerocopy_success;
 	unsigned long tx_zerocopy_fail;
+	unsigned long tx_frag_overflow;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index ac27af3..b7daf8d 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -254,6 +254,13 @@ static const struct xenvif_stat {
 		"tx_zerocopy_fail",
 		offsetof(struct xenvif, tx_zerocopy_fail)
 	},
+	/* Number of packets exceeding MAX_SKB_FRAG slots. You should use
+	 * a guest with the same MAX_SKB_FRAG
+	 */
+	{
+		"tx_frag_overflow",
+		offsetof(struct xenvif, tx_frag_overflow)
+	},
 };
 
 static int xenvif_get_sset_count(struct net_device *dev, int string_set)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 9841429..4305965 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1656,6 +1656,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			vif->tx_zerocopy_sent += 2;
+			vif->tx_frag_overflow++;
 			nskb = skb;
 
 			skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC | __GFP_NOWARN);

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 7/9] xen-netback: Add stat counters for frag_list skbs
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (9 preceding siblings ...)
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These counters help determine how often the guest sends a packet with more
than MAX_SKB_FRAGS frags.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    7 +++++++
 drivers/net/xen-netback/netback.c   |    1 +
 3 files changed, 9 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e3c28ff..c037efb 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -158,6 +158,7 @@ struct xenvif {
 	unsigned long tx_zerocopy_sent;
 	unsigned long tx_zerocopy_success;
 	unsigned long tx_zerocopy_fail;
+	unsigned long tx_frag_overflow;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index ac27af3..b7daf8d 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -254,6 +254,13 @@ static const struct xenvif_stat {
 		"tx_zerocopy_fail",
 		offsetof(struct xenvif, tx_zerocopy_fail)
 	},
+	/* Number of packets exceeding MAX_SKB_FRAG slots. You should use
+	 * a guest with the same MAX_SKB_FRAG
+	 */
+	{
+		"tx_frag_overflow",
+		offsetof(struct xenvif, tx_frag_overflow)
+	},
 };
 
 static int xenvif_get_sset_count(struct net_device *dev, int string_set)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 9841429..4305965 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1656,6 +1656,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			vif->tx_zerocopy_sent += 2;
+			vif->tx_frag_overflow++;
 			nskb = skb;
 
 			skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC | __GFP_NOWARN);

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (11 preceding siblings ...)
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-13 15:44   ` Wei Liu
  2013-12-13 15:44   ` Wei Liu
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (4 subsequent siblings)
  17 siblings, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    5 +++++
 drivers/net/xen-netback/interface.c |   21 ++++++++++++++++++++-
 drivers/net/xen-netback/netback.c   |   10 ++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e022812..a834818 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -128,6 +128,8 @@ struct xenvif {
 	 */
 	bool rx_event;
 
+	struct timer_list wake_queue;
+
 	/* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
 	 * head/fragment page uses 2 copy operations because it
 	 * straddles two buffers in the frontend.
@@ -223,4 +225,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
 
 extern bool separate_tx_rx_irq;
 
+extern unsigned int rx_drain_timeout_msecs;
+extern unsigned int rx_drain_timeout_jiffies;
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7aa3535..eaf406f 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -114,6 +114,17 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void xenvif_wake_queue(unsigned long data)
+{
+	struct xenvif *vif = (struct xenvif *)data;
+
+	netdev_err(vif->dev, "timer fires\n");
+	if (netif_queue_stopped(vif->dev)) {
+		netdev_err(vif->dev, "draining TX queue\n");
+		netif_wake_queue(vif->dev);
+	}
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
@@ -141,8 +152,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * then turn off the queue to give the ring a chance to
 	 * drain.
 	 */
-	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
+	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
+		vif->wake_queue.function = xenvif_wake_queue;
+		vif->wake_queue.data = (unsigned long)vif;
 		xenvif_stop_queue(vif);
+		mod_timer(&vif->wake_queue,
+			jiffies + rx_drain_timeout_jiffies);
+	}
 
 	skb_queue_tail(&vif->rx_queue, skb);
 	xenvif_kick_thread(vif);
@@ -341,6 +357,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	init_timer(&vif->credit_timeout);
 	vif->credit_window_start = get_jiffies_64();
 
+	init_timer(&vif->wake_queue);
+
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
@@ -515,6 +533,7 @@ void xenvif_disconnect(struct xenvif *vif)
 		xenvif_carrier_off(vif);
 
 	if (vif->task) {
+		del_timer_sync(&vif->wake_queue);
 		kthread_stop(vif->task);
 		vif->task = NULL;
 	}
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 1078ae8..e6c56b5 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -64,6 +64,14 @@ static unsigned int fatal_skb_slots = FATAL_SKB_SLOTS_DEFAULT;
 module_param(fatal_skb_slots, uint, 0444);
 
 /*
+ * When guest ring is filled up, qdisc queues the packets for us, but we have
+ * to timeout them, otherwise other guests' packets can get stucked there
+ */
+unsigned int rx_drain_timeout_msecs = 10000;
+module_param(rx_drain_timeout_msecs, uint, 0444);
+unsigned int rx_drain_timeout_jiffies;
+
+/*
  * To avoid confusion, we define XEN_NETBK_LEGACY_SLOTS_MAX indicating
  * the maximum slots a valid packet can use. Now this value is defined
  * to be XEN_NETIF_NR_SLOTS_MIN, which is supposed to be supported by
@@ -2051,6 +2059,8 @@ static int __init netback_init(void)
 	if (rc)
 		goto failed_init;
 
+	rx_drain_timeout_jiffies = msecs_to_jiffies(rx_drain_timeout_msecs);
+
 	return 0;
 
 failed_init:

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (12 preceding siblings ...)
  2013-12-12 23:48 ` [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-12 23:48 ` [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    5 +++++
 drivers/net/xen-netback/interface.c |   21 ++++++++++++++++++++-
 drivers/net/xen-netback/netback.c   |   10 ++++++++++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e022812..a834818 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -128,6 +128,8 @@ struct xenvif {
 	 */
 	bool rx_event;
 
+	struct timer_list wake_queue;
+
 	/* Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
 	 * head/fragment page uses 2 copy operations because it
 	 * straddles two buffers in the frontend.
@@ -223,4 +225,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
 
 extern bool separate_tx_rx_irq;
 
+extern unsigned int rx_drain_timeout_msecs;
+extern unsigned int rx_drain_timeout_jiffies;
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7aa3535..eaf406f 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -114,6 +114,17 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void xenvif_wake_queue(unsigned long data)
+{
+	struct xenvif *vif = (struct xenvif *)data;
+
+	netdev_err(vif->dev, "timer fires\n");
+	if (netif_queue_stopped(vif->dev)) {
+		netdev_err(vif->dev, "draining TX queue\n");
+		netif_wake_queue(vif->dev);
+	}
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
@@ -141,8 +152,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * then turn off the queue to give the ring a chance to
 	 * drain.
 	 */
-	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
+	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
+		vif->wake_queue.function = xenvif_wake_queue;
+		vif->wake_queue.data = (unsigned long)vif;
 		xenvif_stop_queue(vif);
+		mod_timer(&vif->wake_queue,
+			jiffies + rx_drain_timeout_jiffies);
+	}
 
 	skb_queue_tail(&vif->rx_queue, skb);
 	xenvif_kick_thread(vif);
@@ -341,6 +357,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	init_timer(&vif->credit_timeout);
 	vif->credit_window_start = get_jiffies_64();
 
+	init_timer(&vif->wake_queue);
+
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
@@ -515,6 +533,7 @@ void xenvif_disconnect(struct xenvif *vif)
 		xenvif_carrier_off(vif);
 
 	if (vif->task) {
+		del_timer_sync(&vif->wake_queue);
 		kthread_stop(vif->task);
 		vif->task = NULL;
 	}
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 1078ae8..e6c56b5 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -64,6 +64,14 @@ static unsigned int fatal_skb_slots = FATAL_SKB_SLOTS_DEFAULT;
 module_param(fatal_skb_slots, uint, 0444);
 
 /*
+ * When guest ring is filled up, qdisc queues the packets for us, but we have
+ * to timeout them, otherwise other guests' packets can get stucked there
+ */
+unsigned int rx_drain_timeout_msecs = 10000;
+module_param(rx_drain_timeout_msecs, uint, 0444);
+unsigned int rx_drain_timeout_jiffies;
+
+/*
  * To avoid confusion, we define XEN_NETBK_LEGACY_SLOTS_MAX indicating
  * the maximum slots a valid packet can use. Now this value is defined
  * to be XEN_NETIF_NR_SLOTS_MIN, which is supposed to be supported by
@@ -2051,6 +2059,8 @@ static int __init netback_init(void)
 	if (rc)
 		goto failed_init;
 
+	rx_drain_timeout_jiffies = msecs_to_jiffies(rx_drain_timeout_msecs);
+
 	return 0;
 
 failed_init:

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (13 preceding siblings ...)
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-13 15:44   ` Wei Liu
  2013-12-13 15:44   ` Wei Liu
  2013-12-12 23:48 ` Zoltan Kiss
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h  |    2 ++
 drivers/net/xen-netback/netback.c |   30 +++++++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 05fa6be..a834818 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -111,6 +111,8 @@ struct xenvif {
 	u16 dealloc_ring[MAX_PENDING_REQS];
 	struct task_struct *dealloc_task;
 	wait_queue_head_t dealloc_wq;
+	struct timer_list dealloc_delay;
+	bool dealloc_delay_timed_out;
 
 	/* Use kthread for guest RX */
 	struct task_struct *task;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 5252416..f4a9876 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -136,6 +136,11 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
 		vif->pending_prod + vif->pending_cons;
 }
 
+static inline pending_ring_idx_t nr_free_slots(struct xen_netif_tx_back_ring *ring)
+{
+	return ring->nr_ents -	(ring->sring->req_prod - ring->rsp_prod_pvt);
+}
+
 bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed)
 {
 	RING_IDX prod, cons;
@@ -1898,10 +1903,33 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
+static void xenvif_dealloc_delay(unsigned long data)
+{
+	struct xenvif *vif = (struct xenvif *)data;
+
+	vif->dealloc_delay_timed_out = true;
+	wake_up(&vif->dealloc_wq);
+}
+
 static inline int tx_dealloc_work_todo(struct xenvif *vif)
 {
-	if (vif->dealloc_cons != vif->dealloc_prod)
+	if (vif->dealloc_cons != vif->dealloc_prod) {
+		if ((nr_free_slots(&vif->tx) > 2 * XEN_NETBK_LEGACY_SLOTS_MAX) &&
+			(vif->dealloc_prod - vif->dealloc_cons < MAX_PENDING_REQS / 4) &&
+			!vif->dealloc_delay_timed_out) {
+			if (!timer_pending(&vif->dealloc_delay)) {
+				vif->dealloc_delay.function = xenvif_dealloc_delay;
+				vif->dealloc_delay.data = (unsigned long)vif;
+				mod_timer(&vif->dealloc_delay,
+					jiffies + msecs_to_jiffies(1));
+
+			}
+			return 0;
+		}
+		del_timer_sync(&vif->dealloc_delay);
+		vif->dealloc_delay_timed_out = false;
 		return 1;
+	}
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (14 preceding siblings ...)
  2013-12-12 23:48 ` [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
@ 2013-12-12 23:48 ` Zoltan Kiss
  2013-12-16  6:32 ` [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy annie li
  2013-12-16  6:32 ` [Xen-devel] " annie li
  17 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h  |    2 ++
 drivers/net/xen-netback/netback.c |   30 +++++++++++++++++++++++++++++-
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 05fa6be..a834818 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -111,6 +111,8 @@ struct xenvif {
 	u16 dealloc_ring[MAX_PENDING_REQS];
 	struct task_struct *dealloc_task;
 	wait_queue_head_t dealloc_wq;
+	struct timer_list dealloc_delay;
+	bool dealloc_delay_timed_out;
 
 	/* Use kthread for guest RX */
 	struct task_struct *task;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 5252416..f4a9876 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -136,6 +136,11 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
 		vif->pending_prod + vif->pending_cons;
 }
 
+static inline pending_ring_idx_t nr_free_slots(struct xen_netif_tx_back_ring *ring)
+{
+	return ring->nr_ents -	(ring->sring->req_prod - ring->rsp_prod_pvt);
+}
+
 bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed)
 {
 	RING_IDX prod, cons;
@@ -1898,10 +1903,33 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
+static void xenvif_dealloc_delay(unsigned long data)
+{
+	struct xenvif *vif = (struct xenvif *)data;
+
+	vif->dealloc_delay_timed_out = true;
+	wake_up(&vif->dealloc_wq);
+}
+
 static inline int tx_dealloc_work_todo(struct xenvif *vif)
 {
-	if (vif->dealloc_cons != vif->dealloc_prod)
+	if (vif->dealloc_cons != vif->dealloc_prod) {
+		if ((nr_free_slots(&vif->tx) > 2 * XEN_NETBK_LEGACY_SLOTS_MAX) &&
+			(vif->dealloc_prod - vif->dealloc_cons < MAX_PENDING_REQS / 4) &&
+			!vif->dealloc_delay_timed_out) {
+			if (!timer_pending(&vif->dealloc_delay)) {
+				vif->dealloc_delay.function = xenvif_dealloc_delay;
+				vif->dealloc_delay.data = (unsigned long)vif;
+				mod_timer(&vif->dealloc_delay,
+					jiffies + msecs_to_jiffies(1));
+
+			}
+			return 0;
+		}
+		del_timer_sync(&vif->dealloc_delay);
+		vif->dealloc_delay_timed_out = false;
 		return 1;
+	}
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-12 23:48 ` [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
  2013-12-13 15:31   ` Wei Liu
@ 2013-12-13 15:31   ` Wei Liu
  2013-12-13 18:22     ` Zoltan Kiss
  2013-12-13 18:22     ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:31 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, Dec 12, 2013 at 11:48:09PM +0000, Zoltan Kiss wrote:
> This patch contains the new definitions necessary for grant mapping.
> 
> v2:
> - move unmapping to separate thread. The NAPI instance has to be scheduled
>   even from thread context, which can cause huge delays
> - that causes unfortunately bigger struct xenvif
> - store grant handle after checking validity
> 

If the size of xenvif really becomes a problem, you can try to make
sratch space like struct gnttab_copy per-cpu. The downside is that
approach requires much coding and carefully guard against race
conditions. You would need to consider cost v.s. benefit.

> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 
> ---
[...]
>  #define XENVIF_QUEUE_LENGTH 32
>  #define XENVIF_NAPI_WEIGHT  64
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index c1b7a42..3ddc474 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
>  	return page;
>  }
>  
> +static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
> +	       struct xen_netif_tx_request *txp,
> +	       struct gnttab_map_grant_ref *gop)
> +{
> +	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
> +	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
> +			  GNTMAP_host_map | GNTMAP_readonly,
> +			  txp->gref, vif->domid);
> +
> +	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
> +	       sizeof(*txp));
> +
> +}
> +

This helper function is not used until next patch. Probably you can move
it to the second patch.

The same applies to other helper functions as well. Move them to the
patch they are used. It would be easier for people to review.

>  static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
>  					       struct sk_buff *skb,
>  					       struct xen_netif_tx_request *txp,
> @@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
>  	return work_done;
>  }
>  
> +void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
> +{

Do we care about zerocopy_success? I don't see it used in this function.

> +	unsigned long flags;
> +	pending_ring_idx_t index;
> +	u16 pending_idx = ubuf->desc;
> +	struct pending_tx_info *temp =
> +		container_of(ubuf, struct pending_tx_info, callback_struct);
> +	struct xenvif *vif =
> +		container_of(temp - pending_idx, struct xenvif,
> +			pending_tx_info[0]);
> +

The third parameter to container_of should be the name of the member
within the struct.

> +	spin_lock_irqsave(&vif->dealloc_lock, flags);
> +	do {
> +		pending_idx = ubuf->desc;
> +		ubuf = (struct ubuf_info *) ubuf->ctx;
> +		index = pending_index(vif->dealloc_prod);
> +		vif->dealloc_ring[index] = pending_idx;
> +		/* Sync with xenvif_tx_action_dealloc:
> +		 * insert idx then incr producer.
> +		 */
> +		smp_wmb();
> +		vif->dealloc_prod++;
> +	} while (ubuf);
> +	wake_up(&vif->dealloc_wq);
> +	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
> +}
> +
> +static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
> +{
> +	struct gnttab_unmap_grant_ref *gop;
> +	pending_ring_idx_t dc, dp;
> +	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
> +	unsigned int i = 0;
> +
> +	dc = vif->dealloc_cons;
> +	gop = vif->tx_unmap_ops;
> +
> +	/* Free up any grants we have finished using */
> +	do {
> +		dp = vif->dealloc_prod;
> +
> +		/* Ensure we see all indices enqueued by netif_idx_release(). */

There is no netif_idx_release in netback code. :-)

> +		smp_rmb();
> +
> +		while (dc != dp) {
> +			pending_idx =
> +				vif->dealloc_ring[pending_index(dc++)];
> +
> +			/* Already unmapped? */
> +			if (vif->grant_tx_handle[pending_idx] ==
> +				NETBACK_INVALID_HANDLE) {
> +				netdev_err(vif->dev,
> +					"Trying to unmap invalid handle! "
> +					"pending_idx: %x\n", pending_idx);
> +				continue;
> +			}

Should this be BUG_ON? AIUI this kthread should be the only one doing
unmap, right?

> +
> +			pending_idx_release[gop-vif->tx_unmap_ops] =
> +				pending_idx;
> +			vif->pages_to_unmap[gop-vif->tx_unmap_ops] =
> +				vif->mmap_pages[pending_idx];
> +			gnttab_set_unmap_op(gop,
> +					idx_to_kaddr(vif, pending_idx),
> +					GNTMAP_host_map,
> +					vif->grant_tx_handle[pending_idx]);
> +			vif->grant_tx_handle[pending_idx] =
> +				NETBACK_INVALID_HANDLE;
> +			++gop;
> +		}
> +
[...]
> +}
>  
>  static void make_tx_response(struct xenvif *vif,
>  			     struct xen_netif_tx_request *txp,
> @@ -1720,6 +1854,14 @@ static inline int tx_work_todo(struct xenvif *vif)
>  	return 0;
>  }
>  
> +static inline int tx_dealloc_work_todo(struct xenvif *vif)

static inline bool

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-12 23:48 ` [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
@ 2013-12-13 15:31   ` Wei Liu
  2013-12-13 15:31   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:31 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Thu, Dec 12, 2013 at 11:48:09PM +0000, Zoltan Kiss wrote:
> This patch contains the new definitions necessary for grant mapping.
> 
> v2:
> - move unmapping to separate thread. The NAPI instance has to be scheduled
>   even from thread context, which can cause huge delays
> - that causes unfortunately bigger struct xenvif
> - store grant handle after checking validity
> 

If the size of xenvif really becomes a problem, you can try to make
sratch space like struct gnttab_copy per-cpu. The downside is that
approach requires much coding and carefully guard against race
conditions. You would need to consider cost v.s. benefit.

> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 
> ---
[...]
>  #define XENVIF_QUEUE_LENGTH 32
>  #define XENVIF_NAPI_WEIGHT  64
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index c1b7a42..3ddc474 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
>  	return page;
>  }
>  
> +static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
> +	       struct xen_netif_tx_request *txp,
> +	       struct gnttab_map_grant_ref *gop)
> +{
> +	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
> +	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
> +			  GNTMAP_host_map | GNTMAP_readonly,
> +			  txp->gref, vif->domid);
> +
> +	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
> +	       sizeof(*txp));
> +
> +}
> +

This helper function is not used until next patch. Probably you can move
it to the second patch.

The same applies to other helper functions as well. Move them to the
patch they are used. It would be easier for people to review.

>  static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
>  					       struct sk_buff *skb,
>  					       struct xen_netif_tx_request *txp,
> @@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
>  	return work_done;
>  }
>  
> +void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
> +{

Do we care about zerocopy_success? I don't see it used in this function.

> +	unsigned long flags;
> +	pending_ring_idx_t index;
> +	u16 pending_idx = ubuf->desc;
> +	struct pending_tx_info *temp =
> +		container_of(ubuf, struct pending_tx_info, callback_struct);
> +	struct xenvif *vif =
> +		container_of(temp - pending_idx, struct xenvif,
> +			pending_tx_info[0]);
> +

The third parameter to container_of should be the name of the member
within the struct.

> +	spin_lock_irqsave(&vif->dealloc_lock, flags);
> +	do {
> +		pending_idx = ubuf->desc;
> +		ubuf = (struct ubuf_info *) ubuf->ctx;
> +		index = pending_index(vif->dealloc_prod);
> +		vif->dealloc_ring[index] = pending_idx;
> +		/* Sync with xenvif_tx_action_dealloc:
> +		 * insert idx then incr producer.
> +		 */
> +		smp_wmb();
> +		vif->dealloc_prod++;
> +	} while (ubuf);
> +	wake_up(&vif->dealloc_wq);
> +	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
> +}
> +
> +static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
> +{
> +	struct gnttab_unmap_grant_ref *gop;
> +	pending_ring_idx_t dc, dp;
> +	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
> +	unsigned int i = 0;
> +
> +	dc = vif->dealloc_cons;
> +	gop = vif->tx_unmap_ops;
> +
> +	/* Free up any grants we have finished using */
> +	do {
> +		dp = vif->dealloc_prod;
> +
> +		/* Ensure we see all indices enqueued by netif_idx_release(). */

There is no netif_idx_release in netback code. :-)

> +		smp_rmb();
> +
> +		while (dc != dp) {
> +			pending_idx =
> +				vif->dealloc_ring[pending_index(dc++)];
> +
> +			/* Already unmapped? */
> +			if (vif->grant_tx_handle[pending_idx] ==
> +				NETBACK_INVALID_HANDLE) {
> +				netdev_err(vif->dev,
> +					"Trying to unmap invalid handle! "
> +					"pending_idx: %x\n", pending_idx);
> +				continue;
> +			}

Should this be BUG_ON? AIUI this kthread should be the only one doing
unmap, right?

> +
> +			pending_idx_release[gop-vif->tx_unmap_ops] =
> +				pending_idx;
> +			vif->pages_to_unmap[gop-vif->tx_unmap_ops] =
> +				vif->mmap_pages[pending_idx];
> +			gnttab_set_unmap_op(gop,
> +					idx_to_kaddr(vif, pending_idx),
> +					GNTMAP_host_map,
> +					vif->grant_tx_handle[pending_idx]);
> +			vif->grant_tx_handle[pending_idx] =
> +				NETBACK_INVALID_HANDLE;
> +			++gop;
> +		}
> +
[...]
> +}
>  
>  static void make_tx_response(struct xenvif *vif,
>  			     struct xen_netif_tx_request *txp,
> @@ -1720,6 +1854,14 @@ static inline int tx_work_todo(struct xenvif *vif)
>  	return 0;
>  }
>  
> +static inline int tx_dealloc_work_todo(struct xenvif *vif)

static inline bool

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-12 23:48   ` Zoltan Kiss
  (?)
  (?)
@ 2013-12-13 15:36   ` Wei Liu
  2013-12-16 15:38     ` Zoltan Kiss
  2013-12-16 15:38     ` Zoltan Kiss
  -1 siblings, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:36 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
> This patch changes the grant copy on the TX patch to grant mapping
> 
> v2:
> - delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
                                                      ^ PKT_PROT_LEN
>   request
> - mark the effect of using ballooned pages in a comment
> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>   before netif_receive_skb, and mark the importance of it
> - grab dealloc_lock before __napi_complete to avoid contention with the
>   callback's napi_schedule
> - handle fragmented packets where first request < PKT_PROT_LINE
                                                    ^ PKT_PROT_LEN
> - fix up error path when checksum_setup failed
> - check before teardown for pending grants, and start complain if they are
>   there after 10 second
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
[...]
>  void xenvif_free(struct xenvif *vif)
>  {
> +	int i, unmap_timeout = 0;
> +
> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> +			i = 0;
> +			unmap_timeout++;
> +			msleep(1000);
> +			if (unmap_timeout > 9 &&
> +				net_ratelimit())
> +				netdev_err(vif->dev,
> +					"Page still granted! Index: %x\n", i);
> +		}
> +	}
> +
> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
> +

If some pages are stuck and you just free them will it cause Dom0 to
crash? I mean, if those pages are recycled by other balloon page users.

Even if it will not cause Dom0 to crash, will it leak any resource in
Dom0? At plain sight it looks like at least grant table entry is leaked,
isn't it? We need to be careful about this because a malicious might be
able to DoS Dom0 with resource leakage.

>  	netif_napi_del(&vif->napi);
>  
>  	unregister_netdev(vif->dev);
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 3ddc474..20352be 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
>  			  struct xen_netif_tx_request *txp, RING_IDX end)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
> +	unsigned long flags;
>  
>  	do {
> +		spin_lock_irqsave(&vif->response_lock, flags);
>  		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
> +		spin_unlock_irqrestore(&vif->response_lock, flags);

You only hold the lock for one function call is this intentional?

>  		if (cons == end)
>  			break;
>  		txp = RING_GET_REQUEST(&vif->tx, cons++);
> @@ -786,10 +789,10 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
>  
>  }
>  
> -static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
> +static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  					       struct sk_buff *skb,
>  					       struct xen_netif_tx_request *txp,
> -					       struct gnttab_copy *gop)
> +					       struct gnttab_map_grant_ref *gop)
>  {
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
>  	skb_frag_t *frags = shinfo->frags;
> @@ -810,83 +813,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
>  	/* Skip first skb fragment if it is on same page as header fragment. */
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
[...]
>  		if (checksum_setup(vif, skb)) {
>  			netdev_dbg(vif->dev,
>  				   "Can't setup checksum in net_tx_action\n");
> +			if (skb_shinfo(skb)->destructor_arg)
> +				skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>  			kfree_skb(skb);
>  			continue;
>  		}
> @@ -1601,6 +1559,14 @@ static int xenvif_tx_submit(struct xenvif *vif)
>  
>  		work_done++;
>  
> +		/* Set this flag right before netif_receive_skb, otherwise
> +		 * someone might think this packet already left netback, and
> +		 * do a skb_copy_ubufs while we are still in control of the
> +		 * skb. E.g. the __pskb_pull_tail earlier can do such thing.
> +		 */
> +		if (skb_shinfo(skb)->destructor_arg)
> +			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> +

This is really tricky. :-P

>  		netif_receive_skb(skb);
>  	}
>  
> @@ -1711,7 +1677,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
>  int xenvif_tx_action(struct xenvif *vif, int budget)
>  {
>  	unsigned nr_gops;
> -	int work_done;
> +	int work_done, ret;
>  
>  	if (unlikely(!tx_work_todo(vif)))
>  		return 0;
> @@ -1721,7 +1687,13 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
>  	if (nr_gops == 0)
>  		return 0;
>  
> -	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
> +	if (nr_gops) {

Surely you can remove this "if". At this point nr_gops cannot be zero --
see two lines above.

> +		ret = gnttab_map_refs(vif->tx_map_ops,
> +			NULL,
> +			vif->pages_to_map,
> +			nr_gops);
> +		BUG_ON(ret);
> +	}
>  
>  	work_done = xenvif_tx_submit(vif);
>  
> @@ -1732,61 +1704,37 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
>  			       u8 status)
>  {
[...]
>  
>  void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
>  {
>  	int ret;
> +	struct gnttab_unmap_grant_ref tx_unmap_op;
> +
>  	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
>  		netdev_err(vif->dev,
>  				"Trying to unmap invalid handle! pending_idx: %x\n",
>  				pending_idx);
>  		return;
>  	}
> -	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
> +	gnttab_set_unmap_op(&tx_unmap_op,
>  			idx_to_kaddr(vif, pending_idx),
>  			GNTMAP_host_map,
>  			vif->grant_tx_handle[pending_idx]);
> -	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
> +	ret = gnttab_unmap_refs(&tx_unmap_op,
>  			NULL,
>  			&vif->mmap_pages[pending_idx],
>  			1);

This change should be squashed to patch 1. Or as I suggested the changes
in patch 1 should be moved here.

> @@ -1845,7 +1793,6 @@ static inline int rx_work_todo(struct xenvif *vif)
>  
>  static inline int tx_work_todo(struct xenvif *vif)
>  {
> -

Stray blank line change.

Wei.

>  	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
>  	    (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
>  	     < MAX_PENDING_REQS))

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-12 23:48   ` Zoltan Kiss
  (?)
@ 2013-12-13 15:36   ` Wei Liu
  -1 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:36 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
> This patch changes the grant copy on the TX patch to grant mapping
> 
> v2:
> - delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
                                                      ^ PKT_PROT_LEN
>   request
> - mark the effect of using ballooned pages in a comment
> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>   before netif_receive_skb, and mark the importance of it
> - grab dealloc_lock before __napi_complete to avoid contention with the
>   callback's napi_schedule
> - handle fragmented packets where first request < PKT_PROT_LINE
                                                    ^ PKT_PROT_LEN
> - fix up error path when checksum_setup failed
> - check before teardown for pending grants, and start complain if they are
>   there after 10 second
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
[...]
>  void xenvif_free(struct xenvif *vif)
>  {
> +	int i, unmap_timeout = 0;
> +
> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> +			i = 0;
> +			unmap_timeout++;
> +			msleep(1000);
> +			if (unmap_timeout > 9 &&
> +				net_ratelimit())
> +				netdev_err(vif->dev,
> +					"Page still granted! Index: %x\n", i);
> +		}
> +	}
> +
> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
> +

If some pages are stuck and you just free them will it cause Dom0 to
crash? I mean, if those pages are recycled by other balloon page users.

Even if it will not cause Dom0 to crash, will it leak any resource in
Dom0? At plain sight it looks like at least grant table entry is leaked,
isn't it? We need to be careful about this because a malicious might be
able to DoS Dom0 with resource leakage.

>  	netif_napi_del(&vif->napi);
>  
>  	unregister_netdev(vif->dev);
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 3ddc474..20352be 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
>  			  struct xen_netif_tx_request *txp, RING_IDX end)
>  {
>  	RING_IDX cons = vif->tx.req_cons;
> +	unsigned long flags;
>  
>  	do {
> +		spin_lock_irqsave(&vif->response_lock, flags);
>  		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
> +		spin_unlock_irqrestore(&vif->response_lock, flags);

You only hold the lock for one function call is this intentional?

>  		if (cons == end)
>  			break;
>  		txp = RING_GET_REQUEST(&vif->tx, cons++);
> @@ -786,10 +789,10 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
>  
>  }
>  
> -static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
> +static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  					       struct sk_buff *skb,
>  					       struct xen_netif_tx_request *txp,
> -					       struct gnttab_copy *gop)
> +					       struct gnttab_map_grant_ref *gop)
>  {
>  	struct skb_shared_info *shinfo = skb_shinfo(skb);
>  	skb_frag_t *frags = shinfo->frags;
> @@ -810,83 +813,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
>  	/* Skip first skb fragment if it is on same page as header fragment. */
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
[...]
>  		if (checksum_setup(vif, skb)) {
>  			netdev_dbg(vif->dev,
>  				   "Can't setup checksum in net_tx_action\n");
> +			if (skb_shinfo(skb)->destructor_arg)
> +				skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>  			kfree_skb(skb);
>  			continue;
>  		}
> @@ -1601,6 +1559,14 @@ static int xenvif_tx_submit(struct xenvif *vif)
>  
>  		work_done++;
>  
> +		/* Set this flag right before netif_receive_skb, otherwise
> +		 * someone might think this packet already left netback, and
> +		 * do a skb_copy_ubufs while we are still in control of the
> +		 * skb. E.g. the __pskb_pull_tail earlier can do such thing.
> +		 */
> +		if (skb_shinfo(skb)->destructor_arg)
> +			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> +

This is really tricky. :-P

>  		netif_receive_skb(skb);
>  	}
>  
> @@ -1711,7 +1677,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
>  int xenvif_tx_action(struct xenvif *vif, int budget)
>  {
>  	unsigned nr_gops;
> -	int work_done;
> +	int work_done, ret;
>  
>  	if (unlikely(!tx_work_todo(vif)))
>  		return 0;
> @@ -1721,7 +1687,13 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
>  	if (nr_gops == 0)
>  		return 0;
>  
> -	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
> +	if (nr_gops) {

Surely you can remove this "if". At this point nr_gops cannot be zero --
see two lines above.

> +		ret = gnttab_map_refs(vif->tx_map_ops,
> +			NULL,
> +			vif->pages_to_map,
> +			nr_gops);
> +		BUG_ON(ret);
> +	}
>  
>  	work_done = xenvif_tx_submit(vif);
>  
> @@ -1732,61 +1704,37 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
>  			       u8 status)
>  {
[...]
>  
>  void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
>  {
>  	int ret;
> +	struct gnttab_unmap_grant_ref tx_unmap_op;
> +
>  	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
>  		netdev_err(vif->dev,
>  				"Trying to unmap invalid handle! pending_idx: %x\n",
>  				pending_idx);
>  		return;
>  	}
> -	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
> +	gnttab_set_unmap_op(&tx_unmap_op,
>  			idx_to_kaddr(vif, pending_idx),
>  			GNTMAP_host_map,
>  			vif->grant_tx_handle[pending_idx]);
> -	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
> +	ret = gnttab_unmap_refs(&tx_unmap_op,
>  			NULL,
>  			&vif->mmap_pages[pending_idx],
>  			1);

This change should be squashed to patch 1. Or as I suggested the changes
in patch 1 should be moved here.

> @@ -1845,7 +1793,6 @@ static inline int rx_work_todo(struct xenvif *vif)
>  
>  static inline int tx_work_todo(struct xenvif *vif)
>  {
> -

Stray blank line change.

Wei.

>  	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
>  	    (nr_pending_reqs(vif) + XEN_NETBK_LEGACY_SLOTS_MAX
>  	     < MAX_PENDING_REQS))

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-12 23:48 ` Zoltan Kiss
  2013-12-13 15:43   ` Wei Liu
@ 2013-12-13 15:43   ` Wei Liu
  2013-12-16 16:10     ` Zoltan Kiss
  2013-12-16 16:10     ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:43 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, Dec 12, 2013 at 11:48:14PM +0000, Zoltan Kiss wrote:
> Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
> handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
> - create a new skb
> - map the leftover slots to its frags (no linear buffer here!)
> - chain it to the previous through skb_shinfo(skb)->frag_list
> - map them
> - copy the whole stuff into a brand new skb and send it to the stack
> - unmap the 2 old skb's pages
> 

Do you see performance regression with this approach?

> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 
> ---
>  drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index e26cdda..f6ed1c8 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  	u16 pending_idx = *((u16 *)skb->data);
>  	int start;
>  	pending_ring_idx_t index;
> -	unsigned int nr_slots;
> +	unsigned int nr_slots, frag_overflow = 0;
>  
>  	/* At this point shinfo->nr_frags is in fact the number of
>  	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>  	 */
> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
> +		shinfo->nr_frags = MAX_SKB_FRAGS;
> +	}
>  	nr_slots = shinfo->nr_frags;
>  

It is also probably better to check whether shinfo->nr_frags is too
large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
already be valid at this point but it wouldn't hurt to be more careful.

>  	/* Skip first skb fragment if it is on same page as header fragment. */
> @@ -926,6 +930,33 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  
>  	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>  
> +	if (frag_overflow) {
> +		struct sk_buff *nskb = alloc_skb(NET_SKB_PAD + NET_IP_ALIGN,
> +				GFP_ATOMIC | __GFP_NOWARN);
> +		if (unlikely(nskb == NULL)) {
> +			netdev_err(vif->dev,
> +				   "Can't allocate the frag_list skb.\n");
> +			return NULL;
> +		}
> +
> +		/* Packets passed to netif_rx() must have some headroom. */
> +		skb_reserve(nskb, NET_SKB_PAD + NET_IP_ALIGN);
> +

The code to call alloc_skb and skb_reserve is copied from other
location. I would like to have a dedicated function to allocate skb in
netback if possible.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-13 15:43   ` Wei Liu
  2013-12-13 15:43   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:43 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Thu, Dec 12, 2013 at 11:48:14PM +0000, Zoltan Kiss wrote:
> Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
> handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
> - create a new skb
> - map the leftover slots to its frags (no linear buffer here!)
> - chain it to the previous through skb_shinfo(skb)->frag_list
> - map them
> - copy the whole stuff into a brand new skb and send it to the stack
> - unmap the 2 old skb's pages
> 

Do you see performance regression with this approach?

> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> 
> ---
>  drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
>  1 file changed, 94 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index e26cdda..f6ed1c8 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  	u16 pending_idx = *((u16 *)skb->data);
>  	int start;
>  	pending_ring_idx_t index;
> -	unsigned int nr_slots;
> +	unsigned int nr_slots, frag_overflow = 0;
>  
>  	/* At this point shinfo->nr_frags is in fact the number of
>  	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>  	 */
> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
> +		shinfo->nr_frags = MAX_SKB_FRAGS;
> +	}
>  	nr_slots = shinfo->nr_frags;
>  

It is also probably better to check whether shinfo->nr_frags is too
large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
already be valid at this point but it wouldn't hurt to be more careful.

>  	/* Skip first skb fragment if it is on same page as header fragment. */
> @@ -926,6 +930,33 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  
>  	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>  
> +	if (frag_overflow) {
> +		struct sk_buff *nskb = alloc_skb(NET_SKB_PAD + NET_IP_ALIGN,
> +				GFP_ATOMIC | __GFP_NOWARN);
> +		if (unlikely(nskb == NULL)) {
> +			netdev_err(vif->dev,
> +				   "Can't allocate the frag_list skb.\n");
> +			return NULL;
> +		}
> +
> +		/* Packets passed to netif_rx() must have some headroom. */
> +		skb_reserve(nskb, NET_SKB_PAD + NET_IP_ALIGN);
> +

The code to call alloc_skb and skb_reserve is copied from other
location. I would like to have a dedicated function to allocate skb in
netback if possible.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-12 23:48 ` [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
@ 2013-12-13 15:44   ` Wei Liu
  2013-12-16 17:16     ` Zoltan Kiss
  2013-12-16 17:16     ` Zoltan Kiss
  2013-12-13 15:44   ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:44 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, Dec 12, 2013 at 11:48:16PM +0000, Zoltan Kiss wrote:
> A malicious or buggy guest can leave its queue filled indefinitely, in which
> case qdisc start to queue packets for that VIF. If those packets came from an
> another guest, it can block its slots and prevent shutdown. To avoid that, we
> make sure the queue is drained in every 10 seconds
> 

Oh I see where the 10 second constraint in previous patch comes from.

Could you define a macro for this constant then use it everywhere.

> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
[...]
> +static void xenvif_wake_queue(unsigned long data)
> +{
> +	struct xenvif *vif = (struct xenvif *)data;
> +
> +	netdev_err(vif->dev, "timer fires\n");

What timer? This error message needs to be more specific.

> +	if (netif_queue_stopped(vif->dev)) {
> +		netdev_err(vif->dev, "draining TX queue\n");
> +		netif_wake_queue(vif->dev);
> +	}
> +}
> +
>  static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct xenvif *vif = netdev_priv(dev);
> @@ -141,8 +152,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	 * then turn off the queue to give the ring a chance to
>  	 * drain.
>  	 */
> -	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
> +	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
> +		vif->wake_queue.function = xenvif_wake_queue;
> +		vif->wake_queue.data = (unsigned long)vif;
>  		xenvif_stop_queue(vif);
> +		mod_timer(&vif->wake_queue,
> +			jiffies + rx_drain_timeout_jiffies);
> +	}
>  

Do you need to use jiffies_64 instead of jiffies?

This timer is only armed when ring is full. So what happens when the
ring is not full and some other parts of the system holds on to the
packets forever? Can this happen?

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-12 23:48 ` [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
  2013-12-13 15:44   ` Wei Liu
@ 2013-12-13 15:44   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:44 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Thu, Dec 12, 2013 at 11:48:16PM +0000, Zoltan Kiss wrote:
> A malicious or buggy guest can leave its queue filled indefinitely, in which
> case qdisc start to queue packets for that VIF. If those packets came from an
> another guest, it can block its slots and prevent shutdown. To avoid that, we
> make sure the queue is drained in every 10 seconds
> 

Oh I see where the 10 second constraint in previous patch comes from.

Could you define a macro for this constant then use it everywhere.

> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
[...]
> +static void xenvif_wake_queue(unsigned long data)
> +{
> +	struct xenvif *vif = (struct xenvif *)data;
> +
> +	netdev_err(vif->dev, "timer fires\n");

What timer? This error message needs to be more specific.

> +	if (netif_queue_stopped(vif->dev)) {
> +		netdev_err(vif->dev, "draining TX queue\n");
> +		netif_wake_queue(vif->dev);
> +	}
> +}
> +
>  static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  {
>  	struct xenvif *vif = netdev_priv(dev);
> @@ -141,8 +152,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	 * then turn off the queue to give the ring a chance to
>  	 * drain.
>  	 */
> -	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
> +	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
> +		vif->wake_queue.function = xenvif_wake_queue;
> +		vif->wake_queue.data = (unsigned long)vif;
>  		xenvif_stop_queue(vif);
> +		mod_timer(&vif->wake_queue,
> +			jiffies + rx_drain_timeout_jiffies);
> +	}
>  

Do you need to use jiffies_64 instead of jiffies?

This timer is only armed when ring is full. So what happens when the
ring is not full and some other parts of the system holds on to the
packets forever? Can this happen?

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations
  2013-12-12 23:48 ` [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
  2013-12-13 15:44   ` Wei Liu
@ 2013-12-13 15:44   ` Wei Liu
  2013-12-16 16:30     ` Zoltan Kiss
  2013-12-16 16:30     ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:44 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, Dec 12, 2013 at 11:48:17PM +0000, Zoltan Kiss wrote:
> Unmapping causes TLB flushing, therefore we should make it in the largest
> possible batches. However we shouldn't starve the guest for too long. So if
> the guest has space for at least two big packets and we don't have at least a
> quarter ring to unmap, delay it for at most 1 milisec.
> 

Is this solution temporary or permanent? If it is permanent would it
make sense to make these parameter tunable?

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations
  2013-12-12 23:48 ` [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
@ 2013-12-13 15:44   ` Wei Liu
  2013-12-13 15:44   ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 15:44 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Thu, Dec 12, 2013 at 11:48:17PM +0000, Zoltan Kiss wrote:
> Unmapping causes TLB flushing, therefore we should make it in the largest
> possible batches. However we shouldn't starve the guest for too long. So if
> the guest has space for at least two big packets and we don't have at least a
> quarter ring to unmap, delay it for at most 1 milisec.
> 

Is this solution temporary or permanent? If it is permanent would it
make sense to make these parameter tunable?

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-13 15:31   ` Wei Liu
  2013-12-13 18:22     ` Zoltan Kiss
@ 2013-12-13 18:22     ` Zoltan Kiss
  2013-12-13 19:14       ` Wei Liu
  2013-12-13 19:14       ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-13 18:22 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 13/12/13 15:31, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:09PM +0000, Zoltan Kiss wrote:
>> This patch contains the new definitions necessary for grant mapping.
>>
>> v2:
>> - move unmapping to separate thread. The NAPI instance has to be scheduled
>>    even from thread context, which can cause huge delays
>> - that causes unfortunately bigger struct xenvif
>> - store grant handle after checking validity
>>
>
> If the size of xenvif really becomes a problem, you can try to make
> sratch space like struct gnttab_copy per-cpu. The downside is that
> approach requires much coding and carefully guard against race
> conditions. You would need to consider cost v.s. benefit.

I mentioned this because for the first series I had comments that I 
should be more vigilant about this. At that time there was a problem 
with struct xenvif allocation which was solved by now. My quick 
calculation showed this patch will increase the size with ~15kb

>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>>
>> ---
> [...]
>>   #define XENVIF_QUEUE_LENGTH 32
>>   #define XENVIF_NAPI_WEIGHT  64
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index c1b7a42..3ddc474 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
>>   	return page;
>>   }
>>
>> +static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
>> +	       struct xen_netif_tx_request *txp,
>> +	       struct gnttab_map_grant_ref *gop)
>> +{
>> +	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
>> +	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
>> +			  GNTMAP_host_map | GNTMAP_readonly,
>> +			  txp->gref, vif->domid);
>> +
>> +	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
>> +	       sizeof(*txp));
>> +
>> +}
>> +
>
> This helper function is not used until next patch. Probably you can move
> it to the second patch.
>
> The same applies to other helper functions as well. Move them to the
> patch they are used. It would be easier for people to review.
I just moved them here because the second patch is already huge, and I 
couldn't have an idea to splice it up while keeping it bisectable and 
logically consistent. As I mentioned, I welcome ideas about that.

>
>>   static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
>>   					       struct sk_buff *skb,
>>   					       struct xen_netif_tx_request *txp,
>> @@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
>>   	return work_done;
>>   }
>>
>> +void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>> +{
>
> Do we care about zerocopy_success? I don't see it used in this function.
It will be used in the 5th patch. Anyway, it's in the definition of the 
zerocopy callback.

>
>> +	unsigned long flags;
>> +	pending_ring_idx_t index;
>> +	u16 pending_idx = ubuf->desc;
>> +	struct pending_tx_info *temp =
>> +		container_of(ubuf, struct pending_tx_info, callback_struct);
>> +	struct xenvif *vif =
>> +		container_of(temp - pending_idx, struct xenvif,
>> +			pending_tx_info[0]);
>> +
>
> The third parameter to container_of should be the name of the member
> within the struct.
Here we have the pending_idx, so we get a pointer for the holding struct 
pending_tx_info, then for the beginning of pending_tx_info (temp - 
pending_idx), and then to the struct xenvif. It's a bit tricky and not 
straightforward, I admit :)

>
>> +	spin_lock_irqsave(&vif->dealloc_lock, flags);
>> +	do {
>> +		pending_idx = ubuf->desc;
>> +		ubuf = (struct ubuf_info *) ubuf->ctx;
>> +		index = pending_index(vif->dealloc_prod);
>> +		vif->dealloc_ring[index] = pending_idx;
>> +		/* Sync with xenvif_tx_action_dealloc:
>> +		 * insert idx then incr producer.
>> +		 */
>> +		smp_wmb();
>> +		vif->dealloc_prod++;
>> +	} while (ubuf);
>> +	wake_up(&vif->dealloc_wq);
>> +	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
>> +}
>> +
>> +static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
>> +{
>> +	struct gnttab_unmap_grant_ref *gop;
>> +	pending_ring_idx_t dc, dp;
>> +	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
>> +	unsigned int i = 0;
>> +
>> +	dc = vif->dealloc_cons;
>> +	gop = vif->tx_unmap_ops;
>> +
>> +	/* Free up any grants we have finished using */
>> +	do {
>> +		dp = vif->dealloc_prod;
>> +
>> +		/* Ensure we see all indices enqueued by netif_idx_release(). */
>
> There is no netif_idx_release in netback code. :-)
Oh yes, that's from the classic code, it should be 
xenvif_zerocopy_callback. I will fix it.

>
>> +		smp_rmb();
>> +
>> +		while (dc != dp) {
>> +			pending_idx =
>> +				vif->dealloc_ring[pending_index(dc++)];
>> +
>> +			/* Already unmapped? */
>> +			if (vif->grant_tx_handle[pending_idx] ==
>> +				NETBACK_INVALID_HANDLE) {
>> +				netdev_err(vif->dev,
>> +					"Trying to unmap invalid handle! "
>> +					"pending_idx: %x\n", pending_idx);
>> +				continue;
>> +			}
>
> Should this be BUG_ON? AIUI this kthread should be the only one doing
> unmap, right?
The NAPI instance can do it as well if it is a small packet fits into 
PKT_PROT_LEN. But still this scenario shouldn't really happen, I was 
just not sure we have to crash immediately. Maybe handle it as a fatal 
error and destroy the vif?



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-13 15:31   ` Wei Liu
@ 2013-12-13 18:22     ` Zoltan Kiss
  2013-12-13 18:22     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-13 18:22 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 13/12/13 15:31, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:09PM +0000, Zoltan Kiss wrote:
>> This patch contains the new definitions necessary for grant mapping.
>>
>> v2:
>> - move unmapping to separate thread. The NAPI instance has to be scheduled
>>    even from thread context, which can cause huge delays
>> - that causes unfortunately bigger struct xenvif
>> - store grant handle after checking validity
>>
>
> If the size of xenvif really becomes a problem, you can try to make
> sratch space like struct gnttab_copy per-cpu. The downside is that
> approach requires much coding and carefully guard against race
> conditions. You would need to consider cost v.s. benefit.

I mentioned this because for the first series I had comments that I 
should be more vigilant about this. At that time there was a problem 
with struct xenvif allocation which was solved by now. My quick 
calculation showed this patch will increase the size with ~15kb

>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>>
>> ---
> [...]
>>   #define XENVIF_QUEUE_LENGTH 32
>>   #define XENVIF_NAPI_WEIGHT  64
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index c1b7a42..3ddc474 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
>>   	return page;
>>   }
>>
>> +static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
>> +	       struct xen_netif_tx_request *txp,
>> +	       struct gnttab_map_grant_ref *gop)
>> +{
>> +	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
>> +	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
>> +			  GNTMAP_host_map | GNTMAP_readonly,
>> +			  txp->gref, vif->domid);
>> +
>> +	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
>> +	       sizeof(*txp));
>> +
>> +}
>> +
>
> This helper function is not used until next patch. Probably you can move
> it to the second patch.
>
> The same applies to other helper functions as well. Move them to the
> patch they are used. It would be easier for people to review.
I just moved them here because the second patch is already huge, and I 
couldn't have an idea to splice it up while keeping it bisectable and 
logically consistent. As I mentioned, I welcome ideas about that.

>
>>   static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
>>   					       struct sk_buff *skb,
>>   					       struct xen_netif_tx_request *txp,
>> @@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
>>   	return work_done;
>>   }
>>
>> +void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>> +{
>
> Do we care about zerocopy_success? I don't see it used in this function.
It will be used in the 5th patch. Anyway, it's in the definition of the 
zerocopy callback.

>
>> +	unsigned long flags;
>> +	pending_ring_idx_t index;
>> +	u16 pending_idx = ubuf->desc;
>> +	struct pending_tx_info *temp =
>> +		container_of(ubuf, struct pending_tx_info, callback_struct);
>> +	struct xenvif *vif =
>> +		container_of(temp - pending_idx, struct xenvif,
>> +			pending_tx_info[0]);
>> +
>
> The third parameter to container_of should be the name of the member
> within the struct.
Here we have the pending_idx, so we get a pointer for the holding struct 
pending_tx_info, then for the beginning of pending_tx_info (temp - 
pending_idx), and then to the struct xenvif. It's a bit tricky and not 
straightforward, I admit :)

>
>> +	spin_lock_irqsave(&vif->dealloc_lock, flags);
>> +	do {
>> +		pending_idx = ubuf->desc;
>> +		ubuf = (struct ubuf_info *) ubuf->ctx;
>> +		index = pending_index(vif->dealloc_prod);
>> +		vif->dealloc_ring[index] = pending_idx;
>> +		/* Sync with xenvif_tx_action_dealloc:
>> +		 * insert idx then incr producer.
>> +		 */
>> +		smp_wmb();
>> +		vif->dealloc_prod++;
>> +	} while (ubuf);
>> +	wake_up(&vif->dealloc_wq);
>> +	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
>> +}
>> +
>> +static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
>> +{
>> +	struct gnttab_unmap_grant_ref *gop;
>> +	pending_ring_idx_t dc, dp;
>> +	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
>> +	unsigned int i = 0;
>> +
>> +	dc = vif->dealloc_cons;
>> +	gop = vif->tx_unmap_ops;
>> +
>> +	/* Free up any grants we have finished using */
>> +	do {
>> +		dp = vif->dealloc_prod;
>> +
>> +		/* Ensure we see all indices enqueued by netif_idx_release(). */
>
> There is no netif_idx_release in netback code. :-)
Oh yes, that's from the classic code, it should be 
xenvif_zerocopy_callback. I will fix it.

>
>> +		smp_rmb();
>> +
>> +		while (dc != dp) {
>> +			pending_idx =
>> +				vif->dealloc_ring[pending_index(dc++)];
>> +
>> +			/* Already unmapped? */
>> +			if (vif->grant_tx_handle[pending_idx] ==
>> +				NETBACK_INVALID_HANDLE) {
>> +				netdev_err(vif->dev,
>> +					"Trying to unmap invalid handle! "
>> +					"pending_idx: %x\n", pending_idx);
>> +				continue;
>> +			}
>
> Should this be BUG_ON? AIUI this kthread should be the only one doing
> unmap, right?
The NAPI instance can do it as well if it is a small packet fits into 
PKT_PROT_LEN. But still this scenario shouldn't really happen, I was 
just not sure we have to crash immediately. Maybe handle it as a fatal 
error and destroy the vif?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-13 18:22     ` Zoltan Kiss
  2013-12-13 19:14       ` Wei Liu
@ 2013-12-13 19:14       ` Wei Liu
  2013-12-16 15:21         ` Zoltan Kiss
  2013-12-16 15:21         ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 19:14 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Fri, Dec 13, 2013 at 06:22:38PM +0000, Zoltan Kiss wrote:
> On 13/12/13 15:31, Wei Liu wrote:
> >On Thu, Dec 12, 2013 at 11:48:09PM +0000, Zoltan Kiss wrote:
> >>This patch contains the new definitions necessary for grant mapping.
> >>
> >>v2:
> >>- move unmapping to separate thread. The NAPI instance has to be scheduled
> >>   even from thread context, which can cause huge delays
> >>- that causes unfortunately bigger struct xenvif
> >>- store grant handle after checking validity
> >>
> >
> >If the size of xenvif really becomes a problem, you can try to make
> >sratch space like struct gnttab_copy per-cpu. The downside is that
> >approach requires much coding and carefully guard against race
> >conditions. You would need to consider cost v.s. benefit.
> 
> I mentioned this because for the first series I had comments that I
> should be more vigilant about this. At that time there was a problem
> with struct xenvif allocation which was solved by now. My quick
> calculation showed this patch will increase the size with ~15kb
> 

15kb doesn't seem a lot. And the fragmentation problem causing
allocation failure was fixed. So I guess this won't be a problem.

> >
> >>Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> >>
> >>---
> >[...]
> >>  #define XENVIF_QUEUE_LENGTH 32
> >>  #define XENVIF_NAPI_WEIGHT  64
> >>diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> >>index c1b7a42..3ddc474 100644
> >>--- a/drivers/net/xen-netback/netback.c
> >>+++ b/drivers/net/xen-netback/netback.c
> >>@@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
> >>  	return page;
> >>  }
> >>
> >>+static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
> >>+	       struct xen_netif_tx_request *txp,
> >>+	       struct gnttab_map_grant_ref *gop)
> >>+{
> >>+	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
> >>+	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
> >>+			  GNTMAP_host_map | GNTMAP_readonly,
> >>+			  txp->gref, vif->domid);
> >>+
> >>+	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
> >>+	       sizeof(*txp));
> >>+
> >>+}
> >>+
> >
> >This helper function is not used until next patch. Probably you can move
> >it to the second patch.
> >
> >The same applies to other helper functions as well. Move them to the
> >patch they are used. It would be easier for people to review.
> I just moved them here because the second patch is already huge, and
> I couldn't have an idea to splice it up while keeping it bisectable
> and logically consistent. As I mentioned, I welcome ideas about
> that.
> 

My personal idea would be for a new feature it is not the size of a
patch matters most, it's the logic matters most. So I'm fine with long
consolidated patches. The incremental changes you made makes the patches
hard to review IMHO.

That's merely my personal opinion. Let's see what other people say.

> >
> >>  static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
> >>  					       struct sk_buff *skb,
> >>  					       struct xen_netif_tx_request *txp,
> >>@@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
> >>  	return work_done;
> >>  }
> >>
> >>+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
> >>+{
> >
> >Do we care about zerocopy_success? I don't see it used in this function.
> It will be used in the 5th patch. Anyway, it's in the definition of
> the zerocopy callback.
> 

Oh right, it is actually used. Then you should rearrange you series.
Move the patches to add stats before this one, then provide a single
patch for functions instead of making incremental changes.

> >
> >>+	unsigned long flags;
> >>+	pending_ring_idx_t index;
> >>+	u16 pending_idx = ubuf->desc;
> >>+	struct pending_tx_info *temp =
> >>+		container_of(ubuf, struct pending_tx_info, callback_struct);
> >>+	struct xenvif *vif =
> >>+		container_of(temp - pending_idx, struct xenvif,
> >>+			pending_tx_info[0]);
> >>+
> >
> >The third parameter to container_of should be the name of the member
> >within the struct.
> Here we have the pending_idx, so we get a pointer for the holding
> struct pending_tx_info, then for the beginning of pending_tx_info
> (temp - pending_idx), and then to the struct xenvif. It's a bit
> tricky and not straightforward, I admit :)
> 

Well, macro trick. :-)

> >
> >>+	spin_lock_irqsave(&vif->dealloc_lock, flags);
> >>+	do {
> >>+		pending_idx = ubuf->desc;
> >>+		ubuf = (struct ubuf_info *) ubuf->ctx;
> >>+		index = pending_index(vif->dealloc_prod);
> >>+		vif->dealloc_ring[index] = pending_idx;
> >>+		/* Sync with xenvif_tx_action_dealloc:
> >>+		 * insert idx then incr producer.
> >>+		 */
> >>+		smp_wmb();
> >>+		vif->dealloc_prod++;
> >>+	} while (ubuf);
> >>+	wake_up(&vif->dealloc_wq);
> >>+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
[...]
> >
> >>+		smp_rmb();
> >>+
> >>+		while (dc != dp) {
> >>+			pending_idx =
> >>+				vif->dealloc_ring[pending_index(dc++)];
> >>+
> >>+			/* Already unmapped? */
> >>+			if (vif->grant_tx_handle[pending_idx] ==
> >>+				NETBACK_INVALID_HANDLE) {
> >>+				netdev_err(vif->dev,
> >>+					"Trying to unmap invalid handle! "
> >>+					"pending_idx: %x\n", pending_idx);
> >>+				continue;
> >>+			}
> >
> >Should this be BUG_ON? AIUI this kthread should be the only one doing
> >unmap, right?
> The NAPI instance can do it as well if it is a small packet fits
> into PKT_PROT_LEN. But still this scenario shouldn't really happen,
> I was just not sure we have to crash immediately. Maybe handle it as
> a fatal error and destroy the vif?
> 

It depends. If this is within the trust boundary, i.e. everything at the
stage should have been sanitized then we should BUG_ON because there's
clearly a bug somewhere in the sanitization process, or in the
interaction of various backend routines.

Wei.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-13 18:22     ` Zoltan Kiss
@ 2013-12-13 19:14       ` Wei Liu
  2013-12-13 19:14       ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-13 19:14 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, Wei Liu, ian.campbell, netdev, linux-kernel, xen-devel

On Fri, Dec 13, 2013 at 06:22:38PM +0000, Zoltan Kiss wrote:
> On 13/12/13 15:31, Wei Liu wrote:
> >On Thu, Dec 12, 2013 at 11:48:09PM +0000, Zoltan Kiss wrote:
> >>This patch contains the new definitions necessary for grant mapping.
> >>
> >>v2:
> >>- move unmapping to separate thread. The NAPI instance has to be scheduled
> >>   even from thread context, which can cause huge delays
> >>- that causes unfortunately bigger struct xenvif
> >>- store grant handle after checking validity
> >>
> >
> >If the size of xenvif really becomes a problem, you can try to make
> >sratch space like struct gnttab_copy per-cpu. The downside is that
> >approach requires much coding and carefully guard against race
> >conditions. You would need to consider cost v.s. benefit.
> 
> I mentioned this because for the first series I had comments that I
> should be more vigilant about this. At that time there was a problem
> with struct xenvif allocation which was solved by now. My quick
> calculation showed this patch will increase the size with ~15kb
> 

15kb doesn't seem a lot. And the fragmentation problem causing
allocation failure was fixed. So I guess this won't be a problem.

> >
> >>Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> >>
> >>---
> >[...]
> >>  #define XENVIF_QUEUE_LENGTH 32
> >>  #define XENVIF_NAPI_WEIGHT  64
> >>diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> >>index c1b7a42..3ddc474 100644
> >>--- a/drivers/net/xen-netback/netback.c
> >>+++ b/drivers/net/xen-netback/netback.c
> >>@@ -772,6 +772,20 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
> >>  	return page;
> >>  }
> >>
> >>+static inline void xenvif_tx_create_gop(struct xenvif *vif, u16 pending_idx,
> >>+	       struct xen_netif_tx_request *txp,
> >>+	       struct gnttab_map_grant_ref *gop)
> >>+{
> >>+	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
> >>+	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
> >>+			  GNTMAP_host_map | GNTMAP_readonly,
> >>+			  txp->gref, vif->domid);
> >>+
> >>+	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
> >>+	       sizeof(*txp));
> >>+
> >>+}
> >>+
> >
> >This helper function is not used until next patch. Probably you can move
> >it to the second patch.
> >
> >The same applies to other helper functions as well. Move them to the
> >patch they are used. It would be easier for people to review.
> I just moved them here because the second patch is already huge, and
> I couldn't have an idea to splice it up while keeping it bisectable
> and logically consistent. As I mentioned, I welcome ideas about
> that.
> 

My personal idea would be for a new feature it is not the size of a
patch matters most, it's the logic matters most. So I'm fine with long
consolidated patches. The incremental changes you made makes the patches
hard to review IMHO.

That's merely my personal opinion. Let's see what other people say.

> >
> >>  static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
> >>  					       struct sk_buff *skb,
> >>  					       struct xen_netif_tx_request *txp,
> >>@@ -1593,6 +1607,106 @@ static int xenvif_tx_submit(struct xenvif *vif)
> >>  	return work_done;
> >>  }
> >>
> >>+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
> >>+{
> >
> >Do we care about zerocopy_success? I don't see it used in this function.
> It will be used in the 5th patch. Anyway, it's in the definition of
> the zerocopy callback.
> 

Oh right, it is actually used. Then you should rearrange you series.
Move the patches to add stats before this one, then provide a single
patch for functions instead of making incremental changes.

> >
> >>+	unsigned long flags;
> >>+	pending_ring_idx_t index;
> >>+	u16 pending_idx = ubuf->desc;
> >>+	struct pending_tx_info *temp =
> >>+		container_of(ubuf, struct pending_tx_info, callback_struct);
> >>+	struct xenvif *vif =
> >>+		container_of(temp - pending_idx, struct xenvif,
> >>+			pending_tx_info[0]);
> >>+
> >
> >The third parameter to container_of should be the name of the member
> >within the struct.
> Here we have the pending_idx, so we get a pointer for the holding
> struct pending_tx_info, then for the beginning of pending_tx_info
> (temp - pending_idx), and then to the struct xenvif. It's a bit
> tricky and not straightforward, I admit :)
> 

Well, macro trick. :-)

> >
> >>+	spin_lock_irqsave(&vif->dealloc_lock, flags);
> >>+	do {
> >>+		pending_idx = ubuf->desc;
> >>+		ubuf = (struct ubuf_info *) ubuf->ctx;
> >>+		index = pending_index(vif->dealloc_prod);
> >>+		vif->dealloc_ring[index] = pending_idx;
> >>+		/* Sync with xenvif_tx_action_dealloc:
> >>+		 * insert idx then incr producer.
> >>+		 */
> >>+		smp_wmb();
> >>+		vif->dealloc_prod++;
> >>+	} while (ubuf);
> >>+	wake_up(&vif->dealloc_wq);
> >>+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
[...]
> >
> >>+		smp_rmb();
> >>+
> >>+		while (dc != dp) {
> >>+			pending_idx =
> >>+				vif->dealloc_ring[pending_index(dc++)];
> >>+
> >>+			/* Already unmapped? */
> >>+			if (vif->grant_tx_handle[pending_idx] ==
> >>+				NETBACK_INVALID_HANDLE) {
> >>+				netdev_err(vif->dev,
> >>+					"Trying to unmap invalid handle! "
> >>+					"pending_idx: %x\n", pending_idx);
> >>+				continue;
> >>+			}
> >
> >Should this be BUG_ON? AIUI this kthread should be the only one doing
> >unmap, right?
> The NAPI instance can do it as well if it is a small packet fits
> into PKT_PROT_LEN. But still this scenario shouldn't really happen,
> I was just not sure we have to crash immediately. Maybe handle it as
> a fatal error and destroy the vif?
> 

It depends. If this is within the trust boundary, i.e. everything at the
stage should have been sanitized then we should BUG_ON because there's
clearly a bug somewhere in the sanitization process, or in the
interaction of various backend routines.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (16 preceding siblings ...)
  2013-12-16  6:32 ` [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy annie li
@ 2013-12-16  6:32 ` annie li
  2013-12-16 16:13   ` Zoltan Kiss
  2013-12-16 16:13   ` Zoltan Kiss
  17 siblings, 2 replies; 76+ messages in thread
From: annie li @ 2013-12-16  6:32 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies


On 2013/12/13 7:48, Zoltan Kiss wrote:
> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series [1] tried to solve this
> problem, however it seems to be very invasive on the network stack's code,
> and therefore haven't progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up. That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)
Sounds good.
Is the TX throughput gotten between one vm and one bare metal box? or 
between multiple vms and bare metal? Do you have any test results with 
netperf?

Thanks
Annie
> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
> luckily it doesn't cause a major regression for this usecase. In the future
> we should try to eliminate that copy somehow.
> There are a few spinoff tasks which will be addressed in separate patches:
> - grant copy the header directly instead of map and memcpy. This should help
>    us avoiding TLB flushing
> - use something else than ballooned pages
> - fix grant map to use page->index properly
> I will run some more extensive tests, but some basic XenRT tests were already
> passed with good results.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1: Introduce TX grant map definitions
> 2: Change TX path from grant copy to mapping
> 3: Remove old TX grant copy definitons and fix indentations
> 4: Change RX path for mapped SKB fragments
> 5: Add stat counters for zerocopy
> 6: Handle guests with too many frags
> 7: Add stat counters for frag_list skbs
> 8: Timeout packets in RX path
> 9: Aggregate TX unmap operations
>
> v2: I've fixed some smaller things, see the individual patches. I've added a
> few new stat counters, and handling the important use case when an older guest
> sends lots of slots. Instead of delayed copy now we timeout packets on the RX
> path, based on the assumption that otherwise packets should get stucked
> anywhere else. Finally some unmap batching to avoid too much TLB flush
>
> [1] http://lwn.net/Articles/491522/
> [2] https://lkml.org/lkml/2012/7/20/363
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (15 preceding siblings ...)
  2013-12-12 23:48 ` Zoltan Kiss
@ 2013-12-16  6:32 ` annie li
  2013-12-16  6:32 ` [Xen-devel] " annie li
  17 siblings, 0 replies; 76+ messages in thread
From: annie li @ 2013-12-16  6:32 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel


On 2013/12/13 7:48, Zoltan Kiss wrote:
> A long known problem of the upstream netback implementation that on the TX
> path (from guest to Dom0) it copies the whole packet from guest memory into
> Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
> huge perfomance penalty. The classic kernel version of netback used grant
> mapping, and to get notified when the page can be unmapped, it used page
> destructors. Unfortunately that destructor is not an upstreamable solution.
> Ian Campbell's skb fragment destructor patch series [1] tried to solve this
> problem, however it seems to be very invasive on the network stack's code,
> and therefore haven't progressed very well.
> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
> know when the skb is freed up. That is the way KVM solved the same problem,
> and based on my initial tests it can do the same for us. Avoiding the extra
> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
> Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
> running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
> switch)
Sounds good.
Is the TX throughput gotten between one vm and one bare metal box? or 
between multiple vms and bare metal? Do you have any test results with 
netperf?

Thanks
Annie
> Based on my investigations the packet get only copied if it is delivered to
> Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
> luckily it doesn't cause a major regression for this usecase. In the future
> we should try to eliminate that copy somehow.
> There are a few spinoff tasks which will be addressed in separate patches:
> - grant copy the header directly instead of map and memcpy. This should help
>    us avoiding TLB flushing
> - use something else than ballooned pages
> - fix grant map to use page->index properly
> I will run some more extensive tests, but some basic XenRT tests were already
> passed with good results.
> I've tried to broke it down to smaller patches, with mixed results, so I
> welcome suggestions on that part as well:
> 1: Introduce TX grant map definitions
> 2: Change TX path from grant copy to mapping
> 3: Remove old TX grant copy definitons and fix indentations
> 4: Change RX path for mapped SKB fragments
> 5: Add stat counters for zerocopy
> 6: Handle guests with too many frags
> 7: Add stat counters for frag_list skbs
> 8: Timeout packets in RX path
> 9: Aggregate TX unmap operations
>
> v2: I've fixed some smaller things, see the individual patches. I've added a
> few new stat counters, and handling the important use case when an older guest
> sends lots of slots. Instead of delayed copy now we timeout packets on the RX
> path, based on the assumption that otherwise packets should get stucked
> anywhere else. Finally some unmap batching to avoid too much TLB flush
>
> [1] http://lwn.net/Articles/491522/
> [2] https://lkml.org/lkml/2012/7/20/363
>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-13 19:14       ` Wei Liu
  2013-12-16 15:21         ` Zoltan Kiss
@ 2013-12-16 15:21         ` Zoltan Kiss
  2013-12-16 17:50           ` Wei Liu
  2013-12-16 17:50           ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 15:21 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 13/12/13 19:14, Wei Liu wrote:
>>>> +	spin_lock_irqsave(&vif->dealloc_lock, flags);
>>>> > >>+	do {
>>>> > >>+		pending_idx = ubuf->desc;
>>>> > >>+		ubuf = (struct ubuf_info *) ubuf->ctx;
>>>> > >>+		index = pending_index(vif->dealloc_prod);
>>>> > >>+		vif->dealloc_ring[index] = pending_idx;
>>>> > >>+		/* Sync with xenvif_tx_action_dealloc:
>>>> > >>+		 * insert idx then incr producer.
>>>> > >>+		 */
>>>> > >>+		smp_wmb();
>>>> > >>+		vif->dealloc_prod++;
>>>> > >>+	} while (ubuf);
>>>> > >>+	wake_up(&vif->dealloc_wq);
>>>> > >>+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
> [...]
>>> > >
>>>> > >>+		smp_rmb();
>>>> > >>+
>>>> > >>+		while (dc != dp) {
>>>> > >>+			pending_idx =
>>>> > >>+				vif->dealloc_ring[pending_index(dc++)];
>>>> > >>+
>>>> > >>+			/* Already unmapped? */
>>>> > >>+			if (vif->grant_tx_handle[pending_idx] ==
>>>> > >>+				NETBACK_INVALID_HANDLE) {
>>>> > >>+				netdev_err(vif->dev,
>>>> > >>+					"Trying to unmap invalid handle! "
>>>> > >>+					"pending_idx: %x\n", pending_idx);
>>>> > >>+				continue;
>>>> > >>+			}
>>> > >
>>> > >Should this be BUG_ON? AIUI this kthread should be the only one doing
>>> > >unmap, right?
>> >The NAPI instance can do it as well if it is a small packet fits
>> >into PKT_PROT_LEN. But still this scenario shouldn't really happen,
>> >I was just not sure we have to crash immediately. Maybe handle it as
>> >a fatal error and destroy the vif?
>> >
> It depends. If this is within the trust boundary, i.e. everything at the
> stage should have been sanitized then we should BUG_ON because there's
> clearly a bug somewhere in the sanitization process, or in the
> interaction of various backend routines.

My understanding is that crashing should be avoided if we can bail out 
somehow. At this point there is clearly a bug in netback somewhere, 
something unmapped that page before it should have happened, or at least 
that array get corrupted somehow. However there is a chance that 
xenvif_fatal_tx_err() can contain the issue, and the rest of the system 
can go unaffected.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-13 19:14       ` Wei Liu
@ 2013-12-16 15:21         ` Zoltan Kiss
  2013-12-16 15:21         ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 15:21 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 13/12/13 19:14, Wei Liu wrote:
>>>> +	spin_lock_irqsave(&vif->dealloc_lock, flags);
>>>> > >>+	do {
>>>> > >>+		pending_idx = ubuf->desc;
>>>> > >>+		ubuf = (struct ubuf_info *) ubuf->ctx;
>>>> > >>+		index = pending_index(vif->dealloc_prod);
>>>> > >>+		vif->dealloc_ring[index] = pending_idx;
>>>> > >>+		/* Sync with xenvif_tx_action_dealloc:
>>>> > >>+		 * insert idx then incr producer.
>>>> > >>+		 */
>>>> > >>+		smp_wmb();
>>>> > >>+		vif->dealloc_prod++;
>>>> > >>+	} while (ubuf);
>>>> > >>+	wake_up(&vif->dealloc_wq);
>>>> > >>+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
> [...]
>>> > >
>>>> > >>+		smp_rmb();
>>>> > >>+
>>>> > >>+		while (dc != dp) {
>>>> > >>+			pending_idx =
>>>> > >>+				vif->dealloc_ring[pending_index(dc++)];
>>>> > >>+
>>>> > >>+			/* Already unmapped? */
>>>> > >>+			if (vif->grant_tx_handle[pending_idx] ==
>>>> > >>+				NETBACK_INVALID_HANDLE) {
>>>> > >>+				netdev_err(vif->dev,
>>>> > >>+					"Trying to unmap invalid handle! "
>>>> > >>+					"pending_idx: %x\n", pending_idx);
>>>> > >>+				continue;
>>>> > >>+			}
>>> > >
>>> > >Should this be BUG_ON? AIUI this kthread should be the only one doing
>>> > >unmap, right?
>> >The NAPI instance can do it as well if it is a small packet fits
>> >into PKT_PROT_LEN. But still this scenario shouldn't really happen,
>> >I was just not sure we have to crash immediately. Maybe handle it as
>> >a fatal error and destroy the vif?
>> >
> It depends. If this is within the trust boundary, i.e. everything at the
> stage should have been sanitized then we should BUG_ON because there's
> clearly a bug somewhere in the sanitization process, or in the
> interaction of various backend routines.

My understanding is that crashing should be avoided if we can bail out 
somehow. At this point there is clearly a bug in netback somewhere, 
something unmapped that page before it should have happened, or at least 
that array get corrupted somehow. However there is a chance that 
xenvif_fatal_tx_err() can contain the issue, and the rest of the system 
can go unaffected.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-13 15:36   ` Wei Liu
@ 2013-12-16 15:38     ` Zoltan Kiss
  2013-12-16 18:21       ` Wei Liu
  2013-12-16 18:21       ` Wei Liu
  2013-12-16 15:38     ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 15:38 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 13/12/13 15:36, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
>> This patch changes the grant copy on the TX patch to grant mapping
>>
>> v2:
>> - delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
>                                                        ^ PKT_PROT_LEN
>>    request
>> - mark the effect of using ballooned pages in a comment
>> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>>    before netif_receive_skb, and mark the importance of it
>> - grab dealloc_lock before __napi_complete to avoid contention with the
>>    callback's napi_schedule
>> - handle fragmented packets where first request < PKT_PROT_LINE
>                                                      ^ PKT_PROT_LEN
Oh, some dyskleksia of mine, I will fix that :)

>> - fix up error path when checksum_setup failed
>> - check before teardown for pending grants, and start complain if they are
>>    there after 10 second
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---
> [...]
>>   void xenvif_free(struct xenvif *vif)
>>   {
>> +	int i, unmap_timeout = 0;
>> +
>> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
>> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>> +			i = 0;
>> +			unmap_timeout++;
>> +			msleep(1000);
>> +			if (unmap_timeout > 9 &&
>> +				net_ratelimit())
>> +				netdev_err(vif->dev,
>> +					"Page still granted! Index: %x\n", i);
>> +		}
>> +	}
>> +
>> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
>> +
>
> If some pages are stuck and you just free them will it cause Dom0 to
> crash? I mean, if those pages are recycled by other balloon page users.
>
> Even if it will not cause Dom0 to crash, will it leak any resource in
> Dom0? At plain sight it looks like at least grant table entry is leaked,
> isn't it? We need to be careful about this because a malicious might be
> able to DoS Dom0 with resource leakage.
Yes, if we call free_xenballooned_pages while something is still mapped, 
Xen kills Dom0 because balloon driver tries to touch the PTE of a grant 
mapped page. That's why we make sure before that everything is unmapped, 
and repeat an error message if it's not. I'm afraid we can't do anything 
better here, that means a serious netback bug.
But a malicious guest cannot take advantage of this unless it's find a 
way to screw up netback's internal bookkeeping. Then it can block here 
indefinitely the teardown of the VIF, and it's associated resources.

>
>>   	netif_napi_del(&vif->napi);
>>
>>   	unregister_netdev(vif->dev);
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 3ddc474..20352be 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
>>   			  struct xen_netif_tx_request *txp, RING_IDX end)
>>   {
>>   	RING_IDX cons = vif->tx.req_cons;
>> +	unsigned long flags;
>>
>>   	do {
>> +		spin_lock_irqsave(&vif->response_lock, flags);
>>   		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
>> +		spin_unlock_irqrestore(&vif->response_lock, flags);
>
> You only hold the lock for one function call is this intentional?
Yes, make_tx_response can be called from xenvif_tx_err or 
xenvif_idx_release, and they can be called from the NAPI instance and 
the dealloc thread. (xenvif_tx_err only from NAPI)

>
>>   		netif_receive_skb(skb);
>>   	}
>>
>> @@ -1711,7 +1677,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
>>   int xenvif_tx_action(struct xenvif *vif, int budget)
>>   {
>>   	unsigned nr_gops;
>> -	int work_done;
>> +	int work_done, ret;
>>
>>   	if (unlikely(!tx_work_todo(vif)))
>>   		return 0;
>> @@ -1721,7 +1687,13 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
>>   	if (nr_gops == 0)
>>   		return 0;
>>
>> -	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
>> +	if (nr_gops) {
>
> Surely you can remove this "if". At this point nr_gops cannot be zero --
> see two lines above.

>>   void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
>>   {
>>   	int ret;
>> +	struct gnttab_unmap_grant_ref tx_unmap_op;
>> +
>>   	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
>>   		netdev_err(vif->dev,
>>   				"Trying to unmap invalid handle! pending_idx: %x\n",
>>   				pending_idx);
>>   		return;
>>   	}
>> -	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
>> +	gnttab_set_unmap_op(&tx_unmap_op,
>>   			idx_to_kaddr(vif, pending_idx),
>>   			GNTMAP_host_map,
>>   			vif->grant_tx_handle[pending_idx]);
>> -	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
>> +	ret = gnttab_unmap_refs(&tx_unmap_op,
>>   			NULL,
>>   			&vif->mmap_pages[pending_idx],
>>   			1);
>
> This change should be squashed to patch 1. Or as I suggested the changes
> in patch 1 should be moved here.
>
>> @@ -1845,7 +1793,6 @@ static inline int rx_work_todo(struct xenvif *vif)
>>
>>   static inline int tx_work_todo(struct xenvif *vif)
>>   {
>> -
>
> Stray blank line change.
Agreed on previous 3 comments, I will apply them.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-13 15:36   ` Wei Liu
  2013-12-16 15:38     ` Zoltan Kiss
@ 2013-12-16 15:38     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 15:38 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 13/12/13 15:36, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
>> This patch changes the grant copy on the TX patch to grant mapping
>>
>> v2:
>> - delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
>                                                        ^ PKT_PROT_LEN
>>    request
>> - mark the effect of using ballooned pages in a comment
>> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>>    before netif_receive_skb, and mark the importance of it
>> - grab dealloc_lock before __napi_complete to avoid contention with the
>>    callback's napi_schedule
>> - handle fragmented packets where first request < PKT_PROT_LINE
>                                                      ^ PKT_PROT_LEN
Oh, some dyskleksia of mine, I will fix that :)

>> - fix up error path when checksum_setup failed
>> - check before teardown for pending grants, and start complain if they are
>>    there after 10 second
>>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---
> [...]
>>   void xenvif_free(struct xenvif *vif)
>>   {
>> +	int i, unmap_timeout = 0;
>> +
>> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
>> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>> +			i = 0;
>> +			unmap_timeout++;
>> +			msleep(1000);
>> +			if (unmap_timeout > 9 &&
>> +				net_ratelimit())
>> +				netdev_err(vif->dev,
>> +					"Page still granted! Index: %x\n", i);
>> +		}
>> +	}
>> +
>> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
>> +
>
> If some pages are stuck and you just free them will it cause Dom0 to
> crash? I mean, if those pages are recycled by other balloon page users.
>
> Even if it will not cause Dom0 to crash, will it leak any resource in
> Dom0? At plain sight it looks like at least grant table entry is leaked,
> isn't it? We need to be careful about this because a malicious might be
> able to DoS Dom0 with resource leakage.
Yes, if we call free_xenballooned_pages while something is still mapped, 
Xen kills Dom0 because balloon driver tries to touch the PTE of a grant 
mapped page. That's why we make sure before that everything is unmapped, 
and repeat an error message if it's not. I'm afraid we can't do anything 
better here, that means a serious netback bug.
But a malicious guest cannot take advantage of this unless it's find a 
way to screw up netback's internal bookkeeping. Then it can block here 
indefinitely the teardown of the VIF, and it's associated resources.

>
>>   	netif_napi_del(&vif->napi);
>>
>>   	unregister_netdev(vif->dev);
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 3ddc474..20352be 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
>>   			  struct xen_netif_tx_request *txp, RING_IDX end)
>>   {
>>   	RING_IDX cons = vif->tx.req_cons;
>> +	unsigned long flags;
>>
>>   	do {
>> +		spin_lock_irqsave(&vif->response_lock, flags);
>>   		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
>> +		spin_unlock_irqrestore(&vif->response_lock, flags);
>
> You only hold the lock for one function call is this intentional?
Yes, make_tx_response can be called from xenvif_tx_err or 
xenvif_idx_release, and they can be called from the NAPI instance and 
the dealloc thread. (xenvif_tx_err only from NAPI)

>
>>   		netif_receive_skb(skb);
>>   	}
>>
>> @@ -1711,7 +1677,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
>>   int xenvif_tx_action(struct xenvif *vif, int budget)
>>   {
>>   	unsigned nr_gops;
>> -	int work_done;
>> +	int work_done, ret;
>>
>>   	if (unlikely(!tx_work_todo(vif)))
>>   		return 0;
>> @@ -1721,7 +1687,13 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
>>   	if (nr_gops == 0)
>>   		return 0;
>>
>> -	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
>> +	if (nr_gops) {
>
> Surely you can remove this "if". At this point nr_gops cannot be zero --
> see two lines above.

>>   void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
>>   {
>>   	int ret;
>> +	struct gnttab_unmap_grant_ref tx_unmap_op;
>> +
>>   	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
>>   		netdev_err(vif->dev,
>>   				"Trying to unmap invalid handle! pending_idx: %x\n",
>>   				pending_idx);
>>   		return;
>>   	}
>> -	gnttab_set_unmap_op(&vif->tx_unmap_ops[0],
>> +	gnttab_set_unmap_op(&tx_unmap_op,
>>   			idx_to_kaddr(vif, pending_idx),
>>   			GNTMAP_host_map,
>>   			vif->grant_tx_handle[pending_idx]);
>> -	ret = gnttab_unmap_refs(vif->tx_unmap_ops,
>> +	ret = gnttab_unmap_refs(&tx_unmap_op,
>>   			NULL,
>>   			&vif->mmap_pages[pending_idx],
>>   			1);
>
> This change should be squashed to patch 1. Or as I suggested the changes
> in patch 1 should be moved here.
>
>> @@ -1845,7 +1793,6 @@ static inline int rx_work_todo(struct xenvif *vif)
>>
>>   static inline int tx_work_todo(struct xenvif *vif)
>>   {
>> -
>
> Stray blank line change.
Agreed on previous 3 comments, I will apply them.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-13 15:43   ` Wei Liu
  2013-12-16 16:10     ` Zoltan Kiss
@ 2013-12-16 16:10     ` Zoltan Kiss
  2013-12-16 18:09       ` Wei Liu
  2013-12-16 18:09       ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:10 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 13/12/13 15:43, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:14PM +0000, Zoltan Kiss wrote:
>> Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
>> handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
>> - create a new skb
>> - map the leftover slots to its frags (no linear buffer here!)
>> - chain it to the previous through skb_shinfo(skb)->frag_list
>> - map them
>> - copy the whole stuff into a brand new skb and send it to the stack
>> - unmap the 2 old skb's pages
>>
>
> Do you see performance regression with this approach?
Well, it was pretty hard to reproduce that behaviour even with NFS. I 
don't think it happens often enough that it causes a noticable 
performance regression. Anyway, it would be just as slow as the current 
grant copy with coalescing, maybe a bit slower due to the unmapping. But 
at least we use a core network function to do the coalescing.
Or, if you mean the generic performance, if this problem doesn't appear, 
then no, I don't see performance regression.

>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>>
>> ---
>>   drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
>>   1 file changed, 94 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index e26cdda..f6ed1c8 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>   	u16 pending_idx = *((u16 *)skb->data);
>>   	int start;
>>   	pending_ring_idx_t index;
>> -	unsigned int nr_slots;
>> +	unsigned int nr_slots, frag_overflow = 0;
>>
>>   	/* At this point shinfo->nr_frags is in fact the number of
>>   	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>>   	 */
>> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
>> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
>> +		shinfo->nr_frags = MAX_SKB_FRAGS;
>> +	}
>>   	nr_slots = shinfo->nr_frags;
>>
>
> It is also probably better to check whether shinfo->nr_frags is too
> large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
> already be valid at this point but it wouldn't hurt to be more careful.
Ok, I've added this:
	/* At this point shinfo->nr_frags is in fact the number of
	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
	 */
+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
+		if (shinfo->nr_frags > XEN_NETBK_LEGACY_SLOTS_MAX) return NULL;
+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;


>
>>   	/* Skip first skb fragment if it is on same page as header fragment. */
>> @@ -926,6 +930,33 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>
>>   	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>>
>> +	if (frag_overflow) {
>> +		struct sk_buff *nskb = alloc_skb(NET_SKB_PAD + NET_IP_ALIGN,
>> +				GFP_ATOMIC | __GFP_NOWARN);
>> +		if (unlikely(nskb == NULL)) {
>> +			netdev_err(vif->dev,
>> +				   "Can't allocate the frag_list skb.\n");
>> +			return NULL;
>> +		}
>> +
>> +		/* Packets passed to netif_rx() must have some headroom. */
>> +		skb_reserve(nskb, NET_SKB_PAD + NET_IP_ALIGN);
>> +
>
> The code to call alloc_skb and skb_reserve is copied from other
> location. I would like to have a dedicated function to allocate skb in
> netback if possible.
OK


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-13 15:43   ` Wei Liu
@ 2013-12-16 16:10     ` Zoltan Kiss
  2013-12-16 16:10     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:10 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 13/12/13 15:43, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:14PM +0000, Zoltan Kiss wrote:
>> Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
>> handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
>> - create a new skb
>> - map the leftover slots to its frags (no linear buffer here!)
>> - chain it to the previous through skb_shinfo(skb)->frag_list
>> - map them
>> - copy the whole stuff into a brand new skb and send it to the stack
>> - unmap the 2 old skb's pages
>>
>
> Do you see performance regression with this approach?
Well, it was pretty hard to reproduce that behaviour even with NFS. I 
don't think it happens often enough that it causes a noticable 
performance regression. Anyway, it would be just as slow as the current 
grant copy with coalescing, maybe a bit slower due to the unmapping. But 
at least we use a core network function to do the coalescing.
Or, if you mean the generic performance, if this problem doesn't appear, 
then no, I don't see performance regression.

>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>>
>> ---
>>   drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
>>   1 file changed, 94 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index e26cdda..f6ed1c8 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>   	u16 pending_idx = *((u16 *)skb->data);
>>   	int start;
>>   	pending_ring_idx_t index;
>> -	unsigned int nr_slots;
>> +	unsigned int nr_slots, frag_overflow = 0;
>>
>>   	/* At this point shinfo->nr_frags is in fact the number of
>>   	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>>   	 */
>> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
>> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
>> +		shinfo->nr_frags = MAX_SKB_FRAGS;
>> +	}
>>   	nr_slots = shinfo->nr_frags;
>>
>
> It is also probably better to check whether shinfo->nr_frags is too
> large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
> already be valid at this point but it wouldn't hurt to be more careful.
Ok, I've added this:
	/* At this point shinfo->nr_frags is in fact the number of
	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
	 */
+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
+		if (shinfo->nr_frags > XEN_NETBK_LEGACY_SLOTS_MAX) return NULL;
+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;


>
>>   	/* Skip first skb fragment if it is on same page as header fragment. */
>> @@ -926,6 +930,33 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>
>>   	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>>
>> +	if (frag_overflow) {
>> +		struct sk_buff *nskb = alloc_skb(NET_SKB_PAD + NET_IP_ALIGN,
>> +				GFP_ATOMIC | __GFP_NOWARN);
>> +		if (unlikely(nskb == NULL)) {
>> +			netdev_err(vif->dev,
>> +				   "Can't allocate the frag_list skb.\n");
>> +			return NULL;
>> +		}
>> +
>> +		/* Packets passed to netif_rx() must have some headroom. */
>> +		skb_reserve(nskb, NET_SKB_PAD + NET_IP_ALIGN);
>> +
>
> The code to call alloc_skb and skb_reserve is copied from other
> location. I would like to have a dedicated function to allocate skb in
> netback if possible.
OK

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2013-12-16  6:32 ` [Xen-devel] " annie li
@ 2013-12-16 16:13   ` Zoltan Kiss
  2013-12-16 16:13   ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:13 UTC (permalink / raw)
  To: annie li
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/12/13 06:32, annie li wrote:
>
> On 2013/12/13 7:48, Zoltan Kiss wrote:
>> A long known problem of the upstream netback implementation that on
>> the TX
>> path (from guest to Dom0) it copies the whole packet from guest memory
>> into
>> Dom0. That simply became a bottleneck with 10Gb NICs, and generally
>> it's a
>> huge perfomance penalty. The classic kernel version of netback used grant
>> mapping, and to get notified when the page can be unmapped, it used page
>> destructors. Unfortunately that destructor is not an upstreamable
>> solution.
>> Ian Campbell's skb fragment destructor patch series [1] tried to solve
>> this
>> problem, however it seems to be very invasive on the network stack's
>> code,
>> and therefore haven't progressed very well.
>> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it
>> needs to
>> know when the skb is freed up. That is the way KVM solved the same
>> problem,
>> and based on my initial tests it can do the same for us. Avoiding the
>> extra
>> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
>> Interlagos box, both Dom0 and guest on upstream kernel, on the same
>> NUMA node,
>> running iperf 2.0.5, and the remote end was a bare metal box on the
>> same 10Gb
>> switch)
> Sounds good.
> Is the TX throughput gotten between one vm and one bare metal box? or
> between multiple vms and bare metal? Do you have any test results with
> netperf?
One VM and a bare metal box. I've used only iperf.

Regards,

Zoli


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2013-12-16  6:32 ` [Xen-devel] " annie li
  2013-12-16 16:13   ` Zoltan Kiss
@ 2013-12-16 16:13   ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:13 UTC (permalink / raw)
  To: annie li
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On 16/12/13 06:32, annie li wrote:
>
> On 2013/12/13 7:48, Zoltan Kiss wrote:
>> A long known problem of the upstream netback implementation that on
>> the TX
>> path (from guest to Dom0) it copies the whole packet from guest memory
>> into
>> Dom0. That simply became a bottleneck with 10Gb NICs, and generally
>> it's a
>> huge perfomance penalty. The classic kernel version of netback used grant
>> mapping, and to get notified when the page can be unmapped, it used page
>> destructors. Unfortunately that destructor is not an upstreamable
>> solution.
>> Ian Campbell's skb fragment destructor patch series [1] tried to solve
>> this
>> problem, however it seems to be very invasive on the network stack's
>> code,
>> and therefore haven't progressed very well.
>> This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it
>> needs to
>> know when the skb is freed up. That is the way KVM solved the same
>> problem,
>> and based on my initial tests it can do the same for us. Avoiding the
>> extra
>> copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
>> Interlagos box, both Dom0 and guest on upstream kernel, on the same
>> NUMA node,
>> running iperf 2.0.5, and the remote end was a bare metal box on the
>> same 10Gb
>> switch)
> Sounds good.
> Is the TX throughput gotten between one vm and one bare metal box? or
> between multiple vms and bare metal? Do you have any test results with
> netperf?
One VM and a bare metal box. I've used only iperf.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations
  2013-12-13 15:44   ` Wei Liu
  2013-12-16 16:30     ` Zoltan Kiss
@ 2013-12-16 16:30     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:30 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 13/12/13 15:44, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:17PM +0000, Zoltan Kiss wrote:
>> Unmapping causes TLB flushing, therefore we should make it in the largest
>> possible batches. However we shouldn't starve the guest for too long. So if
>> the guest has space for at least two big packets and we don't have at least a
>> quarter ring to unmap, delay it for at most 1 milisec.
>>
>
> Is this solution temporary or permanent? If it is permanent would it
> make sense to make these parameter tunable?

Well, I'm not entirely sure yet this is the best way to do this, so in 
this sense it's temporary. But generally we should do some sort of 
batching, as TLB flush cannot be avoided every time. If we settle on 
something we should make the tunable parameters tunable.
The problem is that it is a thin red line we should find here. My first 
approach was that I left the tx_dealloc_work_todo as it was, and after 
the thread woke up but before anything were done I made it sleep for 50 
ns and measured  how fast the guest is running out of free slots:

if (kthread_should_stop())
	break;

+i=0;
+do {
+	++i;
+	prev_free_slots = nr_free_slots(&vif->tx);
+	__set_current_state(TASK_UNINTERRUPTIBLE);
+	rc = schedule_hrtimeout_range(&tx_dealloc_delay_ktime, 10, 
HRTIMER_MODE_REL);
+	if (rc) trace_printk("%s sleep were interrupted! %d\n",vif->dev->name, 
rc);
+	curr_free_slots = nr_free_slots(&vif->tx);
+} while ( (curr_free_slots < 4 * (prev_free_slots - curr_free_slots)) 
&& i < 11);
+
xenvif_tx_dealloc_action(vif);

And worst case after 500 ns I let the thread to do the unmap anyway. But 
I was a bit worried about this approach, so I choose a bit more 
conservative one for this patch.

There are also ideas to use some other instrument for unmapping instead 
of the current separate thread approach. Putting it into the NAPI 
instance was the original idea, which caused problems. Placing it into 
the another thread where RX work happens also doesn't sound too good, 
these things can and should happen in parallel.
Other ideas were work queues and tasklets, I'll spend some more time to 
check if they are feasible.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations
  2013-12-13 15:44   ` Wei Liu
@ 2013-12-16 16:30     ` Zoltan Kiss
  2013-12-16 16:30     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 16:30 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 13/12/13 15:44, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:17PM +0000, Zoltan Kiss wrote:
>> Unmapping causes TLB flushing, therefore we should make it in the largest
>> possible batches. However we shouldn't starve the guest for too long. So if
>> the guest has space for at least two big packets and we don't have at least a
>> quarter ring to unmap, delay it for at most 1 milisec.
>>
>
> Is this solution temporary or permanent? If it is permanent would it
> make sense to make these parameter tunable?

Well, I'm not entirely sure yet this is the best way to do this, so in 
this sense it's temporary. But generally we should do some sort of 
batching, as TLB flush cannot be avoided every time. If we settle on 
something we should make the tunable parameters tunable.
The problem is that it is a thin red line we should find here. My first 
approach was that I left the tx_dealloc_work_todo as it was, and after 
the thread woke up but before anything were done I made it sleep for 50 
ns and measured  how fast the guest is running out of free slots:

if (kthread_should_stop())
	break;

+i=0;
+do {
+	++i;
+	prev_free_slots = nr_free_slots(&vif->tx);
+	__set_current_state(TASK_UNINTERRUPTIBLE);
+	rc = schedule_hrtimeout_range(&tx_dealloc_delay_ktime, 10, 
HRTIMER_MODE_REL);
+	if (rc) trace_printk("%s sleep were interrupted! %d\n",vif->dev->name, 
rc);
+	curr_free_slots = nr_free_slots(&vif->tx);
+} while ( (curr_free_slots < 4 * (prev_free_slots - curr_free_slots)) 
&& i < 11);
+
xenvif_tx_dealloc_action(vif);

And worst case after 500 ns I let the thread to do the unmap anyway. But 
I was a bit worried about this approach, so I choose a bit more 
conservative one for this patch.

There are also ideas to use some other instrument for unmapping instead 
of the current separate thread approach. Putting it into the NAPI 
instance was the original idea, which caused problems. Placing it into 
the another thread where RX work happens also doesn't sound too good, 
these things can and should happen in parallel.
Other ideas were work queues and tasklets, I'll spend some more time to 
check if they are feasible.

Regards,

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-13 15:44   ` Wei Liu
@ 2013-12-16 17:16     ` Zoltan Kiss
  2013-12-16 19:03       ` Wei Liu
  2013-12-16 19:03       ` Wei Liu
  2013-12-16 17:16     ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 17:16 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 13/12/13 15:44, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:16PM +0000, Zoltan Kiss wrote:
>> A malicious or buggy guest can leave its queue filled indefinitely, in which
>> case qdisc start to queue packets for that VIF. If those packets came from an
>> another guest, it can block its slots and prevent shutdown. To avoid that, we
>> make sure the queue is drained in every 10 seconds
>>
>
> Oh I see where the 10 second constraint in previous patch comes from.
>
> Could you define a macro for this constant then use it everywhere.
Well, they are not entirely the same thing, but worth making them the 
same. How about using "unmap_timeout > (rx_drain_timeout_msecs/1000)" in 
xenvif_free()? Then netback won't complain about a stucked page if an 
another guest is permitted to hold on to it.

>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---
> [...]
>> +static void xenvif_wake_queue(unsigned long data)
>> +{
>> +	struct xenvif *vif = (struct xenvif *)data;
>> +
>> +	netdev_err(vif->dev, "timer fires\n");
>
> What timer? This error message needs to be more specific.
I forgot to remove this, I used it for debugging only. The other message 
2 line below is more important

>
>> +	if (netif_queue_stopped(vif->dev)) {
>> +		netdev_err(vif->dev, "draining TX queue\n");
>> +		netif_wake_queue(vif->dev);
>> +	}
>> +}
>> +
>>   static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct xenvif *vif = netdev_priv(dev);
>> @@ -141,8 +152,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   	 * then turn off the queue to give the ring a chance to
>>   	 * drain.
>>   	 */
>> -	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
>> +	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
>> +		vif->wake_queue.function = xenvif_wake_queue;
>> +		vif->wake_queue.data = (unsigned long)vif;
>>   		xenvif_stop_queue(vif);
>> +		mod_timer(&vif->wake_queue,
>> +			jiffies + rx_drain_timeout_jiffies);
>> +	}
>>
>
> Do you need to use jiffies_64 instead of jiffies?
Well, we don't use time_after_eq here, just set the timer. AFAIK that 
should be OK.

> This timer is only armed when ring is full. So what happens when the
> ring is not full and some other parts of the system holds on to the
> packets forever? Can this happen?
This timer is not to protect the receiving guest, but to protect the 
sender. If the ring is not full, then netback will put the packet there 
and release the skb back.
This patch is to replace delayed copy from classic kernel times. There 
we handled this problem on the sender side: after a timer expired we 
made a local copy of the packet and released back the pages. It had 
stronger guarantees that a guest will always get back its pages, but it 
also caused more unnecessary copies when the system is already loaded 
and we should really thrash the packet. Unfortunately we can't do that 
as the sender is no longer in control.
Instead I choose this more lightweight solution, because practice said 
an another guest's queue is the only place where the packet can get 
stucked, especially if that guest is malicious, buggy, or too slow.
Other parts (e.g. a driver) can also hold on the packet if they are 
buggy, but then we should fix that bug rather than feed it with more 
guest pages.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-13 15:44   ` Wei Liu
  2013-12-16 17:16     ` Zoltan Kiss
@ 2013-12-16 17:16     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 17:16 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 13/12/13 15:44, Wei Liu wrote:
> On Thu, Dec 12, 2013 at 11:48:16PM +0000, Zoltan Kiss wrote:
>> A malicious or buggy guest can leave its queue filled indefinitely, in which
>> case qdisc start to queue packets for that VIF. If those packets came from an
>> another guest, it can block its slots and prevent shutdown. To avoid that, we
>> make sure the queue is drained in every 10 seconds
>>
>
> Oh I see where the 10 second constraint in previous patch comes from.
>
> Could you define a macro for this constant then use it everywhere.
Well, they are not entirely the same thing, but worth making them the 
same. How about using "unmap_timeout > (rx_drain_timeout_msecs/1000)" in 
xenvif_free()? Then netback won't complain about a stucked page if an 
another guest is permitted to hold on to it.

>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---
> [...]
>> +static void xenvif_wake_queue(unsigned long data)
>> +{
>> +	struct xenvif *vif = (struct xenvif *)data;
>> +
>> +	netdev_err(vif->dev, "timer fires\n");
>
> What timer? This error message needs to be more specific.
I forgot to remove this, I used it for debugging only. The other message 
2 line below is more important

>
>> +	if (netif_queue_stopped(vif->dev)) {
>> +		netdev_err(vif->dev, "draining TX queue\n");
>> +		netif_wake_queue(vif->dev);
>> +	}
>> +}
>> +
>>   static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   {
>>   	struct xenvif *vif = netdev_priv(dev);
>> @@ -141,8 +152,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   	 * then turn off the queue to give the ring a chance to
>>   	 * drain.
>>   	 */
>> -	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
>> +	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
>> +		vif->wake_queue.function = xenvif_wake_queue;
>> +		vif->wake_queue.data = (unsigned long)vif;
>>   		xenvif_stop_queue(vif);
>> +		mod_timer(&vif->wake_queue,
>> +			jiffies + rx_drain_timeout_jiffies);
>> +	}
>>
>
> Do you need to use jiffies_64 instead of jiffies?
Well, we don't use time_after_eq here, just set the timer. AFAIK that 
should be OK.

> This timer is only armed when ring is full. So what happens when the
> ring is not full and some other parts of the system holds on to the
> packets forever? Can this happen?
This timer is not to protect the receiving guest, but to protect the 
sender. If the ring is not full, then netback will put the packet there 
and release the skb back.
This patch is to replace delayed copy from classic kernel times. There 
we handled this problem on the sender side: after a timer expired we 
made a local copy of the packet and released back the pages. It had 
stronger guarantees that a guest will always get back its pages, but it 
also caused more unnecessary copies when the system is already loaded 
and we should really thrash the packet. Unfortunately we can't do that 
as the sender is no longer in control.
Instead I choose this more lightweight solution, because practice said 
an another guest's queue is the only place where the packet can get 
stucked, especially if that guest is malicious, buggy, or too slow.
Other parts (e.g. a driver) can also hold on the packet if they are 
buggy, but then we should fix that bug rather than feed it with more 
guest pages.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-16 15:21         ` Zoltan Kiss
@ 2013-12-16 17:50           ` Wei Liu
  2014-01-07 14:50             ` Zoltan Kiss
  2014-01-07 14:50             ` Zoltan Kiss
  2013-12-16 17:50           ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 17:50 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Mon, Dec 16, 2013 at 03:21:40PM +0000, Zoltan Kiss wrote:
[...]
> >>>> >
> >>>> >Should this be BUG_ON? AIUI this kthread should be the only one doing
> >>>> >unmap, right?
> >>>The NAPI instance can do it as well if it is a small packet fits
> >>>into PKT_PROT_LEN. But still this scenario shouldn't really happen,
> >>>I was just not sure we have to crash immediately. Maybe handle it as
> >>>a fatal error and destroy the vif?
> >>>
> >It depends. If this is within the trust boundary, i.e. everything at the
> >stage should have been sanitized then we should BUG_ON because there's
> >clearly a bug somewhere in the sanitization process, or in the
> >interaction of various backend routines.
> 
> My understanding is that crashing should be avoided if we can bail
> out somehow. At this point there is clearly a bug in netback
> somewhere, something unmapped that page before it should have
> happened, or at least that array get corrupted somehow. However
> there is a chance that xenvif_fatal_tx_err() can contain the issue,
> and the rest of the system can go unaffected.
> 

That would make debugging much harder if a crash is caused by a previous
corrupted array and we pretend we can carry on serving IMHO. Now netback
is having three routines (NAPI, two kthreads) to serve a single vif, the
interation among them makes bug hard to reproduce.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-16 15:21         ` Zoltan Kiss
  2013-12-16 17:50           ` Wei Liu
@ 2013-12-16 17:50           ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 17:50 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, Wei Liu, ian.campbell, netdev, linux-kernel, xen-devel

On Mon, Dec 16, 2013 at 03:21:40PM +0000, Zoltan Kiss wrote:
[...]
> >>>> >
> >>>> >Should this be BUG_ON? AIUI this kthread should be the only one doing
> >>>> >unmap, right?
> >>>The NAPI instance can do it as well if it is a small packet fits
> >>>into PKT_PROT_LEN. But still this scenario shouldn't really happen,
> >>>I was just not sure we have to crash immediately. Maybe handle it as
> >>>a fatal error and destroy the vif?
> >>>
> >It depends. If this is within the trust boundary, i.e. everything at the
> >stage should have been sanitized then we should BUG_ON because there's
> >clearly a bug somewhere in the sanitization process, or in the
> >interaction of various backend routines.
> 
> My understanding is that crashing should be avoided if we can bail
> out somehow. At this point there is clearly a bug in netback
> somewhere, something unmapped that page before it should have
> happened, or at least that array get corrupted somehow. However
> there is a chance that xenvif_fatal_tx_err() can contain the issue,
> and the rest of the system can go unaffected.
> 

That would make debugging much harder if a crash is caused by a previous
corrupted array and we pretend we can carry on serving IMHO. Now netback
is having three routines (NAPI, two kthreads) to serve a single vif, the
interation among them makes bug hard to reproduce.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-16 16:10     ` Zoltan Kiss
@ 2013-12-16 18:09       ` Wei Liu
  2014-01-07 15:23         ` Zoltan Kiss
  2014-01-07 15:23         ` Zoltan Kiss
  2013-12-16 18:09       ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 18:09 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Mon, Dec 16, 2013 at 04:10:42PM +0000, Zoltan Kiss wrote:
> On 13/12/13 15:43, Wei Liu wrote:
> >On Thu, Dec 12, 2013 at 11:48:14PM +0000, Zoltan Kiss wrote:
> >>Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
> >>handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
> >>- create a new skb
> >>- map the leftover slots to its frags (no linear buffer here!)
> >>- chain it to the previous through skb_shinfo(skb)->frag_list
> >>- map them
> >>- copy the whole stuff into a brand new skb and send it to the stack
> >>- unmap the 2 old skb's pages
> >>
> >
> >Do you see performance regression with this approach?
> Well, it was pretty hard to reproduce that behaviour even with NFS.
> I don't think it happens often enough that it causes a noticable
> performance regression. Anyway, it would be just as slow as the
> current grant copy with coalescing, maybe a bit slower due to the
> unmapping. But at least we use a core network function to do the
> coalescing.
> Or, if you mean the generic performance, if this problem doesn't
> appear, then no, I don't see performance regression.
> 

OK, thanks for comfirming.

> >>Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> >>
> >>---
> >>  drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 94 insertions(+), 5 deletions(-)
> >>
> >>diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> >>index e26cdda..f6ed1c8 100644
> >>--- a/drivers/net/xen-netback/netback.c
> >>+++ b/drivers/net/xen-netback/netback.c
> >>@@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
> >>  	u16 pending_idx = *((u16 *)skb->data);
> >>  	int start;
> >>  	pending_ring_idx_t index;
> >>-	unsigned int nr_slots;
> >>+	unsigned int nr_slots, frag_overflow = 0;
> >>
> >>  	/* At this point shinfo->nr_frags is in fact the number of
> >>  	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
> >>  	 */
> >>+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
> >>+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
> >>+		shinfo->nr_frags = MAX_SKB_FRAGS;
> >>+	}
> >>  	nr_slots = shinfo->nr_frags;
> >>
> >
> >It is also probably better to check whether shinfo->nr_frags is too
> >large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
> >already be valid at this point but it wouldn't hurt to be more careful.
> Ok, I've added this:
> 	/* At this point shinfo->nr_frags is in fact the number of
> 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
> 	 */
> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
> +		if (shinfo->nr_frags > XEN_NETBK_LEGACY_SLOTS_MAX) return NULL;
> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
> 

What I suggested is

   BUG_ON(frag_overflow > MAX_SKB_FRAGS)

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-16 16:10     ` Zoltan Kiss
  2013-12-16 18:09       ` Wei Liu
@ 2013-12-16 18:09       ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 18:09 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, Wei Liu, ian.campbell, netdev, linux-kernel, xen-devel

On Mon, Dec 16, 2013 at 04:10:42PM +0000, Zoltan Kiss wrote:
> On 13/12/13 15:43, Wei Liu wrote:
> >On Thu, Dec 12, 2013 at 11:48:14PM +0000, Zoltan Kiss wrote:
> >>Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
> >>handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
> >>- create a new skb
> >>- map the leftover slots to its frags (no linear buffer here!)
> >>- chain it to the previous through skb_shinfo(skb)->frag_list
> >>- map them
> >>- copy the whole stuff into a brand new skb and send it to the stack
> >>- unmap the 2 old skb's pages
> >>
> >
> >Do you see performance regression with this approach?
> Well, it was pretty hard to reproduce that behaviour even with NFS.
> I don't think it happens often enough that it causes a noticable
> performance regression. Anyway, it would be just as slow as the
> current grant copy with coalescing, maybe a bit slower due to the
> unmapping. But at least we use a core network function to do the
> coalescing.
> Or, if you mean the generic performance, if this problem doesn't
> appear, then no, I don't see performance regression.
> 

OK, thanks for comfirming.

> >>Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> >>
> >>---
> >>  drivers/net/xen-netback/netback.c |   99 +++++++++++++++++++++++++++++++++++--
> >>  1 file changed, 94 insertions(+), 5 deletions(-)
> >>
> >>diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> >>index e26cdda..f6ed1c8 100644
> >>--- a/drivers/net/xen-netback/netback.c
> >>+++ b/drivers/net/xen-netback/netback.c
> >>@@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
> >>  	u16 pending_idx = *((u16 *)skb->data);
> >>  	int start;
> >>  	pending_ring_idx_t index;
> >>-	unsigned int nr_slots;
> >>+	unsigned int nr_slots, frag_overflow = 0;
> >>
> >>  	/* At this point shinfo->nr_frags is in fact the number of
> >>  	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
> >>  	 */
> >>+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
> >>+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
> >>+		shinfo->nr_frags = MAX_SKB_FRAGS;
> >>+	}
> >>  	nr_slots = shinfo->nr_frags;
> >>
> >
> >It is also probably better to check whether shinfo->nr_frags is too
> >large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
> >already be valid at this point but it wouldn't hurt to be more careful.
> Ok, I've added this:
> 	/* At this point shinfo->nr_frags is in fact the number of
> 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
> 	 */
> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
> +		if (shinfo->nr_frags > XEN_NETBK_LEGACY_SLOTS_MAX) return NULL;
> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
> 

What I suggested is

   BUG_ON(frag_overflow > MAX_SKB_FRAGS)

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-16 15:38     ` Zoltan Kiss
@ 2013-12-16 18:21       ` Wei Liu
  2013-12-16 18:57         ` Zoltan Kiss
  2013-12-16 18:57         ` Zoltan Kiss
  2013-12-16 18:21       ` Wei Liu
  1 sibling, 2 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 18:21 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Mon, Dec 16, 2013 at 03:38:05PM +0000, Zoltan Kiss wrote:
[...]
> >>+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> >>+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> >>+			i = 0;
> >>+			unmap_timeout++;
> >>+			msleep(1000);
> >>+			if (unmap_timeout > 9 &&
> >>+				net_ratelimit())
> >>+				netdev_err(vif->dev,
> >>+					"Page still granted! Index: %x\n", i);
> >>+		}
> >>+	}
> >>+
> >>+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
> >>+
> >
> >If some pages are stuck and you just free them will it cause Dom0 to
> >crash? I mean, if those pages are recycled by other balloon page users.
> >
> >Even if it will not cause Dom0 to crash, will it leak any resource in
> >Dom0? At plain sight it looks like at least grant table entry is leaked,
> >isn't it? We need to be careful about this because a malicious might be
> >able to DoS Dom0 with resource leakage.
> Yes, if we call free_xenballooned_pages while something is still
> mapped, Xen kills Dom0 because balloon driver tries to touch the PTE
> of a grant mapped page. That's why we make sure before that
> everything is unmapped, and repeat an error message if it's not. I'm

The code snippet above doesn't loop ten times over the whole array if
there's stale pages found, nor does it loop ten times on any stale
pages.

So imagine that we have the very last page in the array staled. This
routine sleeps for 1 second then free all ballooned pages. It's still
not guaranteed at the point we call free_xenballooned_pages all pages
are unmapped, right?

> afraid we can't do anything better here, that means a serious
> netback bug.
> But a malicious guest cannot take advantage of this unless it's find
> a way to screw up netback's internal bookkeeping. Then it can block
> here indefinitely the teardown of the VIF, and it's associated
> resources.
> 

OK.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-16 15:38     ` Zoltan Kiss
  2013-12-16 18:21       ` Wei Liu
@ 2013-12-16 18:21       ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 18:21 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, Wei Liu, ian.campbell, netdev, linux-kernel, xen-devel

On Mon, Dec 16, 2013 at 03:38:05PM +0000, Zoltan Kiss wrote:
[...]
> >>+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> >>+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> >>+			i = 0;
> >>+			unmap_timeout++;
> >>+			msleep(1000);
> >>+			if (unmap_timeout > 9 &&
> >>+				net_ratelimit())
> >>+				netdev_err(vif->dev,
> >>+					"Page still granted! Index: %x\n", i);
> >>+		}
> >>+	}
> >>+
> >>+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
> >>+
> >
> >If some pages are stuck and you just free them will it cause Dom0 to
> >crash? I mean, if those pages are recycled by other balloon page users.
> >
> >Even if it will not cause Dom0 to crash, will it leak any resource in
> >Dom0? At plain sight it looks like at least grant table entry is leaked,
> >isn't it? We need to be careful about this because a malicious might be
> >able to DoS Dom0 with resource leakage.
> Yes, if we call free_xenballooned_pages while something is still
> mapped, Xen kills Dom0 because balloon driver tries to touch the PTE
> of a grant mapped page. That's why we make sure before that
> everything is unmapped, and repeat an error message if it's not. I'm

The code snippet above doesn't loop ten times over the whole array if
there's stale pages found, nor does it loop ten times on any stale
pages.

So imagine that we have the very last page in the array staled. This
routine sleeps for 1 second then free all ballooned pages. It's still
not guaranteed at the point we call free_xenballooned_pages all pages
are unmapped, right?

> afraid we can't do anything better here, that means a serious
> netback bug.
> But a malicious guest cannot take advantage of this unless it's find
> a way to screw up netback's internal bookkeeping. Then it can block
> here indefinitely the teardown of the VIF, and it's associated
> resources.
> 

OK.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-16 18:21       ` Wei Liu
@ 2013-12-16 18:57         ` Zoltan Kiss
  2013-12-16 19:06           ` Wei Liu
  2013-12-16 19:06           ` Wei Liu
  2013-12-16 18:57         ` Zoltan Kiss
  1 sibling, 2 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 18:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/12/13 18:21, Wei Liu wrote:
> On Mon, Dec 16, 2013 at 03:38:05PM +0000, Zoltan Kiss wrote:
> [...]
>>>> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
>>>> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>>>> +			i = 0;
>>>> +			unmap_timeout++;
>>>> +			msleep(1000);
>>>> +			if (unmap_timeout > 9 &&
>>>> +				net_ratelimit())
>>>> +				netdev_err(vif->dev,
>>>> +					"Page still granted! Index: %x\n", i);
>>>> +		}
>>>> +	}
>>>> +
>>>> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
>>>> +
>>>
>>> If some pages are stuck and you just free them will it cause Dom0 to
>>> crash? I mean, if those pages are recycled by other balloon page users.
>>>
>>> Even if it will not cause Dom0 to crash, will it leak any resource in
>>> Dom0? At plain sight it looks like at least grant table entry is leaked,
>>> isn't it? We need to be careful about this because a malicious might be
>>> able to DoS Dom0 with resource leakage.
>> Yes, if we call free_xenballooned_pages while something is still
>> mapped, Xen kills Dom0 because balloon driver tries to touch the PTE
>> of a grant mapped page. That's why we make sure before that
>> everything is unmapped, and repeat an error message if it's not. I'm

There is an "i = 0" if we find a valid handle. So we start again 
checking the whole array from the second element (incorrectly, it should 
be "i = -1"!), and we print an incorrect error message, but essentially 
we are not leaving the loop, unless the first element was the 
problematic. We can modify that to "i--" or "i = -1" if we want to 
recheck the whole array. It shouldn't happen at this point that we 
transmit new packets, starting from the beginning is just an extra 
safety check.
Also, we should modify i after the printing of the error message.

Zoli


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-16 18:21       ` Wei Liu
  2013-12-16 18:57         ` Zoltan Kiss
@ 2013-12-16 18:57         ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-16 18:57 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 16/12/13 18:21, Wei Liu wrote:
> On Mon, Dec 16, 2013 at 03:38:05PM +0000, Zoltan Kiss wrote:
> [...]
>>>> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
>>>> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>>>> +			i = 0;
>>>> +			unmap_timeout++;
>>>> +			msleep(1000);
>>>> +			if (unmap_timeout > 9 &&
>>>> +				net_ratelimit())
>>>> +				netdev_err(vif->dev,
>>>> +					"Page still granted! Index: %x\n", i);
>>>> +		}
>>>> +	}
>>>> +
>>>> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
>>>> +
>>>
>>> If some pages are stuck and you just free them will it cause Dom0 to
>>> crash? I mean, if those pages are recycled by other balloon page users.
>>>
>>> Even if it will not cause Dom0 to crash, will it leak any resource in
>>> Dom0? At plain sight it looks like at least grant table entry is leaked,
>>> isn't it? We need to be careful about this because a malicious might be
>>> able to DoS Dom0 with resource leakage.
>> Yes, if we call free_xenballooned_pages while something is still
>> mapped, Xen kills Dom0 because balloon driver tries to touch the PTE
>> of a grant mapped page. That's why we make sure before that
>> everything is unmapped, and repeat an error message if it's not. I'm

There is an "i = 0" if we find a valid handle. So we start again 
checking the whole array from the second element (incorrectly, it should 
be "i = -1"!), and we print an incorrect error message, but essentially 
we are not leaving the loop, unless the first element was the 
problematic. We can modify that to "i--" or "i = -1" if we want to 
recheck the whole array. It shouldn't happen at this point that we 
transmit new packets, starting from the beginning is just an extra 
safety check.
Also, we should modify i after the printing of the error message.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-16 17:16     ` Zoltan Kiss
  2013-12-16 19:03       ` Wei Liu
@ 2013-12-16 19:03       ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 19:03 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Mon, Dec 16, 2013 at 05:16:17PM +0000, Zoltan Kiss wrote:
> On 13/12/13 15:44, Wei Liu wrote:
> >On Thu, Dec 12, 2013 at 11:48:16PM +0000, Zoltan Kiss wrote:
> >>A malicious or buggy guest can leave its queue filled indefinitely, in which
> >>case qdisc start to queue packets for that VIF. If those packets came from an
> >>another guest, it can block its slots and prevent shutdown. To avoid that, we
> >>make sure the queue is drained in every 10 seconds
> >>
> >
> >Oh I see where the 10 second constraint in previous patch comes from.
> >
> >Could you define a macro for this constant then use it everywhere.
> Well, they are not entirely the same thing, but worth making them
> the same. How about using "unmap_timeout >
> (rx_drain_timeout_msecs/1000)" in xenvif_free()? Then netback won't
> complain about a stucked page if an another guest is permitted to
> hold on to it.
> 

Thanks for clarification. I see the difference. If they are not the same
by definition then we need to think more about making them the same in
practice.

If we use  "unmap_timeout > (rx_drain_timeout_msecs/1000)" then we
basically assume that guest RX path is the one who is most likely to
hold the packet for the longest time.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path
  2013-12-16 17:16     ` Zoltan Kiss
@ 2013-12-16 19:03       ` Wei Liu
  2013-12-16 19:03       ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 19:03 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, Wei Liu, ian.campbell, netdev, linux-kernel, xen-devel

On Mon, Dec 16, 2013 at 05:16:17PM +0000, Zoltan Kiss wrote:
> On 13/12/13 15:44, Wei Liu wrote:
> >On Thu, Dec 12, 2013 at 11:48:16PM +0000, Zoltan Kiss wrote:
> >>A malicious or buggy guest can leave its queue filled indefinitely, in which
> >>case qdisc start to queue packets for that VIF. If those packets came from an
> >>another guest, it can block its slots and prevent shutdown. To avoid that, we
> >>make sure the queue is drained in every 10 seconds
> >>
> >
> >Oh I see where the 10 second constraint in previous patch comes from.
> >
> >Could you define a macro for this constant then use it everywhere.
> Well, they are not entirely the same thing, but worth making them
> the same. How about using "unmap_timeout >
> (rx_drain_timeout_msecs/1000)" in xenvif_free()? Then netback won't
> complain about a stucked page if an another guest is permitted to
> hold on to it.
> 

Thanks for clarification. I see the difference. If they are not the same
by definition then we need to think more about making them the same in
practice.

If we use  "unmap_timeout > (rx_drain_timeout_msecs/1000)" then we
basically assume that guest RX path is the one who is most likely to
hold the packet for the longest time.

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-16 18:57         ` Zoltan Kiss
@ 2013-12-16 19:06           ` Wei Liu
  2013-12-16 19:06           ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 19:06 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Mon, Dec 16, 2013 at 06:57:44PM +0000, Zoltan Kiss wrote:
> On 16/12/13 18:21, Wei Liu wrote:
> >On Mon, Dec 16, 2013 at 03:38:05PM +0000, Zoltan Kiss wrote:
> >[...]
> >>>>+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> >>>>+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> >>>>+			i = 0;
> >>>>+			unmap_timeout++;
> >>>>+			msleep(1000);
> >>>>+			if (unmap_timeout > 9 &&
> >>>>+				net_ratelimit())
> >>>>+				netdev_err(vif->dev,
> >>>>+					"Page still granted! Index: %x\n", i);
> >>>>+		}
> >>>>+	}
> >>>>+
> >>>>+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
> >>>>+
> >>>
> >>>If some pages are stuck and you just free them will it cause Dom0 to
> >>>crash? I mean, if those pages are recycled by other balloon page users.
> >>>
> >>>Even if it will not cause Dom0 to crash, will it leak any resource in
> >>>Dom0? At plain sight it looks like at least grant table entry is leaked,
> >>>isn't it? We need to be careful about this because a malicious might be
> >>>able to DoS Dom0 with resource leakage.
> >>Yes, if we call free_xenballooned_pages while something is still
> >>mapped, Xen kills Dom0 because balloon driver tries to touch the PTE
> >>of a grant mapped page. That's why we make sure before that
> >>everything is unmapped, and repeat an error message if it's not. I'm
> 
> There is an "i = 0" if we find a valid handle. So we start again

Oops, missed that.

> checking the whole array from the second element (incorrectly, it
> should be "i = -1"!), and we print an incorrect error message, but
> essentially we are not leaving the loop, unless the first element
> was the problematic. We can modify that to "i--" or "i = -1" if we
> want to recheck the whole array. It shouldn't happen at this point
> that we transmit new packets, starting from the beginning is just an
> extra safety check.
> Also, we should modify i after the printing of the error message.
> 

So I did help find a bug though. :-)

Wei.

> Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-16 18:57         ` Zoltan Kiss
  2013-12-16 19:06           ` Wei Liu
@ 2013-12-16 19:06           ` Wei Liu
  1 sibling, 0 replies; 76+ messages in thread
From: Wei Liu @ 2013-12-16 19:06 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, Wei Liu, ian.campbell, netdev, linux-kernel, xen-devel

On Mon, Dec 16, 2013 at 06:57:44PM +0000, Zoltan Kiss wrote:
> On 16/12/13 18:21, Wei Liu wrote:
> >On Mon, Dec 16, 2013 at 03:38:05PM +0000, Zoltan Kiss wrote:
> >[...]
> >>>>+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> >>>>+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> >>>>+			i = 0;
> >>>>+			unmap_timeout++;
> >>>>+			msleep(1000);
> >>>>+			if (unmap_timeout > 9 &&
> >>>>+				net_ratelimit())
> >>>>+				netdev_err(vif->dev,
> >>>>+					"Page still granted! Index: %x\n", i);
> >>>>+		}
> >>>>+	}
> >>>>+
> >>>>+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
> >>>>+
> >>>
> >>>If some pages are stuck and you just free them will it cause Dom0 to
> >>>crash? I mean, if those pages are recycled by other balloon page users.
> >>>
> >>>Even if it will not cause Dom0 to crash, will it leak any resource in
> >>>Dom0? At plain sight it looks like at least grant table entry is leaked,
> >>>isn't it? We need to be careful about this because a malicious might be
> >>>able to DoS Dom0 with resource leakage.
> >>Yes, if we call free_xenballooned_pages while something is still
> >>mapped, Xen kills Dom0 because balloon driver tries to touch the PTE
> >>of a grant mapped page. That's why we make sure before that
> >>everything is unmapped, and repeat an error message if it's not. I'm
> 
> There is an "i = 0" if we find a valid handle. So we start again

Oops, missed that.

> checking the whole array from the second element (incorrectly, it
> should be "i = -1"!), and we print an incorrect error message, but
> essentially we are not leaving the loop, unless the first element
> was the problematic. We can modify that to "i--" or "i = -1" if we
> want to recheck the whole array. It shouldn't happen at this point
> that we transmit new packets, starting from the beginning is just an
> extra safety check.
> Also, we should modify i after the printing of the error message.
> 

So I did help find a bug though. :-)

Wei.

> Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-12 23:48   ` Zoltan Kiss
                     ` (2 preceding siblings ...)
  (?)
@ 2013-12-17 21:49   ` Konrad Rzeszutek Wilk
  2013-12-30 17:58     ` Zoltan Kiss
  2013-12-30 17:58     ` [Xen-devel] " Zoltan Kiss
  -1 siblings, 2 replies; 76+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 21:49 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
> This patch changes the grant copy on the TX patch to grant mapping
> 
> v2:
> - delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
>   request
> - mark the effect of using ballooned pages in a comment
> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>   before netif_receive_skb, and mark the importance of it
> - grab dealloc_lock before __napi_complete to avoid contention with the
>   callback's napi_schedule
> - handle fragmented packets where first request < PKT_PROT_LINE
> - fix up error path when checksum_setup failed
> - check before teardown for pending grants, and start complain if they are
>   there after 10 second
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>  drivers/net/xen-netback/interface.c |   57 +++++++-
>  drivers/net/xen-netback/netback.c   |  257 ++++++++++++++---------------------
>  2 files changed, 156 insertions(+), 158 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1c27e9e..42946de 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -122,7 +122,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	BUG_ON(skb->dev != dev);
>  
>  	/* Drop the packet if vif is not ready */
> -	if (vif->task == NULL || !xenvif_schedulable(vif))
> +	if (vif->task == NULL ||
> +		vif->dealloc_task == NULL ||
> +		!xenvif_schedulable(vif))
>  		goto drop;
>  
>  	/* At best we'll need one slot for the header and one for each
> @@ -335,8 +337,25 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	vif->pending_prod = MAX_PENDING_REQS;
>  	for (i = 0; i < MAX_PENDING_REQS; i++)
>  		vif->pending_ring[i] = i;
> -	for (i = 0; i < MAX_PENDING_REQS; i++)
> -		vif->mmap_pages[i] = NULL;
> +	/* If ballooning is disabled, this will consume real memory, so you
> +	 * better enable it. The long term solution would be to use just a
> +	 * bunch of valid page descriptors, without dependency on ballooning
> +	 */
> +	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
> +		vif->mmap_pages,
> +		false);
> +	if (err) {
> +		netdev_err(dev, "Could not reserve mmap_pages\n");
> +		return NULL;
> +	}
> +	for (i = 0; i < MAX_PENDING_REQS; i++) {
> +		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
> +			{ .callback = xenvif_zerocopy_callback,
> +			  .ctx = NULL,
> +			  .desc = i };
> +		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
> +	}
> +	init_timer(&vif->dealloc_delay);
>  
>  	/*
>  	 * Initialise a dummy MAC address. We choose the numerically
> @@ -380,6 +399,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  		goto err;
>  
>  	init_waitqueue_head(&vif->wq);
> +	init_waitqueue_head(&vif->dealloc_wq);
>  
>  	if (tx_evtchn == rx_evtchn) {
>  		/* feature-split-event-channels == 0 */
> @@ -421,6 +441,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  		goto err_rx_unbind;
>  	}
>  
> +	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
> +				   (void *)vif, "%s-dealloc", vif->dev->name);
> +	if (IS_ERR(vif->dealloc_task)) {
> +		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
> +		err = PTR_ERR(vif->dealloc_task);
> +		goto err_rx_unbind;
> +	}
> +
>  	vif->task = task;
>  
>  	rtnl_lock();
> @@ -433,6 +461,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  	rtnl_unlock();
>  
>  	wake_up_process(vif->task);
> +	wake_up_process(vif->dealloc_task);
>  
>  	return 0;
>  
> @@ -470,6 +499,12 @@ void xenvif_disconnect(struct xenvif *vif)
>  		vif->task = NULL;
>  	}
>  
> +	if (vif->dealloc_task) {
> +		del_timer_sync(&vif->dealloc_delay);
> +		kthread_stop(vif->dealloc_task);
> +		vif->dealloc_task = NULL;
> +	}
> +
>  	if (vif->tx_irq) {
>  		if (vif->tx_irq == vif->rx_irq)
>  			unbind_from_irqhandler(vif->tx_irq, vif);
> @@ -485,6 +520,22 @@ void xenvif_disconnect(struct xenvif *vif)
>  
>  void xenvif_free(struct xenvif *vif)
>  {
> +	int i, unmap_timeout = 0;
> +
> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> +			i = 0;
> +			unmap_timeout++;
> +			msleep(1000);

You don't want to use schedule() and a wakeup here to allow other threads
to do their work?

> +			if (unmap_timeout > 9 &&
> +				net_ratelimit())
> +				netdev_err(vif->dev,
> +					"Page still granted! Index: %x\n", i);
> +		}
> +	}
> +
> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);

How about just stashing those pages on a 'I can't free them' list that will
keep them forever. And if that list gets truly large then switch back to
grant_copy?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-12 23:48   ` Zoltan Kiss
                     ` (3 preceding siblings ...)
  (?)
@ 2013-12-17 21:49   ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 76+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-12-17 21:49 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
> This patch changes the grant copy on the TX patch to grant mapping
> 
> v2:
> - delete branch for handling fragmented packets fit PKT_PROT_LINE sized first
>   request
> - mark the effect of using ballooned pages in a comment
> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>   before netif_receive_skb, and mark the importance of it
> - grab dealloc_lock before __napi_complete to avoid contention with the
>   callback's napi_schedule
> - handle fragmented packets where first request < PKT_PROT_LINE
> - fix up error path when checksum_setup failed
> - check before teardown for pending grants, and start complain if they are
>   there after 10 second
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>  drivers/net/xen-netback/interface.c |   57 +++++++-
>  drivers/net/xen-netback/netback.c   |  257 ++++++++++++++---------------------
>  2 files changed, 156 insertions(+), 158 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1c27e9e..42946de 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -122,7 +122,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	BUG_ON(skb->dev != dev);
>  
>  	/* Drop the packet if vif is not ready */
> -	if (vif->task == NULL || !xenvif_schedulable(vif))
> +	if (vif->task == NULL ||
> +		vif->dealloc_task == NULL ||
> +		!xenvif_schedulable(vif))
>  		goto drop;
>  
>  	/* At best we'll need one slot for the header and one for each
> @@ -335,8 +337,25 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	vif->pending_prod = MAX_PENDING_REQS;
>  	for (i = 0; i < MAX_PENDING_REQS; i++)
>  		vif->pending_ring[i] = i;
> -	for (i = 0; i < MAX_PENDING_REQS; i++)
> -		vif->mmap_pages[i] = NULL;
> +	/* If ballooning is disabled, this will consume real memory, so you
> +	 * better enable it. The long term solution would be to use just a
> +	 * bunch of valid page descriptors, without dependency on ballooning
> +	 */
> +	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
> +		vif->mmap_pages,
> +		false);
> +	if (err) {
> +		netdev_err(dev, "Could not reserve mmap_pages\n");
> +		return NULL;
> +	}
> +	for (i = 0; i < MAX_PENDING_REQS; i++) {
> +		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
> +			{ .callback = xenvif_zerocopy_callback,
> +			  .ctx = NULL,
> +			  .desc = i };
> +		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
> +	}
> +	init_timer(&vif->dealloc_delay);
>  
>  	/*
>  	 * Initialise a dummy MAC address. We choose the numerically
> @@ -380,6 +399,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  		goto err;
>  
>  	init_waitqueue_head(&vif->wq);
> +	init_waitqueue_head(&vif->dealloc_wq);
>  
>  	if (tx_evtchn == rx_evtchn) {
>  		/* feature-split-event-channels == 0 */
> @@ -421,6 +441,14 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  		goto err_rx_unbind;
>  	}
>  
> +	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
> +				   (void *)vif, "%s-dealloc", vif->dev->name);
> +	if (IS_ERR(vif->dealloc_task)) {
> +		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
> +		err = PTR_ERR(vif->dealloc_task);
> +		goto err_rx_unbind;
> +	}
> +
>  	vif->task = task;
>  
>  	rtnl_lock();
> @@ -433,6 +461,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  	rtnl_unlock();
>  
>  	wake_up_process(vif->task);
> +	wake_up_process(vif->dealloc_task);
>  
>  	return 0;
>  
> @@ -470,6 +499,12 @@ void xenvif_disconnect(struct xenvif *vif)
>  		vif->task = NULL;
>  	}
>  
> +	if (vif->dealloc_task) {
> +		del_timer_sync(&vif->dealloc_delay);
> +		kthread_stop(vif->dealloc_task);
> +		vif->dealloc_task = NULL;
> +	}
> +
>  	if (vif->tx_irq) {
>  		if (vif->tx_irq == vif->rx_irq)
>  			unbind_from_irqhandler(vif->tx_irq, vif);
> @@ -485,6 +520,22 @@ void xenvif_disconnect(struct xenvif *vif)
>  
>  void xenvif_free(struct xenvif *vif)
>  {
> +	int i, unmap_timeout = 0;
> +
> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> +			i = 0;
> +			unmap_timeout++;
> +			msleep(1000);

You don't want to use schedule() and a wakeup here to allow other threads
to do their work?

> +			if (unmap_timeout > 9 &&
> +				net_ratelimit())
> +				netdev_err(vif->dev,
> +					"Page still granted! Index: %x\n", i);
> +		}
> +	}
> +
> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);

How about just stashing those pages on a 'I can't free them' list that will
keep them forever. And if that list gets truly large then switch back to
grant_copy?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [Xen-devel] [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-17 21:49   ` [Xen-devel] " Konrad Rzeszutek Wilk
  2013-12-30 17:58     ` Zoltan Kiss
@ 2013-12-30 17:58     ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-30 17:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On 17/12/13 21:49, Konrad Rzeszutek Wilk wrote:
> On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
>> @@ -485,6 +520,22 @@ void xenvif_disconnect(struct xenvif *vif)
>>
>>   void xenvif_free(struct xenvif *vif)
>>   {
>> +	int i, unmap_timeout = 0;
>> +
>> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
>> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>> +			i = 0;
>> +			unmap_timeout++;
>> +			msleep(1000);
>
> You don't want to use schedule() and a wakeup here to allow other threads
> to do their work?
Yep, schedule_timeout() would be nicer indeed

>
>> +			if (unmap_timeout > 9 &&
>> +				net_ratelimit())
>> +				netdev_err(vif->dev,
>> +					"Page still granted! Index: %x\n", i);
>> +		}
>> +	}
>> +
>> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
>
> How about just stashing those pages on a 'I can't free them' list that will
> keep them forever. And if that list gets truly large then switch back to
> grant_copy?
But then what would you answer to the guest? You can't shoot the shared 
ring until there is an outstanding slot.
On the other hand, doing a copy would just move the memory leak into the 
backend, which could be problematic if a guest figures out how to make a 
packet which can get stucked somewhere in the backend.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping
  2013-12-17 21:49   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2013-12-30 17:58     ` Zoltan Kiss
  2013-12-30 17:58     ` [Xen-devel] " Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-30 17:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

On 17/12/13 21:49, Konrad Rzeszutek Wilk wrote:
> On Thu, Dec 12, 2013 at 11:48:10PM +0000, Zoltan Kiss wrote:
>> @@ -485,6 +520,22 @@ void xenvif_disconnect(struct xenvif *vif)
>>
>>   void xenvif_free(struct xenvif *vif)
>>   {
>> +	int i, unmap_timeout = 0;
>> +
>> +	for (i = 0; i < MAX_PENDING_REQS; ++i) {
>> +		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>> +			i = 0;
>> +			unmap_timeout++;
>> +			msleep(1000);
>
> You don't want to use schedule() and a wakeup here to allow other threads
> to do their work?
Yep, schedule_timeout() would be nicer indeed

>
>> +			if (unmap_timeout > 9 &&
>> +				net_ratelimit())
>> +				netdev_err(vif->dev,
>> +					"Page still granted! Index: %x\n", i);
>> +		}
>> +	}
>> +
>> +	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
>
> How about just stashing those pages on a 'I can't free them' list that will
> keep them forever. And if that list gets truly large then switch back to
> grant_copy?
But then what would you answer to the guest? You can't shoot the shared 
ring until there is an outstanding slot.
On the other hand, doing a copy would just move the memory leak into the 
backend, which could be problematic if a guest figures out how to make a 
packet which can get stucked somewhere in the backend.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-16 17:50           ` Wei Liu
  2014-01-07 14:50             ` Zoltan Kiss
@ 2014-01-07 14:50             ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-07 14:50 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/12/13 17:50, Wei Liu wrote:
> On Mon, Dec 16, 2013 at 03:21:40PM +0000, Zoltan Kiss wrote:
> [...]
>>>>>>>
>>>>>>> Should this be BUG_ON? AIUI this kthread should be the only one doing
>>>>>>> unmap, right?
>>>>> The NAPI instance can do it as well if it is a small packet fits
>>>>> into PKT_PROT_LEN. But still this scenario shouldn't really happen,
>>>>> I was just not sure we have to crash immediately. Maybe handle it as
>>>>> a fatal error and destroy the vif?
>>>>>
>>> It depends. If this is within the trust boundary, i.e. everything at the
>>> stage should have been sanitized then we should BUG_ON because there's
>>> clearly a bug somewhere in the sanitization process, or in the
>>> interaction of various backend routines.
>>
>> My understanding is that crashing should be avoided if we can bail
>> out somehow. At this point there is clearly a bug in netback
>> somewhere, something unmapped that page before it should have
>> happened, or at least that array get corrupted somehow. However
>> there is a chance that xenvif_fatal_tx_err() can contain the issue,
>> and the rest of the system can go unaffected.
>>
>
> That would make debugging much harder if a crash is caused by a previous
> corrupted array and we pretend we can carry on serving IMHO. Now netback
> is having three routines (NAPI, two kthreads) to serve a single vif, the
> interation among them makes bug hard to reproduce.

OK, I'll make this a BUG() in the next series.

Zoli


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions
  2013-12-16 17:50           ` Wei Liu
@ 2014-01-07 14:50             ` Zoltan Kiss
  2014-01-07 14:50             ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-07 14:50 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 16/12/13 17:50, Wei Liu wrote:
> On Mon, Dec 16, 2013 at 03:21:40PM +0000, Zoltan Kiss wrote:
> [...]
>>>>>>>
>>>>>>> Should this be BUG_ON? AIUI this kthread should be the only one doing
>>>>>>> unmap, right?
>>>>> The NAPI instance can do it as well if it is a small packet fits
>>>>> into PKT_PROT_LEN. But still this scenario shouldn't really happen,
>>>>> I was just not sure we have to crash immediately. Maybe handle it as
>>>>> a fatal error and destroy the vif?
>>>>>
>>> It depends. If this is within the trust boundary, i.e. everything at the
>>> stage should have been sanitized then we should BUG_ON because there's
>>> clearly a bug somewhere in the sanitization process, or in the
>>> interaction of various backend routines.
>>
>> My understanding is that crashing should be avoided if we can bail
>> out somehow. At this point there is clearly a bug in netback
>> somewhere, something unmapped that page before it should have
>> happened, or at least that array get corrupted somehow. However
>> there is a chance that xenvif_fatal_tx_err() can contain the issue,
>> and the rest of the system can go unaffected.
>>
>
> That would make debugging much harder if a crash is caused by a previous
> corrupted array and we pretend we can carry on serving IMHO. Now netback
> is having three routines (NAPI, two kthreads) to serve a single vif, the
> interation among them makes bug hard to reproduce.

OK, I'll make this a BUG() in the next series.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-16 18:09       ` Wei Liu
@ 2014-01-07 15:23         ` Zoltan Kiss
  2014-01-07 15:23         ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-07 15:23 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/12/13 18:09, Wei Liu wrote:
>>>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>>>> index e26cdda..f6ed1c8 100644
>>>> --- a/drivers/net/xen-netback/netback.c
>>>> +++ b/drivers/net/xen-netback/netback.c
>>>> @@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>>>   	u16 pending_idx = *((u16 *)skb->data);
>>>>   	int start;
>>>>   	pending_ring_idx_t index;
>>>> -	unsigned int nr_slots;
>>>> +	unsigned int nr_slots, frag_overflow = 0;
>>>>
>>>>   	/* At this point shinfo->nr_frags is in fact the number of
>>>>   	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>>>>   	 */
>>>> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
>>>> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
>>>> +		shinfo->nr_frags = MAX_SKB_FRAGS;
>>>> +	}
>>>>   	nr_slots = shinfo->nr_frags;
>>>>
>>>
>>> It is also probably better to check whether shinfo->nr_frags is too
>>> large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
>>> already be valid at this point but it wouldn't hurt to be more careful.
>> Ok, I've added this:
>> 	/* At this point shinfo->nr_frags is in fact the number of
>> 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>> 	 */
>> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
>> +		if (shinfo->nr_frags > XEN_NETBK_LEGACY_SLOTS_MAX) return NULL;
>> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
>>
>
> What I suggested is
>
>     BUG_ON(frag_overflow > MAX_SKB_FRAGS)

Ok, I've changed it.

Zoli


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags
  2013-12-16 18:09       ` Wei Liu
  2014-01-07 15:23         ` Zoltan Kiss
@ 2014-01-07 15:23         ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-07 15:23 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 16/12/13 18:09, Wei Liu wrote:
>>>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>>>> index e26cdda..f6ed1c8 100644
>>>> --- a/drivers/net/xen-netback/netback.c
>>>> +++ b/drivers/net/xen-netback/netback.c
>>>> @@ -906,11 +906,15 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>>>   	u16 pending_idx = *((u16 *)skb->data);
>>>>   	int start;
>>>>   	pending_ring_idx_t index;
>>>> -	unsigned int nr_slots;
>>>> +	unsigned int nr_slots, frag_overflow = 0;
>>>>
>>>>   	/* At this point shinfo->nr_frags is in fact the number of
>>>>   	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>>>>   	 */
>>>> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
>>>> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
>>>> +		shinfo->nr_frags = MAX_SKB_FRAGS;
>>>> +	}
>>>>   	nr_slots = shinfo->nr_frags;
>>>>
>>>
>>> It is also probably better to check whether shinfo->nr_frags is too
>>> large which makes frag_overflow > MAX_SKB_FRAGS. I know skb should be
>>> already be valid at this point but it wouldn't hurt to be more careful.
>> Ok, I've added this:
>> 	/* At this point shinfo->nr_frags is in fact the number of
>> 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
>> 	 */
>> +	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
>> +		if (shinfo->nr_frags > XEN_NETBK_LEGACY_SLOTS_MAX) return NULL;
>> +		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
>>
>
> What I suggested is
>
>     BUG_ON(frag_overflow > MAX_SKB_FRAGS)

Ok, I've changed it.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:44   ` Zoltan Kiss
@ 2014-01-08 14:44   ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-08 14:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 08/01/14 14:43, Wei Liu wrote:
> You once mentioned that you have a trick to avoid touching TLB, is it in
> this series?
>
> (Haven't really looked at this series as I'm in today. Will have a
> closer look tonight. I'm just curious now.)
>
> Wei.
>
No, I'm currently working on that, it will be a separate series, as it 
also needs some Xen modifications which haven't reached upstream yet AFAIK.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08 14:43 ` Wei Liu
@ 2014-01-08 14:44   ` Zoltan Kiss
  2014-01-08 14:44   ` Zoltan Kiss
  1 sibling, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-08 14:44 UTC (permalink / raw)
  To: Wei Liu; +Cc: xen-devel, jonathan.davies, ian.campbell, linux-kernel, netdev

On 08/01/14 14:43, Wei Liu wrote:
> You once mentioned that you have a trick to avoid touching TLB, is it in
> this series?
>
> (Haven't really looked at this series as I'm in today. Will have a
> closer look tonight. I'm just curious now.)
>
> Wei.
>
No, I'm currently working on that, it will be a separate series, as it 
also needs some Xen modifications which haven't reached upstream yet AFAIK.

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
                   ` (2 preceding siblings ...)
  2014-01-08 14:43 ` Wei Liu
@ 2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:44   ` Zoltan Kiss
  2014-01-08 14:44   ` Zoltan Kiss
  3 siblings, 2 replies; 76+ messages in thread
From: Wei Liu @ 2014-01-08 14:43 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

You once mentioned that you have a trick to avoid touching TLB, is it in
this series?

(Haven't really looked at this series as I'm in today. Will have a
closer look tonight. I'm just curious now.)

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
@ 2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:43 ` Wei Liu
  3 siblings, 0 replies; 76+ messages in thread
From: Wei Liu @ 2014-01-08 14:43 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: jonathan.davies, wei.liu2, ian.campbell, netdev, linux-kernel, xen-devel

You once mentioned that you have a trick to avoid touching TLB, is it in
this series?

(Haven't really looked at this series as I'm in today. Will have a
closer look tonight. I'm just curious now.)

Wei.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
@ 2014-01-08  0:16 ` Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:16 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Sorry, the version number in the subject should be v3

Zoli


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
  2014-01-08  0:10 Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
@ 2014-01-08  0:16 ` Zoltan Kiss
  2014-01-08 14:43 ` Wei Liu
  2014-01-08 14:43 ` Wei Liu
  3 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:16 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Sorry, the version number in the subject should be v3

Zoli

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2014-01-08  0:10 Zoltan Kiss
  2014-01-08  0:16 ` Zoltan Kiss
                   ` (3 more replies)
  0 siblings, 4 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:10 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2014-01-08  0:10 Zoltan Kiss
  0 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2014-01-08  0:10 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2013-12-12 23:48 Zoltan Kiss
  0 siblings, 0 replies; 76+ messages in thread
From: Zoltan Kiss @ 2013-12-12 23:48 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2014-01-08 14:45 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-12 23:48 [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
2013-12-13 15:31   ` Wei Liu
2013-12-13 15:31   ` Wei Liu
2013-12-13 18:22     ` Zoltan Kiss
2013-12-13 18:22     ` Zoltan Kiss
2013-12-13 19:14       ` Wei Liu
2013-12-13 19:14       ` Wei Liu
2013-12-16 15:21         ` Zoltan Kiss
2013-12-16 15:21         ` Zoltan Kiss
2013-12-16 17:50           ` Wei Liu
2014-01-07 14:50             ` Zoltan Kiss
2014-01-07 14:50             ` Zoltan Kiss
2013-12-16 17:50           ` Wei Liu
2013-12-12 23:48 ` Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 2/9] xen-netback: Change TX path from grant copy to mapping Zoltan Kiss
2013-12-12 23:48   ` Zoltan Kiss
2013-12-13 15:36   ` Wei Liu
2013-12-13 15:36   ` Wei Liu
2013-12-16 15:38     ` Zoltan Kiss
2013-12-16 18:21       ` Wei Liu
2013-12-16 18:57         ` Zoltan Kiss
2013-12-16 19:06           ` Wei Liu
2013-12-16 19:06           ` Wei Liu
2013-12-16 18:57         ` Zoltan Kiss
2013-12-16 18:21       ` Wei Liu
2013-12-16 15:38     ` Zoltan Kiss
2013-12-17 21:49   ` [Xen-devel] " Konrad Rzeszutek Wilk
2013-12-30 17:58     ` Zoltan Kiss
2013-12-30 17:58     ` [Xen-devel] " Zoltan Kiss
2013-12-17 21:49   ` Konrad Rzeszutek Wilk
2013-12-12 23:48 ` [PATCH net-next v2 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations Zoltan Kiss
2013-12-12 23:48   ` Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 4/9] xen-netback: Change RX path for mapped SKB fragments Zoltan Kiss
2013-12-12 23:48 ` Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 5/9] xen-netback: Add stat counters for zerocopy Zoltan Kiss
2013-12-12 23:48 ` Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
2013-12-12 23:48 ` Zoltan Kiss
2013-12-13 15:43   ` Wei Liu
2013-12-13 15:43   ` Wei Liu
2013-12-16 16:10     ` Zoltan Kiss
2013-12-16 16:10     ` Zoltan Kiss
2013-12-16 18:09       ` Wei Liu
2014-01-07 15:23         ` Zoltan Kiss
2014-01-07 15:23         ` Zoltan Kiss
2013-12-16 18:09       ` Wei Liu
2013-12-12 23:48 ` [PATCH net-next v2 7/9] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
2013-12-12 23:48 ` Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
2013-12-13 15:44   ` Wei Liu
2013-12-16 17:16     ` Zoltan Kiss
2013-12-16 19:03       ` Wei Liu
2013-12-16 19:03       ` Wei Liu
2013-12-16 17:16     ` Zoltan Kiss
2013-12-13 15:44   ` Wei Liu
2013-12-12 23:48 ` Zoltan Kiss
2013-12-12 23:48 ` [PATCH net-next v2 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
2013-12-13 15:44   ` Wei Liu
2013-12-13 15:44   ` Wei Liu
2013-12-16 16:30     ` Zoltan Kiss
2013-12-16 16:30     ` Zoltan Kiss
2013-12-12 23:48 ` Zoltan Kiss
2013-12-16  6:32 ` [PATCH net-next v2 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy annie li
2013-12-16  6:32 ` [Xen-devel] " annie li
2013-12-16 16:13   ` Zoltan Kiss
2013-12-16 16:13   ` Zoltan Kiss
2013-12-12 23:48 Zoltan Kiss
2014-01-08  0:10 Zoltan Kiss
2014-01-08  0:10 Zoltan Kiss
2014-01-08  0:16 ` Zoltan Kiss
2014-01-08  0:16 ` Zoltan Kiss
2014-01-08 14:43 ` Wei Liu
2014-01-08 14:43 ` Wei Liu
2014-01-08 14:44   ` Zoltan Kiss
2014-01-08 14:44   ` Zoltan Kiss

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.