[PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy
@ 2014-01-14 20:39 Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A long known problem of the upstream netback implementation that on the TX
path (from guest to Dom0) it copies the whole packet from guest memory into
Dom0. That simply became a bottleneck with 10Gb NICs, and generally it's a
huge perfomance penalty. The classic kernel version of netback used grant
mapping, and to get notified when the page can be unmapped, it used page
destructors. Unfortunately that destructor is not an upstreamable solution.
Ian Campbell's skb fragment destructor patch series [1] tried to solve this
problem, however it seems to be very invasive on the network stack's code,
and therefore haven't progressed very well.
This patch series use SKBTX_DEV_ZEROCOPY flags to tell the stack it needs to
know when the skb is freed up. That is the way KVM solved the same problem,
and based on my initial tests it can do the same for us. Avoiding the extra
copy boosted up TX throughput from 6.8 Gbps to 7.9 (I used a slower
Interlagos box, both Dom0 and guest on upstream kernel, on the same NUMA node,
running iperf 2.0.5, and the remote end was a bare metal box on the same 10Gb
switch)
Based on my investigations the packet get only copied if it is delivered to
Dom0 stack, which is due to this [2] patch. That's a bit unfortunate, but
luckily it doesn't cause a major regression for this usecase. In the future
we should try to eliminate that copy somehow.
There are a few spinoff tasks which will be addressed in separate patches:
- grant copy the header directly instead of map and memcpy. This should help
  us avoiding TLB flushing
- use something else than ballooned pages
- fix grant map to use page->index properly
I will run some more extensive tests, but some basic XenRT tests were already
passed with good results.
I've tried to broke it down to smaller patches, with mixed results, so I
welcome suggestions on that part as well:
1: Introduce TX grant map definitions
2: Change TX path from grant copy to mapping
3: Remove old TX grant copy definitons and fix indentations
4: Change RX path for mapped SKB fragments
5: Add stat counters for zerocopy
6: Handle guests with too many frags
7: Add stat counters for frag_list skbs
8: Timeout packets in RX path
9: Aggregate TX unmap operations

v2: I've fixed some smaller things, see the individual patches. I've added a
few new stat counters, and handling the important use case when an older guest
sends lots of slots. Instead of delayed copy now we timeout packets on the RX
path, based on the assumption that otherwise packets should get stucked
anywhere else. Finally some unmap batching to avoid too much TLB flush

v3: Apart from fixing a few things mentioned in responses the important change
is the use the hypercall directly for grant [un]mapping, therefore we can
avoid m2p override.

v4: Now we are using a new grant mapping API to avoid m2p_override. The RX queue
timeout logic changed also.

[1] http://lwn.net/Articles/491522/
[2] https://lkml.org/lkml/2012/7/20/363

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-15 15:16   ` Zoltan Kiss
  2014-01-16  0:00   ` Wei Liu
  2014-01-14 20:39 ` [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping Zoltan Kiss
                   ` (7 subsequent siblings)
  8 siblings, 2 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

This patch contains the new definitions necessary for grant mapping.

v2:
- move unmapping to separate thread. The NAPI instance has to be scheduled
  even from thread context, which can cause huge delays
- that causes unfortunately bigger struct xenvif
- store grant handle after checking validity

v3:
- fix comment in xenvif_tx_dealloc_action()
- call unmap hypercall directly instead of gnttab_unmap_refs(), which does
  unnecessary m2p_override. Also remove pages_to_[un]map members
- BUG() if grant_tx_handle corrupted

v4:
- fix indentations and comments
- use bool for tx_dealloc_work_todo
- BUG() if grant_tx_handle corrupted - now really :)
- go back to gnttab_unmap_refs, now we rely on API changes

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

---
 drivers/net/xen-netback/common.h    |   30 +++++-
 drivers/net/xen-netback/interface.c |    1 +
 drivers/net/xen-netback/netback.c   |  171 +++++++++++++++++++++++++++++++++++
 3 files changed, 201 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index c955fc3..3e5ca11 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -79,6 +79,11 @@ struct pending_tx_info {
 				  * if it is head of one or more tx
 				  * reqs
 				  */
+	/* callback data for released SKBs. The	callback is always
+	 * xenvif_zerocopy_callback, ctx points to the next fragment, desc
+	 * contains the pending_idx
+	 */
+	struct ubuf_info callback_struct;
 };
 
 #define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
@@ -108,6 +113,8 @@ struct xenvif_rx_meta {
  */
 #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
 
+#define NETBACK_INVALID_HANDLE -1
+
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
@@ -126,13 +133,26 @@ struct xenvif {
 	pending_ring_idx_t pending_cons;
 	u16 pending_ring[MAX_PENDING_REQS];
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
+	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
 	/* Coalescing tx requests before copying makes number of grant
 	 * copy ops greater or equal to number of slots required. In
 	 * worst case a tx request consumes 2 gnttab_copy.
 	 */
 	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
-
+	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
+	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
+	struct page *pages_to_map[MAX_PENDING_REQS];
+	struct page *pages_to_unmap[MAX_PENDING_REQS];
+
+	spinlock_t dealloc_lock;
+	spinlock_t response_lock;
+	pending_ring_idx_t dealloc_prod;
+	pending_ring_idx_t dealloc_cons;
+	u16 dealloc_ring[MAX_PENDING_REQS];
+	struct task_struct *dealloc_task;
+	wait_queue_head_t dealloc_wq;
 
 	/* Use kthread for guest RX */
 	struct task_struct *task;
@@ -222,6 +242,8 @@ int xenvif_tx_action(struct xenvif *vif, int budget);
 int xenvif_kthread(void *data);
 void xenvif_kick_thread(struct xenvif *vif);
 
+int xenvif_dealloc_kthread(void *data);
+
 /* Determine whether the needed number of slots (req) are available,
  * and set req_event if not.
  */
@@ -229,6 +251,12 @@ bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed);
 
 void xenvif_stop_queue(struct xenvif *vif);
 
+/* Callback from stack when TX packet can be released */
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
+
+/* Unmap a pending page, usually has to be called before xenvif_idx_release */
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
+
 extern bool separate_tx_rx_irq;
 
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index b9de31e..a7855b3 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -38,6 +38,7 @@
 
 #include <xen/events.h>
 #include <asm/xen/hypercall.h>
+#include <xen/balloon.h>
 
 #define XENVIF_QUEUE_LENGTH 32
 #define XENVIF_NAPI_WEIGHT  64
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4f81ac0..b84d2b8 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -772,6 +772,21 @@ static struct page *xenvif_alloc_page(struct xenvif *vif,
 	return page;
 }
 
+static inline void xenvif_tx_create_gop(struct xenvif *vif,
+					u16 pending_idx,
+					struct xen_netif_tx_request *txp,
+					struct gnttab_map_grant_ref *gop)
+{
+	vif->pages_to_map[gop-vif->tx_map_ops] = vif->mmap_pages[pending_idx];
+	gnttab_set_map_op(gop, idx_to_kaddr(vif, pending_idx),
+			  GNTMAP_host_map | GNTMAP_readonly,
+			  txp->gref, vif->domid);
+
+	memcpy(&vif->pending_tx_info[pending_idx].req, txp,
+	       sizeof(*txp));
+
+}
+
 static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 					       struct sk_buff *skb,
 					       struct xen_netif_tx_request *txp,
@@ -1611,6 +1626,107 @@ static int xenvif_tx_submit(struct xenvif *vif)
 	return work_done;
 }
 
+void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
+{
+	unsigned long flags;
+	pending_ring_idx_t index;
+	u16 pending_idx = ubuf->desc;
+	struct pending_tx_info *temp =
+		container_of(ubuf, struct pending_tx_info, callback_struct);
+	struct xenvif *vif = container_of(temp - pending_idx,
+					  struct xenvif,
+					  pending_tx_info[0]);
+
+	spin_lock_irqsave(&vif->dealloc_lock, flags);
+	do {
+		pending_idx = ubuf->desc;
+		ubuf = (struct ubuf_info *) ubuf->ctx;
+		index = pending_index(vif->dealloc_prod);
+		vif->dealloc_ring[index] = pending_idx;
+		/* Sync with xenvif_tx_dealloc_action:
+		 * insert idx then incr producer.
+		 */
+		smp_wmb();
+		vif->dealloc_prod++;
+	} while (ubuf);
+	wake_up(&vif->dealloc_wq);
+	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+}
+
+static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
+{
+	struct gnttab_unmap_grant_ref *gop;
+	pending_ring_idx_t dc, dp;
+	u16 pending_idx, pending_idx_release[MAX_PENDING_REQS];
+	unsigned int i = 0;
+
+	dc = vif->dealloc_cons;
+	gop = vif->tx_unmap_ops;
+
+	/* Free up any grants we have finished using */
+	do {
+		dp = vif->dealloc_prod;
+
+		/* Ensure we see all indices enqueued by all
+		 * xenvif_zerocopy_callback().
+		 */
+		smp_rmb();
+
+		while (dc != dp) {
+			pending_idx =
+				vif->dealloc_ring[pending_index(dc++)];
+
+			/* Already unmapped? */
+			if (vif->grant_tx_handle[pending_idx] ==
+				NETBACK_INVALID_HANDLE) {
+				netdev_err(vif->dev,
+					   "Trying to unmap invalid handle! "
+					   "pending_idx: %x\n", pending_idx);
+				BUG();
+			}
+
+			pending_idx_release[gop-vif->tx_unmap_ops] =
+				pending_idx;
+			vif->pages_to_unmap[gop-vif->tx_unmap_ops] =
+				vif->mmap_pages[pending_idx];
+			gnttab_set_unmap_op(gop,
+					    idx_to_kaddr(vif, pending_idx),
+					    GNTMAP_host_map,
+					    vif->grant_tx_handle[pending_idx]);
+			vif->grant_tx_handle[pending_idx] =
+				NETBACK_INVALID_HANDLE;
+			++gop;
+		}
+
+	} while (dp != vif->dealloc_prod);
+
+	vif->dealloc_cons = dc;
+
+	if (gop - vif->tx_unmap_ops > 0) {
+		int ret;
+		ret = gnttab_unmap_refs(vif->tx_unmap_ops,
+					vif->pages_to_unmap,
+					gop - vif->tx_unmap_ops);
+		if (ret) {
+			netdev_err(vif->dev, "Unmap fail: nr_ops %x ret %d\n",
+				   gop - vif->tx_unmap_ops, ret);
+			for (i = 0; i < gop - vif->tx_unmap_ops; ++i) {
+				netdev_err(vif->dev,
+					   " host_addr: %llx handle: %x status: %d\n",
+					   gop[i].host_addr,
+					   gop[i].handle,
+					   gop[i].status);
+			}
+			BUG();
+		}
+	}
+
+	for (i = 0; i < gop - vif->tx_unmap_ops; ++i)
+		xenvif_idx_release(vif, pending_idx_release[i],
+				   XEN_NETIF_RSP_OKAY);
+}
+
+
 /* Called after netfront has transmitted */
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
@@ -1677,6 +1793,31 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 	vif->mmap_pages[pending_idx] = NULL;
 }
 
+void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
+{
+	int ret;
+	struct gnttab_unmap_grant_ref tx_unmap_op;
+
+	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
+		netdev_err(vif->dev,
+			   "Trying to unmap invalid handle! pending_idx: %x\n",
+			   pending_idx);
+		return;
+	}
+	gnttab_set_unmap_op(&tx_unmap_op,
+			    idx_to_kaddr(vif, pending_idx),
+			    GNTMAP_host_map,
+			    vif->grant_tx_handle[pending_idx]);
+	ret = gnttab_unmap_refs(&tx_unmap_op,
+				&vif->mmap_pages[pending_idx],
+				1);
+
+	ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+					&tx_unmap_op,
+					1);
+	BUG_ON(ret);
+	vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
+}
 
 static void make_tx_response(struct xenvif *vif,
 			     struct xen_netif_tx_request *txp,
@@ -1738,6 +1879,14 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
+static inline bool tx_dealloc_work_todo(struct xenvif *vif)
+{
+	if (vif->dealloc_cons != vif->dealloc_prod)
+		return true;
+
+	return false;
+}
+
 void xenvif_unmap_frontend_rings(struct xenvif *vif)
 {
 	if (vif->tx.sring)
@@ -1826,6 +1975,28 @@ int xenvif_kthread(void *data)
 	return 0;
 }
 
+int xenvif_dealloc_kthread(void *data)
+{
+	struct xenvif *vif = data;
+
+	while (!kthread_should_stop()) {
+		wait_event_interruptible(vif->dealloc_wq,
+					 tx_dealloc_work_todo(vif) ||
+					 kthread_should_stop());
+		if (kthread_should_stop())
+			break;
+
+		xenvif_tx_dealloc_action(vif);
+		cond_resched();
+	}
+
+	/* Unmap anything remaining*/
+	if (tx_dealloc_work_todo(vif))
+		xenvif_tx_dealloc_action(vif);
+
+	return 0;
+}
+
 static int __init netback_init(void)
 {
 	int rc = 0;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-16  0:01   ` Wei Liu
  2014-01-14 20:39 ` [PATCH net-next v4 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations Zoltan Kiss
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

This patch changes the grant copy on the TX patch to grant mapping

v2:
- delete branch for handling fragmented packets fit PKT_PROT_LEN sized first
  request
- mark the effect of using ballooned pages in a comment
- place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
  before netif_receive_skb, and mark the importance of it
- grab dealloc_lock before __napi_complete to avoid contention with the
  callback's napi_schedule
- handle fragmented packets where first request < PKT_PROT_LEN
- fix up error path when checksum_setup failed
- check before teardown for pending grants, and start complain if they are
  there after 10 second

v3:
- delete a surplus checking from tx_action
- remove stray line
- squash xenvif_idx_unmap changes into the first patch
- init spinlocks
- call map hypercall directly instead of gnttab_map_refs()
- fix unmapping timeout in xenvif_free()

v4:
- fix indentations and comments
- handle errors of set_phys_to_machine
- go back to gnttab_map_refs instead of direct hypercall. Now we rely on the
  modified API

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/interface.c |   60 +++++++-
 drivers/net/xen-netback/netback.c   |  256 ++++++++++++++---------------------
 2 files changed, 159 insertions(+), 157 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index a7855b3..1e0bf71 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -123,7 +123,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	BUG_ON(skb->dev != dev);
 
 	/* Drop the packet if vif is not ready */
-	if (vif->task == NULL || !xenvif_schedulable(vif))
+	if (vif->task == NULL ||
+	    vif->dealloc_task == NULL ||
+	    !xenvif_schedulable(vif))
 		goto drop;
 
 	/* At best we'll need one slot for the header and one for each
@@ -345,8 +347,26 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif->pending_prod = MAX_PENDING_REQS;
 	for (i = 0; i < MAX_PENDING_REQS; i++)
 		vif->pending_ring[i] = i;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		vif->mmap_pages[i] = NULL;
+	spin_lock_init(&vif->dealloc_lock);
+	spin_lock_init(&vif->response_lock);
+	/* If ballooning is disabled, this will consume real memory, so you
+	 * better enable it. The long term solution would be to use just a
+	 * bunch of valid page descriptors, without dependency on ballooning
+	 */
+	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
+				       vif->mmap_pages,
+				       false);
+	if (err) {
+		netdev_err(dev, "Could not reserve mmap_pages\n");
+		return NULL;
+	}
+	for (i = 0; i < MAX_PENDING_REQS; i++) {
+		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
+			{ .callback = xenvif_zerocopy_callback,
+			  .ctx = NULL,
+			  .desc = i };
+		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
+	}
 
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
@@ -390,6 +410,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		goto err;
 
 	init_waitqueue_head(&vif->wq);
+	init_waitqueue_head(&vif->dealloc_wq);
 
 	if (tx_evtchn == rx_evtchn) {
 		/* feature-split-event-channels == 0 */
@@ -431,6 +452,16 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		goto err_rx_unbind;
 	}
 
+	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
+					   (void *)vif,
+					   "%s-dealloc",
+					   vif->dev->name);
+	if (IS_ERR(vif->dealloc_task)) {
+		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
+		err = PTR_ERR(vif->dealloc_task);
+		goto err_rx_unbind;
+	}
+
 	vif->task = task;
 
 	rtnl_lock();
@@ -443,6 +474,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	rtnl_unlock();
 
 	wake_up_process(vif->task);
+	wake_up_process(vif->dealloc_task);
 
 	return 0;
 
@@ -480,6 +512,11 @@ void xenvif_disconnect(struct xenvif *vif)
 		vif->task = NULL;
 	}
 
+	if (vif->dealloc_task) {
+		kthread_stop(vif->dealloc_task);
+		vif->dealloc_task = NULL;
+	}
+
 	if (vif->tx_irq) {
 		if (vif->tx_irq == vif->rx_irq)
 			unbind_from_irqhandler(vif->tx_irq, vif);
@@ -495,6 +532,23 @@ void xenvif_disconnect(struct xenvif *vif)
 
 void xenvif_free(struct xenvif *vif)
 {
+	int i, unmap_timeout = 0;
+
+	for (i = 0; i < MAX_PENDING_REQS; ++i) {
+		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
+			unmap_timeout++;
+			schedule_timeout(msecs_to_jiffies(1000));
+			if (unmap_timeout > 9 &&
+			    net_ratelimit())
+				netdev_err(vif->dev,
+					   "Page still granted! Index: %x\n",
+					   i);
+			i = -1;
+		}
+	}
+
+	free_xenballooned_pages(MAX_PENDING_REQS, vif->mmap_pages);
+
 	netif_napi_del(&vif->napi);
 
 	unregister_netdev(vif->dev);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 7050f63..e73af87 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -645,9 +645,12 @@ static void xenvif_tx_err(struct xenvif *vif,
 			  struct xen_netif_tx_request *txp, RING_IDX end)
 {
 	RING_IDX cons = vif->tx.req_cons;
+	unsigned long flags;
 
 	do {
+		spin_lock_irqsave(&vif->response_lock, flags);
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
+		spin_unlock_irqrestore(&vif->response_lock, flags);
 		if (cons == end)
 			break;
 		txp = RING_GET_REQUEST(&vif->tx, cons++);
@@ -787,10 +790,10 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif,
 
 }
 
-static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
-					       struct sk_buff *skb,
-					       struct xen_netif_tx_request *txp,
-					       struct gnttab_copy *gop)
+static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
+							struct sk_buff *skb,
+							struct xen_netif_tx_request *txp,
+							struct gnttab_map_grant_ref *gop)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
@@ -811,83 +814,12 @@ static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
-	/* Coalesce tx requests, at this point the packet passed in
-	 * should be <= 64K. Any packets larger than 64K have been
-	 * handled in xenvif_count_requests().
-	 */
-	for (shinfo->nr_frags = slot = start; slot < nr_slots;
-	     shinfo->nr_frags++) {
-		struct pending_tx_info *pending_tx_info =
-			vif->pending_tx_info;
-
-		page = alloc_page(GFP_ATOMIC|__GFP_COLD);
-		if (!page)
-			goto err;
-
-		dst_offset = 0;
-		first = NULL;
-		while (dst_offset < PAGE_SIZE && slot < nr_slots) {
-			gop->flags = GNTCOPY_source_gref;
-
-			gop->source.u.ref = txp->gref;
-			gop->source.domid = vif->domid;
-			gop->source.offset = txp->offset;
-
-			gop->dest.domid = DOMID_SELF;
-
-			gop->dest.offset = dst_offset;
-			gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-
-			if (dst_offset + txp->size > PAGE_SIZE) {
-				/* This page can only merge a portion
-				 * of tx request. Do not increment any
-				 * pointer / counter here. The txp
-				 * will be dealt with in future
-				 * rounds, eventually hitting the
-				 * `else` branch.
-				 */
-				gop->len = PAGE_SIZE - dst_offset;
-				txp->offset += gop->len;
-				txp->size -= gop->len;
-				dst_offset += gop->len; /* quit loop */
-			} else {
-				/* This tx request can be merged in the page */
-				gop->len = txp->size;
-				dst_offset += gop->len;
-
+	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
+	     shinfo->nr_frags++, txp++, gop++) {
 				index = pending_index(vif->pending_cons++);
-
 				pending_idx = vif->pending_ring[index];
-
-				memcpy(&pending_tx_info[pending_idx].req, txp,
-				       sizeof(*txp));
-
-				/* Poison these fields, corresponding
-				 * fields for head tx req will be set
-				 * to correct values after the loop.
-				 */
-				vif->mmap_pages[pending_idx] = (void *)(~0UL);
-				pending_tx_info[pending_idx].head =
-					INVALID_PENDING_RING_IDX;
-
-				if (!first) {
-					first = &pending_tx_info[pending_idx];
-					start_idx = index;
-					head_idx = pending_idx;
-				}
-
-				txp++;
-				slot++;
-			}
-
-			gop++;
-		}
-
-		first->req.offset = 0;
-		first->req.size = dst_offset;
-		first->head = start_idx;
-		vif->mmap_pages[head_idx] = page;
-		frag_set_pending_idx(&frags[shinfo->nr_frags], head_idx);
+		xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
 
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
@@ -909,9 +841,9 @@ err:
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
 			       struct sk_buff *skb,
-			       struct gnttab_copy **gopp)
+			       struct gnttab_map_grant_ref **gopp)
 {
-	struct gnttab_copy *gop = *gopp;
+	struct gnttab_map_grant_ref *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	struct pending_tx_info *tx_info;
@@ -923,6 +855,17 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	err = gop->status;
 	if (unlikely(err))
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
+	else {
+		if (vif->grant_tx_handle[pending_idx] !=
+		    NETBACK_INVALID_HANDLE) {
+			netdev_err(vif->dev,
+				   "Stale mapped handle! pending_idx %x handle %x\n",
+				   pending_idx,
+				   vif->grant_tx_handle[pending_idx]);
+			BUG();
+		}
+		vif->grant_tx_handle[pending_idx] = gop->handle;
+	}
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
@@ -936,18 +879,24 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-		do {
 			newerr = (++gop)->status;
-			if (newerr)
-				break;
-			peek = vif->pending_ring[pending_index(++head)];
-		} while (!pending_tx_is_head(vif, peek));
 
 		if (likely(!newerr)) {
+			if (vif->grant_tx_handle[pending_idx] !=
+			    NETBACK_INVALID_HANDLE) {
+				netdev_err(vif->dev,
+					   "Stale mapped handle! pending_idx %x handle %x\n",
+					   pending_idx,
+					   vif->grant_tx_handle[pending_idx]);
+				xenvif_fatal_tx_err(vif);
+			}
+			vif->grant_tx_handle[pending_idx] = gop->handle;
 			/* Had a previous error? Invalidate this fragment. */
-			if (unlikely(err))
+			if (unlikely(err)) {
+				xenvif_idx_unmap(vif, pending_idx);
 				xenvif_idx_release(vif, pending_idx,
 						   XEN_NETIF_RSP_OKAY);
+			}
 			continue;
 		}
 
@@ -960,9 +909,11 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
+		xenvif_idx_unmap(vif, pending_idx);
 		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xenvif_idx_unmap(vif, pending_idx);
 			xenvif_idx_release(vif, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
@@ -975,7 +926,9 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	return err;
 }
 
-static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif *vif,
+			      struct sk_buff *skb,
+			      u16 prev_pending_idx)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -989,6 +942,17 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 
 		pending_idx = frag_get_pending_idx(frag);
 
+		/* If this is not the first frag, chain it to the previous*/
+		if (unlikely(prev_pending_idx == INVALID_PENDING_IDX))
+			skb_shinfo(skb)->destructor_arg =
+				&vif->pending_tx_info[pending_idx].callback_struct;
+		else if (likely(pending_idx != prev_pending_idx))
+			vif->pending_tx_info[prev_pending_idx].callback_struct.ctx =
+				&(vif->pending_tx_info[pending_idx].callback_struct);
+
+		vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
+		prev_pending_idx = pending_idx;
+
 		txp = &vif->pending_tx_info[pending_idx].req;
 		page = virt_to_page(idx_to_kaddr(vif, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
@@ -996,10 +960,15 @@ static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
-		/* Take an extra reference to offset xenvif_idx_release */
+		/* Take an extra reference to offset network stack's put_page */
 		get_page(vif->mmap_pages[pending_idx]);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
 	}
+	/* FIXME: __skb_fill_page_desc set this to true because page->pfmemalloc
+	 * overlaps with "index", and "mapping" is not set. I think mapping
+	 * should be set. If delivered to local stack, it would drop this
+	 * skb in sk_filter unless the socket has the right to use it.
+	 */
+	skb->pfmemalloc	= false;
 }
 
 static int xenvif_get_extras(struct xenvif *vif,
@@ -1372,7 +1341,7 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 
 static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops, *request_gop;
+	struct gnttab_map_grant_ref *gop = vif->tx_map_ops, *request_gop;
 	struct sk_buff *skb;
 	int ret;
 
@@ -1480,30 +1449,10 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			}
 		}
 
-		/* XXX could copy straight to head */
-		page = xenvif_alloc_page(vif, pending_idx);
-		if (!page) {
-			kfree_skb(skb);
-			xenvif_tx_err(vif, &txreq, idx);
-			break;
-		}
-
-		gop->source.u.ref = txreq.gref;
-		gop->source.domid = vif->domid;
-		gop->source.offset = txreq.offset;
-
-		gop->dest.u.gmfn = virt_to_mfn(page_address(page));
-		gop->dest.domid = DOMID_SELF;
-		gop->dest.offset = txreq.offset;
-
-		gop->len = txreq.size;
-		gop->flags = GNTCOPY_source_gref;
+		xenvif_tx_create_gop(vif, pending_idx, &txreq, gop);
 
 		gop++;
 
-		memcpy(&vif->pending_tx_info[pending_idx].req,
-		       &txreq, sizeof(txreq));
-		vif->pending_tx_info[pending_idx].head = index;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1532,17 +1481,17 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 
 		vif->tx.req_cons = idx;
 
-		if ((gop-vif->tx_copy_ops) >= ARRAY_SIZE(vif->tx_copy_ops))
+		if ((gop-vif->tx_map_ops) >= ARRAY_SIZE(vif->tx_map_ops))
 			break;
 	}
 
-	return gop - vif->tx_copy_ops;
+	return gop - vif->tx_map_ops;
 }
 
 
 static int xenvif_tx_submit(struct xenvif *vif)
 {
-	struct gnttab_copy *gop = vif->tx_copy_ops;
+	struct gnttab_map_grant_ref *gop = vif->tx_map_ops;
 	struct sk_buff *skb;
 	int work_done = 0;
 
@@ -1566,12 +1515,17 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		memcpy(skb->data,
 		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
 		       data_len);
+		vif->pending_tx_info[pending_idx].callback_struct.ctx = NULL;
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
 			txp->offset += data_len;
 			txp->size -= data_len;
+			skb_shinfo(skb)->destructor_arg =
+				&vif->pending_tx_info[pending_idx].callback_struct;
 		} else {
 			/* Schedule a response immediately. */
+			skb_shinfo(skb)->destructor_arg = NULL;
+			xenvif_idx_unmap(vif, pending_idx);
 			xenvif_idx_release(vif, pending_idx,
 					   XEN_NETIF_RSP_OKAY);
 		}
@@ -1581,7 +1535,11 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xenvif_fill_frags(vif, skb);
+		xenvif_fill_frags(vif,
+				  skb,
+				  skb_shinfo(skb)->destructor_arg ?
+				  pending_idx :
+				  INVALID_PENDING_IDX);
 
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
@@ -1595,6 +1553,11 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		if (checksum_setup(vif, skb)) {
 			netdev_dbg(vif->dev,
 				   "Can't setup checksum in net_tx_action\n");
+			/* We have to set this flag so the dealloc thread can
+			 * send the slots back
+			 */
+			if (skb_shinfo(skb)->destructor_arg)
+				skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			kfree_skb(skb);
 			continue;
 		}
@@ -1620,6 +1583,14 @@ static int xenvif_tx_submit(struct xenvif *vif)
 
 		work_done++;
 
+		/* Set this flag right before netif_receive_skb, otherwise
+		 * someone might think this packet already left netback, and
+		 * do a skb_copy_ubufs while we are still in control of the
+		 * skb. E.g. the __pskb_pull_tail earlier can do such thing.
+		 */
+		if (skb_shinfo(skb)->destructor_arg)
+			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+
 		netif_receive_skb(skb);
 	}
 
@@ -1731,7 +1702,7 @@ static inline void xenvif_tx_dealloc_action(struct xenvif *vif)
 int xenvif_tx_action(struct xenvif *vif, int budget)
 {
 	unsigned nr_gops;
-	int work_done;
+	int work_done, ret;
 
 	if (unlikely(!tx_work_todo(vif)))
 		return 0;
@@ -1741,7 +1712,10 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
 	if (nr_gops == 0)
 		return 0;
 
-	gnttab_batch_copy(vif->tx_copy_ops, nr_gops);
+	ret = gnttab_map_refs(vif->tx_map_ops,
+			      vif->pages_to_map,
+			      nr_gops);
+	BUG_ON(ret);
 
 	work_done = xenvif_tx_submit(vif);
 
@@ -1752,45 +1726,19 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 			       u8 status)
 {
 	struct pending_tx_info *pending_tx_info;
-	pending_ring_idx_t head;
+	pending_ring_idx_t index;
 	u16 peek; /* peek into next tx request */
+	unsigned long flags;
 
-	BUG_ON(vif->mmap_pages[pending_idx] == (void *)(~0UL));
-
-	/* Already complete? */
-	if (vif->mmap_pages[pending_idx] == NULL)
-		return;
-
-	pending_tx_info = &vif->pending_tx_info[pending_idx];
-
-	head = pending_tx_info->head;
-
-	BUG_ON(!pending_tx_is_head(vif, head));
-	BUG_ON(vif->pending_ring[pending_index(head)] != pending_idx);
-
-	do {
-		pending_ring_idx_t index;
-		pending_ring_idx_t idx = pending_index(head);
-		u16 info_idx = vif->pending_ring[idx];
-
-		pending_tx_info = &vif->pending_tx_info[info_idx];
+		pending_tx_info = &vif->pending_tx_info[pending_idx];
+		spin_lock_irqsave(&vif->response_lock, flags);
 		make_tx_response(vif, &pending_tx_info->req, status);
-
-		/* Setting any number other than
-		 * INVALID_PENDING_RING_IDX indicates this slot is
-		 * starting a new packet / ending a previous packet.
-		 */
-		pending_tx_info->head = 0;
-
-		index = pending_index(vif->pending_prod++);
-		vif->pending_ring[index] = vif->pending_ring[info_idx];
-
-		peek = vif->pending_ring[pending_index(++head)];
-
-	} while (!pending_tx_is_head(vif, peek));
-
-	put_page(vif->mmap_pages[pending_idx]);
-	vif->mmap_pages[pending_idx] = NULL;
+		index = pending_index(vif->pending_prod);
+		vif->pending_ring[index] = pending_idx;
+		/* TX shouldn't use the index before we give it back here */
+		mb();
+		vif->pending_prod++;
+		spin_unlock_irqrestore(&vif->response_lock, flags);
 }
 
 void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-16  0:02   ` Wei Liu
  2014-01-14 20:39 ` [PATCH net-next v4 4/9] xen-netback: Change RX path for mapped SKB fragments Zoltan Kiss
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These became obsolate with grant mapping. I've left intentionally the
indentations in this way, to improve readability of previous patches.

v2:
- move the indentation fixup patch here

v4:
- indentation fixes

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h  |   37 +------------------
 drivers/net/xen-netback/netback.c |   72 ++++++++-----------------------------
 2 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index f35a3ce..2b1cd83 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -46,39 +46,9 @@
 #include <xen/xenbus.h>
 
 typedef unsigned int pending_ring_idx_t;
-#define INVALID_PENDING_RING_IDX (~0U)
 
-/* For the head field in pending_tx_info: it is used to indicate
- * whether this tx info is the head of one or more coalesced requests.
- *
- * When head != INVALID_PENDING_RING_IDX, it means the start of a new
- * tx requests queue and the end of previous queue.
- *
- * An example sequence of head fields (I = INVALID_PENDING_RING_IDX):
- *
- * ...|0 I I I|5 I|9 I I I|...
- * -->|<-INUSE----------------
- *
- * After consuming the first slot(s) we have:
- *
- * ...|V V V V|5 I|9 I I I|...
- * -----FREE->|<-INUSE--------
- *
- * where V stands for "valid pending ring index". Any number other
- * than INVALID_PENDING_RING_IDX is OK. These entries are considered
- * free and can contain any number other than
- * INVALID_PENDING_RING_IDX. In practice we use 0.
- *
- * The in use non-INVALID_PENDING_RING_IDX (say 0, 5 and 9 in the
- * above example) number is the index into pending_tx_info and
- * mmap_pages arrays.
- */
 struct pending_tx_info {
-	struct xen_netif_tx_request req; /* coalesced tx request */
-	pending_ring_idx_t head; /* head != INVALID_PENDING_RING_IDX
-				  * if it is head of one or more tx
-				  * reqs
-				  */
+	struct xen_netif_tx_request req; /* tx request */
 	/* callback data for released SKBs. The	callback is always
 	 * xenvif_zerocopy_callback, ctx points to the next fragment, desc
 	 * contains the pending_idx
@@ -135,11 +105,6 @@ struct xenvif {
 	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
 	grant_handle_t grant_tx_handle[MAX_PENDING_REQS];
 
-	/* Coalescing tx requests before copying makes number of grant
-	 * copy ops greater or equal to number of slots required. In
-	 * worst case a tx request consumes 2 gnttab_copy.
-	 */
-	struct gnttab_copy tx_copy_ops[2*MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 5724468..f74fa92 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -71,16 +71,6 @@ module_param(fatal_skb_slots, uint, 0444);
  */
 #define XEN_NETBK_LEGACY_SLOTS_MAX XEN_NETIF_NR_SLOTS_MIN
 
-/*
- * If head != INVALID_PENDING_RING_IDX, it means this tx request is head of
- * one or more merged tx requests, otherwise it is the continuation of
- * previous tx request.
- */
-static inline int pending_tx_is_head(struct xenvif *vif, RING_IDX idx)
-{
-	return vif->pending_tx_info[idx].head != INVALID_PENDING_RING_IDX;
-}
-
 static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 			       u8 status);
 
@@ -762,19 +752,6 @@ static int xenvif_count_requests(struct xenvif *vif,
 	return slots;
 }
 
-static struct page *xenvif_alloc_page(struct xenvif *vif,
-				      u16 pending_idx)
-{
-	struct page *page;
-
-	page = alloc_page(GFP_ATOMIC|__GFP_COLD);
-	if (!page)
-		return NULL;
-	vif->mmap_pages[pending_idx] = page;
-
-	return page;
-}
-
 static inline void xenvif_tx_create_gop(struct xenvif *vif,
 					u16 pending_idx,
 					struct xen_netif_tx_request *txp,
@@ -797,13 +774,9 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
 	u16 pending_idx = *((u16 *)skb->data);
-	u16 head_idx = 0;
-	int slot, start;
-	struct page *page;
-	pending_ring_idx_t index, start_idx = 0;
-	uint16_t dst_offset;
+	int start;
+	pending_ring_idx_t index;
 	unsigned int nr_slots;
-	struct pending_tx_info *first = NULL;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
@@ -815,8 +788,8 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 
 	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
 	     shinfo->nr_frags++, txp++, gop++) {
-				index = pending_index(vif->pending_cons++);
-				pending_idx = vif->pending_ring[index];
+		index = pending_index(vif->pending_cons++);
+		pending_idx = vif->pending_ring[index];
 		xenvif_tx_create_gop(vif, pending_idx, txp, gop);
 		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
 	}
@@ -824,18 +797,6 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
 
 	return gop;
-err:
-	/* Unwind, freeing all pages and sending error responses. */
-	while (shinfo->nr_frags-- > start) {
-		xenvif_idx_release(vif,
-				frag_get_pending_idx(&frags[shinfo->nr_frags]),
-				XEN_NETIF_RSP_ERROR);
-	}
-	/* The head too, if necessary. */
-	if (start)
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_ERROR);
-
-	return NULL;
 }
 
 static int xenvif_tx_check_gop(struct xenvif *vif,
@@ -848,7 +809,6 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
-	u16 peek; /* peek into next tx request */
 
 	/* Check status of header. */
 	err = gop->status;
@@ -873,14 +833,12 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
-		pending_ring_idx_t head;
 
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
 		tx_info = &vif->pending_tx_info[pending_idx];
-		head = tx_info->head;
 
 		/* Check error status: if okay then remember grant handle. */
-			newerr = (++gop)->status;
+		newerr = (++gop)->status;
 
 		if (likely(!newerr)) {
 			if (vif->grant_tx_handle[pending_idx] !=
@@ -1353,7 +1311,6 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 	       (skb_queue_len(&vif->tx_queue) < budget)) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
-		struct page *page;
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
 		u16 pending_idx;
 		RING_IDX idx;
@@ -1728,18 +1685,17 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
-	u16 peek; /* peek into next tx request */
 	unsigned long flags;
 
-		pending_tx_info = &vif->pending_tx_info[pending_idx];
-		spin_lock_irqsave(&vif->response_lock, flags);
-		make_tx_response(vif, &pending_tx_info->req, status);
-		index = pending_index(vif->pending_prod);
-		vif->pending_ring[index] = pending_idx;
-		/* TX shouldn't use the index before we give it back here */
-		mb();
-		vif->pending_prod++;
-		spin_unlock_irqrestore(&vif->response_lock, flags);
+	pending_tx_info = &vif->pending_tx_info[pending_idx];
+	spin_lock_irqsave(&vif->response_lock, flags);
+	make_tx_response(vif, &pending_tx_info->req, status);
+	index = pending_index(vif->pending_prod);
+	vif->pending_ring[index] = pending_idx;
+	/* TX shouldn't use the index before we give it back here */
+	mb();
+	vif->pending_prod++;
+	spin_unlock_irqrestore(&vif->response_lock, flags);
 }
 
 void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 4/9] xen-netback: Change RX path for mapped SKB fragments
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (2 preceding siblings ...)
  2014-01-14 20:39 ` [PATCH net-next v4 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 5/9] xen-netback: Add stat counters for zerocopy Zoltan Kiss
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

RX path need to know if the SKB fragments are stored on pages from another
domain.

v4:
- indentation fixes

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/netback.c |   46 +++++++++++++++++++++++++++++++++----
 1 file changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index f74fa92..d43444d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -226,7 +226,9 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
 static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 				 struct netrx_pending_operations *npo,
 				 struct page *page, unsigned long size,
-				 unsigned long offset, int *head)
+				 unsigned long offset, int *head,
+				 struct xenvif *foreign_vif,
+				 grant_ref_t foreign_gref)
 {
 	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
@@ -268,8 +270,15 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->flags = GNTCOPY_dest_gref;
 		copy_gop->len = bytes;
 
-		copy_gop->source.domid = DOMID_SELF;
-		copy_gop->source.u.gmfn = virt_to_mfn(page_address(page));
+		if (foreign_vif) {
+			copy_gop->source.domid = foreign_vif->domid;
+			copy_gop->source.u.ref = foreign_gref;
+			copy_gop->flags |= GNTCOPY_source_gref;
+		} else {
+			copy_gop->source.domid = DOMID_SELF;
+			copy_gop->source.u.gmfn =
+				virt_to_mfn(page_address(page));
+		}
 		copy_gop->source.offset = offset;
 
 		copy_gop->dest.domid = vif->domid;
@@ -330,6 +339,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	int old_meta_prod;
 	int gso_type;
 	int gso_size;
+	struct ubuf_info *ubuf = skb_shinfo(skb)->destructor_arg;
+	grant_ref_t foreign_grefs[MAX_SKB_FRAGS];
+	struct xenvif *foreign_vif = NULL;
 
 	old_meta_prod = npo->meta_prod;
 
@@ -370,6 +382,26 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
 
+	if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
+		 (ubuf->callback == &xenvif_zerocopy_callback)) {
+		u16 pending_idx = ubuf->desc;
+		int i = 0;
+		struct pending_tx_info *temp =
+			container_of(ubuf,
+				     struct pending_tx_info,
+				     callback_struct);
+		foreign_vif =
+			container_of(temp - pending_idx,
+				     struct xenvif,
+				     pending_tx_info[0]);
+		do {
+			pending_idx = ubuf->desc;
+			foreign_grefs[i++] =
+				foreign_vif->pending_tx_info[pending_idx].req.gref;
+			ubuf = (struct ubuf_info *) ubuf->ctx;
+		} while (ubuf);
+	}
+
 	data = skb->data;
 	while (data < skb_tail_pointer(skb)) {
 		unsigned int offset = offset_in_page(data);
@@ -379,7 +411,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 			len = skb_tail_pointer(skb) - data;
 
 		xenvif_gop_frag_copy(vif, skb, npo,
-				     virt_to_page(data), len, offset, &head);
+				     virt_to_page(data), len, offset, &head,
+				     NULL,
+				     0);
 		data += len;
 	}
 
@@ -388,7 +422,9 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
 				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
 				     skb_shinfo(skb)->frags[i].page_offset,
-				     &head);
+				     &head,
+				     foreign_vif,
+				     foreign_grefs[i]);
 	}
 
 	return npo->meta_prod - old_meta_prod;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 5/9] xen-netback: Add stat counters for zerocopy
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (3 preceding siblings ...)
  2014-01-14 20:39 ` [PATCH net-next v4 4/9] xen-netback: Change RX path for mapped SKB fragments Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These counters help determine how often the buffers had to be copied. Also
they help find out if packets are leaked, as if "sent != success + fail",
there are probably packets never freed up properly.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    3 +++
 drivers/net/xen-netback/interface.c |   15 +++++++++++++++
 drivers/net/xen-netback/netback.c   |    9 ++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 419e63c..e3c28ff 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -155,6 +155,9 @@ struct xenvif {
 
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
+	unsigned long tx_zerocopy_sent;
+	unsigned long tx_zerocopy_success;
+	unsigned long tx_zerocopy_fail;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index af5216f..75fe683 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -239,6 +239,21 @@ static const struct xenvif_stat {
 		"rx_gso_checksum_fixup",
 		offsetof(struct xenvif, rx_gso_checksum_fixup)
 	},
+	/* If (sent != success + fail), there are probably packets never
+	 * freed up properly!
+	 */
+	{
+		"tx_zerocopy_sent",
+		offsetof(struct xenvif, tx_zerocopy_sent),
+	},
+	{
+		"tx_zerocopy_success",
+		offsetof(struct xenvif, tx_zerocopy_success),
+	},
+	{
+		"tx_zerocopy_fail",
+		offsetof(struct xenvif, tx_zerocopy_fail)
+	},
 };
 
 static int xenvif_get_sset_count(struct net_device *dev, int string_set)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a1b03e4..e2dd565 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1611,8 +1611,10 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 		 * skb_copy_ubufs while we are still in control of the skb. E.g.
 		 * the __pskb_pull_tail earlier can do such thing.
 		 */
-		if (skb_shinfo(skb)->destructor_arg)
+		if (skb_shinfo(skb)->destructor_arg) {
 			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			vif->tx_zerocopy_sent++;
+		}
 
 		netif_receive_skb(skb);
 	}
@@ -1645,6 +1647,11 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 		napi_schedule(&vif->napi);
 	} while (ubuf);
 	spin_unlock_irqrestore(&vif->dealloc_lock, flags);
+
+	if (likely(zerocopy_success))
+		vif->tx_zerocopy_success++;
+	else
+		vif->tx_zerocopy_fail++;
 }
 
 static inline void xenvif_tx_action_dealloc(struct xenvif *vif)

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (4 preceding siblings ...)
  2014-01-14 20:39 ` [PATCH net-next v4 5/9] xen-netback: Add stat counters for zerocopy Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-16  0:03   ` Wei Liu
  2014-01-14 20:39 ` [PATCH net-next v4 7/9] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Xen network protocol had implicit dependency on MAX_SKB_FRAGS. Netback has to
handle guests sending up to XEN_NETBK_LEGACY_SLOTS_MAX slots. To achieve that:
- create a new skb
- map the leftover slots to its frags (no linear buffer here!)
- chain it to the previous through skb_shinfo(skb)->frag_list
- map them
- copy the whole stuff into a brand new skb and send it to the stack
- unmap the 2 old skb's pages

v3:
- adding extra check for frag number
- consolidate alloc_skb's into xenvif_alloc_skb()
- BUG_ON(frag_overflow > MAX_SKB_FRAGS)

v4:
- handle error of skb_copy_expand()

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>

---
 drivers/net/xen-netback/netback.c |  125 ++++++++++++++++++++++++++++++++++---
 1 file changed, 115 insertions(+), 10 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index c2b2597..345c6a2 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -802,6 +802,20 @@ static inline void xenvif_tx_create_gop(struct xenvif *vif,
 
 }
 
+static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
+{
+	struct sk_buff *skb =
+		alloc_skb(size + NET_SKB_PAD + NET_IP_ALIGN,
+			  GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(skb == NULL))
+		return NULL;
+
+	/* Packets passed to netif_rx() must have some headroom. */
+	skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+
+	return skb;
+}
+
 static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 							struct sk_buff *skb,
 							struct xen_netif_tx_request *txp,
@@ -812,11 +826,16 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 	u16 pending_idx = *((u16 *)skb->data);
 	int start;
 	pending_ring_idx_t index;
-	unsigned int nr_slots;
+	unsigned int nr_slots, frag_overflow = 0;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
 	 */
+	if (shinfo->nr_frags > MAX_SKB_FRAGS) {
+		frag_overflow = shinfo->nr_frags - MAX_SKB_FRAGS;
+		BUG_ON(frag_overflow > MAX_SKB_FRAGS);
+		shinfo->nr_frags = MAX_SKB_FRAGS;
+	}
 	nr_slots = shinfo->nr_frags;
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -832,6 +851,29 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
 
 	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
 
+	if (frag_overflow) {
+		struct sk_buff *nskb = xenvif_alloc_skb(0);
+		if (unlikely(nskb == NULL)) {
+			netdev_err(vif->dev,
+				   "Can't allocate the frag_list skb.\n");
+			return NULL;
+		}
+
+		shinfo = skb_shinfo(nskb);
+		frags = shinfo->frags;
+
+		for (shinfo->nr_frags = 0; shinfo->nr_frags < frag_overflow;
+		     shinfo->nr_frags++, txp++, gop++) {
+			index = pending_index(vif->pending_cons++);
+			pending_idx = vif->pending_ring[index];
+			xenvif_tx_create_gop(vif, pending_idx, txp, gop);
+			frag_set_pending_idx(&frags[shinfo->nr_frags],
+					     pending_idx);
+		}
+
+		skb_shinfo(skb)->frag_list = nskb;
+	}
+
 	return gop;
 }
 
@@ -845,6 +887,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	struct pending_tx_info *tx_info;
 	int nr_frags = shinfo->nr_frags;
 	int i, err, start;
+	struct sk_buff *first_skb = NULL;
 
 	/* Check status of header. */
 	err = gop->status;
@@ -867,6 +910,7 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 	/* Skip first skb fragment if it is on same page as header fragment. */
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
+check_frags:
 	for (i = start; i < nr_frags; i++) {
 		int j, newerr;
 
@@ -903,11 +947,20 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
 			continue;
-
 		/* First error: invalidate header and preceding fragments. */
-		pending_idx = *((u16 *)skb->data);
-		xenvif_idx_unmap(vif, pending_idx);
-		xenvif_idx_release(vif, pending_idx, XEN_NETIF_RSP_OKAY);
+		if (!first_skb) {
+			pending_idx = *((u16 *)skb->data);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif,
+					   pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		} else {
+			pending_idx = *((u16 *)first_skb->data);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif,
+					   pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		}
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
 			xenvif_idx_unmap(vif, pending_idx);
@@ -919,6 +972,32 @@ static int xenvif_tx_check_gop(struct xenvif *vif,
 		err = newerr;
 	}
 
+	if (shinfo->frag_list) {
+		first_skb = skb;
+		skb = shinfo->frag_list;
+		shinfo = skb_shinfo(skb);
+		nr_frags = shinfo->nr_frags;
+		start = 0;
+
+		goto check_frags;
+	}
+
+	/* There was a mapping error in the frag_list skb. We have to unmap
+	 * the first skb's frags
+	 */
+	if (first_skb && err) {
+		int j;
+		shinfo = skb_shinfo(first_skb);
+		pending_idx = *((u16 *)first_skb->data);
+		start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
+		for (j = start; j < shinfo->nr_frags; j++) {
+			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+			xenvif_idx_unmap(vif, pending_idx);
+			xenvif_idx_release(vif, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		}
+	}
+
 	*gopp = gop + 1;
 	return err;
 }
@@ -1422,8 +1501,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			    ret < XEN_NETBK_LEGACY_SLOTS_MAX) ?
 			PKT_PROT_LEN : txreq.size;
 
-		skb = alloc_skb(data_len + NET_SKB_PAD + NET_IP_ALIGN,
-				GFP_ATOMIC | __GFP_NOWARN);
+		skb = xenvif_alloc_skb(data_len);
 		if (unlikely(skb == NULL)) {
 			netdev_dbg(vif->dev,
 				   "Can't allocate a skb in start_xmit.\n");
@@ -1431,9 +1509,6 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif, int budget)
 			break;
 		}
 
-		/* Packets passed to netif_rx() must have some headroom. */
-		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
-
 		if (extras[XEN_NETIF_EXTRA_TYPE_GSO - 1].type) {
 			struct xen_netif_extra_info *gso;
 			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
@@ -1495,6 +1570,7 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
 		unsigned data_len;
+		struct sk_buff *nskb = NULL;
 
 		pending_idx = *((u16 *)skb->data);
 		txp = &vif->pending_tx_info[pending_idx].req;
@@ -1537,6 +1613,32 @@ static int xenvif_tx_submit(struct xenvif *vif)
 				  pending_idx :
 				  INVALID_PENDING_IDX);
 
+		if (skb_shinfo(skb)->frag_list) {
+			nskb = skb_shinfo(skb)->frag_list;
+			xenvif_fill_frags(vif, nskb, INVALID_PENDING_IDX);
+			skb->len += nskb->len;
+			skb->data_len += nskb->len;
+			skb->truesize += nskb->truesize;
+			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+			vif->tx_zerocopy_sent += 2;
+			nskb = skb;
+
+			skb = skb_copy_expand(skb,
+					      0,
+					      0,
+					      GFP_ATOMIC | __GFP_NOWARN);
+			if (!skb) {
+				netdev_dbg(vif->dev,
+					   "Can't consolidate skb with too many fragments\n");
+				if (skb_shinfo(nskb)->destructor_arg)
+					skb_shinfo(nskb)->tx_flags |=
+						SKBTX_DEV_ZEROCOPY;
+				kfree_skb(nskb);
+				continue;
+			}
+			skb_shinfo(skb)->destructor_arg = NULL;
+		}
 		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
 			int target = min_t(int, skb->len, PKT_PROT_LEN);
 			__pskb_pull_tail(skb, target - skb_headlen(skb));
@@ -1590,6 +1692,9 @@ static int xenvif_tx_submit(struct xenvif *vif)
 		}
 
 		netif_receive_skb(skb);
+
+		if (nskb)
+			kfree_skb(nskb);
 	}
 
 	return work_done;

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 7/9] xen-netback: Add stat counters for frag_list skbs
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (5 preceding siblings ...)
  2014-01-14 20:39 ` [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
  2014-01-14 20:39 ` [PATCH net-next v4 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
  8 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

These counters help determine how often the guest sends a packet with more
than MAX_SKB_FRAGS frags.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    1 +
 drivers/net/xen-netback/interface.c |    7 +++++++
 drivers/net/xen-netback/netback.c   |    1 +
 3 files changed, 9 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e3c28ff..c037efb 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -158,6 +158,7 @@ struct xenvif {
 	unsigned long tx_zerocopy_sent;
 	unsigned long tx_zerocopy_success;
 	unsigned long tx_zerocopy_fail;
+	unsigned long tx_frag_overflow;
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index ac27af3..b7daf8d 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -254,6 +254,13 @@ static const struct xenvif_stat {
 		"tx_zerocopy_fail",
 		offsetof(struct xenvif, tx_zerocopy_fail)
 	},
+	/* Number of packets exceeding MAX_SKB_FRAG slots. You should use
+	 * a guest with the same MAX_SKB_FRAG
+	 */
+	{
+		"tx_frag_overflow",
+		offsetof(struct xenvif, tx_frag_overflow)
+	},
 };
 
 static int xenvif_get_sset_count(struct net_device *dev, int string_set)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 9841429..4305965 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1656,6 +1656,7 @@ static int xenvif_tx_submit(struct xenvif *vif, int budget)
 			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
 			vif->tx_zerocopy_sent += 2;
+			vif->tx_frag_overflow++;
 			nskb = skb;
 
 			skb = skb_copy_expand(skb, 0, 0, GFP_ATOMIC | __GFP_NOWARN);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (6 preceding siblings ...)
  2014-01-14 20:39 ` [PATCH net-next v4 7/9] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  2014-01-16  0:03   ` Wei Liu
  2014-01-14 20:39 ` [PATCH net-next v4 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss
  8 siblings, 1 reply; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

A malicious or buggy guest can leave its queue filled indefinitely, in which
case qdisc start to queue packets for that VIF. If those packets came from an
another guest, it can block its slots and prevent shutdown. To avoid that, we
make sure the queue is drained in every 10 seconds.
The QDisc queue in worst case takes 3 round to flush usually.

v3:
- remove stale debug log
- tie unmap timeout in xenvif_free to this timeout

v4:
- due to RX flow control changes now start_xmit doesn't drop the packets but
  place them on the internal queue. So the timer set rx_queue_purge and kick in
  the thread to drop the packets there
- we shoot down the timer if a previously filled internal queue drains
- adjust the teardown timeout as in worst case it can take more time now

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    6 ++++++
 drivers/net/xen-netback/interface.c |   24 ++++++++++++++++++++++--
 drivers/net/xen-netback/netback.c   |   23 ++++++++++++++++++++---
 3 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 109c29f..d1cd8ce 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -129,6 +129,9 @@ struct xenvif {
 	struct xen_netif_rx_back_ring rx;
 	struct sk_buff_head rx_queue;
 	RING_IDX rx_last_skb_slots;
+	bool rx_queue_purge;
+
+	struct timer_list wake_queue;
 
 	/* This array is allocated seperately as it is large */
 	struct gnttab_copy *grant_copy_op;
@@ -225,4 +228,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
 
 extern bool separate_tx_rx_irq;
 
+extern unsigned int rx_drain_timeout_msecs;
+extern unsigned int rx_drain_timeout_jiffies;
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index c531f6c..2616d51 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -114,6 +114,18 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static void xenvif_wake_queue(unsigned long data)
+{
+	struct xenvif *vif = (struct xenvif *)data;
+
+	if (netif_queue_stopped(vif->dev)) {
+		netdev_err(vif->dev, "draining TX queue\n");
+		vif->rx_queue_purge = true;
+		xenvif_kick_thread(vif);
+		netif_wake_queue(vif->dev);
+	}
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
@@ -143,8 +155,13 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 * then turn off the queue to give the ring a chance to
 	 * drain.
 	 */
-	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed))
+	if (!xenvif_rx_ring_slots_available(vif, min_slots_needed)) {
+		vif->wake_queue.function = xenvif_wake_queue;
+		vif->wake_queue.data = (unsigned long)vif;
 		xenvif_stop_queue(vif);
+		mod_timer(&vif->wake_queue,
+			jiffies + rx_drain_timeout_jiffies);
+	}
 
 	skb_queue_tail(&vif->rx_queue, skb);
 	xenvif_kick_thread(vif);
@@ -352,6 +369,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	init_timer(&vif->credit_timeout);
 	vif->credit_window_start = get_jiffies_64();
 
+	init_timer(&vif->wake_queue);
+
 	dev->netdev_ops	= &xenvif_netdev_ops;
 	dev->hw_features = NETIF_F_SG |
 		NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
@@ -529,6 +548,7 @@ void xenvif_disconnect(struct xenvif *vif)
 		xenvif_carrier_off(vif);
 
 	if (vif->task) {
+		del_timer_sync(&vif->wake_queue);
 		kthread_stop(vif->task);
 		vif->task = NULL;
 	}
@@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
 		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
 			unmap_timeout++;
 			schedule_timeout(msecs_to_jiffies(1000));
-			if (unmap_timeout > 9 &&
+			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&
 			    net_ratelimit())
 				netdev_err(vif->dev,
 					   "Page still granted! Index: %x\n",
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d2ccb55..1378abd 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -63,6 +63,13 @@ module_param(separate_tx_rx_irq, bool, 0644);
 static unsigned int fatal_skb_slots = FATAL_SKB_SLOTS_DEFAULT;
 module_param(fatal_skb_slots, uint, 0444);
 
+/* When guest ring is filled up, qdisc queues the packets for us, but we have
+ * to timeout them, otherwise other guests' packets can get stucked there
+ */
+unsigned int rx_drain_timeout_msecs = 10000;
+module_param(rx_drain_timeout_msecs, uint, 0444);
+unsigned int rx_drain_timeout_jiffies;
+
 /*
  * To avoid confusion, we define XEN_NETBK_LEGACY_SLOTS_MAX indicating
  * the maximum slots a valid packet can use. Now this value is defined
@@ -1919,8 +1926,9 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 
 static inline int rx_work_todo(struct xenvif *vif)
 {
-	return !skb_queue_empty(&vif->rx_queue) &&
-	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots);
+	return (!skb_queue_empty(&vif->rx_queue) &&
+	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots)) ||
+	       vif->rx_queue_purge;
 }
 
 static inline int tx_work_todo(struct xenvif *vif)
@@ -2011,12 +2019,19 @@ int xenvif_kthread(void *data)
 		if (kthread_should_stop())
 			break;
 
+		if (vif->rx_queue_purge) {
+			skb_queue_purge(&vif->rx_queue);
+			vif->rx_queue_purge = false;
+		}
+
 		if (!skb_queue_empty(&vif->rx_queue))
 			xenvif_rx_action(vif);
 
 		if (skb_queue_empty(&vif->rx_queue) &&
-		    netif_queue_stopped(vif->dev))
+		    netif_queue_stopped(vif->dev)) {
+			del_timer_sync(&vif->wake_queue);
 			xenvif_start_queue(vif);
+		}
 
 		cond_resched();
 	}
@@ -2067,6 +2082,8 @@ static int __init netback_init(void)
 	if (rc)
 		goto failed_init;
 
+	rx_drain_timeout_jiffies = msecs_to_jiffies(rx_drain_timeout_msecs);
+
 	return 0;
 
 failed_init:

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next v4 9/9] xen-netback: Aggregate TX unmap operations
  2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
                   ` (7 preceding siblings ...)
  2014-01-14 20:39 ` [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
@ 2014-01-14 20:39 ` Zoltan Kiss
  8 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-14 20:39 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

Unmapping causes TLB flushing, therefore we should make it in the largest
possible batches. However we shouldn't starve the guest for too long. So if
the guest has space for at least two big packets and we don't have at least a
quarter ring to unmap, delay it for at most 1 milisec.

v4:
- use bool for tx_dealloc_work_todo

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
 drivers/net/xen-netback/common.h    |    2 ++
 drivers/net/xen-netback/interface.c |    2 ++
 drivers/net/xen-netback/netback.c   |   31 ++++++++++++++++++++++++++++++-
 3 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 1594109..ce6b5b5 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -115,6 +115,8 @@ struct xenvif {
 	u16 dealloc_ring[MAX_PENDING_REQS];
 	struct task_struct *dealloc_task;
 	wait_queue_head_t dealloc_wq;
+	struct timer_list dealloc_delay;
+	bool dealloc_delay_timed_out;
 
 	/* Use kthread for guest RX */
 	struct task_struct *task;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 669bd55..1c34f56 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -406,6 +406,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 			  .desc = i };
 		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
 	}
+	init_timer(&vif->dealloc_delay);
 
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
@@ -553,6 +554,7 @@ void xenvif_disconnect(struct xenvif *vif)
 	}
 
 	if (vif->dealloc_task) {
+		del_timer_sync(&vif->dealloc_delay);
 		kthread_stop(vif->dealloc_task);
 		vif->dealloc_task = NULL;
 	}
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 9e7ba04..b1d1d4c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -135,6 +135,11 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
 		vif->pending_prod + vif->pending_cons;
 }
 
+static inline pending_ring_idx_t nr_free_slots(struct xen_netif_tx_back_ring *ring)
+{
+	return ring->nr_ents -	(ring->sring->req_prod - ring->rsp_prod_pvt);
+}
+
 bool xenvif_rx_ring_slots_available(struct xenvif *vif, int needed)
 {
 	RING_IDX prod, cons;
@@ -1936,10 +1941,34 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
+static void xenvif_dealloc_delay(unsigned long data)
+{
+	struct xenvif *vif = (struct xenvif *)data;
+
+	vif->dealloc_delay_timed_out = true;
+	wake_up(&vif->dealloc_wq);
+}
+
 static inline bool tx_dealloc_work_todo(struct xenvif *vif)
 {
-	if (vif->dealloc_cons != vif->dealloc_prod)
+	if (vif->dealloc_cons != vif->dealloc_prod) {
+		if ((nr_free_slots(&vif->tx) > 2 * XEN_NETBK_LEGACY_SLOTS_MAX) &&
+		    (vif->dealloc_prod - vif->dealloc_cons < MAX_PENDING_REQS / 4) &&
+		    !vif->dealloc_delay_timed_out) {
+			if (!timer_pending(&vif->dealloc_delay)) {
+				vif->dealloc_delay.function =
+					xenvif_dealloc_delay;
+				vif->dealloc_delay.data = (unsigned long)vif;
+				mod_timer(&vif->dealloc_delay,
+					  jiffies + msecs_to_jiffies(1));
+
+			}
+			return false;
+		}
+		del_timer_sync(&vif->dealloc_delay);
+		vif->dealloc_delay_timed_out = false;
 		return true;
+	}
 
 	return false;
 }

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions
  2014-01-14 20:39 ` [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
@ 2014-01-15 15:16   ` Zoltan Kiss
  2014-01-16  0:00   ` Wei Liu
  1 sibling, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-15 15:16 UTC (permalink / raw)
  To: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies
  Cc: Zoltan Kiss

On 14/01/14 20:39, Zoltan Kiss wrote:
> @@ -1677,6 +1793,31 @@ static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx,
>   	vif->mmap_pages[pending_idx] = NULL;
>   }
>
> +void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
> +{
> +	int ret;
> +	struct gnttab_unmap_grant_ref tx_unmap_op;
> +
> +	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
> +		netdev_err(vif->dev,
> +			   "Trying to unmap invalid handle! pending_idx: %x\n",
> +			   pending_idx);
> +		return;
> +	}
> +	gnttab_set_unmap_op(&tx_unmap_op,
> +			    idx_to_kaddr(vif, pending_idx),
> +			    GNTMAP_host_map,
> +			    vif->grant_tx_handle[pending_idx]);
> +	ret = gnttab_unmap_refs(&tx_unmap_op,
> +				&vif->mmap_pages[pending_idx],
> +				1);
> +
> +	ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
> +					&tx_unmap_op,
> +					1);
> +	BUG_ON(ret);
> +	vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
> +}

Awkward mistake, I forgot to delete the hypercall ... Even more 
interesting, it caused troubles only very rarely ...

Zoli

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions
  2014-01-14 20:39 ` [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
  2014-01-15 15:16   ` Zoltan Kiss
@ 2014-01-16  0:00   ` Wei Liu
  2014-01-20 16:53     ` Zoltan Kiss
  1 sibling, 1 reply; 22+ messages in thread
From: Wei Liu @ 2014-01-16  0:00 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

There is a stray blank line change in xenvif_tx_create_gop. (I removed
that part too early and didn't bother to paste it back...)

On Tue, Jan 14, 2014 at 08:39:47PM +0000, Zoltan Kiss wrote:
[...]
> +void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx)
> +{
> +	int ret;
> +	struct gnttab_unmap_grant_ref tx_unmap_op;
> +
> +	if (vif->grant_tx_handle[pending_idx] == NETBACK_INVALID_HANDLE) {
> +		netdev_err(vif->dev,
> +			   "Trying to unmap invalid handle! pending_idx: %x\n",
> +			   pending_idx);
> +		return;
> +	}
> +	gnttab_set_unmap_op(&tx_unmap_op,
> +			    idx_to_kaddr(vif, pending_idx),
> +			    GNTMAP_host_map,
> +			    vif->grant_tx_handle[pending_idx]);
> +	ret = gnttab_unmap_refs(&tx_unmap_op,
> +				&vif->mmap_pages[pending_idx],
> +				1);
> +
> +	ret = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
> +					&tx_unmap_op,
> +					1);

As you said in your other email, this should be removed. :-)

> +	BUG_ON(ret);
> +	vif->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
> +}
>  
>  static void make_tx_response(struct xenvif *vif,
>  			     struct xen_netif_tx_request *txp,
> @@ -1738,6 +1879,14 @@ static inline int tx_work_todo(struct xenvif *vif)
>  	return 0;
>  }
>  
> +static inline bool tx_dealloc_work_todo(struct xenvif *vif)
> +{
> +	if (vif->dealloc_cons != vif->dealloc_prod)
> +		return true;
> +
> +	return false;

This can be simplified as
  return vif->dealloc_cons != vif->dealloc_prod;

Wei.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping
  2014-01-14 20:39 ` [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping Zoltan Kiss
@ 2014-01-16  0:01   ` Wei Liu
  2014-01-20 17:04     ` Zoltan Kiss
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Liu @ 2014-01-16  0:01 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Tue, Jan 14, 2014 at 08:39:48PM +0000, Zoltan Kiss wrote:
> This patch changes the grant copy on the TX patch to grant mapping
> 
> v2:
> - delete branch for handling fragmented packets fit PKT_PROT_LEN sized first
>   request
> - mark the effect of using ballooned pages in a comment
> - place setting of skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY right
>   before netif_receive_skb, and mark the importance of it
> - grab dealloc_lock before __napi_complete to avoid contention with the
>   callback's napi_schedule
> - handle fragmented packets where first request < PKT_PROT_LEN
> - fix up error path when checksum_setup failed
> - check before teardown for pending grants, and start complain if they are
>   there after 10 second
> 
> v3:
> - delete a surplus checking from tx_action
> - remove stray line
> - squash xenvif_idx_unmap changes into the first patch
> - init spinlocks
> - call map hypercall directly instead of gnttab_map_refs()
> - fix unmapping timeout in xenvif_free()
> 
> v4:
> - fix indentations and comments
> - handle errors of set_phys_to_machine

There's no call to set_phys_to_machine in this patch. Did I miss
something?

> - go back to gnttab_map_refs instead of direct hypercall. Now we rely on the
>   modified API
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
>  drivers/net/xen-netback/interface.c |   60 +++++++-
>  drivers/net/xen-netback/netback.c   |  256 ++++++++++++++---------------------
>  2 files changed, 159 insertions(+), 157 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index a7855b3..1e0bf71 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -123,7 +123,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	BUG_ON(skb->dev != dev);
>  
>  	/* Drop the packet if vif is not ready */
> -	if (vif->task == NULL || !xenvif_schedulable(vif))
> +	if (vif->task == NULL ||
> +	    vif->dealloc_task == NULL ||
> +	    !xenvif_schedulable(vif))
>  		goto drop;
>  
>  	/* At best we'll need one slot for the header and one for each
> @@ -345,8 +347,26 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,

At the beginning of the function there's BUG_ON checks for vif->task. I
would suggest you do the same for vif->dealloc_task, just to be
consistent.

>  	vif->pending_prod = MAX_PENDING_REQS;
>  	for (i = 0; i < MAX_PENDING_REQS; i++)
>  		vif->pending_ring[i] = i;
> -	for (i = 0; i < MAX_PENDING_REQS; i++)
> -		vif->mmap_pages[i] = NULL;
> +	spin_lock_init(&vif->dealloc_lock);
> +	spin_lock_init(&vif->response_lock);
> +	/* If ballooning is disabled, this will consume real memory, so you
> +	 * better enable it. The long term solution would be to use just a
> +	 * bunch of valid page descriptors, without dependency on ballooning
> +	 */
> +	err = alloc_xenballooned_pages(MAX_PENDING_REQS,
> +				       vif->mmap_pages,
> +				       false);
> +	if (err) {
> +		netdev_err(dev, "Could not reserve mmap_pages\n");
> +		return NULL;
> +	}
> +	for (i = 0; i < MAX_PENDING_REQS; i++) {
> +		vif->pending_tx_info[i].callback_struct = (struct ubuf_info)
> +			{ .callback = xenvif_zerocopy_callback,
> +			  .ctx = NULL,
> +			  .desc = i };
> +		vif->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
> +	}
>  
>  	/*
>  	 * Initialise a dummy MAC address. We choose the numerically
> @@ -390,6 +410,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  		goto err;
>  
>  	init_waitqueue_head(&vif->wq);
> +	init_waitqueue_head(&vif->dealloc_wq);
>  
>  	if (tx_evtchn == rx_evtchn) {
>  		/* feature-split-event-channels == 0 */
> @@ -431,6 +452,16 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  		goto err_rx_unbind;
>  	}
>  
> +	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
> +					   (void *)vif,
> +					   "%s-dealloc",
> +					   vif->dev->name);
> +	if (IS_ERR(vif->dealloc_task)) {
> +		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
> +		err = PTR_ERR(vif->dealloc_task);
> +		goto err_rx_unbind;
> +	}
> +
>  	vif->task = task;

Please move this line before the above hunk. Don't separate it from
corresponding kthread_create.

Last but not least, though I've looked at this patch for several rounds
and and the basic logic looks correct to me, I would like it to go
through XenRT tests if possible -- eye inspection is error-prone to such
complicated change. (If I'm not mistaken you once told me you've done
regression tests already. That would be neat!)

Wei.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations
  2014-01-14 20:39 ` [PATCH net-next v4 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations Zoltan Kiss
@ 2014-01-16  0:02   ` Wei Liu
  0 siblings, 0 replies; 22+ messages in thread
From: Wei Liu @ 2014-01-16  0:02 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Tue, Jan 14, 2014 at 08:39:49PM +0000, Zoltan Kiss wrote:
> These became obsolate with grant mapping. I've left intentionally the
               ^ obsolete
> indentations in this way, to improve readability of previous patches.
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags
  2014-01-14 20:39 ` [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
@ 2014-01-16  0:03   ` Wei Liu
  2014-01-20 17:26     ` Zoltan Kiss
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Liu @ 2014-01-16  0:03 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Tue, Jan 14, 2014 at 08:39:52PM +0000, Zoltan Kiss wrote:
[...]
>  	/* Skip first skb fragment if it is on same page as header fragment. */
> @@ -832,6 +851,29 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>  
>  	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>  
> +	if (frag_overflow) {
> +		struct sk_buff *nskb = xenvif_alloc_skb(0);
> +		if (unlikely(nskb == NULL)) {
> +			netdev_err(vif->dev,
> +				   "Can't allocate the frag_list skb.\n");

This, and other occurences of netdev_* logs need to be rate limit.
Otherwise you risk flooding kernel log when system is under memory
pressure.

> +			return NULL;
> +		}
> +
> +		shinfo = skb_shinfo(nskb);
> +		frags = shinfo->frags;
> +
> +		for (shinfo->nr_frags = 0; shinfo->nr_frags < frag_overflow;
> +		     shinfo->nr_frags++, txp++, gop++) {
> +			index = pending_index(vif->pending_cons++);
> +			pending_idx = vif->pending_ring[index];
> +			xenvif_tx_create_gop(vif, pending_idx, txp, gop);
> +			frag_set_pending_idx(&frags[shinfo->nr_frags],
> +					     pending_idx);
> +		}
> +
> +		skb_shinfo(skb)->frag_list = nskb;
> +	}
> +
>  	return gop;
>  }
>  
[...]
> @@ -1537,6 +1613,32 @@ static int xenvif_tx_submit(struct xenvif *vif)
>  				  pending_idx :
>  				  INVALID_PENDING_IDX);
>  
> +		if (skb_shinfo(skb)->frag_list) {
> +			nskb = skb_shinfo(skb)->frag_list;
> +			xenvif_fill_frags(vif, nskb, INVALID_PENDING_IDX);
> +			skb->len += nskb->len;
> +			skb->data_len += nskb->len;
> +			skb->truesize += nskb->truesize;
> +			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> +			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
> +			vif->tx_zerocopy_sent += 2;
> +			nskb = skb;
> +
> +			skb = skb_copy_expand(skb,
> +					      0,
> +					      0,
> +					      GFP_ATOMIC | __GFP_NOWARN);
> +			if (!skb) {
> +				netdev_dbg(vif->dev,
> +					   "Can't consolidate skb with too many fragments\n");

Rate limit.

> +				if (skb_shinfo(nskb)->destructor_arg)
> +					skb_shinfo(nskb)->tx_flags |=
> +						SKBTX_DEV_ZEROCOPY;

Why is this needed? nskb is the saved pointer to original skb, which has
already had SKBTX_DEV_ZEROCOPY in tx_flags. Did I miss something?

Wei.

> +				kfree_skb(nskb);
> +				continue;
> +			}
> +			skb_shinfo(skb)->destructor_arg = NULL;
> +		}
>  		if (skb_is_nonlinear(skb) && skb_headlen(skb) < PKT_PROT_LEN) {
>  			int target = min_t(int, skb->len, PKT_PROT_LEN);
>  			__pskb_pull_tail(skb, target - skb_headlen(skb));
> @@ -1590,6 +1692,9 @@ static int xenvif_tx_submit(struct xenvif *vif)
>  		}
>  
>  		netif_receive_skb(skb);
> +
> +		if (nskb)
> +			kfree_skb(nskb);
>  	}
>  
>  	return work_done;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
  2014-01-14 20:39 ` [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
@ 2014-01-16  0:03   ` Wei Liu
  2014-01-17 19:27     ` Zoltan Kiss
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Liu @ 2014-01-16  0:03 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: ian.campbell, wei.liu2, xen-devel, netdev, linux-kernel, jonathan.davies

On Tue, Jan 14, 2014 at 08:39:54PM +0000, Zoltan Kiss wrote:
[...]
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 109c29f..d1cd8ce 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -129,6 +129,9 @@ struct xenvif {
>  	struct xen_netif_rx_back_ring rx;
>  	struct sk_buff_head rx_queue;
>  	RING_IDX rx_last_skb_slots;

Hmm... You seemed to mix your other patch with this series. :-)

> +	bool rx_queue_purge;
> +
> +	struct timer_list wake_queue;
>  
>  	/* This array is allocated seperately as it is large */
>  	struct gnttab_copy *grant_copy_op;
> @@ -225,4 +228,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
>  
>  extern bool separate_tx_rx_irq;
>  
[...]
> @@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
>  		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>  			unmap_timeout++;
>  			schedule_timeout(msecs_to_jiffies(1000));
> -			if (unmap_timeout > 9 &&
> +			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&

This line is really too long. And what's the rationale behind this long
expression?

Wei.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
  2014-01-16  0:03   ` Wei Liu
@ 2014-01-17 19:27     ` Zoltan Kiss
  2014-01-20 16:53       ` Wei Liu
  0 siblings, 1 reply; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-17 19:27 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/01/14 00:03, Wei Liu wrote:
> On Tue, Jan 14, 2014 at 08:39:54PM +0000, Zoltan Kiss wrote:
> [...]
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index 109c29f..d1cd8ce 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -129,6 +129,9 @@ struct xenvif {
>>   	struct xen_netif_rx_back_ring rx;
>>   	struct sk_buff_head rx_queue;
>>   	RING_IDX rx_last_skb_slots;
>
> Hmm... You seemed to mix your other patch with this series. :-)
Yep, this series doesn't work without that patch (actually that is a bug 
in netback even without my series), so at the moment it is based on it.

>
>> +	bool rx_queue_purge;
>> +
>> +	struct timer_list wake_queue;
>>
>>   	/* This array is allocated seperately as it is large */
>>   	struct gnttab_copy *grant_copy_op;
>> @@ -225,4 +228,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
>>
>>   extern bool separate_tx_rx_irq;
>>
> [...]
>> @@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
>>   		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>>   			unmap_timeout++;
>>   			schedule_timeout(msecs_to_jiffies(1000));
>> -			if (unmap_timeout > 9 &&
>> +			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&
>
> This line is really too long. And what's the rationale behind this long
> expression?
It calculates how many times you should ditch the internal queue of an 
another (maybe stucked) vif before Qdisc empties it's actual content. 
After that there shouldn't be any mapped handle left, so we should start 
printing these messages. Actually it should use vif->dev->tx_queue_len, 
and yes, it is probably better to move it to the beginning of the 
function into a new variable, and use that here.

Zoli

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions
  2014-01-16  0:00   ` Wei Liu
@ 2014-01-20 16:53     ` Zoltan Kiss
  0 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-20 16:53 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/01/14 00:00, Wei Liu wrote:
> There is a stray blank line change in xenvif_tx_create_gop. (I removed
> that part too early and didn't bother to paste it back...)
Ok, fixed

>> +static inline bool tx_dealloc_work_todo(struct xenvif *vif)
>> +{
>> +	if (vif->dealloc_cons != vif->dealloc_prod)
>> +		return true;
>> +
>> +	return false;
>
> This can be simplified as
>    return vif->dealloc_cons != vif->dealloc_prod;
Indeed, done.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
  2014-01-17 19:27     ` Zoltan Kiss
@ 2014-01-20 16:53       ` Wei Liu
  2014-01-20 17:47         ` Zoltan Kiss
  0 siblings, 1 reply; 22+ messages in thread
From: Wei Liu @ 2014-01-20 16:53 UTC (permalink / raw)
  To: Zoltan Kiss
  Cc: Wei Liu, ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On Fri, Jan 17, 2014 at 07:27:35PM +0000, Zoltan Kiss wrote:
> On 16/01/14 00:03, Wei Liu wrote:
> >On Tue, Jan 14, 2014 at 08:39:54PM +0000, Zoltan Kiss wrote:
> >[...]
> >>diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> >>index 109c29f..d1cd8ce 100644
> >>--- a/drivers/net/xen-netback/common.h
> >>+++ b/drivers/net/xen-netback/common.h
> >>@@ -129,6 +129,9 @@ struct xenvif {
> >>  	struct xen_netif_rx_back_ring rx;
> >>  	struct sk_buff_head rx_queue;
> >>  	RING_IDX rx_last_skb_slots;
> >
> >Hmm... You seemed to mix your other patch with this series. :-)
> Yep, this series doesn't work without that patch (actually that is a
> bug in netback even without my series), so at the moment it is based
> on it.
> 
> >
> >>+	bool rx_queue_purge;
> >>+
> >>+	struct timer_list wake_queue;
> >>
> >>  	/* This array is allocated seperately as it is large */
> >>  	struct gnttab_copy *grant_copy_op;
> >>@@ -225,4 +228,7 @@ void xenvif_idx_unmap(struct xenvif *vif, u16 pending_idx);
> >>
> >>  extern bool separate_tx_rx_irq;
> >>
> >[...]
> >>@@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
> >>  		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
> >>  			unmap_timeout++;
> >>  			schedule_timeout(msecs_to_jiffies(1000));
> >>-			if (unmap_timeout > 9 &&
> >>+			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&
> >
> >This line is really too long. And what's the rationale behind this long
> >expression?
> It calculates how many times you should ditch the internal queue of
> an another (maybe stucked) vif before Qdisc empties it's actual
> content. After that there shouldn't be any mapped handle left, so we
> should start printing these messages. Actually it should use
> vif->dev->tx_queue_len, and yes, it is probably better to move it to
> the beginning of the function into a new variable, and use that
> here.
> 

Why is relative to tx queue length?

What's the meaning of drain_timeout multipled by the last part
(DIV_ROUND_UP)?

If you proposed to use vif->dev->tx_queue_len to replace DIV_ROUND_UP
then ignore the above question. But I still don't understand the
rationale behind this. Could you elaborate a bit more? Wouldn't
rx_drain_timeout_msecs/1000 along suffice?

Wei.

> Zoli

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping
  2014-01-16  0:01   ` Wei Liu
@ 2014-01-20 17:04     ` Zoltan Kiss
  0 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-20 17:04 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/01/14 00:01, Wei Liu wrote:
> On Tue, Jan 14, 2014 at 08:39:48PM +0000, Zoltan Kiss wrote:
>> v3:
>> - delete a surplus checking from tx_action
>> - remove stray line
>> - squash xenvif_idx_unmap changes into the first patch
>> - init spinlocks
>> - call map hypercall directly instead of gnttab_map_refs()
>> - fix unmapping timeout in xenvif_free()
>>
>> v4:
>> - fix indentations and comments
>> - handle errors of set_phys_to_machine
>
> There's no call to set_phys_to_machine in this patch. Did I miss
> something?
I've made several changes between v3 and v4 about the grant mapping 
stuff, this was an earlier concept, not the one I've finally sent in. It 
should be the same comment as in the first patch: "go back to 
gnttab_map_refs, now we rely on API changes"

>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -123,7 +123,9 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>>   	BUG_ON(skb->dev != dev);
>>
>>   	/* Drop the packet if vif is not ready */
>> -	if (vif->task == NULL || !xenvif_schedulable(vif))
>> +	if (vif->task == NULL ||
>> +	    vif->dealloc_task == NULL ||
>> +	    !xenvif_schedulable(vif))
>>   		goto drop;
>>
>>   	/* At best we'll need one slot for the header and one for each
>> @@ -345,8 +347,26 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>
> At the beginning of the function there's BUG_ON checks for vif->task. I
> would suggest you do the same for vif->dealloc_task, just to be
> consistent.
I guess you mean in xenvif_connect. Applied.

>> @@ -431,6 +452,16 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>>   		goto err_rx_unbind;
>>   	}
>>
>> +	vif->dealloc_task = kthread_create(xenvif_dealloc_kthread,
>> +					   (void *)vif,
>> +					   "%s-dealloc",
>> +					   vif->dev->name);
>> +	if (IS_ERR(vif->dealloc_task)) {
>> +		pr_warn("Could not allocate kthread for %s\n", vif->dev->name);
>> +		err = PTR_ERR(vif->dealloc_task);
>> +		goto err_rx_unbind;
>> +	}
>> +
>>   	vif->task = task;
>
> Please move this line before the above hunk. Don't separate it from
> corresponding kthread_create.
Done, I've also used task for dealloc thread creation, the same way as 
the rx thread does.

> Last but not least, though I've looked at this patch for several rounds
> and and the basic logic looks correct to me, I would like it to go
> through XenRT tests if possible -- eye inspection is error-prone to such
> complicated change. (If I'm not mistaken you once told me you've done
> regression tests already. That would be neat!)
Yes, that's ongoing, I don't expect the patches to be accepted before 
they pass XenRT.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags
  2014-01-16  0:03   ` Wei Liu
@ 2014-01-20 17:26     ` Zoltan Kiss
  0 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-20 17:26 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 16/01/14 00:03, Wei Liu wrote:
> On Tue, Jan 14, 2014 at 08:39:52PM +0000, Zoltan Kiss wrote:
> [...]
>>   	/* Skip first skb fragment if it is on same page as header fragment. */
>> @@ -832,6 +851,29 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif *vif,
>>
>>   	BUG_ON(shinfo->nr_frags > MAX_SKB_FRAGS);
>>
>> +	if (frag_overflow) {
>> +		struct sk_buff *nskb = xenvif_alloc_skb(0);
>> +		if (unlikely(nskb == NULL)) {
>> +			netdev_err(vif->dev,
>> +				   "Can't allocate the frag_list skb.\n");
>
> This, and other occurences of netdev_* logs need to be rate limit.
> Otherwise you risk flooding kernel log when system is under memory
> pressure.
Done.

>> @@ -1537,6 +1613,32 @@ static int xenvif_tx_submit(struct xenvif *vif)
>>   				  pending_idx :
>>   				  INVALID_PENDING_IDX);
>>
>> +		if (skb_shinfo(skb)->frag_list) {
>> +			nskb = skb_shinfo(skb)->frag_list;
>> +			xenvif_fill_frags(vif, nskb, INVALID_PENDING_IDX);
>> +			skb->len += nskb->len;
>> +			skb->data_len += nskb->len;
>> +			skb->truesize += nskb->truesize;
>> +			skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>> +			skb_shinfo(nskb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
>> +			vif->tx_zerocopy_sent += 2;
>> +			nskb = skb;
>> +
>> +			skb = skb_copy_expand(skb,
>> +					      0,
>> +					      0,
>> +					      GFP_ATOMIC | __GFP_NOWARN);
>> +			if (!skb) {
>> +				netdev_dbg(vif->dev,
>> +					   "Can't consolidate skb with too many fragments\n");
>
> Rate limit.
>
>> +				if (skb_shinfo(nskb)->destructor_arg)
>> +					skb_shinfo(nskb)->tx_flags |=
>> +						SKBTX_DEV_ZEROCOPY;
>
> Why is this needed? nskb is the saved pointer to original skb, which has
> already had SKBTX_DEV_ZEROCOPY in tx_flags. Did I miss something?
Indeed. This actually belongs to the header grant copy patches I've sent 
in as well. I move it there.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path
  2014-01-20 16:53       ` Wei Liu
@ 2014-01-20 17:47         ` Zoltan Kiss
  0 siblings, 0 replies; 22+ messages in thread
From: Zoltan Kiss @ 2014-01-20 17:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, netdev, linux-kernel, jonathan.davies

On 20/01/14 16:53, Wei Liu wrote:
>>>> @@ -559,7 +579,7 @@ void xenvif_free(struct xenvif *vif)
>>>>   		if (vif->grant_tx_handle[i] != NETBACK_INVALID_HANDLE) {
>>>>   			unmap_timeout++;
>>>>   			schedule_timeout(msecs_to_jiffies(1000));
>>>> -			if (unmap_timeout > 9 &&
>>>> +			if (unmap_timeout > ((rx_drain_timeout_msecs/1000) * DIV_ROUND_UP(XENVIF_QUEUE_LENGTH, (XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS))) &&
>>>
>>> This line is really too long. And what's the rationale behind this long
>>> expression?
>> It calculates how many times you should ditch the internal queue of
>> an another (maybe stucked) vif before Qdisc empties it's actual
>> content. After that there shouldn't be any mapped handle left, so we
>> should start printing these messages. Actually it should use
>> vif->dev->tx_queue_len, and yes, it is probably better to move it to
>> the beginning of the function into a new variable, and use that
>> here.
>>
>
> Why is relative to tx queue length?
>
> What's the meaning of drain_timeout multipled by the last part
> (DIV_ROUND_UP)?
>
> If you proposed to use vif->dev->tx_queue_len to replace DIV_ROUND_UP
> then ignore the above question. But I still don't understand the
> rationale behind this. Could you elaborate a bit more? Wouldn't
> rx_drain_timeout_msecs/1000 along suffice?

Here we want to avoid timeout messages if an skb can be legitimatly 
stucked somewhere else. As we discussed earlier, realisticly this could 
be an another vif's internal or QDisc queue. That another vif also has 
this rx_drain_timeout_msecs timeout, but now with Paul's recent changes 
the timer only ditches the internal queue. After that, the QDisc queue 
can put in worst case XEN_NETIF_RX_RING_SIZE / MAX_SKB_FRAGS skbs into 
that another vif's internal queue, so we need several rounds of such 
timeouts until we can be sure that no another vif should have skb's from 
us. We are not sending more skb's, so newly stucked packets are not 
interesting for us here.
But actually using the current vif's queue length is not relevant in 
this calculation, as it doesn't mean other vif's has the same. I think 
it is better to stick with XENVIF_QUEUE_LENGTH.
I've added this explanation as a comment and moved the calculation into 
a separate variable, so it doesn't cause such long lines.

Zoli

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-01-20 17:48 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-14 20:39 [PATCH net-next v4 0/9] xen-netback: TX grant mapping with SKBTX_DEV_ZEROCOPY instead of copy Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 1/9] xen-netback: Introduce TX grant map definitions Zoltan Kiss
2014-01-15 15:16   ` Zoltan Kiss
2014-01-16  0:00   ` Wei Liu
2014-01-20 16:53     ` Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 2/9] xen-netback: Change TX path from grant copy to mapping Zoltan Kiss
2014-01-16  0:01   ` Wei Liu
2014-01-20 17:04     ` Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 3/9] xen-netback: Remove old TX grant copy definitons and fix indentations Zoltan Kiss
2014-01-16  0:02   ` Wei Liu
2014-01-14 20:39 ` [PATCH net-next v4 4/9] xen-netback: Change RX path for mapped SKB fragments Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 5/9] xen-netback: Add stat counters for zerocopy Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 6/9] xen-netback: Handle guests with too many frags Zoltan Kiss
2014-01-16  0:03   ` Wei Liu
2014-01-20 17:26     ` Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 7/9] xen-netback: Add stat counters for frag_list skbs Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 8/9] xen-netback: Timeout packets in RX path Zoltan Kiss
2014-01-16  0:03   ` Wei Liu
2014-01-17 19:27     ` Zoltan Kiss
2014-01-20 16:53       ` Wei Liu
2014-01-20 17:47         ` Zoltan Kiss
2014-01-14 20:39 ` [PATCH net-next v4 9/9] xen-netback: Aggregate TX unmap operations Zoltan Kiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).