All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH V3] Xen netback / netfront improvement
@ 2012-01-30 14:45 Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 01/16] netback: page pool version 1 Wei Liu
                   ` (15 more replies)
  0 siblings, 16 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk

Since this series includes both netback and netfront changes, the
whole series is named as "Xen netback / netfront improvement".

Changes in V3:
 - Rework of per-cpu scratch space
 - Multi page ring support
 - Split event channels
 - Rx protocol stub
 - Fix a minor bug in module_put path

Changes in V2:
 - Fix minor bugs in V1
 - Embed pending_tx_info into page pool
 - Per-cpu scratch space
 - Notification code path clean up

This version has been tested by 
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

V1:
A new netback implementation which includes three major features:

 - Global page pool support
 - NAPI + kthread 1:1 model
 - Netback internal name changes

This patch series is the foundation of furture work. So it is better
to get it right first. Patch 1 and 3 have the real meat.

The first benifit of 1:1 model will be scheduling fairness.

The rational behind a global page pool is that we need to limit
overall memory consumed by all vifs.

Utilization of NAPI enables the possibility to mitigate
interrupts/events, the code path is cleaned up in a separated patch.

Netback internal changes cleans up the code structure after switching
to 1:1 model. It also prepares netback for further code layout
changes.

----
 drivers/net/xen-netback/Makefile              |    2 +-
 drivers/net/xen-netback/common.h              |  149 ++-
 drivers/net/xen-netback/interface.c           |  256 ++++--
 drivers/net/xen-netback/netback.c             | 1344 +++++++------------------
 drivers/net/xen-netback/page_pool.c           |  185 ++++
 drivers/net/xen-netback/page_pool.h           |   66 ++
 drivers/net/xen-netback/xenbus.c              |  185 ++++-
 drivers/net/xen-netback/xenvif_rx_protocol0.c |  616 +++++++++++
 drivers/net/xen-netback/xenvif_rx_protocol0.h |   53 +
 drivers/net/xen-netfront.c                    |  399 ++++++--
 10 files changed, 2062 insertions(+), 1193 deletions(-)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 01/16] netback: page pool version 1
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 02/16] netback: add module unload function Wei Liu
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: Wei Liu, ian.campbell, konrad.wilk

A global page pool. Since we are moving to 1:1 model netback, it is
better to limit total RAM consumed by all the vifs.

With this patch, each vif gets page from the pool and puts the page
back when it is finished with the page.

This pool is only meant to access via exported interfaces. Internals
are subject to change when we discover new requirements for the pool.

Current exported interfaces include:

page_pool_init: pool init
page_pool_destroy: pool destruction
page_pool_get: get a page from pool
page_pool_put: put page back to pool
is_in_pool: tell whether a page belongs to the pool

Current implementation has following defects:
 - Global locking
 - No starve prevention mechanism / reservation logic

Global locking tends to cause contention on the pool. No reservation
logic may cause vif to starve. A possible solution to these two
problems will be each vif maintains its local cache and claims a
portion of the pool. However the implementation will be tricky when
coming to pool management, so let's worry about that later.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/Makefile    |    2 +-
 drivers/net/xen-netback/common.h    |    6 +
 drivers/net/xen-netback/netback.c   |  158 ++++++++++++------------------
 drivers/net/xen-netback/page_pool.c |  185 +++++++++++++++++++++++++++++++++++
 drivers/net/xen-netback/page_pool.h |   63 ++++++++++++
 5 files changed, 317 insertions(+), 97 deletions(-)
 create mode 100644 drivers/net/xen-netback/page_pool.c
 create mode 100644 drivers/net/xen-netback/page_pool.h

diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
index e346e81..dc4b8b1 100644
--- a/drivers/net/xen-netback/Makefile
+++ b/drivers/net/xen-netback/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
 
-xen-netback-y := netback.o xenbus.o interface.o
+xen-netback-y := netback.o xenbus.o interface.o page_pool.o
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 94b79c3..288b2f3 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -45,6 +45,12 @@
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 
+struct pending_tx_info {
+	struct xen_netif_tx_request req;
+	struct xenvif *vif;
+};
+typedef unsigned int pending_ring_idx_t;
+
 struct xen_netbk;
 
 struct xenvif {
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 59effac..d11205f 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -33,6 +33,7 @@
  */
 
 #include "common.h"
+#include "page_pool.h"
 
 #include <linux/kthread.h>
 #include <linux/if_vlan.h>
@@ -46,12 +47,6 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
-struct pending_tx_info {
-	struct xen_netif_tx_request req;
-	struct xenvif *vif;
-};
-typedef unsigned int pending_ring_idx_t;
-
 struct netbk_rx_meta {
 	int id;
 	int size;
@@ -65,21 +60,6 @@ struct netbk_rx_meta {
 
 #define MAX_BUFFER_OFFSET PAGE_SIZE
 
-/* extra field used in struct page */
-union page_ext {
-	struct {
-#if BITS_PER_LONG < 64
-#define IDX_WIDTH   8
-#define GROUP_WIDTH (BITS_PER_LONG - IDX_WIDTH)
-		unsigned int group:GROUP_WIDTH;
-		unsigned int idx:IDX_WIDTH;
-#else
-		unsigned int group, idx;
-#endif
-	} e;
-	void *mapping;
-};
-
 struct xen_netbk {
 	wait_queue_head_t wq;
 	struct task_struct *task;
@@ -89,7 +69,7 @@ struct xen_netbk {
 
 	struct timer_list net_timer;
 
-	struct page *mmap_pages[MAX_PENDING_REQS];
+	idx_t mmap_pages[MAX_PENDING_REQS];
 
 	pending_ring_idx_t pending_prod;
 	pending_ring_idx_t pending_cons;
@@ -100,7 +80,6 @@ struct xen_netbk {
 
 	atomic_t netfront_count;
 
-	struct pending_tx_info pending_tx_info[MAX_PENDING_REQS];
 	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
 
 	u16 pending_ring[MAX_PENDING_REQS];
@@ -160,7 +139,7 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 static inline unsigned long idx_to_pfn(struct xen_netbk *netbk,
 				       u16 idx)
 {
-	return page_to_pfn(netbk->mmap_pages[idx]);
+	return page_to_pfn(to_page(netbk->mmap_pages[idx]));
 }
 
 static inline unsigned long idx_to_kaddr(struct xen_netbk *netbk,
@@ -169,45 +148,6 @@ static inline unsigned long idx_to_kaddr(struct xen_netbk *netbk,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(netbk, idx));
 }
 
-/* extra field used in struct page */
-static inline void set_page_ext(struct page *pg, struct xen_netbk *netbk,
-				unsigned int idx)
-{
-	unsigned int group = netbk - xen_netbk;
-	union page_ext ext = { .e = { .group = group + 1, .idx = idx } };
-
-	BUILD_BUG_ON(sizeof(ext) > sizeof(ext.mapping));
-	pg->mapping = ext.mapping;
-}
-
-static int get_page_ext(struct page *pg,
-			unsigned int *pgroup, unsigned int *pidx)
-{
-	union page_ext ext = { .mapping = pg->mapping };
-	struct xen_netbk *netbk;
-	unsigned int group, idx;
-
-	group = ext.e.group - 1;
-
-	if (group < 0 || group >= xen_netbk_group_nr)
-		return 0;
-
-	netbk = &xen_netbk[group];
-
-	idx = ext.e.idx;
-
-	if ((idx < 0) || (idx >= MAX_PENDING_REQS))
-		return 0;
-
-	if (netbk->mmap_pages[idx] != pg)
-		return 0;
-
-	*pgroup = group;
-	*pidx = idx;
-
-	return 1;
-}
-
 /*
  * This is the amount of packet we copy rather than map, so that the
  * guest can't fiddle with the contents of the headers while we do
@@ -398,8 +338,8 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 	 * These variables are used iff get_page_ext returns true,
 	 * in which case they are guaranteed to be initialized.
 	 */
-	unsigned int uninitialized_var(group), uninitialized_var(idx);
-	int foreign = get_page_ext(page, &group, &idx);
+	unsigned int uninitialized_var(idx);
+	int foreign = is_in_pool(page, &idx);
 	unsigned long bytes;
 
 	/* Data must not cross a page boundary. */
@@ -427,10 +367,7 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop = npo->copy + npo->copy_prod++;
 		copy_gop->flags = GNTCOPY_dest_gref;
 		if (foreign) {
-			struct xen_netbk *netbk = &xen_netbk[group];
-			struct pending_tx_info *src_pend;
-
-			src_pend = &netbk->pending_tx_info[idx];
+			struct pending_tx_info *src_pend = to_txinfo(idx);
 
 			copy_gop->source.domid = src_pend->vif->domid;
 			copy_gop->source.u.ref = src_pend->req.gref;
@@ -906,11 +843,11 @@ static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
 					 u16 pending_idx)
 {
 	struct page *page;
-	page = alloc_page(GFP_KERNEL|__GFP_COLD);
+	int idx;
+	page = page_pool_get(netbk, &idx);
 	if (!page)
 		return NULL;
-	set_page_ext(page, netbk, pending_idx);
-	netbk->mmap_pages[pending_idx] = page;
+	netbk->mmap_pages[pending_idx] = idx;
 	return page;
 }
 
@@ -931,8 +868,8 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 	for (i = start; i < shinfo->nr_frags; i++, txp++) {
 		struct page *page;
 		pending_ring_idx_t index;
-		struct pending_tx_info *pending_tx_info =
-			netbk->pending_tx_info;
+		int idx;
+		struct pending_tx_info *pending_tx_info;
 
 		index = pending_index(netbk->pending_cons++);
 		pending_idx = netbk->pending_ring[index];
@@ -940,6 +877,9 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 		if (!page)
 			return NULL;
 
+		idx = netbk->mmap_pages[pending_idx];
+		pending_tx_info = to_txinfo(idx);
+
 		gop->source.u.ref = txp->gref;
 		gop->source.domid = vif->domid;
 		gop->source.offset = txp->offset;
@@ -953,9 +893,9 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 
 		gop++;
 
-		memcpy(&pending_tx_info[pending_idx].req, txp, sizeof(*txp));
+		memcpy(&pending_tx_info->req, txp, sizeof(*txp));
 		xenvif_get(vif);
-		pending_tx_info[pending_idx].vif = vif;
+		pending_tx_info->vif = vif;
 		frag_set_pending_idx(&frags[i], pending_idx);
 	}
 
@@ -968,8 +908,9 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 {
 	struct gnttab_copy *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
-	struct pending_tx_info *pending_tx_info = netbk->pending_tx_info;
-	struct xenvif *vif = pending_tx_info[pending_idx].vif;
+	struct pending_tx_info *pending_tx_info;
+	int idx;
+	struct xenvif *vif = NULL;
 	struct xen_netif_tx_request *txp;
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -980,7 +921,10 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	if (unlikely(err)) {
 		pending_ring_idx_t index;
 		index = pending_index(netbk->pending_prod++);
-		txp = &pending_tx_info[pending_idx].req;
+		idx = netbk->mmap_pages[index];
+		pending_tx_info = to_txinfo(idx);
+		txp = &pending_tx_info->req;
+		vif = pending_tx_info->vif;
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
 		netbk->pending_ring[index] = pending_idx;
 		xenvif_put(vif);
@@ -1005,7 +949,9 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 		}
 
 		/* Error on this fragment: respond to client with an error. */
-		txp = &netbk->pending_tx_info[pending_idx].req;
+		idx = netbk->mmap_pages[pending_idx];
+		txp = &to_txinfo(idx)->req;
+		vif = to_txinfo(idx)->vif;
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
 		index = pending_index(netbk->pending_prod++);
 		netbk->pending_ring[index] = pending_idx;
@@ -1042,10 +988,15 @@ static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb)
 		struct xen_netif_tx_request *txp;
 		struct page *page;
 		u16 pending_idx;
+		int idx;
+		struct pending_tx_info *pending_tx_info;
 
 		pending_idx = frag_get_pending_idx(frag);
 
-		txp = &netbk->pending_tx_info[pending_idx].req;
+		idx = netbk->mmap_pages[pending_idx];
+		pending_tx_info = to_txinfo(idx);
+
+		txp = &pending_tx_info->req;
 		page = virt_to_page(idx_to_kaddr(netbk, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
 		skb->len += txp->size;
@@ -1053,7 +1004,7 @@ static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb)
 		skb->truesize += txp->size;
 
 		/* Take an extra reference to offset xen_netbk_idx_release */
-		get_page(netbk->mmap_pages[pending_idx]);
+		get_page(page);
 		xen_netbk_idx_release(netbk, pending_idx);
 	}
 }
@@ -1233,6 +1184,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		int work_to_do;
 		unsigned int data_len;
 		pending_ring_idx_t index;
+		int pool_idx;
+		struct pending_tx_info *pending_tx_info;
 
 		/* Get a netif from the list with work to do. */
 		vif = poll_net_schedule_list(netbk);
@@ -1347,9 +1300,12 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 
 		gop++;
 
-		memcpy(&netbk->pending_tx_info[pending_idx].req,
+		pool_idx = netbk->mmap_pages[pending_idx];
+		pending_tx_info = to_txinfo(pool_idx);
+
+		memcpy(&pending_tx_info->req,
 		       &txreq, sizeof(txreq));
-		netbk->pending_tx_info[pending_idx].vif = vif;
+		pending_tx_info->vif = vif;
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1397,10 +1353,16 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
 		struct xenvif *vif;
 		u16 pending_idx;
 		unsigned data_len;
+		int idx;
+		struct pending_tx_info *pending_tx_info;
 
 		pending_idx = *((u16 *)skb->data);
-		vif = netbk->pending_tx_info[pending_idx].vif;
-		txp = &netbk->pending_tx_info[pending_idx].req;
+
+		idx = netbk->mmap_pages[pending_idx];
+		pending_tx_info = to_txinfo(idx);
+
+		vif = pending_tx_info->vif;
+		txp = &pending_tx_info->req;
 
 		/* Check the remap error code. */
 		if (unlikely(xen_netbk_tx_check_gop(netbk, skb, &gop))) {
@@ -1480,12 +1442,14 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
 	struct xenvif *vif;
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
+	int idx;
 
 	/* Already complete? */
-	if (netbk->mmap_pages[pending_idx] == NULL)
+	if (netbk->mmap_pages[pending_idx] == INVALID_ENTRY)
 		return;
 
-	pending_tx_info = &netbk->pending_tx_info[pending_idx];
+	idx = netbk->mmap_pages[pending_idx];
+	pending_tx_info = to_txinfo(idx);
 
 	vif = pending_tx_info->vif;
 
@@ -1496,9 +1460,9 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
 
 	xenvif_put(vif);
 
-	netbk->mmap_pages[pending_idx]->mapping = 0;
-	put_page(netbk->mmap_pages[pending_idx]);
-	netbk->mmap_pages[pending_idx] = NULL;
+	page_pool_put(netbk->mmap_pages[pending_idx]);
+
+	netbk->mmap_pages[pending_idx] = INVALID_ENTRY;
 }
 
 static void make_tx_response(struct xenvif *vif,
@@ -1681,19 +1645,21 @@ static int __init netback_init(void)
 		wake_up_process(netbk->task);
 	}
 
-	rc = xenvif_xenbus_init();
+	rc = page_pool_init();
 	if (rc)
 		goto failed_init;
 
+	rc = xenvif_xenbus_init();
+	if (rc)
+		goto pool_failed_init;
+
 	return 0;
 
+pool_failed_init:
+	page_pool_destroy();
 failed_init:
 	while (--group >= 0) {
 		struct xen_netbk *netbk = &xen_netbk[group];
-		for (i = 0; i < MAX_PENDING_REQS; i++) {
-			if (netbk->mmap_pages[i])
-				__free_page(netbk->mmap_pages[i]);
-		}
 		del_timer(&netbk->net_timer);
 		kthread_stop(netbk->task);
 	}
diff --git a/drivers/net/xen-netback/page_pool.c b/drivers/net/xen-netback/page_pool.c
new file mode 100644
index 0000000..294f48b
--- /dev/null
+++ b/drivers/net/xen-netback/page_pool.c
@@ -0,0 +1,185 @@
+/*
+ * Global page pool for netback.
+ *
+ * Wei Liu <wei.liu2@citrix.com>
+ * Copyright (c) Citrix Systems
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+#include "page_pool.h"
+#include <asm/xen/page.h>
+
+static idx_t free_head;
+static int free_count;
+static unsigned long pool_size;
+static DEFINE_SPINLOCK(pool_lock);
+static struct page_pool_entry *pool;
+
+static int get_free_entry(void)
+{
+	int idx;
+
+	spin_lock(&pool_lock);
+
+	if (free_count == 0) {
+		spin_unlock(&pool_lock);
+		return -ENOSPC;
+	}
+
+	idx = free_head;
+	free_count--;
+	free_head = pool[idx].u.fl;
+	pool[idx].u.fl = INVALID_ENTRY;
+
+	spin_unlock(&pool_lock);
+
+	return idx;
+}
+
+static void put_free_entry(idx_t idx)
+{
+	spin_lock(&pool_lock);
+
+	pool[idx].u.fl = free_head;
+	free_head = idx;
+	free_count++;
+
+	spin_unlock(&pool_lock);
+}
+
+static inline void set_page_ext(struct page *pg, unsigned int idx)
+{
+	union page_ext ext = { .idx = idx };
+
+	BUILD_BUG_ON(sizeof(ext) > sizeof(ext.mapping));
+	pg->mapping = ext.mapping;
+}
+
+static int get_page_ext(struct page *pg, unsigned int *pidx)
+{
+	union page_ext ext = { .mapping = pg->mapping };
+	int idx;
+
+	idx = ext.idx;
+
+	if ((idx < 0) || (idx >= pool_size))
+		return 0;
+
+	if (pool[idx].page != pg)
+		return 0;
+
+	*pidx = idx;
+
+	return 1;
+}
+
+int is_in_pool(struct page *page, int *pidx)
+{
+	return get_page_ext(page, pidx);
+}
+
+struct page *page_pool_get(struct xen_netbk *netbk, int *pidx)
+{
+	int idx;
+	struct page *page;
+
+	idx = get_free_entry();
+	if (idx < 0)
+		return NULL;
+	page = alloc_page(GFP_ATOMIC);
+
+	if (page == NULL) {
+		put_free_entry(idx);
+		return NULL;
+	}
+
+	set_page_ext(page, idx);
+	pool[idx].u.netbk = netbk;
+	pool[idx].page = page;
+
+	*pidx = idx;
+
+	return page;
+}
+
+void page_pool_put(int idx)
+{
+	struct page *page = pool[idx].page;
+
+	pool[idx].page = NULL;
+	pool[idx].u.netbk = NULL;
+	page->mapping = 0;
+	put_page(page);
+	put_free_entry(idx);
+}
+
+int page_pool_init()
+{
+	int cpus = 0;
+	int i;
+
+	cpus = num_online_cpus();
+	pool_size = cpus * ENTRIES_PER_CPU;
+
+	pool = vzalloc(sizeof(struct page_pool_entry) * pool_size);
+
+	if (!pool)
+		return -ENOMEM;
+
+	for (i = 0; i < pool_size - 1; i++)
+		pool[i].u.fl = i+1;
+	pool[pool_size-1].u.fl = INVALID_ENTRY;
+	free_count = pool_size;
+	free_head = 0;
+
+	return 0;
+}
+
+void page_pool_destroy()
+{
+	int i;
+	for (i = 0; i < pool_size; i++)
+		if (pool[i].page)
+			put_page(pool[i].page);
+
+	vfree(pool);
+}
+
+struct page *to_page(int idx)
+{
+	return pool[idx].page;
+}
+
+struct xen_netbk *to_netbk(int idx)
+{
+	return pool[idx].u.netbk;
+}
+
+struct pending_tx_info *to_txinfo(int idx)
+{
+	return &pool[idx].tx_info;
+}
diff --git a/drivers/net/xen-netback/page_pool.h b/drivers/net/xen-netback/page_pool.h
new file mode 100644
index 0000000..572b037
--- /dev/null
+++ b/drivers/net/xen-netback/page_pool.h
@@ -0,0 +1,63 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __PAGE_POOL_H__
+#define __PAGE_POOL_H__
+
+#include "common.h"
+
+typedef uint32_t idx_t;
+
+#define ENTRIES_PER_CPU (1024)
+#define INVALID_ENTRY 0xffffffff
+
+struct page_pool_entry {
+	struct page *page;
+	struct pending_tx_info tx_info;
+	union {
+		struct xen_netbk *netbk;
+		idx_t             fl;
+	} u;
+};
+
+union page_ext {
+	idx_t idx;
+	void *mapping;
+};
+
+int  page_pool_init(void);
+void page_pool_destroy(void);
+
+
+struct page *page_pool_get(struct xen_netbk *netbk, int *pidx);
+void         page_pool_put(int idx);
+int          is_in_pool(struct page *page, int *pidx);
+
+struct page            *to_page(int idx);
+struct xen_netbk       *to_netbk(int idx);
+struct pending_tx_info *to_txinfo(int idx);
+
+#endif /* __PAGE_POOL_H__ */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 02/16] netback: add module unload function.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 01/16] netback: page pool version 1 Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 03/16] netback: switch to NAPI + kthread model Wei Liu
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Enables users to unload netback module.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h  |    1 +
 drivers/net/xen-netback/netback.c |   14 ++++++++++++++
 drivers/net/xen-netback/xenbus.c  |    5 +++++
 3 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 288b2f3..372c7f5 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -126,6 +126,7 @@ void xenvif_get(struct xenvif *vif);
 void xenvif_put(struct xenvif *vif);
 
 int xenvif_xenbus_init(void);
+void xenvif_xenbus_exit(void);
 
 int xenvif_schedulable(struct xenvif *vif);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index d11205f..3059684 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1670,5 +1670,19 @@ failed_init:
 
 module_init(netback_init);
 
+static void __exit netback_exit(void)
+{
+	int i;
+	xenvif_xenbus_exit();
+	for (i = 0; i < xen_netbk_group_nr; i++) {
+		struct xen_netbk *netbk = &xen_netbk[i];
+		del_timer_sync(&netbk->net_timer);
+		kthread_stop(netbk->task);
+	}
+	vfree(xen_netbk);
+	page_pool_destroy();
+}
+module_exit(netback_exit);
+
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_ALIAS("xen-backend:vif");
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 410018c..65d14f2 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -485,3 +485,8 @@ int xenvif_xenbus_init(void)
 {
 	return xenbus_register_backend(&netback_driver);
 }
+
+void xenvif_xenbus_exit(void)
+{
+	return xenbus_unregister_driver(&netback_driver);
+}
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 03/16] netback: switch to NAPI + kthread model
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 01/16] netback: page pool version 1 Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 02/16] netback: add module unload function Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space Wei Liu
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

This patch implements 1:1 model netback. We utilizes NAPI and kthread
to do the weight-lifting job:

  - NAPI is used for guest side TX (host side RX)
  - kthread is used for guest side RX (host side TX)

This model provides better scheduling fairness among vifs. It also
lays the foundation for future work.

The major defect for the current implementation is that in the NAPI
poll handler we don't actually disable frontend from generating
interrupt. Any tuning with ring pointers will come in as separated
commit.

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |   34 ++--
 drivers/net/xen-netback/interface.c |   92 ++++++---
 drivers/net/xen-netback/netback.c   |  367 ++++++++++-------------------------
 drivers/net/xen-netback/xenbus.c    |    1 -
 4 files changed, 186 insertions(+), 308 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 372c7f5..31c331c 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -47,7 +47,6 @@
 
 struct pending_tx_info {
 	struct xen_netif_tx_request req;
-	struct xenvif *vif;
 };
 typedef unsigned int pending_ring_idx_t;
 
@@ -61,14 +60,17 @@ struct xenvif {
 	/* Reference to netback processing backend. */
 	struct xen_netbk *netbk;
 
+	/* Use NAPI for guest TX */
+	struct napi_struct napi;
+	/* Use kthread for guest RX */
+	struct task_struct *task;
+	wait_queue_head_t wq;
+
 	u8               fe_dev_addr[6];
 
 	/* Physical parameters of the comms window. */
 	unsigned int     irq;
 
-	/* List of frontends to notify after a batch of frames sent. */
-	struct list_head notify_list;
-
 	/* The shared rings and indexes. */
 	struct xen_netif_tx_back_ring tx;
 	struct xen_netif_rx_back_ring rx;
@@ -99,11 +101,7 @@ struct xenvif {
 	unsigned long rx_gso_checksum_fixup;
 
 	/* Miscellaneous private stuff. */
-	struct list_head schedule_list;
-	atomic_t         refcnt;
 	struct net_device *dev;
-
-	wait_queue_head_t waiting_to_free;
 };
 
 static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif)
@@ -122,9 +120,6 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		   unsigned long rx_ring_ref, unsigned int evtchn);
 void xenvif_disconnect(struct xenvif *vif);
 
-void xenvif_get(struct xenvif *vif);
-void xenvif_put(struct xenvif *vif);
-
 int xenvif_xenbus_init(void);
 void xenvif_xenbus_exit(void);
 
@@ -140,14 +135,6 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 				 grant_ref_t tx_ring_ref,
 				 grant_ref_t rx_ring_ref);
 
-/* (De)Register a xenvif with the netback backend. */
-void xen_netbk_add_xenvif(struct xenvif *vif);
-void xen_netbk_remove_xenvif(struct xenvif *vif);
-
-/* (De)Schedule backend processing for a xenvif */
-void xen_netbk_schedule_xenvif(struct xenvif *vif);
-void xen_netbk_deschedule_xenvif(struct xenvif *vif);
-
 /* Check for SKBs from frontend and schedule backend processing */
 void xen_netbk_check_rx_xenvif(struct xenvif *vif);
 /* Receive an SKB from the frontend */
@@ -161,4 +148,13 @@ void xenvif_notify_tx_completion(struct xenvif *vif);
 /* Returns number of ring slots required to send an skb to the frontend */
 unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb);
 
+/* Allocate and free xen_netbk structure */
+struct xen_netbk *xen_netbk_alloc_netbk(struct xenvif *vif);
+void xen_netbk_free_netbk(struct xen_netbk *netbk);
+
+void xen_netbk_tx_action(struct xen_netbk *netbk, int *work_done, int budget);
+void xen_netbk_rx_action(struct xen_netbk *netbk);
+
+int xen_netbk_kthread(void *data);
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1825629..dfc04f8 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -30,6 +30,7 @@
 
 #include "common.h"
 
+#include <linux/kthread.h>
 #include <linux/ethtool.h>
 #include <linux/rtnetlink.h>
 #include <linux/if_vlan.h>
@@ -38,17 +39,7 @@
 #include <asm/xen/hypercall.h>
 
 #define XENVIF_QUEUE_LENGTH 32
-
-void xenvif_get(struct xenvif *vif)
-{
-	atomic_inc(&vif->refcnt);
-}
-
-void xenvif_put(struct xenvif *vif)
-{
-	if (atomic_dec_and_test(&vif->refcnt))
-		wake_up(&vif->waiting_to_free);
-}
+#define XENVIF_NAPI_WEIGHT  64
 
 int xenvif_schedulable(struct xenvif *vif)
 {
@@ -67,14 +58,37 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	if (vif->netbk == NULL)
 		return IRQ_NONE;
 
-	xen_netbk_schedule_xenvif(vif);
-
 	if (xenvif_rx_schedulable(vif))
 		netif_wake_queue(vif->dev);
 
+	if (likely(napi_schedule_prep(&vif->napi)))
+		__napi_schedule(&vif->napi);
+
 	return IRQ_HANDLED;
 }
 
+static int xenvif_poll(struct napi_struct *napi, int budget)
+{
+	struct xenvif *vif = container_of(napi, struct xenvif, napi);
+	int work_done = 0;
+
+	xen_netbk_tx_action(vif->netbk, &work_done, budget);
+
+	if (work_done < budget) {
+		int more_to_do = 0;
+		unsigned long flag;
+		local_irq_save(flag);
+
+		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
+		if (!more_to_do)
+			__napi_complete(napi);
+
+		local_irq_restore(flag);
+	}
+
+	return work_done;
+}
+
 static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
@@ -90,7 +104,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	/* Reserve ring slots for the worst-case number of fragments. */
 	vif->rx_req_cons_peek += xen_netbk_count_skb_slots(vif, skb);
-	xenvif_get(vif);
 
 	if (vif->can_queue && xen_netbk_must_stop_queue(vif))
 		netif_stop_queue(dev);
@@ -107,7 +120,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb)
 {
-	netif_rx_ni(skb);
+	netif_receive_skb(skb);
 }
 
 void xenvif_notify_tx_completion(struct xenvif *vif)
@@ -124,16 +137,15 @@ static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
 
 static void xenvif_up(struct xenvif *vif)
 {
-	xen_netbk_add_xenvif(vif);
+	napi_enable(&vif->napi);
 	enable_irq(vif->irq);
 	xen_netbk_check_rx_xenvif(vif);
 }
 
 static void xenvif_down(struct xenvif *vif)
 {
+	napi_disable(&vif->napi);
 	disable_irq(vif->irq);
-	xen_netbk_deschedule_xenvif(vif);
-	xen_netbk_remove_xenvif(vif);
 }
 
 static int xenvif_open(struct net_device *dev)
@@ -259,14 +271,11 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif = netdev_priv(dev);
 	vif->domid  = domid;
 	vif->handle = handle;
-	vif->netbk  = NULL;
+	vif->netbk = NULL;
+
 	vif->can_sg = 1;
 	vif->csum = 1;
-	atomic_set(&vif->refcnt, 1);
-	init_waitqueue_head(&vif->waiting_to_free);
 	vif->dev = dev;
-	INIT_LIST_HEAD(&vif->schedule_list);
-	INIT_LIST_HEAD(&vif->notify_list);
 
 	vif->credit_bytes = vif->remaining_credit = ~0UL;
 	vif->credit_usec  = 0UL;
@@ -290,6 +299,8 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	memset(dev->dev_addr, 0xFF, ETH_ALEN);
 	dev->dev_addr[0] &= ~0x01;
 
+	netif_napi_add(dev, &vif->napi, xenvif_poll, XENVIF_NAPI_WEIGHT);
+
 	netif_carrier_off(dev);
 
 	err = register_netdev(dev);
@@ -324,7 +335,23 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	vif->irq = err;
 	disable_irq(vif->irq);
 
-	xenvif_get(vif);
+	vif->netbk = xen_netbk_alloc_netbk(vif);
+	if (!vif->netbk) {
+		pr_warn("Could not allocate xen_netbk\n");
+		err = -ENOMEM;
+		goto err_unbind;
+	}
+
+
+	init_waitqueue_head(&vif->wq);
+	vif->task = kthread_create(xen_netbk_kthread,
+				   (void *)vif,
+				   "vif%d.%d", vif->domid, vif->handle);
+	if (IS_ERR(vif->task)) {
+		pr_warn("Could not create kthread\n");
+		err = PTR_ERR(vif->task);
+		goto err_free_netbk;
+	}
 
 	rtnl_lock();
 	if (!vif->can_sg && vif->dev->mtu > ETH_DATA_LEN)
@@ -335,7 +362,13 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 		xenvif_up(vif);
 	rtnl_unlock();
 
+	wake_up_process(vif->task);
+
 	return 0;
+err_free_netbk:
+	xen_netbk_free_netbk(vif->netbk);
+err_unbind:
+	unbind_from_irqhandler(vif->irq, vif);
 err_unmap:
 	xen_netbk_unmap_frontend_rings(vif);
 err:
@@ -345,17 +378,22 @@ err:
 void xenvif_disconnect(struct xenvif *vif)
 {
 	struct net_device *dev = vif->dev;
+
 	if (netif_carrier_ok(dev)) {
 		rtnl_lock();
 		netif_carrier_off(dev); /* discard queued packets */
 		if (netif_running(dev))
 			xenvif_down(vif);
 		rtnl_unlock();
-		xenvif_put(vif);
 	}
 
-	atomic_dec(&vif->refcnt);
-	wait_event(vif->waiting_to_free, atomic_read(&vif->refcnt) == 0);
+	if (vif->task)
+		kthread_stop(vif->task);
+
+	if (vif->netbk)
+		xen_netbk_free_netbk(vif->netbk);
+
+	netif_napi_del(&vif->napi);
 
 	del_timer_sync(&vif->credit_timeout);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 3059684..9a72993 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -61,24 +61,15 @@ struct netbk_rx_meta {
 #define MAX_BUFFER_OFFSET PAGE_SIZE
 
 struct xen_netbk {
-	wait_queue_head_t wq;
-	struct task_struct *task;
-
 	struct sk_buff_head rx_queue;
 	struct sk_buff_head tx_queue;
 
-	struct timer_list net_timer;
-
 	idx_t mmap_pages[MAX_PENDING_REQS];
 
 	pending_ring_idx_t pending_prod;
 	pending_ring_idx_t pending_cons;
-	struct list_head net_schedule_list;
-
-	/* Protect the net_schedule_list in netif. */
-	spinlock_t net_schedule_list_lock;
 
-	atomic_t netfront_count;
+	struct xenvif *vif;
 
 	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
 
@@ -93,42 +84,14 @@ struct xen_netbk {
 	struct netbk_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE];
 };
 
-static struct xen_netbk *xen_netbk;
-static int xen_netbk_group_nr;
-
-void xen_netbk_add_xenvif(struct xenvif *vif)
-{
-	int i;
-	int min_netfront_count;
-	int min_group = 0;
-	struct xen_netbk *netbk;
-
-	min_netfront_count = atomic_read(&xen_netbk[0].netfront_count);
-	for (i = 0; i < xen_netbk_group_nr; i++) {
-		int netfront_count = atomic_read(&xen_netbk[i].netfront_count);
-		if (netfront_count < min_netfront_count) {
-			min_group = i;
-			min_netfront_count = netfront_count;
-		}
-	}
-
-	netbk = &xen_netbk[min_group];
-
-	vif->netbk = netbk;
-	atomic_inc(&netbk->netfront_count);
-}
-
-void xen_netbk_remove_xenvif(struct xenvif *vif)
-{
-	struct xen_netbk *netbk = vif->netbk;
-	vif->netbk = NULL;
-	atomic_dec(&netbk->netfront_count);
-}
-
 static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx);
 static void make_tx_response(struct xenvif *vif,
 			     struct xen_netif_tx_request *txp,
 			     s8       st);
+
+static inline int tx_work_todo(struct xen_netbk *netbk);
+static inline int rx_work_todo(struct xen_netbk *netbk);
+
 static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 					     u16      id,
 					     s8       st,
@@ -179,11 +142,6 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xen_netbk *netbk)
 		netbk->pending_prod + netbk->pending_cons;
 }
 
-static void xen_netbk_kick_thread(struct xen_netbk *netbk)
-{
-	wake_up(&netbk->wq);
-}
-
 static int max_required_rx_slots(struct xenvif *vif)
 {
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
@@ -368,8 +326,9 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->flags = GNTCOPY_dest_gref;
 		if (foreign) {
 			struct pending_tx_info *src_pend = to_txinfo(idx);
+			struct xen_netbk *rnetbk = to_netbk(idx);
 
-			copy_gop->source.domid = src_pend->vif->domid;
+			copy_gop->source.domid = rnetbk->vif->domid;
 			copy_gop->source.u.ref = src_pend->req.gref;
 			copy_gop->flags |= GNTCOPY_source_gref;
 		} else {
@@ -527,11 +486,18 @@ struct skb_cb_overlay {
 	int meta_slots_used;
 };
 
-static void xen_netbk_rx_action(struct xen_netbk *netbk)
+static void xen_netbk_kick_thread(struct xen_netbk *netbk)
 {
-	struct xenvif *vif = NULL, *tmp;
+	struct xenvif *vif = netbk->vif;
+
+	wake_up(&vif->wq);
+}
+
+void xen_netbk_rx_action(struct xen_netbk *netbk)
+{
+	struct xenvif *vif = NULL;
 	s8 status;
-	u16 irq, flags;
+	u16 flags;
 	struct xen_netif_rx_response *resp;
 	struct sk_buff_head rxq;
 	struct sk_buff *skb;
@@ -541,6 +507,7 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 	int count;
 	unsigned long offset;
 	struct skb_cb_overlay *sco;
+	int need_to_notify = 0;
 
 	struct netrx_pending_operations npo = {
 		.copy  = netbk->grant_copy_op,
@@ -641,25 +608,19 @@ static void xen_netbk_rx_action(struct xen_netbk *netbk)
 					 sco->meta_slots_used);
 
 		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret);
-		irq = vif->irq;
-		if (ret && list_empty(&vif->notify_list))
-			list_add_tail(&vif->notify_list, &notify);
+		if (ret)
+			need_to_notify = 1;
 
 		xenvif_notify_tx_completion(vif);
 
-		xenvif_put(vif);
 		npo.meta_cons += sco->meta_slots_used;
 		dev_kfree_skb(skb);
 	}
 
-	list_for_each_entry_safe(vif, tmp, &notify, notify_list) {
+	if (need_to_notify)
 		notify_remote_via_irq(vif->irq);
-		list_del_init(&vif->notify_list);
-	}
 
-	/* More work to do? */
-	if (!skb_queue_empty(&netbk->rx_queue) &&
-			!timer_pending(&netbk->net_timer))
+	if (!skb_queue_empty(&netbk->rx_queue))
 		xen_netbk_kick_thread(netbk);
 }
 
@@ -672,86 +633,17 @@ void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
 	xen_netbk_kick_thread(netbk);
 }
 
-static void xen_netbk_alarm(unsigned long data)
-{
-	struct xen_netbk *netbk = (struct xen_netbk *)data;
-	xen_netbk_kick_thread(netbk);
-}
-
-static int __on_net_schedule_list(struct xenvif *vif)
-{
-	return !list_empty(&vif->schedule_list);
-}
-
-/* Must be called with net_schedule_list_lock held */
-static void remove_from_net_schedule_list(struct xenvif *vif)
-{
-	if (likely(__on_net_schedule_list(vif))) {
-		list_del_init(&vif->schedule_list);
-		xenvif_put(vif);
-	}
-}
-
-static struct xenvif *poll_net_schedule_list(struct xen_netbk *netbk)
-{
-	struct xenvif *vif = NULL;
-
-	spin_lock_irq(&netbk->net_schedule_list_lock);
-	if (list_empty(&netbk->net_schedule_list))
-		goto out;
-
-	vif = list_first_entry(&netbk->net_schedule_list,
-			       struct xenvif, schedule_list);
-	if (!vif)
-		goto out;
-
-	xenvif_get(vif);
-
-	remove_from_net_schedule_list(vif);
-out:
-	spin_unlock_irq(&netbk->net_schedule_list_lock);
-	return vif;
-}
-
-void xen_netbk_schedule_xenvif(struct xenvif *vif)
-{
-	unsigned long flags;
-	struct xen_netbk *netbk = vif->netbk;
-
-	if (__on_net_schedule_list(vif))
-		goto kick;
-
-	spin_lock_irqsave(&netbk->net_schedule_list_lock, flags);
-	if (!__on_net_schedule_list(vif) &&
-	    likely(xenvif_schedulable(vif))) {
-		list_add_tail(&vif->schedule_list, &netbk->net_schedule_list);
-		xenvif_get(vif);
-	}
-	spin_unlock_irqrestore(&netbk->net_schedule_list_lock, flags);
-
-kick:
-	smp_mb();
-	if ((nr_pending_reqs(netbk) < (MAX_PENDING_REQS/2)) &&
-	    !list_empty(&netbk->net_schedule_list))
-		xen_netbk_kick_thread(netbk);
-}
-
-void xen_netbk_deschedule_xenvif(struct xenvif *vif)
-{
-	struct xen_netbk *netbk = vif->netbk;
-	spin_lock_irq(&netbk->net_schedule_list_lock);
-	remove_from_net_schedule_list(vif);
-	spin_unlock_irq(&netbk->net_schedule_list_lock);
-}
-
 void xen_netbk_check_rx_xenvif(struct xenvif *vif)
 {
 	int more_to_do;
 
 	RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
 
+	/* In this check function, we are supposed to do be's rx,
+	 * which means fe's tx */
+
 	if (more_to_do)
-		xen_netbk_schedule_xenvif(vif);
+		napi_schedule(&vif->napi);
 }
 
 static void tx_add_credit(struct xenvif *vif)
@@ -794,7 +686,6 @@ static void netbk_tx_err(struct xenvif *vif,
 	} while (1);
 	vif->tx.req_cons = cons;
 	xen_netbk_check_rx_xenvif(vif);
-	xenvif_put(vif);
 }
 
 static int netbk_count_requests(struct xenvif *vif,
@@ -894,8 +785,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 		gop++;
 
 		memcpy(&pending_tx_info->req, txp, sizeof(*txp));
-		xenvif_get(vif);
-		pending_tx_info->vif = vif;
+
 		frag_set_pending_idx(&frags[i], pending_idx);
 	}
 
@@ -910,7 +800,8 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	u16 pending_idx = *((u16 *)skb->data);
 	struct pending_tx_info *pending_tx_info;
 	int idx;
-	struct xenvif *vif = NULL;
+	struct xenvif *vif = netbk->vif;
+
 	struct xen_netif_tx_request *txp;
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -924,10 +815,8 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 		idx = netbk->mmap_pages[index];
 		pending_tx_info = to_txinfo(idx);
 		txp = &pending_tx_info->req;
-		vif = pending_tx_info->vif;
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
 		netbk->pending_ring[index] = pending_idx;
-		xenvif_put(vif);
 	}
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -951,11 +840,9 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 		/* Error on this fragment: respond to client with an error. */
 		idx = netbk->mmap_pages[pending_idx];
 		txp = &to_txinfo(idx)->req;
-		vif = to_txinfo(idx)->vif;
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
 		index = pending_index(netbk->pending_prod++);
 		netbk->pending_ring[index] = pending_idx;
-		xenvif_put(vif);
 
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
@@ -1171,10 +1058,9 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 	struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop;
 	struct sk_buff *skb;
 	int ret;
+	struct xenvif *vif = netbk->vif;
 
-	while (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
-		!list_empty(&netbk->net_schedule_list)) {
-		struct xenvif *vif;
+	while ((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[MAX_SKB_FRAGS];
 		struct page *page;
@@ -1187,26 +1073,19 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		int pool_idx;
 		struct pending_tx_info *pending_tx_info;
 
-		/* Get a netif from the list with work to do. */
-		vif = poll_net_schedule_list(netbk);
-		if (!vif)
-			continue;
-
 		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, work_to_do);
 		if (!work_to_do) {
-			xenvif_put(vif);
-			continue;
+			break;
 		}
 
 		idx = vif->tx.req_cons;
 		rmb(); /* Ensure that we see the request before we copy it. */
 		memcpy(&txreq, RING_GET_REQUEST(&vif->tx, idx), sizeof(txreq));
 
-		/* Credit-based scheduling. */
+		/* Credit-based traffic shaping. */
 		if (txreq.size > vif->remaining_credit &&
 		    tx_credit_exceeded(vif, txreq.size)) {
-			xenvif_put(vif);
-			continue;
+			break;
 		}
 
 		vif->remaining_credit -= txreq.size;
@@ -1221,14 +1100,14 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 			idx = vif->tx.req_cons;
 			if (unlikely(work_to_do < 0)) {
 				netbk_tx_err(vif, &txreq, idx);
-				continue;
+				break;
 			}
 		}
 
 		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
 		if (unlikely(ret < 0)) {
 			netbk_tx_err(vif, &txreq, idx - ret);
-			continue;
+			break;
 		}
 		idx += ret;
 
@@ -1236,7 +1115,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 			netdev_dbg(vif->dev,
 				   "Bad packet size: %d\n", txreq.size);
 			netbk_tx_err(vif, &txreq, idx);
-			continue;
+			break;
 		}
 
 		/* No crossing a page as the payload mustn't fragment. */
@@ -1246,7 +1125,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 				   txreq.offset, txreq.size,
 				   (txreq.offset&~PAGE_MASK) + txreq.size);
 			netbk_tx_err(vif, &txreq, idx);
-			continue;
+			break;
 		}
 
 		index = pending_index(netbk->pending_cons);
@@ -1275,7 +1154,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 			if (netbk_set_skb_gso(vif, skb, gso)) {
 				kfree_skb(skb);
 				netbk_tx_err(vif, &txreq, idx);
-				continue;
+				break;
 			}
 		}
 
@@ -1284,7 +1163,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		if (!page) {
 			kfree_skb(skb);
 			netbk_tx_err(vif, &txreq, idx);
-			continue;
+			break;
 		}
 
 		gop->source.u.ref = txreq.gref;
@@ -1305,7 +1184,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 
 		memcpy(&pending_tx_info->req,
 		       &txreq, sizeof(txreq));
-		pending_tx_info->vif = vif;
+
 		*((u16 *)skb->data) = pending_idx;
 
 		__skb_put(skb, data_len);
@@ -1329,7 +1208,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		if (request_gop == NULL) {
 			kfree_skb(skb);
 			netbk_tx_err(vif, &txreq, idx);
-			continue;
+			break;
 		}
 		gop = request_gop;
 
@@ -1343,14 +1222,16 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 	return gop - netbk->tx_copy_ops;
 }
 
-static void xen_netbk_tx_submit(struct xen_netbk *netbk)
+static void xen_netbk_tx_submit(struct xen_netbk *netbk,
+				int *work_done, int budget)
 {
 	struct gnttab_copy *gop = netbk->tx_copy_ops;
 	struct sk_buff *skb;
+	struct xenvif *vif = netbk->vif;
 
-	while ((skb = __skb_dequeue(&netbk->tx_queue)) != NULL) {
+	while ((*work_done < budget) &&
+	       (skb = __skb_dequeue(&netbk->tx_queue)) != NULL) {
 		struct xen_netif_tx_request *txp;
-		struct xenvif *vif;
 		u16 pending_idx;
 		unsigned data_len;
 		int idx;
@@ -1361,7 +1242,6 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
 		idx = netbk->mmap_pages[pending_idx];
 		pending_tx_info = to_txinfo(idx);
 
-		vif = pending_tx_info->vif;
 		txp = &pending_tx_info->req;
 
 		/* Check the remap error code. */
@@ -1415,16 +1295,21 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk)
 		vif->dev->stats.rx_bytes += skb->len;
 		vif->dev->stats.rx_packets++;
 
+		(*work_done)++;
+
 		xenvif_receive_skb(vif, skb);
 	}
 }
 
 /* Called after netfront has transmitted */
-static void xen_netbk_tx_action(struct xen_netbk *netbk)
+void xen_netbk_tx_action(struct xen_netbk *netbk, int *work_done, int budget)
 {
 	unsigned nr_gops;
 	int ret;
 
+	if (unlikely(!tx_work_todo(netbk)))
+		return;
+
 	nr_gops = xen_netbk_tx_build_gops(netbk);
 
 	if (nr_gops == 0)
@@ -1433,13 +1318,12 @@ static void xen_netbk_tx_action(struct xen_netbk *netbk)
 					netbk->tx_copy_ops, nr_gops);
 	BUG_ON(ret);
 
-	xen_netbk_tx_submit(netbk);
-
+	xen_netbk_tx_submit(netbk, work_done, budget);
 }
 
 static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
 {
-	struct xenvif *vif;
+	struct xenvif *vif = netbk->vif;
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
 	int idx;
@@ -1451,15 +1335,11 @@ static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
 	idx = netbk->mmap_pages[pending_idx];
 	pending_tx_info = to_txinfo(idx);
 
-	vif = pending_tx_info->vif;
-
 	make_tx_response(vif, &pending_tx_info->req, XEN_NETIF_RSP_OKAY);
 
 	index = pending_index(netbk->pending_prod++);
 	netbk->pending_ring[index] = pending_idx;
 
-	xenvif_put(vif);
-
 	page_pool_put(netbk->mmap_pages[pending_idx]);
 
 	netbk->mmap_pages[pending_idx] = INVALID_ENTRY;
@@ -1516,37 +1396,13 @@ static inline int rx_work_todo(struct xen_netbk *netbk)
 
 static inline int tx_work_todo(struct xen_netbk *netbk)
 {
-
-	if (((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) &&
-			!list_empty(&netbk->net_schedule_list))
+	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&netbk->vif->tx)) &&
+	    (nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS)
 		return 1;
 
 	return 0;
 }
 
-static int xen_netbk_kthread(void *data)
-{
-	struct xen_netbk *netbk = data;
-	while (!kthread_should_stop()) {
-		wait_event_interruptible(netbk->wq,
-				rx_work_todo(netbk) ||
-				tx_work_todo(netbk) ||
-				kthread_should_stop());
-		cond_resched();
-
-		if (kthread_should_stop())
-			break;
-
-		if (rx_work_todo(netbk))
-			xen_netbk_rx_action(netbk);
-
-		if (tx_work_todo(netbk))
-			xen_netbk_tx_action(netbk);
-	}
-
-	return 0;
-}
-
 void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
 {
 	if (vif->tx.sring)
@@ -1592,78 +1448,74 @@ err:
 	return err;
 }
 
-static int __init netback_init(void)
+struct xen_netbk *xen_netbk_alloc_netbk(struct xenvif *vif)
 {
 	int i;
-	int rc = 0;
-	int group;
-
-	if (!xen_domain())
-		return -ENODEV;
+	struct xen_netbk *netbk;
 
-	xen_netbk_group_nr = num_online_cpus();
-	xen_netbk = vzalloc(sizeof(struct xen_netbk) * xen_netbk_group_nr);
-	if (!xen_netbk) {
+	netbk = vzalloc(sizeof(struct xen_netbk));
+	if (!netbk) {
 		printk(KERN_ALERT "%s: out of memory\n", __func__);
-		return -ENOMEM;
+		return NULL;
 	}
 
-	for (group = 0; group < xen_netbk_group_nr; group++) {
-		struct xen_netbk *netbk = &xen_netbk[group];
-		skb_queue_head_init(&netbk->rx_queue);
-		skb_queue_head_init(&netbk->tx_queue);
-
-		init_timer(&netbk->net_timer);
-		netbk->net_timer.data = (unsigned long)netbk;
-		netbk->net_timer.function = xen_netbk_alarm;
-
-		netbk->pending_cons = 0;
-		netbk->pending_prod = MAX_PENDING_REQS;
-		for (i = 0; i < MAX_PENDING_REQS; i++)
-			netbk->pending_ring[i] = i;
-
-		init_waitqueue_head(&netbk->wq);
-		netbk->task = kthread_create(xen_netbk_kthread,
-					     (void *)netbk,
-					     "netback/%u", group);
-
-		if (IS_ERR(netbk->task)) {
-			printk(KERN_ALERT "kthread_create() fails at netback\n");
-			del_timer(&netbk->net_timer);
-			rc = PTR_ERR(netbk->task);
-			goto failed_init;
-		}
+	netbk->vif = vif;
 
-		kthread_bind(netbk->task, group);
+	skb_queue_head_init(&netbk->rx_queue);
+	skb_queue_head_init(&netbk->tx_queue);
 
-		INIT_LIST_HEAD(&netbk->net_schedule_list);
+	netbk->pending_cons = 0;
+	netbk->pending_prod = MAX_PENDING_REQS;
+	for (i = 0; i < MAX_PENDING_REQS; i++)
+		netbk->pending_ring[i] = i;
 
-		spin_lock_init(&netbk->net_schedule_list_lock);
+	for (i = 0; i < MAX_PENDING_REQS; i++)
+		netbk->mmap_pages[i] = INVALID_ENTRY;
 
-		atomic_set(&netbk->netfront_count, 0);
+	return netbk;
+}
 
-		wake_up_process(netbk->task);
+void xen_netbk_free_netbk(struct xen_netbk *netbk)
+{
+	vfree(netbk);
+}
+
+int xen_netbk_kthread(void *data)
+{
+	struct xenvif *vif = data;
+	struct xen_netbk *netbk = vif->netbk;
+
+	while (!kthread_should_stop()) {
+		wait_event_interruptible(vif->wq,
+					 rx_work_todo(netbk) ||
+					 kthread_should_stop());
+		cond_resched();
+
+		if (kthread_should_stop())
+			break;
+
+		if (rx_work_todo(netbk))
+			xen_netbk_rx_action(netbk);
 	}
 
+	return 0;
+}
+
+
+static int __init netback_init(void)
+{
+	int rc = 0;
+
+	if (!xen_domain())
+		return -ENODEV;
+
 	rc = page_pool_init();
 	if (rc)
 		goto failed_init;
 
-	rc = xenvif_xenbus_init();
-	if (rc)
-		goto pool_failed_init;
-
-	return 0;
+	return xenvif_xenbus_init();
 
-pool_failed_init:
-	page_pool_destroy();
 failed_init:
-	while (--group >= 0) {
-		struct xen_netbk *netbk = &xen_netbk[group];
-		del_timer(&netbk->net_timer);
-		kthread_stop(netbk->task);
-	}
-	vfree(xen_netbk);
 	return rc;
 
 }
@@ -1672,14 +1524,7 @@ module_init(netback_init);
 
 static void __exit netback_exit(void)
 {
-	int i;
 	xenvif_xenbus_exit();
-	for (i = 0; i < xen_netbk_group_nr; i++) {
-		struct xen_netbk *netbk = &xen_netbk[i];
-		del_timer_sync(&netbk->net_timer);
-		kthread_stop(netbk->task);
-	}
-	vfree(xen_netbk);
 	page_pool_destroy();
 }
 module_exit(netback_exit);
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 65d14f2..f1e89ca 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -387,7 +387,6 @@ static void connect(struct backend_info *be)
 	netif_wake_queue(be->vif->dev);
 }
 
-
 static int connect_rings(struct backend_info *be)
 {
 	struct xenvif *vif = be->vif;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (2 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 03/16] netback: switch to NAPI + kthread model Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 16:49   ` Viral Mehta
  2012-01-30 14:45 ` [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect Wei Liu
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

In the 1:1 model, given that there are maximum nr_online_cpus netbacks
running, we can use per-cpu scratch space, thus shrinking size of
struct xen_netbk.

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h  |   13 ++++
 drivers/net/xen-netback/netback.c |  134 ++++++++++++++++++++++++-------------
 2 files changed, 100 insertions(+), 47 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 31c331c..3b85563 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -45,6 +45,19 @@
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 
+struct netbk_rx_meta {
+	int id;
+	int size;
+	int gso_size;
+};
+
+#define MAX_PENDING_REQS 256
+
+/* Discriminate from any valid pending_idx value. */
+#define INVALID_PENDING_IDX 0xFFFF
+
+#define MAX_BUFFER_OFFSET PAGE_SIZE
+
 struct pending_tx_info {
 	struct xen_netif_tx_request req;
 };
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 9a72993..1c68afb 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1,3 +1,4 @@
+
 /*
  * Back-end of the driver for virtual network devices. This portion of the
  * driver exports a 'unified' network-device interface that can be accessed
@@ -47,18 +48,17 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
-struct netbk_rx_meta {
-	int id;
-	int size;
-	int gso_size;
-};
 
-#define MAX_PENDING_REQS 256
+struct gnttab_copy *tx_copy_ops;
 
-/* Discriminate from any valid pending_idx value. */
-#define INVALID_PENDING_IDX 0xFFFF
+/*
+ * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
+ * head/fragment page uses 2 copy operations because it
+ * straddles two buffers in the frontend.
+ */
+struct gnttab_copy *grant_copy_op;
+struct netbk_rx_meta *meta;
 
-#define MAX_BUFFER_OFFSET PAGE_SIZE
 
 struct xen_netbk {
 	struct sk_buff_head rx_queue;
@@ -71,17 +71,7 @@ struct xen_netbk {
 
 	struct xenvif *vif;
 
-	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
-
 	u16 pending_ring[MAX_PENDING_REQS];
-
-	/*
-	 * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
-	 * head/fragment page uses 2 copy operations because it
-	 * straddles two buffers in the frontend.
-	 */
-	struct gnttab_copy grant_copy_op[2*XEN_NETIF_RX_RING_SIZE];
-	struct netbk_rx_meta meta[2*XEN_NETIF_RX_RING_SIZE];
 };
 
 static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx);
@@ -509,9 +499,12 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 	struct skb_cb_overlay *sco;
 	int need_to_notify = 0;
 
+	struct gnttab_copy *gco = get_cpu_ptr(grant_copy_op);
+	struct netbk_rx_meta *m = get_cpu_ptr(meta);
+
 	struct netrx_pending_operations npo = {
-		.copy  = netbk->grant_copy_op,
-		.meta  = netbk->meta,
+		.copy  = gco,
+		.meta  = m,
 	};
 
 	skb_queue_head_init(&rxq);
@@ -534,13 +527,16 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 			break;
 	}
 
-	BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta));
+	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
 
-	if (!npo.copy_prod)
+	if (!npo.copy_prod) {
+		put_cpu_ptr(gco);
+		put_cpu_ptr(m);
 		return;
+	}
 
-	BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, &netbk->grant_copy_op,
+	BUG_ON(npo.copy_prod > (2 * XEN_NETIF_RX_RING_SIZE));
+	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gco,
 					npo.copy_prod);
 	BUG_ON(ret != 0);
 
@@ -549,14 +545,14 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 
 		vif = netdev_priv(skb->dev);
 
-		if (netbk->meta[npo.meta_cons].gso_size && vif->gso_prefix) {
+		if (m[npo.meta_cons].gso_size && vif->gso_prefix) {
 			resp = RING_GET_RESPONSE(&vif->rx,
 						vif->rx.rsp_prod_pvt++);
 
 			resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
 
-			resp->offset = netbk->meta[npo.meta_cons].gso_size;
-			resp->id = netbk->meta[npo.meta_cons].id;
+			resp->offset = m[npo.meta_cons].gso_size;
+			resp->id = m[npo.meta_cons].id;
 			resp->status = sco->meta_slots_used;
 
 			npo.meta_cons++;
@@ -581,12 +577,12 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 			flags |= XEN_NETRXF_data_validated;
 
 		offset = 0;
-		resp = make_rx_response(vif, netbk->meta[npo.meta_cons].id,
+		resp = make_rx_response(vif, m[npo.meta_cons].id,
 					status, offset,
-					netbk->meta[npo.meta_cons].size,
+					m[npo.meta_cons].size,
 					flags);
 
-		if (netbk->meta[npo.meta_cons].gso_size && !vif->gso_prefix) {
+		if (m[npo.meta_cons].gso_size && !vif->gso_prefix) {
 			struct xen_netif_extra_info *gso =
 				(struct xen_netif_extra_info *)
 				RING_GET_RESPONSE(&vif->rx,
@@ -594,7 +590,7 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 
 			resp->flags |= XEN_NETRXF_extra_info;
 
-			gso->u.gso.size = netbk->meta[npo.meta_cons].gso_size;
+			gso->u.gso.size = m[npo.meta_cons].gso_size;
 			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
 			gso->u.gso.pad = 0;
 			gso->u.gso.features = 0;
@@ -604,7 +600,7 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 		}
 
 		netbk_add_frag_responses(vif, status,
-					 netbk->meta + npo.meta_cons + 1,
+					 m + npo.meta_cons + 1,
 					 sco->meta_slots_used);
 
 		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret);
@@ -622,6 +618,9 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 
 	if (!skb_queue_empty(&netbk->rx_queue))
 		xen_netbk_kick_thread(netbk);
+
+	put_cpu_ptr(gco);
+	put_cpu_ptr(m);
 }
 
 void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
@@ -1053,9 +1052,10 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 	return false;
 }
 
-static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
+static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
+					struct gnttab_copy *tco)
 {
-	struct gnttab_copy *gop = netbk->tx_copy_ops, *request_gop;
+	struct gnttab_copy *gop = tco, *request_gop;
 	struct sk_buff *skb;
 	int ret;
 	struct xenvif *vif = netbk->vif;
@@ -1215,17 +1215,18 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk)
 		vif->tx.req_cons = idx;
 		xen_netbk_check_rx_xenvif(vif);
 
-		if ((gop-netbk->tx_copy_ops) >= ARRAY_SIZE(netbk->tx_copy_ops))
+		if ((gop - tco) >= MAX_PENDING_REQS)
 			break;
 	}
 
-	return gop - netbk->tx_copy_ops;
+	return gop - tco;
 }
 
 static void xen_netbk_tx_submit(struct xen_netbk *netbk,
+				struct gnttab_copy *tco,
 				int *work_done, int budget)
 {
-	struct gnttab_copy *gop = netbk->tx_copy_ops;
+	struct gnttab_copy *gop = tco;
 	struct sk_buff *skb;
 	struct xenvif *vif = netbk->vif;
 
@@ -1306,19 +1307,25 @@ void xen_netbk_tx_action(struct xen_netbk *netbk, int *work_done, int budget)
 {
 	unsigned nr_gops;
 	int ret;
+	struct gnttab_copy *tco;
 
 	if (unlikely(!tx_work_todo(netbk)))
 		return;
 
-	nr_gops = xen_netbk_tx_build_gops(netbk);
+	tco = get_cpu_ptr(tx_copy_ops);
 
-	if (nr_gops == 0)
-		return;
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy,
-					netbk->tx_copy_ops, nr_gops);
+	nr_gops = xen_netbk_tx_build_gops(netbk, tco);
+
+	if (nr_gops == 0) {
+		put_cpu_ptr(tco);
+		return 0;
+	}
+
+	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, tco, nr_gops);
 	BUG_ON(ret);
 
-	xen_netbk_tx_submit(netbk, work_done, budget);
+	xen_netbk_tx_submit(netbk, tco, work_done, budget);
+	put_cpu_ptr(tco);
 }
 
 static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
@@ -1504,17 +1511,47 @@ int xen_netbk_kthread(void *data)
 
 static int __init netback_init(void)
 {
-	int rc = 0;
+	int rc = -ENOMEM;
 
 	if (!xen_domain())
 		return -ENODEV;
 
+	tx_copy_ops = __alloc_percpu(sizeof(struct gnttab_copy)
+				     * MAX_PENDING_REQS,
+				     __alignof__(struct gnttab_copy));
+	if (!tx_copy_ops)
+		goto failed_init;
+
+	grant_copy_op = __alloc_percpu(sizeof(struct gnttab_copy)
+				       * 2 * XEN_NETIF_RX_RING_SIZE,
+				       __alignof__(struct gnttab_copy));
+	if (!grant_copy_op)
+		goto failed_init_gco;
+
+	meta = __alloc_percpu(sizeof(struct netbk_rx_meta)
+			      * 2 * XEN_NETIF_RX_RING_SIZE,
+			      __alignof__(struct netbk_rx_meta));
+	if (!meta)
+		goto failed_init_meta;
+
 	rc = page_pool_init();
 	if (rc)
-		goto failed_init;
+		goto failed_init_pool;
+
+	rc = xenvif_xenbus_init();
+	if (rc)
+		goto failed_init_xenbus;
 
-	return xenvif_xenbus_init();
+	return rc;
 
+failed_init_xenbus:
+	page_pool_destroy();
+failed_init_pool:
+	free_percpu(meta);
+failed_init_meta:
+	free_percpu(grant_copy_op);
+failed_init_gco:
+	free_percpu(tx_copy_ops);
 failed_init:
 	return rc;
 
@@ -1526,6 +1563,9 @@ static void __exit netback_exit(void)
 {
 	xenvif_xenbus_exit();
 	page_pool_destroy();
+	free_percpu(meta);
+	free_percpu(grant_copy_op);
+	free_percpu(tx_copy_ops);
 }
 module_exit(netback_exit);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (3 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-31 10:24   ` Ian Campbell
  2012-01-30 14:45 ` [RFC PATCH V3 06/16] netback: melt xen_netbk into xenvif Wei Liu
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

If there is vif running and user unloads netback, it will certainly
cause problems -- guest's network interface just mysteriously stops
working.

v2: fix module_put path

disconnect function may get called by the generic framework even
before vif connects.

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/interface.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index dfc04f8..7914f60 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -323,6 +323,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	if (vif->irq)
 		return 0;
 
+	__module_get(THIS_MODULE);
+
 	err = xen_netbk_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
 	if (err < 0)
 		goto err;
@@ -372,12 +374,14 @@ err_unbind:
 err_unmap:
 	xen_netbk_unmap_frontend_rings(vif);
 err:
+	module_put(THIS_MODULE);
 	return err;
 }
 
 void xenvif_disconnect(struct xenvif *vif)
 {
 	struct net_device *dev = vif->dev;
+	int need_module_put = 0;
 
 	if (netif_carrier_ok(dev)) {
 		rtnl_lock();
@@ -397,12 +401,17 @@ void xenvif_disconnect(struct xenvif *vif)
 
 	del_timer_sync(&vif->credit_timeout);
 
-	if (vif->irq)
+	if (vif->irq) {
 		unbind_from_irqhandler(vif->irq, vif);
+		need_module_put = 1;
+	}
 
 	unregister_netdev(vif->dev);
 
 	xen_netbk_unmap_frontend_rings(vif);
 
 	free_netdev(vif->dev);
+
+	if (need_module_put)
+		module_put(THIS_MODULE);
 }
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 06/16] netback: melt xen_netbk into xenvif
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (4 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 07/16] netback: alter internal function/structure names Wei Liu
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

In the 1:1 model, there is no need to keep xen_netbk and xenvif
separated.

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |   36 +++---
 drivers/net/xen-netback/interface.c |   36 +++----
 drivers/net/xen-netback/netback.c   |  213 +++++++++++++----------------------
 drivers/net/xen-netback/page_pool.c |   10 +-
 drivers/net/xen-netback/page_pool.h |   13 ++-
 5 files changed, 122 insertions(+), 186 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 3b85563..17d4e1a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -45,34 +45,29 @@
 #include <xen/grant_table.h>
 #include <xen/xenbus.h>
 
+#include "page_pool.h"
+
 struct netbk_rx_meta {
 	int id;
 	int size;
 	int gso_size;
 };
 
-#define MAX_PENDING_REQS 256
-
 /* Discriminate from any valid pending_idx value. */
 #define INVALID_PENDING_IDX 0xFFFF
 
 #define MAX_BUFFER_OFFSET PAGE_SIZE
 
-struct pending_tx_info {
-	struct xen_netif_tx_request req;
-};
-typedef unsigned int pending_ring_idx_t;
+#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
+#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
 
-struct xen_netbk;
+#define MAX_PENDING_REQS 256
 
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
 	unsigned int     handle;
 
-	/* Reference to netback processing backend. */
-	struct xen_netbk *netbk;
-
 	/* Use NAPI for guest TX */
 	struct napi_struct napi;
 	/* Use kthread for guest RX */
@@ -115,6 +110,16 @@ struct xenvif {
 
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
+
+	struct sk_buff_head rx_queue;
+	struct sk_buff_head tx_queue;
+
+	idx_t mmap_pages[MAX_PENDING_REQS];
+
+	pending_ring_idx_t pending_prod;
+	pending_ring_idx_t pending_cons;
+
+	u16 pending_ring[MAX_PENDING_REQS];
 };
 
 static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif)
@@ -122,9 +127,6 @@ static inline struct xenbus_device *xenvif_to_xenbus_device(struct xenvif *vif)
 	return to_xenbus_device(vif->dev->dev.parent);
 }
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
-
 struct xenvif *xenvif_alloc(struct device *parent,
 			    domid_t domid,
 			    unsigned int handle);
@@ -161,12 +163,8 @@ void xenvif_notify_tx_completion(struct xenvif *vif);
 /* Returns number of ring slots required to send an skb to the frontend */
 unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb);
 
-/* Allocate and free xen_netbk structure */
-struct xen_netbk *xen_netbk_alloc_netbk(struct xenvif *vif);
-void xen_netbk_free_netbk(struct xen_netbk *netbk);
-
-void xen_netbk_tx_action(struct xen_netbk *netbk, int *work_done, int budget);
-void xen_netbk_rx_action(struct xen_netbk *netbk);
+void xen_netbk_tx_action(struct xenvif *vif, int *work_done, int budget);
+void xen_netbk_rx_action(struct xenvif *vif);
 
 int xen_netbk_kthread(void *data);
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 7914f60..3c004fa 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -55,9 +55,6 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;
 
-	if (vif->netbk == NULL)
-		return IRQ_NONE;
-
 	if (xenvif_rx_schedulable(vif))
 		netif_wake_queue(vif->dev);
 
@@ -72,7 +69,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 	struct xenvif *vif = container_of(napi, struct xenvif, napi);
 	int work_done = 0;
 
-	xen_netbk_tx_action(vif->netbk, &work_done, budget);
+	xen_netbk_tx_action(vif, &work_done, budget);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -95,7 +92,8 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	BUG_ON(skb->dev != dev);
 
-	if (vif->netbk == NULL)
+	/* Drop the packet if vif is not ready */
+	if (vif->task == NULL)
 		goto drop;
 
 	/* Drop the packet if the target domain has no receive buffers. */
@@ -257,6 +255,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	int err;
 	struct net_device *dev;
 	struct xenvif *vif;
+	int i;
 	char name[IFNAMSIZ] = {};
 
 	snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
@@ -271,7 +270,6 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	vif = netdev_priv(dev);
 	vif->domid  = domid;
 	vif->handle = handle;
-	vif->netbk = NULL;
 
 	vif->can_sg = 1;
 	vif->csum = 1;
@@ -290,6 +288,17 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 
 	dev->tx_queue_len = XENVIF_QUEUE_LENGTH;
 
+	skb_queue_head_init(&vif->rx_queue);
+	skb_queue_head_init(&vif->tx_queue);
+
+	vif->pending_cons = 0;
+	vif->pending_prod = MAX_PENDING_REQS;
+	for (i = 0; i < MAX_PENDING_REQS; i++)
+		vif->pending_ring[i] = i;
+
+	for (i = 0; i < MAX_PENDING_REQS; i++)
+		vif->mmap_pages[i] = INVALID_ENTRY;
+
 	/*
 	 * Initialise a dummy MAC address. We choose the numerically
 	 * largest non-broadcast address to prevent the address getting
@@ -337,14 +346,6 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	vif->irq = err;
 	disable_irq(vif->irq);
 
-	vif->netbk = xen_netbk_alloc_netbk(vif);
-	if (!vif->netbk) {
-		pr_warn("Could not allocate xen_netbk\n");
-		err = -ENOMEM;
-		goto err_unbind;
-	}
-
-
 	init_waitqueue_head(&vif->wq);
 	vif->task = kthread_create(xen_netbk_kthread,
 				   (void *)vif,
@@ -352,7 +353,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	if (IS_ERR(vif->task)) {
 		pr_warn("Could not create kthread\n");
 		err = PTR_ERR(vif->task);
-		goto err_free_netbk;
+		goto err_unbind;
 	}
 
 	rtnl_lock();
@@ -367,8 +368,6 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	wake_up_process(vif->task);
 
 	return 0;
-err_free_netbk:
-	xen_netbk_free_netbk(vif->netbk);
 err_unbind:
 	unbind_from_irqhandler(vif->irq, vif);
 err_unmap:
@@ -394,9 +393,6 @@ void xenvif_disconnect(struct xenvif *vif)
 	if (vif->task)
 		kthread_stop(vif->task);
 
-	if (vif->netbk)
-		xen_netbk_free_netbk(vif->netbk);
-
 	netif_napi_del(&vif->napi);
 
 	del_timer_sync(&vif->credit_timeout);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 1c68afb..0a52bb1 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -59,28 +59,13 @@ struct gnttab_copy *tx_copy_ops;
 struct gnttab_copy *grant_copy_op;
 struct netbk_rx_meta *meta;
 
-
-struct xen_netbk {
-	struct sk_buff_head rx_queue;
-	struct sk_buff_head tx_queue;
-
-	idx_t mmap_pages[MAX_PENDING_REQS];
-
-	pending_ring_idx_t pending_prod;
-	pending_ring_idx_t pending_cons;
-
-	struct xenvif *vif;
-
-	u16 pending_ring[MAX_PENDING_REQS];
-};
-
-static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx);
+static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx);
 static void make_tx_response(struct xenvif *vif,
 			     struct xen_netif_tx_request *txp,
 			     s8       st);
 
-static inline int tx_work_todo(struct xen_netbk *netbk);
-static inline int rx_work_todo(struct xen_netbk *netbk);
+static inline int tx_work_todo(struct xenvif *vif);
+static inline int rx_work_todo(struct xenvif *vif);
 
 static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 					     u16      id,
@@ -89,16 +74,16 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 					     u16      size,
 					     u16      flags);
 
-static inline unsigned long idx_to_pfn(struct xen_netbk *netbk,
+static inline unsigned long idx_to_pfn(struct xenvif *vif,
 				       u16 idx)
 {
-	return page_to_pfn(to_page(netbk->mmap_pages[idx]));
+	return page_to_pfn(to_page(vif->mmap_pages[idx]));
 }
 
-static inline unsigned long idx_to_kaddr(struct xen_netbk *netbk,
+static inline unsigned long idx_to_kaddr(struct xenvif *vif,
 					 u16 idx)
 {
-	return (unsigned long)pfn_to_kaddr(idx_to_pfn(netbk, idx));
+	return (unsigned long)pfn_to_kaddr(idx_to_pfn(vif, idx));
 }
 
 /*
@@ -126,10 +111,10 @@ static inline pending_ring_idx_t pending_index(unsigned i)
 	return i & (MAX_PENDING_REQS-1);
 }
 
-static inline pending_ring_idx_t nr_pending_reqs(struct xen_netbk *netbk)
+static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
 {
 	return MAX_PENDING_REQS -
-		netbk->pending_prod + netbk->pending_cons;
+		vif->pending_prod + vif->pending_cons;
 }
 
 static int max_required_rx_slots(struct xenvif *vif)
@@ -316,9 +301,9 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
 		copy_gop->flags = GNTCOPY_dest_gref;
 		if (foreign) {
 			struct pending_tx_info *src_pend = to_txinfo(idx);
-			struct xen_netbk *rnetbk = to_netbk(idx);
+			struct xenvif *rvif = to_vif(idx);
 
-			copy_gop->source.domid = rnetbk->vif->domid;
+			copy_gop->source.domid = rvif->domid;
 			copy_gop->source.u.ref = src_pend->req.gref;
 			copy_gop->flags |= GNTCOPY_source_gref;
 		} else {
@@ -476,16 +461,13 @@ struct skb_cb_overlay {
 	int meta_slots_used;
 };
 
-static void xen_netbk_kick_thread(struct xen_netbk *netbk)
+static void xen_netbk_kick_thread(struct xenvif *vif)
 {
-	struct xenvif *vif = netbk->vif;
-
 	wake_up(&vif->wq);
 }
 
-void xen_netbk_rx_action(struct xen_netbk *netbk)
+void xen_netbk_rx_action(struct xenvif *vif)
 {
-	struct xenvif *vif = NULL;
 	s8 status;
 	u16 flags;
 	struct xen_netif_rx_response *resp;
@@ -511,7 +493,7 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 
 	count = 0;
 
-	while ((skb = skb_dequeue(&netbk->rx_queue)) != NULL) {
+	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
 		vif = netdev_priv(skb->dev);
 		nr_frags = skb_shinfo(skb)->nr_frags;
 
@@ -543,8 +525,6 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
 		sco = (struct skb_cb_overlay *)skb->cb;
 
-		vif = netdev_priv(skb->dev);
-
 		if (m[npo.meta_cons].gso_size && vif->gso_prefix) {
 			resp = RING_GET_RESPONSE(&vif->rx,
 						vif->rx.rsp_prod_pvt++);
@@ -616,8 +596,8 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 	if (need_to_notify)
 		notify_remote_via_irq(vif->irq);
 
-	if (!skb_queue_empty(&netbk->rx_queue))
-		xen_netbk_kick_thread(netbk);
+	if (!skb_queue_empty(&vif->rx_queue))
+		xen_netbk_kick_thread(vif);
 
 	put_cpu_ptr(gco);
 	put_cpu_ptr(m);
@@ -625,11 +605,9 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
 
 void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
 {
-	struct xen_netbk *netbk = vif->netbk;
+	skb_queue_tail(&vif->rx_queue, skb);
 
-	skb_queue_tail(&netbk->rx_queue, skb);
-
-	xen_netbk_kick_thread(netbk);
+	xen_netbk_kick_thread(vif);
 }
 
 void xen_netbk_check_rx_xenvif(struct xenvif *vif)
@@ -728,21 +706,20 @@ static int netbk_count_requests(struct xenvif *vif,
 	return frags;
 }
 
-static struct page *xen_netbk_alloc_page(struct xen_netbk *netbk,
+static struct page *xen_netbk_alloc_page(struct xenvif *vif,
 					 struct sk_buff *skb,
 					 u16 pending_idx)
 {
 	struct page *page;
 	int idx;
-	page = page_pool_get(netbk, &idx);
+	page = page_pool_get(vif, &idx);
 	if (!page)
 		return NULL;
-	netbk->mmap_pages[pending_idx] = idx;
+	vif->mmap_pages[pending_idx] = idx;
 	return page;
 }
 
-static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
-						  struct xenvif *vif,
+static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif,
 						  struct sk_buff *skb,
 						  struct xen_netif_tx_request *txp,
 						  struct gnttab_copy *gop)
@@ -761,13 +738,13 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 		int idx;
 		struct pending_tx_info *pending_tx_info;
 
-		index = pending_index(netbk->pending_cons++);
-		pending_idx = netbk->pending_ring[index];
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		index = pending_index(vif->pending_cons++);
+		pending_idx = vif->pending_ring[index];
+		page = xen_netbk_alloc_page(vif, skb, pending_idx);
 		if (!page)
 			return NULL;
 
-		idx = netbk->mmap_pages[pending_idx];
+		idx = vif->mmap_pages[pending_idx];
 		pending_tx_info = to_txinfo(idx);
 
 		gop->source.u.ref = txp->gref;
@@ -791,7 +768,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xen_netbk *netbk,
 	return gop;
 }
 
-static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
+static int xen_netbk_tx_check_gop(struct xenvif *vif,
 				  struct sk_buff *skb,
 				  struct gnttab_copy **gopp)
 {
@@ -799,8 +776,6 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	u16 pending_idx = *((u16 *)skb->data);
 	struct pending_tx_info *pending_tx_info;
 	int idx;
-	struct xenvif *vif = netbk->vif;
-
 	struct xen_netif_tx_request *txp;
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -810,12 +785,12 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	err = gop->status;
 	if (unlikely(err)) {
 		pending_ring_idx_t index;
-		index = pending_index(netbk->pending_prod++);
-		idx = netbk->mmap_pages[index];
+		index = pending_index(vif->pending_prod++);
+		idx = vif->mmap_pages[index];
 		pending_tx_info = to_txinfo(idx);
 		txp = &pending_tx_info->req;
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
-		netbk->pending_ring[index] = pending_idx;
+		vif->pending_ring[index] = pending_idx;
 	}
 
 	/* Skip first skb fragment if it is on same page as header fragment. */
@@ -832,16 +807,16 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
-				xen_netbk_idx_release(netbk, pending_idx);
+				xen_netbk_idx_release(vif, pending_idx);
 			continue;
 		}
 
 		/* Error on this fragment: respond to client with an error. */
-		idx = netbk->mmap_pages[pending_idx];
+		idx = vif->mmap_pages[pending_idx];
 		txp = &to_txinfo(idx)->req;
 		make_tx_response(vif, txp, XEN_NETIF_RSP_ERROR);
-		index = pending_index(netbk->pending_prod++);
-		netbk->pending_ring[index] = pending_idx;
+		index = pending_index(vif->pending_prod++);
+		vif->pending_ring[index] = pending_idx;
 
 		/* Not the first error? Preceding frags already invalidated. */
 		if (err)
@@ -849,10 +824,10 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
-		xen_netbk_idx_release(netbk, pending_idx);
+		xen_netbk_idx_release(vif, pending_idx);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
-			xen_netbk_idx_release(netbk, pending_idx);
+			xen_netbk_idx_release(vif, pending_idx);
 		}
 
 		/* Remember the error: invalidate all subsequent fragments. */
@@ -863,7 +838,7 @@ static int xen_netbk_tx_check_gop(struct xen_netbk *netbk,
 	return err;
 }
 
-static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb)
+static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -879,11 +854,11 @@ static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb)
 
 		pending_idx = frag_get_pending_idx(frag);
 
-		idx = netbk->mmap_pages[pending_idx];
+		idx = vif->mmap_pages[pending_idx];
 		pending_tx_info = to_txinfo(idx);
 
 		txp = &pending_tx_info->req;
-		page = virt_to_page(idx_to_kaddr(netbk, pending_idx));
+		page = virt_to_page(idx_to_kaddr(vif, pending_idx));
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
 		skb->len += txp->size;
 		skb->data_len += txp->size;
@@ -891,7 +866,7 @@ static void xen_netbk_fill_frags(struct xen_netbk *netbk, struct sk_buff *skb)
 
 		/* Take an extra reference to offset xen_netbk_idx_release */
 		get_page(page);
-		xen_netbk_idx_release(netbk, pending_idx);
+		xen_netbk_idx_release(vif, pending_idx);
 	}
 }
 
@@ -1052,15 +1027,14 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 	return false;
 }
 
-static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
+static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 					struct gnttab_copy *tco)
 {
 	struct gnttab_copy *gop = tco, *request_gop;
 	struct sk_buff *skb;
 	int ret;
-	struct xenvif *vif = netbk->vif;
 
-	while ((nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) {
+	while ((nr_pending_reqs(vif) + MAX_SKB_FRAGS) < MAX_PENDING_REQS) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[MAX_SKB_FRAGS];
 		struct page *page;
@@ -1128,8 +1102,8 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
 			break;
 		}
 
-		index = pending_index(netbk->pending_cons);
-		pending_idx = netbk->pending_ring[index];
+		index = pending_index(vif->pending_cons);
+		pending_idx = vif->pending_ring[index];
 
 		data_len = (txreq.size > PKT_PROT_LEN &&
 			    ret < MAX_SKB_FRAGS) ?
@@ -1159,7 +1133,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
 		}
 
 		/* XXX could copy straight to head */
-		page = xen_netbk_alloc_page(netbk, skb, pending_idx);
+		page = xen_netbk_alloc_page(vif, skb, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
 			netbk_tx_err(vif, &txreq, idx);
@@ -1179,7 +1153,7 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
 
 		gop++;
 
-		pool_idx = netbk->mmap_pages[pending_idx];
+		pool_idx = vif->mmap_pages[pending_idx];
 		pending_tx_info = to_txinfo(pool_idx);
 
 		memcpy(&pending_tx_info->req,
@@ -1199,11 +1173,11 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
 					     INVALID_PENDING_IDX);
 		}
 
-		__skb_queue_tail(&netbk->tx_queue, skb);
+		__skb_queue_tail(&vif->tx_queue, skb);
 
-		netbk->pending_cons++;
+		vif->pending_cons++;
 
-		request_gop = xen_netbk_get_requests(netbk, vif,
+		request_gop = xen_netbk_get_requests(vif,
 						     skb, txfrags, gop);
 		if (request_gop == NULL) {
 			kfree_skb(skb);
@@ -1222,16 +1196,15 @@ static unsigned xen_netbk_tx_build_gops(struct xen_netbk *netbk,
 	return gop - tco;
 }
 
-static void xen_netbk_tx_submit(struct xen_netbk *netbk,
+static void xen_netbk_tx_submit(struct xenvif *vif,
 				struct gnttab_copy *tco,
 				int *work_done, int budget)
 {
 	struct gnttab_copy *gop = tco;
 	struct sk_buff *skb;
-	struct xenvif *vif = netbk->vif;
 
 	while ((*work_done < budget) &&
-	       (skb = __skb_dequeue(&netbk->tx_queue)) != NULL) {
+	       (skb = __skb_dequeue(&vif->tx_queue)) != NULL) {
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
 		unsigned data_len;
@@ -1240,13 +1213,13 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk,
 
 		pending_idx = *((u16 *)skb->data);
 
-		idx = netbk->mmap_pages[pending_idx];
+		idx = vif->mmap_pages[pending_idx];
 		pending_tx_info = to_txinfo(idx);
 
 		txp = &pending_tx_info->req;
 
 		/* Check the remap error code. */
-		if (unlikely(xen_netbk_tx_check_gop(netbk, skb, &gop))) {
+		if (unlikely(xen_netbk_tx_check_gop(vif, skb, &gop))) {
 			netdev_dbg(vif->dev, "netback grant failed.\n");
 			skb_shinfo(skb)->nr_frags = 0;
 			kfree_skb(skb);
@@ -1255,7 +1228,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk,
 
 		data_len = skb->len;
 		memcpy(skb->data,
-		       (void *)(idx_to_kaddr(netbk, pending_idx)|txp->offset),
+		       (void *)(idx_to_kaddr(vif, pending_idx)|txp->offset),
 		       data_len);
 		if (data_len < txp->size) {
 			/* Append the packet payload as a fragment. */
@@ -1263,7 +1236,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk,
 			txp->size -= data_len;
 		} else {
 			/* Schedule a response immediately. */
-			xen_netbk_idx_release(netbk, pending_idx);
+			xen_netbk_idx_release(vif, pending_idx);
 		}
 
 		if (txp->flags & XEN_NETTXF_csum_blank)
@@ -1271,7 +1244,7 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk,
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xen_netbk_fill_frags(netbk, skb);
+		xen_netbk_fill_frags(vif, skb);
 
 		/*
 		 * If the initial fragment was < PKT_PROT_LEN then
@@ -1303,53 +1276,52 @@ static void xen_netbk_tx_submit(struct xen_netbk *netbk,
 }
 
 /* Called after netfront has transmitted */
-void xen_netbk_tx_action(struct xen_netbk *netbk, int *work_done, int budget)
+void xen_netbk_tx_action(struct xenvif *vif, int *work_done, int budget)
 {
 	unsigned nr_gops;
 	int ret;
 	struct gnttab_copy *tco;
 
-	if (unlikely(!tx_work_todo(netbk)))
+	if (unlikely(!tx_work_todo(vif)))
 		return;
 
 	tco = get_cpu_ptr(tx_copy_ops);
 
-	nr_gops = xen_netbk_tx_build_gops(netbk, tco);
+	nr_gops = xen_netbk_tx_build_gops(vif, tco);
 
 	if (nr_gops == 0) {
 		put_cpu_ptr(tco);
-		return 0;
+		return;
 	}
 
 	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, tco, nr_gops);
 	BUG_ON(ret);
 
-	xen_netbk_tx_submit(netbk, tco, work_done, budget);
+	xen_netbk_tx_submit(vif, tco, work_done, budget);
 	put_cpu_ptr(tco);
 }
 
-static void xen_netbk_idx_release(struct xen_netbk *netbk, u16 pending_idx)
+static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx)
 {
-	struct xenvif *vif = netbk->vif;
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
 	int idx;
 
 	/* Already complete? */
-	if (netbk->mmap_pages[pending_idx] == INVALID_ENTRY)
+	if (vif->mmap_pages[pending_idx] == INVALID_ENTRY)
 		return;
 
-	idx = netbk->mmap_pages[pending_idx];
+	idx = vif->mmap_pages[pending_idx];
 	pending_tx_info = to_txinfo(idx);
 
 	make_tx_response(vif, &pending_tx_info->req, XEN_NETIF_RSP_OKAY);
 
-	index = pending_index(netbk->pending_prod++);
-	netbk->pending_ring[index] = pending_idx;
+	index = pending_index(vif->pending_prod++);
+	vif->pending_ring[index] = pending_idx;
 
-	page_pool_put(netbk->mmap_pages[pending_idx]);
+	page_pool_put(vif->mmap_pages[pending_idx]);
 
-	netbk->mmap_pages[pending_idx] = INVALID_ENTRY;
+	vif->mmap_pages[pending_idx] = INVALID_ENTRY;
 }
 
 static void make_tx_response(struct xenvif *vif,
@@ -1396,15 +1368,15 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 	return resp;
 }
 
-static inline int rx_work_todo(struct xen_netbk *netbk)
+static inline int rx_work_todo(struct xenvif *vif)
 {
-	return !skb_queue_empty(&netbk->rx_queue);
+	return !skb_queue_empty(&vif->rx_queue);
 }
 
-static inline int tx_work_todo(struct xen_netbk *netbk)
+static inline int tx_work_todo(struct xenvif *vif)
 {
-	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&netbk->vif->tx)) &&
-	    (nr_pending_reqs(netbk) + MAX_SKB_FRAGS) < MAX_PENDING_REQS)
+	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&vif->tx)) &&
+	    (nr_pending_reqs(vif) + MAX_SKB_FRAGS) < MAX_PENDING_REQS)
 		return 1;
 
 	return 0;
@@ -1455,54 +1427,21 @@ err:
 	return err;
 }
 
-struct xen_netbk *xen_netbk_alloc_netbk(struct xenvif *vif)
-{
-	int i;
-	struct xen_netbk *netbk;
-
-	netbk = vzalloc(sizeof(struct xen_netbk));
-	if (!netbk) {
-		printk(KERN_ALERT "%s: out of memory\n", __func__);
-		return NULL;
-	}
-
-	netbk->vif = vif;
-
-	skb_queue_head_init(&netbk->rx_queue);
-	skb_queue_head_init(&netbk->tx_queue);
-
-	netbk->pending_cons = 0;
-	netbk->pending_prod = MAX_PENDING_REQS;
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		netbk->pending_ring[i] = i;
-
-	for (i = 0; i < MAX_PENDING_REQS; i++)
-		netbk->mmap_pages[i] = INVALID_ENTRY;
-
-	return netbk;
-}
-
-void xen_netbk_free_netbk(struct xen_netbk *netbk)
-{
-	vfree(netbk);
-}
-
 int xen_netbk_kthread(void *data)
 {
 	struct xenvif *vif = data;
-	struct xen_netbk *netbk = vif->netbk;
 
 	while (!kthread_should_stop()) {
 		wait_event_interruptible(vif->wq,
-					 rx_work_todo(netbk) ||
+					 rx_work_todo(vif) ||
 					 kthread_should_stop());
 		cond_resched();
 
 		if (kthread_should_stop())
 			break;
 
-		if (rx_work_todo(netbk))
-			xen_netbk_rx_action(netbk);
+		if (rx_work_todo(vif))
+			xen_netbk_rx_action(vif);
 	}
 
 	return 0;
diff --git a/drivers/net/xen-netback/page_pool.c b/drivers/net/xen-netback/page_pool.c
index 294f48b..ce00a93 100644
--- a/drivers/net/xen-netback/page_pool.c
+++ b/drivers/net/xen-netback/page_pool.c
@@ -102,7 +102,7 @@ int is_in_pool(struct page *page, int *pidx)
 	return get_page_ext(page, pidx);
 }
 
-struct page *page_pool_get(struct xen_netbk *netbk, int *pidx)
+struct page *page_pool_get(struct xenvif *vif, int *pidx)
 {
 	int idx;
 	struct page *page;
@@ -118,7 +118,7 @@ struct page *page_pool_get(struct xen_netbk *netbk, int *pidx)
 	}
 
 	set_page_ext(page, idx);
-	pool[idx].u.netbk = netbk;
+	pool[idx].u.vif = vif;
 	pool[idx].page = page;
 
 	*pidx = idx;
@@ -131,7 +131,7 @@ void page_pool_put(int idx)
 	struct page *page = pool[idx].page;
 
 	pool[idx].page = NULL;
-	pool[idx].u.netbk = NULL;
+	pool[idx].u.vif = NULL;
 	page->mapping = 0;
 	put_page(page);
 	put_free_entry(idx);
@@ -174,9 +174,9 @@ struct page *to_page(int idx)
 	return pool[idx].page;
 }
 
-struct xen_netbk *to_netbk(int idx)
+struct xenvif *to_vif(int idx)
 {
-	return pool[idx].u.netbk;
+	return pool[idx].u.vif;
 }
 
 struct pending_tx_info *to_txinfo(int idx)
diff --git a/drivers/net/xen-netback/page_pool.h b/drivers/net/xen-netback/page_pool.h
index 572b037..efae17c 100644
--- a/drivers/net/xen-netback/page_pool.h
+++ b/drivers/net/xen-netback/page_pool.h
@@ -27,7 +27,10 @@
 #ifndef __PAGE_POOL_H__
 #define __PAGE_POOL_H__
 
-#include "common.h"
+struct pending_tx_info {
+	struct xen_netif_tx_request req;
+};
+typedef unsigned int pending_ring_idx_t;
 
 typedef uint32_t idx_t;
 
@@ -38,8 +41,8 @@ struct page_pool_entry {
 	struct page *page;
 	struct pending_tx_info tx_info;
 	union {
-		struct xen_netbk *netbk;
-		idx_t             fl;
+		struct xenvif *vif;
+		idx_t          fl;
 	} u;
 };
 
@@ -52,12 +55,12 @@ int  page_pool_init(void);
 void page_pool_destroy(void);
 
 
-struct page *page_pool_get(struct xen_netbk *netbk, int *pidx);
+struct page *page_pool_get(struct xenvif *vif, int *pidx);
 void         page_pool_put(int idx);
 int          is_in_pool(struct page *page, int *pidx);
 
 struct page            *to_page(int idx);
-struct xen_netbk       *to_netbk(int idx);
+struct xenvif          *to_vif(int idx);
 struct pending_tx_info *to_txinfo(int idx);
 
 #endif /* __PAGE_POOL_H__ */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 07/16] netback: alter internal function/structure names.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (5 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 06/16] netback: melt xen_netbk into xenvif Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 08/16] netback: remove unwanted notification generation during NAPI processing Wei Liu
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Since we've melted xen_netbk into xenvif, so it is better to give
functions clearer names.

Also alter napi poll handler function prototypes a bit.

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |   26 ++--
 drivers/net/xen-netback/interface.c |   20 ++--
 drivers/net/xen-netback/netback.c   |  231 ++++++++++++++++++-----------------
 3 files changed, 142 insertions(+), 135 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 17d4e1a..53141c7 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -47,7 +47,7 @@
 
 #include "page_pool.h"
 
-struct netbk_rx_meta {
+struct xenvif_rx_meta {
 	int id;
 	int size;
 	int gso_size;
@@ -140,32 +140,32 @@ void xenvif_xenbus_exit(void);
 
 int xenvif_schedulable(struct xenvif *vif);
 
-int xen_netbk_rx_ring_full(struct xenvif *vif);
+int xenvif_rx_ring_full(struct xenvif *vif);
 
-int xen_netbk_must_stop_queue(struct xenvif *vif);
+int xenvif_must_stop_queue(struct xenvif *vif);
 
 /* (Un)Map communication rings. */
-void xen_netbk_unmap_frontend_rings(struct xenvif *vif);
-int xen_netbk_map_frontend_rings(struct xenvif *vif,
-				 grant_ref_t tx_ring_ref,
-				 grant_ref_t rx_ring_ref);
+void xenvif_unmap_frontend_rings(struct xenvif *vif);
+int xenvif_map_frontend_rings(struct xenvif *vif,
+			      grant_ref_t tx_ring_ref,
+			      grant_ref_t rx_ring_ref);
 
 /* Check for SKBs from frontend and schedule backend processing */
-void xen_netbk_check_rx_xenvif(struct xenvif *vif);
+void xenvif_check_rx_xenvif(struct xenvif *vif);
 /* Receive an SKB from the frontend */
 void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb);
 
 /* Queue an SKB for transmission to the frontend */
-void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb);
+void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb);
 /* Notify xenvif that ring now has space to send an skb to the frontend */
 void xenvif_notify_tx_completion(struct xenvif *vif);
 
 /* Returns number of ring slots required to send an skb to the frontend */
-unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb);
+unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb);
 
-void xen_netbk_tx_action(struct xenvif *vif, int *work_done, int budget);
-void xen_netbk_rx_action(struct xenvif *vif);
+int xenvif_tx_action(struct xenvif *vif, int budget);
+void xenvif_rx_action(struct xenvif *vif);
 
-int xen_netbk_kthread(void *data);
+int xenvif_kthread(void *data);
 
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 3c004fa..ebed26a 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -48,7 +48,7 @@ int xenvif_schedulable(struct xenvif *vif)
 
 static int xenvif_rx_schedulable(struct xenvif *vif)
 {
-	return xenvif_schedulable(vif) && !xen_netbk_rx_ring_full(vif);
+	return xenvif_schedulable(vif) && !xenvif_rx_ring_full(vif);
 }
 
 static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
@@ -69,7 +69,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 	struct xenvif *vif = container_of(napi, struct xenvif, napi);
 	int work_done = 0;
 
-	xen_netbk_tx_action(vif, &work_done, budget);
+	work_done = xenvif_tx_action(vif, budget);
 
 	if (work_done < budget) {
 		int more_to_do = 0;
@@ -101,12 +101,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		goto drop;
 
 	/* Reserve ring slots for the worst-case number of fragments. */
-	vif->rx_req_cons_peek += xen_netbk_count_skb_slots(vif, skb);
+	vif->rx_req_cons_peek += xenvif_count_skb_slots(vif, skb);
 
-	if (vif->can_queue && xen_netbk_must_stop_queue(vif))
+	if (vif->can_queue && xenvif_must_stop_queue(vif))
 		netif_stop_queue(dev);
 
-	xen_netbk_queue_tx_skb(vif, skb);
+	xenvif_queue_tx_skb(vif, skb);
 
 	return NETDEV_TX_OK;
 
@@ -137,7 +137,7 @@ static void xenvif_up(struct xenvif *vif)
 {
 	napi_enable(&vif->napi);
 	enable_irq(vif->irq);
-	xen_netbk_check_rx_xenvif(vif);
+	xenvif_check_rx_xenvif(vif);
 }
 
 static void xenvif_down(struct xenvif *vif)
@@ -334,7 +334,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 
 	__module_get(THIS_MODULE);
 
-	err = xen_netbk_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
+	err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
 	if (err < 0)
 		goto err;
 
@@ -347,7 +347,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	disable_irq(vif->irq);
 
 	init_waitqueue_head(&vif->wq);
-	vif->task = kthread_create(xen_netbk_kthread,
+	vif->task = kthread_create(xenvif_kthread,
 				   (void *)vif,
 				   "vif%d.%d", vif->domid, vif->handle);
 	if (IS_ERR(vif->task)) {
@@ -371,7 +371,7 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 err_unbind:
 	unbind_from_irqhandler(vif->irq, vif);
 err_unmap:
-	xen_netbk_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(vif);
 err:
 	module_put(THIS_MODULE);
 	return err;
@@ -404,7 +404,7 @@ void xenvif_disconnect(struct xenvif *vif)
 
 	unregister_netdev(vif->dev);
 
-	xen_netbk_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(vif);
 
 	free_netdev(vif->dev);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 0a52bb1..2a2835e 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -57,9 +57,9 @@ struct gnttab_copy *tx_copy_ops;
  * straddles two buffers in the frontend.
  */
 struct gnttab_copy *grant_copy_op;
-struct netbk_rx_meta *meta;
+struct xenvif_rx_meta *meta;
 
-static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx);
+static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx);
 static void make_tx_response(struct xenvif *vif,
 			     struct xen_netif_tx_request *txp,
 			     s8       st);
@@ -127,7 +127,7 @@ static int max_required_rx_slots(struct xenvif *vif)
 	return max;
 }
 
-int xen_netbk_rx_ring_full(struct xenvif *vif)
+int xenvif_rx_ring_full(struct xenvif *vif)
 {
 	RING_IDX peek   = vif->rx_req_cons_peek;
 	RING_IDX needed = max_required_rx_slots(vif);
@@ -136,16 +136,16 @@ int xen_netbk_rx_ring_full(struct xenvif *vif)
 	       ((vif->rx.rsp_prod_pvt + XEN_NETIF_RX_RING_SIZE - peek) < needed);
 }
 
-int xen_netbk_must_stop_queue(struct xenvif *vif)
+int xenvif_must_stop_queue(struct xenvif *vif)
 {
-	if (!xen_netbk_rx_ring_full(vif))
+	if (!xenvif_rx_ring_full(vif))
 		return 0;
 
 	vif->rx.sring->req_event = vif->rx_req_cons_peek +
 		max_required_rx_slots(vif);
 	mb(); /* request notification /then/ check the queue */
 
-	return xen_netbk_rx_ring_full(vif);
+	return xenvif_rx_ring_full(vif);
 }
 
 /*
@@ -191,9 +191,9 @@ static bool start_new_rx_buffer(int offset, unsigned long size, int head)
 /*
  * Figure out how many ring slots we're going to need to send @skb to
  * the guest. This function is essentially a dry run of
- * netbk_gop_frag_copy.
+ * xenvif_gop_frag_copy.
  */
-unsigned int xen_netbk_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
+unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
 {
 	unsigned int count;
 	int i, copy_off;
@@ -232,15 +232,15 @@ struct netrx_pending_operations {
 	unsigned copy_prod, copy_cons;
 	unsigned meta_prod, meta_cons;
 	struct gnttab_copy *copy;
-	struct netbk_rx_meta *meta;
+	struct xenvif_rx_meta *meta;
 	int copy_off;
 	grant_ref_t copy_gref;
 };
 
-static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif,
-						struct netrx_pending_operations *npo)
+static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
+					struct netrx_pending_operations *npo)
 {
-	struct netbk_rx_meta *meta;
+	struct xenvif_rx_meta *meta;
 	struct xen_netif_rx_request *req;
 
 	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
@@ -260,13 +260,13 @@ static struct netbk_rx_meta *get_next_rx_buffer(struct xenvif *vif,
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
  */
-static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
-				struct netrx_pending_operations *npo,
-				struct page *page, unsigned long size,
-				unsigned long offset, int *head)
+static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
+				 struct netrx_pending_operations *npo,
+				 struct page *page, unsigned long size,
+				 unsigned long offset, int *head)
 {
 	struct gnttab_copy *copy_gop;
-	struct netbk_rx_meta *meta;
+	struct xenvif_rx_meta *meta;
 	/*
 	 * These variables are used iff get_page_ext returns true,
 	 * in which case they are guaranteed to be initialized.
@@ -345,14 +345,14 @@ static void netbk_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
  * zero GSO descriptors (for non-GSO packets) or one descriptor (for
  * frontend-side LRO).
  */
-static int netbk_gop_skb(struct sk_buff *skb,
-			 struct netrx_pending_operations *npo)
+static int xenvif_gop_skb(struct sk_buff *skb,
+			  struct netrx_pending_operations *npo)
 {
 	struct xenvif *vif = netdev_priv(skb->dev);
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 	int i;
 	struct xen_netif_rx_request *req;
-	struct netbk_rx_meta *meta;
+	struct xenvif_rx_meta *meta;
 	unsigned char *data;
 	int head = 1;
 	int old_meta_prod;
@@ -389,30 +389,30 @@ static int netbk_gop_skb(struct sk_buff *skb,
 		if (data + len > skb_tail_pointer(skb))
 			len = skb_tail_pointer(skb) - data;
 
-		netbk_gop_frag_copy(vif, skb, npo,
-				    virt_to_page(data), len, offset, &head);
+		xenvif_gop_frag_copy(vif, skb, npo,
+				     virt_to_page(data), len, offset, &head);
 		data += len;
 	}
 
 	for (i = 0; i < nr_frags; i++) {
-		netbk_gop_frag_copy(vif, skb, npo,
-				    skb_frag_page(&skb_shinfo(skb)->frags[i]),
-				    skb_frag_size(&skb_shinfo(skb)->frags[i]),
-				    skb_shinfo(skb)->frags[i].page_offset,
-				    &head);
+		xenvif_gop_frag_copy(vif, skb, npo,
+				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
+				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
+				     skb_shinfo(skb)->frags[i].page_offset,
+				     &head);
 	}
 
 	return npo->meta_prod - old_meta_prod;
 }
 
 /*
- * This is a twin to netbk_gop_skb.  Assume that netbk_gop_skb was
+ * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
  * used to set up the operations on the top of
  * netrx_pending_operations, which have since been done.  Check that
  * they didn't give any errors and advance over them.
  */
-static int netbk_check_gop(struct xenvif *vif, int nr_meta_slots,
-			   struct netrx_pending_operations *npo)
+static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
+			    struct netrx_pending_operations *npo)
 {
 	struct gnttab_copy     *copy_op;
 	int status = XEN_NETIF_RSP_OKAY;
@@ -431,9 +431,9 @@ static int netbk_check_gop(struct xenvif *vif, int nr_meta_slots,
 	return status;
 }
 
-static void netbk_add_frag_responses(struct xenvif *vif, int status,
-				     struct netbk_rx_meta *meta,
-				     int nr_meta_slots)
+static void xenvif_add_frag_responses(struct xenvif *vif, int status,
+				      struct xenvif_rx_meta *meta,
+				      int nr_meta_slots)
 {
 	int i;
 	unsigned long offset;
@@ -461,12 +461,12 @@ struct skb_cb_overlay {
 	int meta_slots_used;
 };
 
-static void xen_netbk_kick_thread(struct xenvif *vif)
+static void xenvif_kick_thread(struct xenvif *vif)
 {
 	wake_up(&vif->wq);
 }
 
-void xen_netbk_rx_action(struct xenvif *vif)
+void xenvif_rx_action(struct xenvif *vif)
 {
 	s8 status;
 	u16 flags;
@@ -482,7 +482,7 @@ void xen_netbk_rx_action(struct xenvif *vif)
 	int need_to_notify = 0;
 
 	struct gnttab_copy *gco = get_cpu_ptr(grant_copy_op);
-	struct netbk_rx_meta *m = get_cpu_ptr(meta);
+	struct xenvif_rx_meta *m = get_cpu_ptr(meta);
 
 	struct netrx_pending_operations npo = {
 		.copy  = gco,
@@ -498,7 +498,7 @@ void xen_netbk_rx_action(struct xenvif *vif)
 		nr_frags = skb_shinfo(skb)->nr_frags;
 
 		sco = (struct skb_cb_overlay *)skb->cb;
-		sco->meta_slots_used = netbk_gop_skb(skb, &npo);
+		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
 
 		count += nr_frags + 1;
 
@@ -543,7 +543,7 @@ void xen_netbk_rx_action(struct xenvif *vif)
 		vif->dev->stats.tx_bytes += skb->len;
 		vif->dev->stats.tx_packets++;
 
-		status = netbk_check_gop(vif, sco->meta_slots_used, &npo);
+		status = xenvif_check_gop(vif, sco->meta_slots_used, &npo);
 
 		if (sco->meta_slots_used == 1)
 			flags = 0;
@@ -579,9 +579,9 @@ void xen_netbk_rx_action(struct xenvif *vif)
 			gso->flags = 0;
 		}
 
-		netbk_add_frag_responses(vif, status,
-					 m + npo.meta_cons + 1,
-					 sco->meta_slots_used);
+		xenvif_add_frag_responses(vif, status,
+					  m + npo.meta_cons + 1,
+					  sco->meta_slots_used);
 
 		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret);
 		if (ret)
@@ -597,20 +597,20 @@ void xen_netbk_rx_action(struct xenvif *vif)
 		notify_remote_via_irq(vif->irq);
 
 	if (!skb_queue_empty(&vif->rx_queue))
-		xen_netbk_kick_thread(vif);
+		xenvif_kick_thread(vif);
 
 	put_cpu_ptr(gco);
 	put_cpu_ptr(m);
 }
 
-void xen_netbk_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
+void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
 {
 	skb_queue_tail(&vif->rx_queue, skb);
 
-	xen_netbk_kick_thread(vif);
+	xenvif_kick_thread(vif);
 }
 
-void xen_netbk_check_rx_xenvif(struct xenvif *vif)
+void xenvif_check_rx_xenvif(struct xenvif *vif)
 {
 	int more_to_do;
 
@@ -647,11 +647,11 @@ static void tx_credit_callback(unsigned long data)
 {
 	struct xenvif *vif = (struct xenvif *)data;
 	tx_add_credit(vif);
-	xen_netbk_check_rx_xenvif(vif);
+	xenvif_check_rx_xenvif(vif);
 }
 
-static void netbk_tx_err(struct xenvif *vif,
-			 struct xen_netif_tx_request *txp, RING_IDX end)
+static void xenvif_tx_err(struct xenvif *vif,
+			  struct xen_netif_tx_request *txp, RING_IDX end)
 {
 	RING_IDX cons = vif->tx.req_cons;
 
@@ -662,10 +662,10 @@ static void netbk_tx_err(struct xenvif *vif,
 		txp = RING_GET_REQUEST(&vif->tx, cons++);
 	} while (1);
 	vif->tx.req_cons = cons;
-	xen_netbk_check_rx_xenvif(vif);
+	xenvif_check_rx_xenvif(vif);
 }
 
-static int netbk_count_requests(struct xenvif *vif,
+static int xenvif_count_requests(struct xenvif *vif,
 				struct xen_netif_tx_request *first,
 				struct xen_netif_tx_request *txp,
 				int work_to_do)
@@ -706,9 +706,9 @@ static int netbk_count_requests(struct xenvif *vif,
 	return frags;
 }
 
-static struct page *xen_netbk_alloc_page(struct xenvif *vif,
-					 struct sk_buff *skb,
-					 u16 pending_idx)
+static struct page *xenvif_alloc_page(struct xenvif *vif,
+				      struct sk_buff *skb,
+				      u16 pending_idx)
 {
 	struct page *page;
 	int idx;
@@ -719,10 +719,10 @@ static struct page *xen_netbk_alloc_page(struct xenvif *vif,
 	return page;
 }
 
-static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif,
-						  struct sk_buff *skb,
-						  struct xen_netif_tx_request *txp,
-						  struct gnttab_copy *gop)
+static struct gnttab_copy *xenvif_get_requests(struct xenvif *vif,
+					       struct sk_buff *skb,
+					       struct xen_netif_tx_request *txp,
+					       struct gnttab_copy *gop)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	skb_frag_t *frags = shinfo->frags;
@@ -740,7 +740,7 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif,
 
 		index = pending_index(vif->pending_cons++);
 		pending_idx = vif->pending_ring[index];
-		page = xen_netbk_alloc_page(vif, skb, pending_idx);
+		page = xenvif_alloc_page(vif, skb, pending_idx);
 		if (!page)
 			return NULL;
 
@@ -768,9 +768,9 @@ static struct gnttab_copy *xen_netbk_get_requests(struct xenvif *vif,
 	return gop;
 }
 
-static int xen_netbk_tx_check_gop(struct xenvif *vif,
-				  struct sk_buff *skb,
-				  struct gnttab_copy **gopp)
+static int xenvif_tx_check_gop(struct xenvif *vif,
+			       struct sk_buff *skb,
+			       struct gnttab_copy **gopp)
 {
 	struct gnttab_copy *gop = *gopp;
 	u16 pending_idx = *((u16 *)skb->data);
@@ -807,7 +807,7 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif,
 		if (likely(!newerr)) {
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err))
-				xen_netbk_idx_release(vif, pending_idx);
+				xenvif_idx_release(vif, pending_idx);
 			continue;
 		}
 
@@ -824,10 +824,10 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif,
 
 		/* First error: invalidate header and preceding fragments. */
 		pending_idx = *((u16 *)skb->data);
-		xen_netbk_idx_release(vif, pending_idx);
+		xenvif_idx_release(vif, pending_idx);
 		for (j = start; j < i; j++) {
 			pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
-			xen_netbk_idx_release(vif, pending_idx);
+			xenvif_idx_release(vif, pending_idx);
 		}
 
 		/* Remember the error: invalidate all subsequent fragments. */
@@ -838,7 +838,7 @@ static int xen_netbk_tx_check_gop(struct xenvif *vif,
 	return err;
 }
 
-static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb)
+static void xenvif_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 {
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	int nr_frags = shinfo->nr_frags;
@@ -864,15 +864,15 @@ static void xen_netbk_fill_frags(struct xenvif *vif, struct sk_buff *skb)
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
-		/* Take an extra reference to offset xen_netbk_idx_release */
+		/* Take an extra reference to offset xenvif_idx_release */
 		get_page(page);
-		xen_netbk_idx_release(vif, pending_idx);
+		xenvif_idx_release(vif, pending_idx);
 	}
 }
 
-static int xen_netbk_get_extras(struct xenvif *vif,
-				struct xen_netif_extra_info *extras,
-				int work_to_do)
+static int xenvif_get_extras(struct xenvif *vif,
+			     struct xen_netif_extra_info *extras,
+			     int work_to_do)
 {
 	struct xen_netif_extra_info extra;
 	RING_IDX cons = vif->tx.req_cons;
@@ -900,9 +900,9 @@ static int xen_netbk_get_extras(struct xenvif *vif,
 	return work_to_do;
 }
 
-static int netbk_set_skb_gso(struct xenvif *vif,
-			     struct sk_buff *skb,
-			     struct xen_netif_extra_info *gso)
+static int xenvif_set_skb_gso(struct xenvif *vif,
+			      struct sk_buff *skb,
+			      struct xen_netif_extra_info *gso)
 {
 	if (!gso->u.gso.size) {
 		netdev_dbg(vif->dev, "GSO size must not be zero.\n");
@@ -1027,8 +1027,8 @@ static bool tx_credit_exceeded(struct xenvif *vif, unsigned size)
 	return false;
 }
 
-static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
-					struct gnttab_copy *tco)
+static unsigned xenvif_tx_build_gops(struct xenvif *vif,
+				     struct gnttab_copy *tco)
 {
 	struct gnttab_copy *gop = tco, *request_gop;
 	struct sk_buff *skb;
@@ -1069,18 +1069,18 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 
 		memset(extras, 0, sizeof(extras));
 		if (txreq.flags & XEN_NETTXF_extra_info) {
-			work_to_do = xen_netbk_get_extras(vif, extras,
+			work_to_do = xenvif_get_extras(vif, extras,
 							  work_to_do);
 			idx = vif->tx.req_cons;
 			if (unlikely(work_to_do < 0)) {
-				netbk_tx_err(vif, &txreq, idx);
+				xenvif_tx_err(vif, &txreq, idx);
 				break;
 			}
 		}
 
-		ret = netbk_count_requests(vif, &txreq, txfrags, work_to_do);
+		ret = xenvif_count_requests(vif, &txreq, txfrags, work_to_do);
 		if (unlikely(ret < 0)) {
-			netbk_tx_err(vif, &txreq, idx - ret);
+			xenvif_tx_err(vif, &txreq, idx - ret);
 			break;
 		}
 		idx += ret;
@@ -1088,7 +1088,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 		if (unlikely(txreq.size < ETH_HLEN)) {
 			netdev_dbg(vif->dev,
 				   "Bad packet size: %d\n", txreq.size);
-			netbk_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(vif, &txreq, idx);
 			break;
 		}
 
@@ -1098,7 +1098,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 				   "txreq.offset: %x, size: %u, end: %lu\n",
 				   txreq.offset, txreq.size,
 				   (txreq.offset&~PAGE_MASK) + txreq.size);
-			netbk_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(vif, &txreq, idx);
 			break;
 		}
 
@@ -1114,7 +1114,7 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 		if (unlikely(skb == NULL)) {
 			netdev_dbg(vif->dev,
 				   "Can't allocate a skb in start_xmit.\n");
-			netbk_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(vif, &txreq, idx);
 			break;
 		}
 
@@ -1125,18 +1125,18 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 			struct xen_netif_extra_info *gso;
 			gso = &extras[XEN_NETIF_EXTRA_TYPE_GSO - 1];
 
-			if (netbk_set_skb_gso(vif, skb, gso)) {
+			if (xenvif_set_skb_gso(vif, skb, gso)) {
 				kfree_skb(skb);
-				netbk_tx_err(vif, &txreq, idx);
+				xenvif_tx_err(vif, &txreq, idx);
 				break;
 			}
 		}
 
 		/* XXX could copy straight to head */
-		page = xen_netbk_alloc_page(vif, skb, pending_idx);
+		page = xenvif_alloc_page(vif, skb, pending_idx);
 		if (!page) {
 			kfree_skb(skb);
-			netbk_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(vif, &txreq, idx);
 			break;
 		}
 
@@ -1177,17 +1177,17 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 
 		vif->pending_cons++;
 
-		request_gop = xen_netbk_get_requests(vif,
+		request_gop = xenvif_get_requests(vif,
 						     skb, txfrags, gop);
 		if (request_gop == NULL) {
 			kfree_skb(skb);
-			netbk_tx_err(vif, &txreq, idx);
+			xenvif_tx_err(vif, &txreq, idx);
 			break;
 		}
 		gop = request_gop;
 
 		vif->tx.req_cons = idx;
-		xen_netbk_check_rx_xenvif(vif);
+		xenvif_check_rx_xenvif(vif);
 
 		if ((gop - tco) >= MAX_PENDING_REQS)
 			break;
@@ -1196,14 +1196,15 @@ static unsigned xen_netbk_tx_build_gops(struct xenvif *vif,
 	return gop - tco;
 }
 
-static void xen_netbk_tx_submit(struct xenvif *vif,
-				struct gnttab_copy *tco,
-				int *work_done, int budget)
+static int xenvif_tx_submit(struct xenvif *vif,
+			    struct gnttab_copy *tco,
+			    int budget)
 {
 	struct gnttab_copy *gop = tco;
 	struct sk_buff *skb;
+	int work_done = 0;
 
-	while ((*work_done < budget) &&
+	while ((work_done < budget) &&
 	       (skb = __skb_dequeue(&vif->tx_queue)) != NULL) {
 		struct xen_netif_tx_request *txp;
 		u16 pending_idx;
@@ -1219,7 +1220,7 @@ static void xen_netbk_tx_submit(struct xenvif *vif,
 		txp = &pending_tx_info->req;
 
 		/* Check the remap error code. */
-		if (unlikely(xen_netbk_tx_check_gop(vif, skb, &gop))) {
+		if (unlikely(xenvif_tx_check_gop(vif, skb, &gop))) {
 			netdev_dbg(vif->dev, "netback grant failed.\n");
 			skb_shinfo(skb)->nr_frags = 0;
 			kfree_skb(skb);
@@ -1236,7 +1237,7 @@ static void xen_netbk_tx_submit(struct xenvif *vif,
 			txp->size -= data_len;
 		} else {
 			/* Schedule a response immediately. */
-			xen_netbk_idx_release(vif, pending_idx);
+			xenvif_idx_release(vif, pending_idx);
 		}
 
 		if (txp->flags & XEN_NETTXF_csum_blank)
@@ -1244,7 +1245,7 @@ static void xen_netbk_tx_submit(struct xenvif *vif,
 		else if (txp->flags & XEN_NETTXF_data_validated)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 
-		xen_netbk_fill_frags(vif, skb);
+		xenvif_fill_frags(vif, skb);
 
 		/*
 		 * If the initial fragment was < PKT_PROT_LEN then
@@ -1269,39 +1270,45 @@ static void xen_netbk_tx_submit(struct xenvif *vif,
 		vif->dev->stats.rx_bytes += skb->len;
 		vif->dev->stats.rx_packets++;
 
-		(*work_done)++;
+		work_done++;
 
 		xenvif_receive_skb(vif, skb);
 	}
+
+	return work_done;
 }
 
 /* Called after netfront has transmitted */
-void xen_netbk_tx_action(struct xenvif *vif, int *work_done, int budget)
+int xenvif_tx_action(struct xenvif *vif, int budget)
 {
 	unsigned nr_gops;
 	int ret;
 	struct gnttab_copy *tco;
+	int work_done;
 
 	if (unlikely(!tx_work_todo(vif)))
-		return;
+		return 0;
 
 	tco = get_cpu_ptr(tx_copy_ops);
 
-	nr_gops = xen_netbk_tx_build_gops(vif, tco);
+	nr_gops = xenvif_tx_build_gops(vif, tco);
 
 	if (nr_gops == 0) {
 		put_cpu_ptr(tco);
-		return;
+		return 0;
 	}
 
 	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, tco, nr_gops);
 	BUG_ON(ret);
 
-	xen_netbk_tx_submit(vif, tco, work_done, budget);
+	work_done = xenvif_tx_submit(vif, tco, budget);
+
 	put_cpu_ptr(tco);
+
+	return work_done;
 }
 
-static void xen_netbk_idx_release(struct xenvif *vif, u16 pending_idx)
+static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx)
 {
 	struct pending_tx_info *pending_tx_info;
 	pending_ring_idx_t index;
@@ -1382,7 +1389,7 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
-void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
+void xenvif_unmap_frontend_rings(struct xenvif *vif)
 {
 	if (vif->tx.sring)
 		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
@@ -1392,9 +1399,9 @@ void xen_netbk_unmap_frontend_rings(struct xenvif *vif)
 					vif->rx.sring);
 }
 
-int xen_netbk_map_frontend_rings(struct xenvif *vif,
-				 grant_ref_t tx_ring_ref,
-				 grant_ref_t rx_ring_ref)
+int xenvif_map_frontend_rings(struct xenvif *vif,
+			      grant_ref_t tx_ring_ref,
+			      grant_ref_t rx_ring_ref)
 {
 	void *addr;
 	struct xen_netif_tx_sring *txs;
@@ -1423,11 +1430,11 @@ int xen_netbk_map_frontend_rings(struct xenvif *vif,
 	return 0;
 
 err:
-	xen_netbk_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(vif);
 	return err;
 }
 
-int xen_netbk_kthread(void *data)
+int xenvif_kthread(void *data)
 {
 	struct xenvif *vif = data;
 
@@ -1441,7 +1448,7 @@ int xen_netbk_kthread(void *data)
 			break;
 
 		if (rx_work_todo(vif))
-			xen_netbk_rx_action(vif);
+			xenvif_rx_action(vif);
 	}
 
 	return 0;
@@ -1467,9 +1474,9 @@ static int __init netback_init(void)
 	if (!grant_copy_op)
 		goto failed_init_gco;
 
-	meta = __alloc_percpu(sizeof(struct netbk_rx_meta)
+	meta = __alloc_percpu(sizeof(struct xenvif_rx_meta)
 			      * 2 * XEN_NETIF_RX_RING_SIZE,
-			      __alignof__(struct netbk_rx_meta));
+			      __alignof__(struct xenvif_rx_meta));
 	if (!meta)
 		goto failed_init_meta;
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 08/16] netback: remove unwanted notification generation during NAPI processing.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (6 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 07/16] netback: alter internal function/structure names Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 09/16] netback: nuke xenvif_receive_skb Wei Liu
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

In original implementation, tx_build_gops tends to update req_event
pointer every time it sees tx error or finish one batch. Remove those
code to only update req_event pointer when we really want to shut down
NAPI.

Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/interface.c |    5 +++--
 drivers/net/xen-netback/netback.c   |    4 +---
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index ebed26a..fe37143 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -58,8 +58,8 @@ static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 	if (xenvif_rx_schedulable(vif))
 		netif_wake_queue(vif->dev);
 
-	if (likely(napi_schedule_prep(&vif->napi)))
-		__napi_schedule(&vif->napi);
+	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
+		napi_schedule(&vif->napi);
 
 	return IRQ_HANDLED;
 }
@@ -74,6 +74,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 	if (work_done < budget) {
 		int more_to_do = 0;
 		unsigned long flag;
+
 		local_irq_save(flag);
 
 		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 2a2835e..065cd65 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -662,7 +662,6 @@ static void xenvif_tx_err(struct xenvif *vif,
 		txp = RING_GET_REQUEST(&vif->tx, cons++);
 	} while (1);
 	vif->tx.req_cons = cons;
-	xenvif_check_rx_xenvif(vif);
 }
 
 static int xenvif_count_requests(struct xenvif *vif,
@@ -1047,7 +1046,7 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif,
 		int pool_idx;
 		struct pending_tx_info *pending_tx_info;
 
-		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, work_to_do);
+		work_to_do = RING_HAS_UNCONSUMED_REQUESTS(&vif->tx);
 		if (!work_to_do) {
 			break;
 		}
@@ -1187,7 +1186,6 @@ static unsigned xenvif_tx_build_gops(struct xenvif *vif,
 		gop = request_gop;
 
 		vif->tx.req_cons = idx;
-		xenvif_check_rx_xenvif(vif);
 
 		if ((gop - tco) >= MAX_PENDING_REQS)
 			break;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 09/16] netback: nuke xenvif_receive_skb
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (7 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 08/16] netback: remove unwanted notification generation during NAPI processing Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space Wei Liu
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Replace it with direct call to netif_receive_skb.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |    2 --
 drivers/net/xen-netback/interface.c |    5 -----
 drivers/net/xen-netback/netback.c   |    2 +-
 3 files changed, 1 insertions(+), 8 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 53141c7..28121f1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -152,8 +152,6 @@ int xenvif_map_frontend_rings(struct xenvif *vif,
 
 /* Check for SKBs from frontend and schedule backend processing */
 void xenvif_check_rx_xenvif(struct xenvif *vif);
-/* Receive an SKB from the frontend */
-void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb);
 
 /* Queue an SKB for transmission to the frontend */
 void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index fe37143..d7a7cd9 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -117,11 +117,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
-void xenvif_receive_skb(struct xenvif *vif, struct sk_buff *skb)
-{
-	netif_receive_skb(skb);
-}
-
 void xenvif_notify_tx_completion(struct xenvif *vif)
 {
 	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 065cd65..a8d58a9 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1270,7 +1270,7 @@ static int xenvif_tx_submit(struct xenvif *vif,
 
 		work_done++;
 
-		xenvif_receive_skb(vif, skb);
+		netif_receive_skb(skb);
 	}
 
 	return work_done;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (8 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 09/16] netback: nuke xenvif_receive_skb Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 21:53   ` Konrad Rzeszutek Wilk
  2012-01-31  1:25   ` Eric Dumazet
  2012-01-30 14:45 ` [RFC PATCH V3 11/16] netback: print alert and bail when scratch space is not available Wei Liu
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

If we allocate large arrays in per-cpu section, multi-page ring
feature is likely to blow up the per-cpu section. So avoid allocating
large arrays, instead we only store pointers to scratch spaces in
per-cpu section.

CPU hotplug event is also taken care of.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/netback.c |  140 +++++++++++++++++++++++++++----------
 1 files changed, 104 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a8d58a9..2ac9b84 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -39,6 +39,7 @@
 #include <linux/kthread.h>
 #include <linux/if_vlan.h>
 #include <linux/udp.h>
+#include <linux/cpu.h>
 
 #include <net/tcp.h>
 
@@ -49,15 +50,15 @@
 #include <asm/xen/page.h>
 
 
-struct gnttab_copy *tx_copy_ops;
+DEFINE_PER_CPU(struct gnttab_copy *, tx_copy_ops);
 
 /*
  * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
  * head/fragment page uses 2 copy operations because it
  * straddles two buffers in the frontend.
  */
-struct gnttab_copy *grant_copy_op;
-struct xenvif_rx_meta *meta;
+DEFINE_PER_CPU(struct gnttab_copy *, grant_copy_op);
+DEFINE_PER_CPU(struct xenvif_rx_meta *, meta);
 
 static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx);
 static void make_tx_response(struct xenvif *vif,
@@ -481,8 +482,8 @@ void xenvif_rx_action(struct xenvif *vif)
 	struct skb_cb_overlay *sco;
 	int need_to_notify = 0;
 
-	struct gnttab_copy *gco = get_cpu_ptr(grant_copy_op);
-	struct xenvif_rx_meta *m = get_cpu_ptr(meta);
+	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
+	struct xenvif_rx_meta *m = get_cpu_var(meta);
 
 	struct netrx_pending_operations npo = {
 		.copy  = gco,
@@ -512,8 +513,8 @@ void xenvif_rx_action(struct xenvif *vif)
 	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
 
 	if (!npo.copy_prod) {
-		put_cpu_ptr(gco);
-		put_cpu_ptr(m);
+		put_cpu_var(grant_copy_op);
+		put_cpu_var(meta);
 		return;
 	}
 
@@ -599,8 +600,8 @@ void xenvif_rx_action(struct xenvif *vif)
 	if (!skb_queue_empty(&vif->rx_queue))
 		xenvif_kick_thread(vif);
 
-	put_cpu_ptr(gco);
-	put_cpu_ptr(m);
+	put_cpu_var(grant_copy_op);
+	put_cpu_var(meta);
 }
 
 void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
@@ -1287,12 +1288,12 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
 	if (unlikely(!tx_work_todo(vif)))
 		return 0;
 
-	tco = get_cpu_ptr(tx_copy_ops);
+	tco = get_cpu_var(tx_copy_ops);
 
 	nr_gops = xenvif_tx_build_gops(vif, tco);
 
 	if (nr_gops == 0) {
-		put_cpu_ptr(tco);
+		put_cpu_var(tx_copy_ops);
 		return 0;
 	}
 
@@ -1301,7 +1302,7 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
 
 	work_done = xenvif_tx_submit(vif, tco, budget);
 
-	put_cpu_ptr(tco);
+	put_cpu_var(tx_copy_ops);
 
 	return work_done;
 }
@@ -1452,31 +1453,97 @@ int xenvif_kthread(void *data)
 	return 0;
 }
 
+static int __create_percpu_scratch_space(unsigned int cpu)
+{
+	per_cpu(tx_copy_ops, cpu) =
+		vzalloc(sizeof(struct gnttab_copy) * MAX_PENDING_REQS);
+
+	per_cpu(grant_copy_op, cpu) =
+		vzalloc(sizeof(struct gnttab_copy)
+			* 2 * XEN_NETIF_RX_RING_SIZE);
+
+	per_cpu(meta, cpu) = vzalloc(sizeof(struct xenvif_rx_meta)
+				     * 2 * XEN_NETIF_RX_RING_SIZE);
+
+	if (!per_cpu(tx_copy_ops, cpu) ||
+	    !per_cpu(grant_copy_op, cpu) ||
+	    !per_cpu(meta, cpu))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __free_percpu_scratch_space(unsigned int cpu)
+{
+	/* freeing NULL pointer is legit */
+	vfree(per_cpu(tx_copy_ops, cpu));
+	vfree(per_cpu(grant_copy_op, cpu));
+	vfree(per_cpu(meta, cpu));
+}
+
+static int __netback_percpu_callback(struct notifier_block *nfb,
+				     unsigned long action, void *hcpu)
+{
+	unsigned int cpu = (unsigned long)hcpu;
+	int rc = NOTIFY_DONE;
+
+	switch (action) {
+	case CPU_ONLINE:
+	case CPU_ONLINE_FROZEN:
+		printk(KERN_INFO
+		       "netback: CPU %x online, creating scratch space\n", cpu);
+		rc = __create_percpu_scratch_space(cpu);
+		if (rc) {
+			printk(KERN_ALERT
+			       "netback: failed to create scratch space for CPU"
+			       " %x\n", cpu);
+			/* FIXME: nothing more we can do here, we will
+			 * print out warning message when thread or
+			 * NAPI runs on this cpu. Also stop getting
+			 * called in the future.
+			 */
+			__free_percpu_scratch_space(cpu);
+			rc = NOTIFY_BAD;
+		} else {
+			rc = NOTIFY_OK;
+		}
+		break;
+	case CPU_DEAD:
+	case CPU_DEAD_FROZEN:
+		printk("netback: CPU %x offline, destroying scratch space\n",
+		       cpu);
+		__free_percpu_scratch_space(cpu);
+		rc = NOTIFY_OK;
+		break;
+	default:
+		break;
+	}
+
+	return rc;
+}
+
+static struct notifier_block netback_notifier_block = {
+	.notifier_call = __netback_percpu_callback,
+};
 
 static int __init netback_init(void)
 {
 	int rc = -ENOMEM;
+	int cpu;
 
 	if (!xen_domain())
 		return -ENODEV;
 
-	tx_copy_ops = __alloc_percpu(sizeof(struct gnttab_copy)
-				     * MAX_PENDING_REQS,
-				     __alignof__(struct gnttab_copy));
-	if (!tx_copy_ops)
-		goto failed_init;
+	/* Don't need to disable preempt here, since nobody else will
+	 * touch these percpu areas during start up. */
+	for_each_online_cpu(cpu) {
+		rc = __create_percpu_scratch_space(cpu);
 
-	grant_copy_op = __alloc_percpu(sizeof(struct gnttab_copy)
-				       * 2 * XEN_NETIF_RX_RING_SIZE,
-				       __alignof__(struct gnttab_copy));
-	if (!grant_copy_op)
-		goto failed_init_gco;
+		if (rc)
+			goto failed_init;
+	}
 
-	meta = __alloc_percpu(sizeof(struct xenvif_rx_meta)
-			      * 2 * XEN_NETIF_RX_RING_SIZE,
-			      __alignof__(struct xenvif_rx_meta));
-	if (!meta)
-		goto failed_init_meta;
+	register_hotcpu_notifier(&netback_notifier_block);
 
 	rc = page_pool_init();
 	if (rc)
@@ -1491,25 +1558,26 @@ static int __init netback_init(void)
 failed_init_xenbus:
 	page_pool_destroy();
 failed_init_pool:
-	free_percpu(meta);
-failed_init_meta:
-	free_percpu(grant_copy_op);
-failed_init_gco:
-	free_percpu(tx_copy_ops);
+	for_each_online_cpu(cpu)
+		__free_percpu_scratch_space(cpu);
 failed_init:
 	return rc;
-
 }
 
 module_init(netback_init);
 
 static void __exit netback_exit(void)
 {
+	int cpu;
+
 	xenvif_xenbus_exit();
 	page_pool_destroy();
-	free_percpu(meta);
-	free_percpu(grant_copy_op);
-	free_percpu(tx_copy_ops);
+
+	unregister_hotcpu_notifier(&netback_notifier_block);
+
+	/* Since we're here, nobody else will touch per-cpu area. */
+	for_each_online_cpu(cpu)
+		__free_percpu_scratch_space(cpu);
 }
 module_exit(netback_exit);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 11/16] netback: print alert and bail when scratch space is not available.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (9 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 12/16] netback: multi-page ring support Wei Liu
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

CPU online event causes our callback to allocate scratch space for
this CPU, which may fail. The simplest and best action when NAPI or
kthread is scheduled on that CPU is to bail.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/interface.c |    5 ++++-
 drivers/net/xen-netback/netback.c   |   17 +++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index d7a7cd9..a5de556 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -69,6 +69,9 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 	struct xenvif *vif = container_of(napi, struct xenvif, napi);
 	int work_done = 0;
 
+	/* N.B.: work_done may be -ENOMEM, indicating scratch space on
+	 * this CPU is not usable. In this situation, we shutdown
+	 * NAPI. See __napi_complete check below.*/
 	work_done = xenvif_tx_action(vif, budget);
 
 	if (work_done < budget) {
@@ -78,7 +81,7 @@ static int xenvif_poll(struct napi_struct *napi, int budget)
 		local_irq_save(flag);
 
 		RING_FINAL_CHECK_FOR_REQUESTS(&vif->tx, more_to_do);
-		if (!more_to_do)
+		if (!more_to_do || work_done < 0)
 			__napi_complete(napi);
 
 		local_irq_restore(flag);
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 2ac9b84..df63703 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -490,6 +490,15 @@ void xenvif_rx_action(struct xenvif *vif)
 		.meta  = m,
 	};
 
+	if (gco == NULL || m == NULL) {
+		put_cpu_var(grant_copy_op);
+		put_cpu_var(meta);
+		printk(KERN_ALERT "netback: CPU %x scratch space is not usable,"
+		       " not doing any TX work for vif%u.%u\n",
+		       smp_processor_id(), vif->domid, vif->handle);
+		return;
+	}
+
 	skb_queue_head_init(&rxq);
 
 	count = 0;
@@ -1290,6 +1299,14 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
 
 	tco = get_cpu_var(tx_copy_ops);
 
+	if (tco == NULL) {
+		put_cpu_var(tx_copy_ops);
+		printk(KERN_ALERT "netback: CPU %x scratch space is not usable,"
+		       " not doing any RX work for vif%u.%u\n",
+		       smp_processor_id(), vif->domid, vif->handle);
+		return -ENOMEM;
+	}
+
 	nr_gops = xenvif_tx_build_gops(vif, tco);
 
 	if (nr_gops == 0) {
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (10 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 11/16] netback: print alert and bail when scratch space is not available Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 16:35     ` Jan Beulich
  2012-01-30 14:45 ` [RFC PATCH V3 13/16] netback: stub for multi receive protocol support Wei Liu
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Extend netback to support multi-page ring.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h    |   44 ++++++++++---
 drivers/net/xen-netback/interface.c |   33 +++++++--
 drivers/net/xen-netback/netback.c   |  116 +++++++++++++++++++++----------
 drivers/net/xen-netback/xenbus.c    |  129 +++++++++++++++++++++++++++++++++--
 4 files changed, 262 insertions(+), 60 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 28121f1..3cf9b8f 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -58,16 +58,36 @@ struct xenvif_rx_meta {
 
 #define MAX_BUFFER_OFFSET PAGE_SIZE
 
-#define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define XEN_NETIF_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
+#define NETBK_TX_RING_SIZE(_nr_pages)					\
+	(__CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE * (_nr_pages)))
+#define NETBK_RX_RING_SIZE(_nr_pages)					\
+	(__CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE * (_nr_pages)))
 
-#define MAX_PENDING_REQS 256
+#define NETBK_MAX_RING_PAGE_ORDER 2
+#define NETBK_MAX_RING_PAGES      (1U << NETBK_MAX_RING_PAGE_ORDER)
+
+#define NETBK_MAX_TX_RING_SIZE NETBK_TX_RING_SIZE(NETBK_MAX_RING_PAGES)
+#define NETBK_MAX_RX_RING_SIZE NETBK_RX_RING_SIZE(NETBK_MAX_RING_PAGES)
+
+#define INVALID_GRANT_HANDLE ((grant_handle_t)~0U)
+
+#define MAX_PENDING_REQS NETBK_MAX_TX_RING_SIZE
+
+struct xen_comms {
+	struct vm_struct *ring_area;
+	grant_handle_t    shmem_handle[NETBK_MAX_RING_PAGES];
+	unsigned int      nr_handles;
+};
 
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
 	unsigned int     handle;
 
+	/* Multi-page ring support */
+	struct xen_comms tx_comms;
+	struct xen_comms rx_comms;
+
 	/* Use NAPI for guest TX */
 	struct napi_struct napi;
 	/* Use kthread for guest RX */
@@ -131,8 +151,10 @@ struct xenvif *xenvif_alloc(struct device *parent,
 			    domid_t domid,
 			    unsigned int handle);
 
-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
-		   unsigned long rx_ring_ref, unsigned int evtchn);
+int xenvif_connect(struct xenvif *vif,
+		   unsigned long tx_ring_ref[], unsigned int tx_ring_order,
+		   unsigned long rx_ring_ref[], unsigned int rx_ring_order,
+		   unsigned int evtchn);
 void xenvif_disconnect(struct xenvif *vif);
 
 int xenvif_xenbus_init(void);
@@ -145,10 +167,11 @@ int xenvif_rx_ring_full(struct xenvif *vif);
 int xenvif_must_stop_queue(struct xenvif *vif);
 
 /* (Un)Map communication rings. */
-void xenvif_unmap_frontend_rings(struct xenvif *vif);
-int xenvif_map_frontend_rings(struct xenvif *vif,
-			      grant_ref_t tx_ring_ref,
-			      grant_ref_t rx_ring_ref);
+void xenvif_unmap_frontend_rings(struct xen_comms *comms);
+int xenvif_map_frontend_rings(struct xen_comms *comms,
+			      int domid,
+			      unsigned long ring_ref[],
+			      unsigned int  ring_ref_count);
 
 /* Check for SKBs from frontend and schedule backend processing */
 void xenvif_check_rx_xenvif(struct xenvif *vif);
@@ -166,4 +189,7 @@ void xenvif_rx_action(struct xenvif *vif);
 
 int xenvif_kthread(void *data);
 
+extern unsigned int MODPARM_netback_max_tx_ring_page_order;
+extern unsigned int MODPARM_netback_max_rx_ring_page_order;
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index a5de556..29f4fd9 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -322,10 +322,14 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	return vif;
 }
 
-int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
-		   unsigned long rx_ring_ref, unsigned int evtchn)
+int xenvif_connect(struct xenvif *vif,
+		   unsigned long tx_ring_ref[], unsigned int tx_ring_ref_count,
+		   unsigned long rx_ring_ref[], unsigned int rx_ring_ref_count,
+		   unsigned int evtchn)
 {
 	int err = -ENOMEM;
+	struct xen_netif_tx_sring *txs;
+	struct xen_netif_rx_sring *rxs;
 
 	/* Already connected through? */
 	if (vif->irq)
@@ -333,15 +337,25 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 
 	__module_get(THIS_MODULE);
 
-	err = xenvif_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
-	if (err < 0)
+	err = xenvif_map_frontend_rings(&vif->tx_comms, vif->domid,
+					tx_ring_ref, tx_ring_ref_count);
+	if (err)
 		goto err;
+	txs = (struct xen_netif_tx_sring *)vif->tx_comms.ring_area->addr;
+	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE * tx_ring_ref_count);
+
+	err = xenvif_map_frontend_rings(&vif->rx_comms, vif->domid,
+					rx_ring_ref, rx_ring_ref_count);
+	if (err)
+		goto err_tx_unmap;
+	rxs = (struct xen_netif_rx_sring *)vif->rx_comms.ring_area->addr;
+	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE * rx_ring_ref_count);
 
 	err = bind_interdomain_evtchn_to_irqhandler(
 		vif->domid, evtchn, xenvif_interrupt, 0,
 		vif->dev->name, vif);
 	if (err < 0)
-		goto err_unmap;
+		goto err_rx_unmap;
 	vif->irq = err;
 	disable_irq(vif->irq);
 
@@ -369,8 +383,10 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
 	return 0;
 err_unbind:
 	unbind_from_irqhandler(vif->irq, vif);
-err_unmap:
-	xenvif_unmap_frontend_rings(vif);
+err_rx_unmap:
+	xenvif_unmap_frontend_rings(&vif->rx_comms);
+err_tx_unmap:
+	xenvif_unmap_frontend_rings(&vif->tx_comms);
 err:
 	module_put(THIS_MODULE);
 	return err;
@@ -403,7 +419,8 @@ void xenvif_disconnect(struct xenvif *vif)
 
 	unregister_netdev(vif->dev);
 
-	xenvif_unmap_frontend_rings(vif);
+	xenvif_unmap_frontend_rings(&vif->tx_comms);
+	xenvif_unmap_frontend_rings(&vif->rx_comms);
 
 	free_netdev(vif->dev);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index df63703..96f354c 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -49,6 +49,17 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
+unsigned int MODPARM_netback_max_rx_ring_page_order = NETBK_MAX_RING_PAGE_ORDER;
+module_param_named(netback_max_rx_ring_page_order,
+		   MODPARM_netback_max_rx_ring_page_order, uint, 0);
+MODULE_PARM_DESC(netback_max_rx_ring_page_order,
+		 "Maximum supported receiver ring page order");
+
+unsigned int MODPARM_netback_max_tx_ring_page_order = NETBK_MAX_RING_PAGE_ORDER;
+module_param_named(netback_max_tx_ring_page_order,
+		   MODPARM_netback_max_tx_ring_page_order, uint, 0);
+MODULE_PARM_DESC(netback_max_tx_ring_page_order,
+		 "Maximum supported transmitter ring page order");
 
 DEFINE_PER_CPU(struct gnttab_copy *, tx_copy_ops);
 
@@ -132,9 +143,11 @@ int xenvif_rx_ring_full(struct xenvif *vif)
 {
 	RING_IDX peek   = vif->rx_req_cons_peek;
 	RING_IDX needed = max_required_rx_slots(vif);
+	struct xen_comms *comms = &vif->rx_comms;
 
 	return ((vif->rx.sring->req_prod - peek) < needed) ||
-	       ((vif->rx.rsp_prod_pvt + XEN_NETIF_RX_RING_SIZE - peek) < needed);
+	       ((vif->rx.rsp_prod_pvt +
+		 NETBK_RX_RING_SIZE(comms->nr_handles) - peek) < needed);
 }
 
 int xenvif_must_stop_queue(struct xenvif *vif)
@@ -481,6 +494,7 @@ void xenvif_rx_action(struct xenvif *vif)
 	unsigned long offset;
 	struct skb_cb_overlay *sco;
 	int need_to_notify = 0;
+	struct xen_comms *comms = &vif->rx_comms;
 
 	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
 	struct xenvif_rx_meta *m = get_cpu_var(meta);
@@ -515,7 +529,8 @@ void xenvif_rx_action(struct xenvif *vif)
 		__skb_queue_tail(&rxq, skb);
 
 		/* Filled the batch queue? */
-		if (count + MAX_SKB_FRAGS >= XEN_NETIF_RX_RING_SIZE)
+		if (count + MAX_SKB_FRAGS >=
+		    NETBK_RX_RING_SIZE(comms->nr_handles))
 			break;
 	}
 
@@ -527,7 +542,7 @@ void xenvif_rx_action(struct xenvif *vif)
 		return;
 	}
 
-	BUG_ON(npo.copy_prod > (2 * XEN_NETIF_RX_RING_SIZE));
+	BUG_ON(npo.copy_prod > (2 * NETBK_MAX_RX_RING_SIZE));
 	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gco,
 					npo.copy_prod);
 	BUG_ON(ret != 0);
@@ -1405,48 +1420,77 @@ static inline int tx_work_todo(struct xenvif *vif)
 	return 0;
 }
 
-void xenvif_unmap_frontend_rings(struct xenvif *vif)
+void xenvif_unmap_frontend_rings(struct xen_comms *comms)
 {
-	if (vif->tx.sring)
-		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-					vif->tx.sring);
-	if (vif->rx.sring)
-		xenbus_unmap_ring_vfree(xenvif_to_xenbus_device(vif),
-					vif->rx.sring);
+	struct gnttab_unmap_grant_ref op[NETBK_MAX_RING_PAGES];
+	unsigned int i;
+	unsigned int j;
+
+	if (!comms->ring_area)
+		return;
+
+	j = 0;
+	for (i = 0; i < comms->nr_handles; i++) {
+		unsigned long addr = (unsigned long)comms->ring_area->addr +
+			(i * PAGE_SIZE);
+
+		if (comms->shmem_handle[i] != INVALID_GRANT_HANDLE) {
+			gnttab_set_unmap_op(&op[j++], addr,
+					    GNTMAP_host_map,
+					    comms->shmem_handle[i]);
+			comms->shmem_handle[i] = INVALID_GRANT_HANDLE;
+		}
+	}
+
+	comms->nr_handles = 0;
+
+	if (j != 0) {
+		if (HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref,
+					      op, j))
+			BUG();
+	}
+
+	free_vm_area(comms->ring_area);
 }
 
-int xenvif_map_frontend_rings(struct xenvif *vif,
-			      grant_ref_t tx_ring_ref,
-			      grant_ref_t rx_ring_ref)
+int xenvif_map_frontend_rings(struct xen_comms *comms,
+			      int domid,
+			      unsigned long ring_ref[],
+			      unsigned int  ring_ref_count)
 {
-	void *addr;
-	struct xen_netif_tx_sring *txs;
-	struct xen_netif_rx_sring *rxs;
-
-	int err = -ENOMEM;
+	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
+	unsigned int i;
+	int err = 0;
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
-				     tx_ring_ref, &addr);
-	if (err)
-		goto err;
+	comms->ring_area = alloc_vm_area(PAGE_SIZE * ring_ref_count, NULL);
+	if (comms->ring_area == NULL)
+		return -ENOMEM;
 
-	txs = (struct xen_netif_tx_sring *)addr;
-	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
+	for (i = 0; i < ring_ref_count; i++) {
+		unsigned long addr = (unsigned long)comms->ring_area->addr +
+			(i * PAGE_SIZE);
+		gnttab_set_map_op(&op[i], addr, GNTMAP_host_map,
+				  ring_ref[i], domid);
+	}
 
-	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
-				     rx_ring_ref, &addr);
-	if (err)
-		goto err;
+	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
+				      &op, ring_ref_count))
+		BUG();
 
-	rxs = (struct xen_netif_rx_sring *)addr;
-	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE);
+	comms->nr_handles = ring_ref_count;
 
-	vif->rx_req_cons_peek = 0;
+	for (i = 0; i < ring_ref_count; i++) {
+		if (op[i].status != 0) {
+			err = op[i].status;
+			comms->shmem_handle[i] = INVALID_GRANT_HANDLE;
+			continue;
+		}
+		comms->shmem_handle[i] = op[i].handle;
+	}
 
-	return 0;
+	if (err != 0)
+		xenvif_unmap_frontend_rings(comms);
 
-err:
-	xenvif_unmap_frontend_rings(vif);
 	return err;
 }
 
@@ -1477,10 +1521,10 @@ static int __create_percpu_scratch_space(unsigned int cpu)
 
 	per_cpu(grant_copy_op, cpu) =
 		vzalloc(sizeof(struct gnttab_copy)
-			* 2 * XEN_NETIF_RX_RING_SIZE);
+			* 2 * NETBK_MAX_RX_RING_SIZE);
 
 	per_cpu(meta, cpu) = vzalloc(sizeof(struct xenvif_rx_meta)
-				     * 2 * XEN_NETIF_RX_RING_SIZE);
+				     * 2 * NETBK_MAX_RX_RING_SIZE);
 
 	if (!per_cpu(tx_copy_ops, cpu) ||
 	    !per_cpu(grant_copy_op, cpu) ||
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index f1e89ca..79499fc 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -113,6 +113,23 @@ static int netback_probe(struct xenbus_device *dev,
 			message = "writing feature-rx-flip";
 			goto abort_transaction;
 		}
+		err = xenbus_printf(xbt, dev->nodename,
+				    "max-tx-ring-page-order",
+				    "%u",
+				    MODPARM_netback_max_tx_ring_page_order);
+		if (err) {
+			message = "writing max-tx-ring-page-order";
+			goto abort_transaction;
+		}
+
+		err = xenbus_printf(xbt, dev->nodename,
+				    "max-rx-ring-page-order",
+				    "%u",
+				    MODPARM_netback_max_rx_ring_page_order);
+		if (err) {
+			message = "writing max-rx-ring-page-order";
+			goto abort_transaction;
+		}
 
 		err = xenbus_transaction_end(xbt, 0);
 	} while (err == -EAGAIN);
@@ -391,22 +408,108 @@ static int connect_rings(struct backend_info *be)
 {
 	struct xenvif *vif = be->vif;
 	struct xenbus_device *dev = be->dev;
-	unsigned long tx_ring_ref, rx_ring_ref;
 	unsigned int evtchn, rx_copy;
 	int err;
 	int val;
+	unsigned long tx_ring_ref[NETBK_MAX_RING_PAGES];
+	unsigned long rx_ring_ref[NETBK_MAX_RING_PAGES];
+	unsigned int  tx_ring_order;
+	unsigned int  rx_ring_order;
 
 	err = xenbus_gather(XBT_NIL, dev->otherend,
-			    "tx-ring-ref", "%lu", &tx_ring_ref,
-			    "rx-ring-ref", "%lu", &rx_ring_ref,
 			    "event-channel", "%u", &evtchn, NULL);
 	if (err) {
 		xenbus_dev_fatal(dev, err,
-				 "reading %s/ring-ref and event-channel",
+				 "reading %s/event-channel",
 				 dev->otherend);
 		return err;
 	}
 
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "tx-ring-order", "%u",
+			   &tx_ring_order);
+	if (err < 0) {
+		tx_ring_order = 0;
+
+		err = xenbus_scanf(XBT_NIL, dev->otherend, "tx-ring-ref", "%lu",
+				   &tx_ring_ref[0]);
+		if (err < 0) {
+			xenbus_dev_fatal(dev, err, "reading %s/tx-ring-ref",
+					 dev->otherend);
+			return err;
+		}
+	} else {
+		unsigned int i;
+
+		if (tx_ring_order > MODPARM_netback_max_tx_ring_page_order) {
+			err = -EINVAL;
+
+			xenbus_dev_fatal(dev, err,
+					 "%s/tx-ring-page-order too big",
+					 dev->otherend);
+			return err;
+		}
+
+		for (i = 0; i < (1U << tx_ring_order); i++) {
+			char ring_ref_name[sizeof("tx-ring-ref") + 2];
+
+			snprintf(ring_ref_name, sizeof(ring_ref_name),
+				 "tx-ring-ref%u", i);
+
+			err = xenbus_scanf(XBT_NIL, dev->otherend,
+					   ring_ref_name, "%lu",
+					   &tx_ring_ref[i]);
+			if (err < 0) {
+				xenbus_dev_fatal(dev, err,
+						 "reading %s/%s",
+						 dev->otherend,
+						 ring_ref_name);
+				return err;
+			}
+		}
+	}
+
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-ring-order", "%u",
+			   &rx_ring_order);
+	if (err < 0) {
+		rx_ring_order = 0;
+		err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-ring-ref", "%lu",
+				   &rx_ring_ref[0]);
+		if (err < 0) {
+			xenbus_dev_fatal(dev, err, "reading %s/rx-ring-ref",
+					 dev->otherend);
+			return err;
+		}
+	} else {
+		unsigned int i;
+
+		if (rx_ring_order > MODPARM_netback_max_rx_ring_page_order) {
+			err = -EINVAL;
+
+			xenbus_dev_fatal(dev, err,
+					 "%s/rx-ring-page-order too big",
+					 dev->otherend);
+			return err;
+		}
+
+		for (i = 0; i < (1U << rx_ring_order); i++) {
+			char ring_ref_name[sizeof("rx-ring-ref") + 2];
+
+			snprintf(ring_ref_name, sizeof(ring_ref_name),
+				 "rx-ring-ref%u", i);
+
+			err = xenbus_scanf(XBT_NIL, dev->otherend,
+					   ring_ref_name, "%lu",
+					   &rx_ring_ref[i]);
+			if (err < 0) {
+				xenbus_dev_fatal(dev, err,
+						 "reading %s/%s",
+						 dev->otherend,
+						 ring_ref_name);
+				return err;
+			}
+		}
+	}
+
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
 			   &rx_copy);
 	if (err == -ENOENT) {
@@ -453,11 +556,23 @@ static int connect_rings(struct backend_info *be)
 	vif->csum = !val;
 
 	/* Map the shared frame, irq etc. */
-	err = xenvif_connect(vif, tx_ring_ref, rx_ring_ref, evtchn);
+	err = xenvif_connect(vif,
+			     tx_ring_ref, (1U << tx_ring_order),
+			     rx_ring_ref, (1U << rx_ring_order),
+			     evtchn);
 	if (err) {
+		int i;
 		xenbus_dev_fatal(dev, err,
-				 "mapping shared-frames %lu/%lu port %u",
-				 tx_ring_ref, rx_ring_ref, evtchn);
+				 "binding port %u",
+				 evtchn);
+		for (i = 0; i < (1U << tx_ring_order); i++)
+			xenbus_dev_fatal(dev, err,
+					 "mapping tx ring handle: %lu",
+					 tx_ring_ref[i]);
+		for (i = 0; i < (1U << rx_ring_order); i++)
+			xenbus_dev_fatal(dev, err,
+					 "mapping rx ring handle: %lu",
+					 tx_ring_ref[i]);
 		return err;
 	}
 	return 0;
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 13/16] netback: stub for multi receive protocol support.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (11 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 12/16] netback: multi-page ring support Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 21:47   ` [Xen-devel] " Konrad Rzeszutek Wilk
  2012-01-30 14:45 ` [RFC PATCH V3 14/16] netback: split event channels support Wei Liu
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Refactor netback, make stub for mutli receive protocols. Also stub
existing code as protocol 0.

Now the file layout becomes:

 - interface.c: xenvif interfaces
 - xenbus.c: xenbus related functions
 - netback.c: common functions for various protocols

For different protocols:

 - xenvif_rx_protocolX.h: header file for the protocol, including
                          protocol structures and functions
 - xenvif_rx_protocolX.c: implementations

To add a new protocol:

 - include protocol header in common.h
 - modify XENVIF_MAX_RX_PROTOCOL in common.h
 - add protocol structure in xenvif.rx union
 - stub in xenbus.c
 - modify Makefile

A protocol should define five functions:

 - setup: setup frontend / backend ring connections
 - teardown: teardown frontend / backend ring connections
 - start_xmit: host start xmit (i.e. guest need to do rx)
 - event: rx completion event
 - action: prepare host side data for guest rx

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/Makefile              |    2 +-
 drivers/net/xen-netback/common.h              |   34 +-
 drivers/net/xen-netback/interface.c           |   49 +-
 drivers/net/xen-netback/netback.c             |  528 +---------------------
 drivers/net/xen-netback/xenbus.c              |    8 +-
 drivers/net/xen-netback/xenvif_rx_protocol0.c |  616 +++++++++++++++++++++++++
 drivers/net/xen-netback/xenvif_rx_protocol0.h |   53 +++
 7 files changed, 732 insertions(+), 558 deletions(-)
 create mode 100644 drivers/net/xen-netback/xenvif_rx_protocol0.c
 create mode 100644 drivers/net/xen-netback/xenvif_rx_protocol0.h

diff --git a/drivers/net/xen-netback/Makefile b/drivers/net/xen-netback/Makefile
index dc4b8b1..fed8add 100644
--- a/drivers/net/xen-netback/Makefile
+++ b/drivers/net/xen-netback/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_XEN_NETDEV_BACKEND) := xen-netback.o
 
-xen-netback-y := netback.o xenbus.o interface.o page_pool.o
+xen-netback-y := netback.o xenbus.o interface.o page_pool.o xenvif_rx_protocol0.o
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 3cf9b8f..f3d95b3 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -46,6 +46,7 @@
 #include <xen/xenbus.h>
 
 #include "page_pool.h"
+#include "xenvif_rx_protocol0.h"
 
 struct xenvif_rx_meta {
 	int id;
@@ -79,6 +80,9 @@ struct xen_comms {
 	unsigned int      nr_handles;
 };
 
+#define XENVIF_MIN_RX_PROTOCOL 0
+#define XENVIF_MAX_RX_PROTOCOL 0
+
 struct xenvif {
 	/* Unique identifier for this interface. */
 	domid_t          domid;
@@ -99,9 +103,13 @@ struct xenvif {
 	/* Physical parameters of the comms window. */
 	unsigned int     irq;
 
-	/* The shared rings and indexes. */
+	/* The shared tx ring and index. */
 	struct xen_netif_tx_back_ring tx;
-	struct xen_netif_rx_back_ring rx;
+
+	/* Multi receive protocol support */
+	union {
+		struct xenvif_rx_protocol0 p0;
+	} rx;
 
 	/* Frontend feature information. */
 	u8 can_sg:1;
@@ -112,13 +120,6 @@ struct xenvif {
 	/* Internal feature information. */
 	u8 can_queue:1;	    /* can queue packets for receiver? */
 
-	/*
-	 * Allow xenvif_start_xmit() to peek ahead in the rx request
-	 * ring.  This is a prediction of what rx_req_cons will be
-	 * once all queued skbs are put on the ring.
-	 */
-	RING_IDX rx_req_cons_peek;
-
 	/* Transmit shaping: allow 'credit_bytes' every 'credit_usec'. */
 	unsigned long   credit_bytes;
 	unsigned long   credit_usec;
@@ -128,6 +129,13 @@ struct xenvif {
 	/* Statistics */
 	unsigned long rx_gso_checksum_fixup;
 
+	/* Hooks for multi receive protocol support */
+	int  (*setup)(struct xenvif *);
+	void (*start_xmit)(struct xenvif *, struct sk_buff *);
+	void (*teardown)(struct xenvif *);
+	void (*event)(struct xenvif *);
+	void (*action)(struct xenvif *);
+
 	/* Miscellaneous private stuff. */
 	struct net_device *dev;
 
@@ -154,7 +162,7 @@ struct xenvif *xenvif_alloc(struct device *parent,
 int xenvif_connect(struct xenvif *vif,
 		   unsigned long tx_ring_ref[], unsigned int tx_ring_order,
 		   unsigned long rx_ring_ref[], unsigned int rx_ring_order,
-		   unsigned int evtchn);
+		   unsigned int evtchn, unsigned int rx_protocol);
 void xenvif_disconnect(struct xenvif *vif);
 
 int xenvif_xenbus_init(void);
@@ -178,8 +186,6 @@ void xenvif_check_rx_xenvif(struct xenvif *vif);
 
 /* Queue an SKB for transmission to the frontend */
 void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb);
-/* Notify xenvif that ring now has space to send an skb to the frontend */
-void xenvif_notify_tx_completion(struct xenvif *vif);
 
 /* Returns number of ring slots required to send an skb to the frontend */
 unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb);
@@ -188,7 +194,11 @@ int xenvif_tx_action(struct xenvif *vif, int budget);
 void xenvif_rx_action(struct xenvif *vif);
 
 int xenvif_kthread(void *data);
+void xenvif_kick_thread(struct xenvif *vif);
+
+int xenvif_max_required_rx_slots(struct xenvif *vif);
 
+extern unsigned int MODPARM_netback_max_rx_protocol;
 extern unsigned int MODPARM_netback_max_tx_ring_page_order;
 extern unsigned int MODPARM_netback_max_rx_ring_page_order;
 
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 29f4fd9..0f05f03 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -46,17 +46,12 @@ int xenvif_schedulable(struct xenvif *vif)
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
 }
 
-static int xenvif_rx_schedulable(struct xenvif *vif)
-{
-	return xenvif_schedulable(vif) && !xenvif_rx_ring_full(vif);
-}
-
 static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;
 
-	if (xenvif_rx_schedulable(vif))
-		netif_wake_queue(vif->dev);
+	if (xenvif_schedulable(vif) && vif->event != NULL)
+		vif->event(vif);
 
 	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
 		napi_schedule(&vif->napi);
@@ -100,17 +95,11 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (vif->task == NULL)
 		goto drop;
 
-	/* Drop the packet if the target domain has no receive buffers. */
-	if (!xenvif_rx_schedulable(vif))
+	/* Drop the packet if vif does not support transmit */
+	if (vif->start_xmit == NULL)
 		goto drop;
 
-	/* Reserve ring slots for the worst-case number of fragments. */
-	vif->rx_req_cons_peek += xenvif_count_skb_slots(vif, skb);
-
-	if (vif->can_queue && xenvif_must_stop_queue(vif))
-		netif_stop_queue(dev);
-
-	xenvif_queue_tx_skb(vif, skb);
+	vif->start_xmit(vif, skb);
 
 	return NETDEV_TX_OK;
 
@@ -120,12 +109,6 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
-void xenvif_notify_tx_completion(struct xenvif *vif)
-{
-	if (netif_queue_stopped(vif->dev) && xenvif_rx_schedulable(vif))
-		netif_wake_queue(vif->dev);
-}
-
 static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
 {
 	struct xenvif *vif = netdev_priv(dev);
@@ -325,11 +308,10 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 int xenvif_connect(struct xenvif *vif,
 		   unsigned long tx_ring_ref[], unsigned int tx_ring_ref_count,
 		   unsigned long rx_ring_ref[], unsigned int rx_ring_ref_count,
-		   unsigned int evtchn)
+		   unsigned int evtchn, unsigned int rx_protocol)
 {
 	int err = -ENOMEM;
 	struct xen_netif_tx_sring *txs;
-	struct xen_netif_rx_sring *rxs;
 
 	/* Already connected through? */
 	if (vif->irq)
@@ -348,8 +330,20 @@ int xenvif_connect(struct xenvif *vif,
 					rx_ring_ref, rx_ring_ref_count);
 	if (err)
 		goto err_tx_unmap;
-	rxs = (struct xen_netif_rx_sring *)vif->rx_comms.ring_area->addr;
-	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE * rx_ring_ref_count);
+	switch (rx_protocol) {
+	case 0:
+		vif->setup = xenvif_p0_setup;
+		vif->start_xmit = xenvif_p0_start_xmit;
+		vif->teardown = xenvif_p0_teardown;
+		vif->event = xenvif_p0_event;
+		vif->action = xenvif_p0_action;
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		goto err_rx_unmap;
+	}
+	if (vif->setup(vif))
+		goto err_rx_unmap;
 
 	err = bind_interdomain_evtchn_to_irqhandler(
 		vif->domid, evtchn, xenvif_interrupt, 0,
@@ -422,6 +416,9 @@ void xenvif_disconnect(struct xenvif *vif)
 	xenvif_unmap_frontend_rings(&vif->tx_comms);
 	xenvif_unmap_frontend_rings(&vif->rx_comms);
 
+	if (vif->teardown)
+		vif->teardown(vif);
+
 	free_netdev(vif->dev);
 
 	if (need_module_put)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 96f354c..2ea43d4 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -49,6 +49,12 @@
 #include <asm/xen/hypercall.h>
 #include <asm/xen/page.h>
 
+unsigned int MODPARM_netback_max_rx_protocol = XENVIF_MAX_RX_PROTOCOL;
+module_param_named(netback_max_rx_protocol,
+		   MODPARM_netback_max_rx_protocol, uint, 0);
+MODULE_PARM_DESC(netback_max_rx_protocol,
+		 "Maximum supported receiver protocol version");
+
 unsigned int MODPARM_netback_max_rx_ring_page_order = NETBK_MAX_RING_PAGE_ORDER;
 module_param_named(netback_max_rx_ring_page_order,
 		   MODPARM_netback_max_rx_ring_page_order, uint, 0);
@@ -79,13 +85,6 @@ static void make_tx_response(struct xenvif *vif,
 static inline int tx_work_todo(struct xenvif *vif);
 static inline int rx_work_todo(struct xenvif *vif);
 
-static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
-					     u16      id,
-					     s8       st,
-					     u16      offset,
-					     u16      size,
-					     u16      flags);
-
 static inline unsigned long idx_to_pfn(struct xenvif *vif,
 				       u16 idx)
 {
@@ -129,7 +128,7 @@ static inline pending_ring_idx_t nr_pending_reqs(struct xenvif *vif)
 		vif->pending_prod + vif->pending_cons;
 }
 
-static int max_required_rx_slots(struct xenvif *vif)
+int xenvif_max_required_rx_slots(struct xenvif *vif)
 {
 	int max = DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
 
@@ -139,495 +138,11 @@ static int max_required_rx_slots(struct xenvif *vif)
 	return max;
 }
 
-int xenvif_rx_ring_full(struct xenvif *vif)
-{
-	RING_IDX peek   = vif->rx_req_cons_peek;
-	RING_IDX needed = max_required_rx_slots(vif);
-	struct xen_comms *comms = &vif->rx_comms;
-
-	return ((vif->rx.sring->req_prod - peek) < needed) ||
-	       ((vif->rx.rsp_prod_pvt +
-		 NETBK_RX_RING_SIZE(comms->nr_handles) - peek) < needed);
-}
-
-int xenvif_must_stop_queue(struct xenvif *vif)
-{
-	if (!xenvif_rx_ring_full(vif))
-		return 0;
-
-	vif->rx.sring->req_event = vif->rx_req_cons_peek +
-		max_required_rx_slots(vif);
-	mb(); /* request notification /then/ check the queue */
-
-	return xenvif_rx_ring_full(vif);
-}
-
-/*
- * Returns true if we should start a new receive buffer instead of
- * adding 'size' bytes to a buffer which currently contains 'offset'
- * bytes.
- */
-static bool start_new_rx_buffer(int offset, unsigned long size, int head)
-{
-	/* simple case: we have completely filled the current buffer. */
-	if (offset == MAX_BUFFER_OFFSET)
-		return true;
-
-	/*
-	 * complex case: start a fresh buffer if the current frag
-	 * would overflow the current buffer but only if:
-	 *     (i)   this frag would fit completely in the next buffer
-	 * and (ii)  there is already some data in the current buffer
-	 * and (iii) this is not the head buffer.
-	 *
-	 * Where:
-	 * - (i) stops us splitting a frag into two copies
-	 *   unless the frag is too large for a single buffer.
-	 * - (ii) stops us from leaving a buffer pointlessly empty.
-	 * - (iii) stops us leaving the first buffer
-	 *   empty. Strictly speaking this is already covered
-	 *   by (ii) but is explicitly checked because
-	 *   netfront relies on the first buffer being
-	 *   non-empty and can crash otherwise.
-	 *
-	 * This means we will effectively linearise small
-	 * frags but do not needlessly split large buffers
-	 * into multiple copies tend to give large frags their
-	 * own buffers as before.
-	 */
-	if ((offset + size > MAX_BUFFER_OFFSET) &&
-	    (size <= MAX_BUFFER_OFFSET) && offset && !head)
-		return true;
-
-	return false;
-}
-
-/*
- * Figure out how many ring slots we're going to need to send @skb to
- * the guest. This function is essentially a dry run of
- * xenvif_gop_frag_copy.
- */
-unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
-{
-	unsigned int count;
-	int i, copy_off;
-
-	count = DIV_ROUND_UP(
-			offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE);
-
-	copy_off = skb_headlen(skb) % PAGE_SIZE;
-
-	if (skb_shinfo(skb)->gso_size)
-		count++;
-
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
-		unsigned long bytes;
-		while (size > 0) {
-			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
-
-			if (start_new_rx_buffer(copy_off, size, 0)) {
-				count++;
-				copy_off = 0;
-			}
-
-			bytes = size;
-			if (copy_off + bytes > MAX_BUFFER_OFFSET)
-				bytes = MAX_BUFFER_OFFSET - copy_off;
-
-			copy_off += bytes;
-			size -= bytes;
-		}
-	}
-	return count;
-}
-
-struct netrx_pending_operations {
-	unsigned copy_prod, copy_cons;
-	unsigned meta_prod, meta_cons;
-	struct gnttab_copy *copy;
-	struct xenvif_rx_meta *meta;
-	int copy_off;
-	grant_ref_t copy_gref;
-};
-
-static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
-					struct netrx_pending_operations *npo)
-{
-	struct xenvif_rx_meta *meta;
-	struct xen_netif_rx_request *req;
-
-	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
-
-	meta = npo->meta + npo->meta_prod++;
-	meta->gso_size = 0;
-	meta->size = 0;
-	meta->id = req->id;
-
-	npo->copy_off = 0;
-	npo->copy_gref = req->gref;
-
-	return meta;
-}
-
-/*
- * Set up the grant operations for this fragment. If it's a flipping
- * interface, we also set up the unmap request from here.
- */
-static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
-				 struct netrx_pending_operations *npo,
-				 struct page *page, unsigned long size,
-				 unsigned long offset, int *head)
-{
-	struct gnttab_copy *copy_gop;
-	struct xenvif_rx_meta *meta;
-	/*
-	 * These variables are used iff get_page_ext returns true,
-	 * in which case they are guaranteed to be initialized.
-	 */
-	unsigned int uninitialized_var(idx);
-	int foreign = is_in_pool(page, &idx);
-	unsigned long bytes;
-
-	/* Data must not cross a page boundary. */
-	BUG_ON(size + offset > PAGE_SIZE);
-
-	meta = npo->meta + npo->meta_prod - 1;
-
-	while (size > 0) {
-		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
-
-		if (start_new_rx_buffer(npo->copy_off, size, *head)) {
-			/*
-			 * Netfront requires there to be some data in the head
-			 * buffer.
-			 */
-			BUG_ON(*head);
-
-			meta = get_next_rx_buffer(vif, npo);
-		}
-
-		bytes = size;
-		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
-			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
-
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		if (foreign) {
-			struct pending_tx_info *src_pend = to_txinfo(idx);
-			struct xenvif *rvif = to_vif(idx);
-
-			copy_gop->source.domid = rvif->domid;
-			copy_gop->source.u.ref = src_pend->req.gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			void *vaddr = page_address(page);
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn = virt_to_mfn(vaddr);
-		}
-		copy_gop->source.offset = offset;
-		copy_gop->dest.domid = vif->domid;
-
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
-		copy_gop->len = bytes;
-
-		npo->copy_off += bytes;
-		meta->size += bytes;
-
-		offset += bytes;
-		size -= bytes;
-
-		/* Leave a gap for the GSO descriptor. */
-		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
-			vif->rx.req_cons++;
-
-		*head = 0; /* There must be something in this buffer now. */
-
-	}
-}
-
-/*
- * Prepare an SKB to be transmitted to the frontend.
- *
- * This function is responsible for allocating grant operations, meta
- * structures, etc.
- *
- * It returns the number of meta structures consumed. The number of
- * ring slots used is always equal to the number of meta slots used
- * plus the number of GSO descriptors used. Currently, we use either
- * zero GSO descriptors (for non-GSO packets) or one descriptor (for
- * frontend-side LRO).
- */
-static int xenvif_gop_skb(struct sk_buff *skb,
-			  struct netrx_pending_operations *npo)
-{
-	struct xenvif *vif = netdev_priv(skb->dev);
-	int nr_frags = skb_shinfo(skb)->nr_frags;
-	int i;
-	struct xen_netif_rx_request *req;
-	struct xenvif_rx_meta *meta;
-	unsigned char *data;
-	int head = 1;
-	int old_meta_prod;
-
-	old_meta_prod = npo->meta_prod;
-
-	/* Set up a GSO prefix descriptor, if necessary */
-	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
-		req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
-		meta = npo->meta + npo->meta_prod++;
-		meta->gso_size = skb_shinfo(skb)->gso_size;
-		meta->size = 0;
-		meta->id = req->id;
-	}
-
-	req = RING_GET_REQUEST(&vif->rx, vif->rx.req_cons++);
-	meta = npo->meta + npo->meta_prod++;
-
-	if (!vif->gso_prefix)
-		meta->gso_size = skb_shinfo(skb)->gso_size;
-	else
-		meta->gso_size = 0;
-
-	meta->size = 0;
-	meta->id = req->id;
-	npo->copy_off = 0;
-	npo->copy_gref = req->gref;
-
-	data = skb->data;
-	while (data < skb_tail_pointer(skb)) {
-		unsigned int offset = offset_in_page(data);
-		unsigned int len = PAGE_SIZE - offset;
-
-		if (data + len > skb_tail_pointer(skb))
-			len = skb_tail_pointer(skb) - data;
-
-		xenvif_gop_frag_copy(vif, skb, npo,
-				     virt_to_page(data), len, offset, &head);
-		data += len;
-	}
-
-	for (i = 0; i < nr_frags; i++) {
-		xenvif_gop_frag_copy(vif, skb, npo,
-				     skb_frag_page(&skb_shinfo(skb)->frags[i]),
-				     skb_frag_size(&skb_shinfo(skb)->frags[i]),
-				     skb_shinfo(skb)->frags[i].page_offset,
-				     &head);
-	}
-
-	return npo->meta_prod - old_meta_prod;
-}
-
-/*
- * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
- * used to set up the operations on the top of
- * netrx_pending_operations, which have since been done.  Check that
- * they didn't give any errors and advance over them.
- */
-static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
-			    struct netrx_pending_operations *npo)
-{
-	struct gnttab_copy     *copy_op;
-	int status = XEN_NETIF_RSP_OKAY;
-	int i;
-
-	for (i = 0; i < nr_meta_slots; i++) {
-		copy_op = npo->copy + npo->copy_cons++;
-		if (copy_op->status != GNTST_okay) {
-			netdev_dbg(vif->dev,
-				   "Bad status %d from copy to DOM%d.\n",
-				   copy_op->status, vif->domid);
-			status = XEN_NETIF_RSP_ERROR;
-		}
-	}
-
-	return status;
-}
-
-static void xenvif_add_frag_responses(struct xenvif *vif, int status,
-				      struct xenvif_rx_meta *meta,
-				      int nr_meta_slots)
-{
-	int i;
-	unsigned long offset;
-
-	/* No fragments used */
-	if (nr_meta_slots <= 1)
-		return;
-
-	nr_meta_slots--;
-
-	for (i = 0; i < nr_meta_slots; i++) {
-		int flags;
-		if (i == nr_meta_slots - 1)
-			flags = 0;
-		else
-			flags = XEN_NETRXF_more_data;
-
-		offset = 0;
-		make_rx_response(vif, meta[i].id, status, offset,
-				 meta[i].size, flags);
-	}
-}
-
-struct skb_cb_overlay {
-	int meta_slots_used;
-};
-
-static void xenvif_kick_thread(struct xenvif *vif)
+void xenvif_kick_thread(struct xenvif *vif)
 {
 	wake_up(&vif->wq);
 }
 
-void xenvif_rx_action(struct xenvif *vif)
-{
-	s8 status;
-	u16 flags;
-	struct xen_netif_rx_response *resp;
-	struct sk_buff_head rxq;
-	struct sk_buff *skb;
-	LIST_HEAD(notify);
-	int ret;
-	int nr_frags;
-	int count;
-	unsigned long offset;
-	struct skb_cb_overlay *sco;
-	int need_to_notify = 0;
-	struct xen_comms *comms = &vif->rx_comms;
-
-	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
-	struct xenvif_rx_meta *m = get_cpu_var(meta);
-
-	struct netrx_pending_operations npo = {
-		.copy  = gco,
-		.meta  = m,
-	};
-
-	if (gco == NULL || m == NULL) {
-		put_cpu_var(grant_copy_op);
-		put_cpu_var(meta);
-		printk(KERN_ALERT "netback: CPU %x scratch space is not usable,"
-		       " not doing any TX work for vif%u.%u\n",
-		       smp_processor_id(), vif->domid, vif->handle);
-		return;
-	}
-
-	skb_queue_head_init(&rxq);
-
-	count = 0;
-
-	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
-		vif = netdev_priv(skb->dev);
-		nr_frags = skb_shinfo(skb)->nr_frags;
-
-		sco = (struct skb_cb_overlay *)skb->cb;
-		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
-
-		count += nr_frags + 1;
-
-		__skb_queue_tail(&rxq, skb);
-
-		/* Filled the batch queue? */
-		if (count + MAX_SKB_FRAGS >=
-		    NETBK_RX_RING_SIZE(comms->nr_handles))
-			break;
-	}
-
-	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
-
-	if (!npo.copy_prod) {
-		put_cpu_var(grant_copy_op);
-		put_cpu_var(meta);
-		return;
-	}
-
-	BUG_ON(npo.copy_prod > (2 * NETBK_MAX_RX_RING_SIZE));
-	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gco,
-					npo.copy_prod);
-	BUG_ON(ret != 0);
-
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-		sco = (struct skb_cb_overlay *)skb->cb;
-
-		if (m[npo.meta_cons].gso_size && vif->gso_prefix) {
-			resp = RING_GET_RESPONSE(&vif->rx,
-						vif->rx.rsp_prod_pvt++);
-
-			resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
-
-			resp->offset = m[npo.meta_cons].gso_size;
-			resp->id = m[npo.meta_cons].id;
-			resp->status = sco->meta_slots_used;
-
-			npo.meta_cons++;
-			sco->meta_slots_used--;
-		}
-
-
-		vif->dev->stats.tx_bytes += skb->len;
-		vif->dev->stats.tx_packets++;
-
-		status = xenvif_check_gop(vif, sco->meta_slots_used, &npo);
-
-		if (sco->meta_slots_used == 1)
-			flags = 0;
-		else
-			flags = XEN_NETRXF_more_data;
-
-		if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */
-			flags |= XEN_NETRXF_csum_blank | XEN_NETRXF_data_validated;
-		else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
-			/* remote but checksummed. */
-			flags |= XEN_NETRXF_data_validated;
-
-		offset = 0;
-		resp = make_rx_response(vif, m[npo.meta_cons].id,
-					status, offset,
-					m[npo.meta_cons].size,
-					flags);
-
-		if (m[npo.meta_cons].gso_size && !vif->gso_prefix) {
-			struct xen_netif_extra_info *gso =
-				(struct xen_netif_extra_info *)
-				RING_GET_RESPONSE(&vif->rx,
-						  vif->rx.rsp_prod_pvt++);
-
-			resp->flags |= XEN_NETRXF_extra_info;
-
-			gso->u.gso.size = m[npo.meta_cons].gso_size;
-			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
-			gso->u.gso.pad = 0;
-			gso->u.gso.features = 0;
-
-			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
-			gso->flags = 0;
-		}
-
-		xenvif_add_frag_responses(vif, status,
-					  m + npo.meta_cons + 1,
-					  sco->meta_slots_used);
-
-		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx, ret);
-		if (ret)
-			need_to_notify = 1;
-
-		xenvif_notify_tx_completion(vif);
-
-		npo.meta_cons += sco->meta_slots_used;
-		dev_kfree_skb(skb);
-	}
-
-	if (need_to_notify)
-		notify_remote_via_irq(vif->irq);
-
-	if (!skb_queue_empty(&vif->rx_queue))
-		xenvif_kick_thread(vif);
-
-	put_cpu_var(grant_copy_op);
-	put_cpu_var(meta);
-}
-
 void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
 {
 	skb_queue_tail(&vif->rx_queue, skb);
@@ -1383,29 +898,6 @@ static void make_tx_response(struct xenvif *vif,
 		notify_remote_via_irq(vif->irq);
 }
 
-static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
-					     u16      id,
-					     s8       st,
-					     u16      offset,
-					     u16      size,
-					     u16      flags)
-{
-	RING_IDX i = vif->rx.rsp_prod_pvt;
-	struct xen_netif_rx_response *resp;
-
-	resp = RING_GET_RESPONSE(&vif->rx, i);
-	resp->offset     = offset;
-	resp->flags      = flags;
-	resp->id         = id;
-	resp->status     = (s16)size;
-	if (st < 0)
-		resp->status = (s16)st;
-
-	vif->rx.rsp_prod_pvt = ++i;
-
-	return resp;
-}
-
 static inline int rx_work_todo(struct xenvif *vif)
 {
 	return !skb_queue_empty(&vif->rx_queue);
@@ -1507,8 +999,8 @@ int xenvif_kthread(void *data)
 		if (kthread_should_stop())
 			break;
 
-		if (rx_work_todo(vif))
-			xenvif_rx_action(vif);
+		if (rx_work_todo(vif) && vif->action)
+			vif->action(vif);
 	}
 
 	return 0;
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 79499fc..4067286 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -415,6 +415,7 @@ static int connect_rings(struct backend_info *be)
 	unsigned long rx_ring_ref[NETBK_MAX_RING_PAGES];
 	unsigned int  tx_ring_order;
 	unsigned int  rx_ring_order;
+	unsigned int  rx_protocol;
 
 	err = xenbus_gather(XBT_NIL, dev->otherend,
 			    "event-channel", "%u", &evtchn, NULL);
@@ -510,6 +511,11 @@ static int connect_rings(struct backend_info *be)
 		}
 	}
 
+	err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-protocol",
+			   "%u", &rx_protocol);
+	if (err < 0)
+		rx_protocol = XENVIF_MIN_RX_PROTOCOL;
+
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
 			   &rx_copy);
 	if (err == -ENOENT) {
@@ -559,7 +565,7 @@ static int connect_rings(struct backend_info *be)
 	err = xenvif_connect(vif,
 			     tx_ring_ref, (1U << tx_ring_order),
 			     rx_ring_ref, (1U << rx_ring_order),
-			     evtchn);
+			     evtchn, rx_protocol);
 	if (err) {
 		int i;
 		xenbus_dev_fatal(dev, err,
diff --git a/drivers/net/xen-netback/xenvif_rx_protocol0.c b/drivers/net/xen-netback/xenvif_rx_protocol0.c
new file mode 100644
index 0000000..3c95d65
--- /dev/null
+++ b/drivers/net/xen-netback/xenvif_rx_protocol0.c
@@ -0,0 +1,616 @@
+/*
+ * netback rx protocol 0 implementation.
+ *
+ * Copyright (c) 2012, Citrix Systems Inc.
+ *
+ * Author: Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "common.h"
+
+#include <xen/events.h>
+#include <xen/interface/memory.h>
+
+#include <asm/xen/hypercall.h>
+#include <asm/xen/page.h>
+
+struct xenvif_rx_meta;
+
+#define MAX_BUFFER_OFFSET PAGE_SIZE
+
+DECLARE_PER_CPU(struct gnttab_copy *, grant_copy_op);
+DECLARE_PER_CPU(struct xenvif_rx_meta *, meta);
+
+struct netrx_pending_operations {
+	unsigned copy_prod, copy_cons;
+	unsigned meta_prod, meta_cons;
+	struct gnttab_copy *copy;
+	struct xenvif_rx_meta *meta;
+	int copy_off;
+	grant_ref_t copy_gref;
+};
+
+struct skb_cb_overlay {
+	int meta_slots_used;
+};
+
+static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
+					     u16      id,
+					     s8       st,
+					     u16      offset,
+					     u16      size,
+					     u16      flags)
+{
+	RING_IDX i = vif->rx.p0.back.rsp_prod_pvt;
+	struct xen_netif_rx_response *resp;
+
+	resp = RING_GET_RESPONSE(&vif->rx.p0.back, i);
+	resp->offset     = offset;
+	resp->flags      = flags;
+	resp->id         = id;
+	resp->status     = (s16)size;
+	if (st < 0)
+		resp->status = (s16)st;
+
+	vif->rx.p0.back.rsp_prod_pvt = ++i;
+
+	return resp;
+}
+
+int xenvif_rx_ring_full(struct xenvif *vif)
+{
+	RING_IDX peek   = vif->rx.p0.rx_req_cons_peek;
+	RING_IDX needed = xenvif_max_required_rx_slots(vif);
+	struct xen_comms *comms = &vif->rx_comms;
+
+	return ((vif->rx.p0.back.sring->req_prod - peek) < needed) ||
+		((vif->rx.p0.back.rsp_prod_pvt +
+		  NETBK_RX_RING_SIZE(comms->nr_handles) - peek) < needed);
+}
+
+int xenvif_must_stop_queue(struct xenvif *vif)
+{
+	if (!xenvif_rx_ring_full(vif))
+		return 0;
+
+	vif->rx.p0.back.sring->req_event = vif->rx.p0.rx_req_cons_peek +
+		xenvif_max_required_rx_slots(vif);
+	mb(); /* request notification /then/ check the queue */
+
+	return xenvif_rx_ring_full(vif);
+}
+
+/*
+ * Returns true if we should start a new receive buffer instead of
+ * adding 'size' bytes to a buffer which currently contains 'offset'
+ * bytes.
+ */
+static bool start_new_rx_buffer(int offset, unsigned long size, int head)
+{
+	/* simple case: we have completely filled the current buffer. */
+	if (offset == MAX_BUFFER_OFFSET)
+		return true;
+
+	/*
+	 * complex case: start a fresh buffer if the current frag
+	 * would overflow the current buffer but only if:
+	 *     (i)   this frag would fit completely in the next buffer
+	 * and (ii)  there is already some data in the current buffer
+	 * and (iii) this is not the head buffer.
+	 *
+	 * Where:
+	 * - (i) stops us splitting a frag into two copies
+	 *   unless the frag is too large for a single buffer.
+	 * - (ii) stops us from leaving a buffer pointlessly empty.
+	 * - (iii) stops us leaving the first buffer
+	 *   empty. Strictly speaking this is already covered
+	 *   by (ii) but is explicitly checked because
+	 *   netfront relies on the first buffer being
+	 *   non-empty and can crash otherwise.
+	 *
+	 * This means we will effectively linearise small
+	 * frags but do not needlessly split large buffers
+	 * into multiple copies tend to give large frags their
+	 * own buffers as before.
+	 */
+	if ((offset + size > MAX_BUFFER_OFFSET) &&
+	    (size <= MAX_BUFFER_OFFSET) && offset && !head)
+		return true;
+
+	return false;
+}
+
+static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif *vif,
+				struct netrx_pending_operations *npo)
+{
+	struct xenvif_rx_meta *meta;
+	struct xen_netif_rx_request *req;
+
+	req = RING_GET_REQUEST(&vif->rx.p0.back, vif->rx.p0.back.req_cons++);
+
+	meta = npo->meta + npo->meta_prod++;
+	meta->gso_size = 0;
+	meta->size = 0;
+	meta->id = req->id;
+
+	npo->copy_off = 0;
+	npo->copy_gref = req->gref;
+
+	return meta;
+}
+
+/*
+ * Set up the grant operations for this fragment. If it's a flipping
+ * interface, we also set up the unmap request from here.
+ */
+static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
+				struct netrx_pending_operations *npo,
+				struct page *page, unsigned long size,
+				unsigned long offset, int *head)
+{
+	struct gnttab_copy *copy_gop;
+	struct xenvif_rx_meta *meta;
+	/*
+	 * These variables are used iff get_page_ext returns true,
+	 * in which case they are guaranteed to be initialized.
+	 */
+	unsigned int uninitialized_var(idx);
+	int foreign = is_in_pool(page, &idx);
+	unsigned long bytes;
+
+	/* Data must not cross a page boundary. */
+	BUG_ON(size + offset > PAGE_SIZE);
+
+	meta = npo->meta + npo->meta_prod - 1;
+
+	while (size > 0) {
+		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
+
+		if (start_new_rx_buffer(npo->copy_off, size, *head)) {
+			/*
+			 * Netfront requires there to be some data in the head
+			 * buffer.
+			 */
+			BUG_ON(*head);
+
+			meta = get_next_rx_buffer(vif, npo);
+		}
+
+		bytes = size;
+		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
+			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
+
+		copy_gop = npo->copy + npo->copy_prod++;
+		copy_gop->flags = GNTCOPY_dest_gref;
+		if (foreign) {
+			struct pending_tx_info *src_pend = to_txinfo(idx);
+			struct xenvif *rvif = to_vif(idx);
+
+			copy_gop->source.domid = rvif->domid;
+			copy_gop->source.u.ref = src_pend->req.gref;
+			copy_gop->flags |= GNTCOPY_source_gref;
+		} else {
+			void *vaddr = page_address(page);
+			copy_gop->source.domid = DOMID_SELF;
+			copy_gop->source.u.gmfn = virt_to_mfn(vaddr);
+		}
+		copy_gop->source.offset = offset;
+		copy_gop->dest.domid = vif->domid;
+
+		copy_gop->dest.offset = npo->copy_off;
+		copy_gop->dest.u.ref = npo->copy_gref;
+		copy_gop->len = bytes;
+
+		npo->copy_off += bytes;
+		meta->size += bytes;
+
+		offset += bytes;
+		size -= bytes;
+
+		/* Leave a gap for the GSO descriptor. */
+		if (*head && skb_shinfo(skb)->gso_size && !vif->gso_prefix)
+			vif->rx.p0.back.req_cons++;
+
+		*head = 0; /* There must be something in this buffer now. */
+	}
+}
+
+/*
+ * Prepare an SKB to be transmitted to the frontend.
+ *
+ * This function is responsible for allocating grant operations, meta
+ * structures, etc.
+ *
+ * It returns the number of meta structures consumed. The number of
+ * ring slots used is always equal to the number of meta slots used
+ * plus the number of GSO descriptors used. Currently, we use either
+ * zero GSO descriptors (for non-GSO packets) or one descriptor (for
+ * frontend-side LRO).
+ */
+static int xenvif_gop_skb(struct sk_buff *skb,
+			 struct netrx_pending_operations *npo)
+{
+	struct xenvif *vif = netdev_priv(skb->dev);
+	int nr_frags = skb_shinfo(skb)->nr_frags;
+	int i;
+	struct xen_netif_rx_request *req;
+	struct xenvif_rx_meta *meta;
+	unsigned char *data;
+	int head = 1;
+	int old_meta_prod;
+
+	old_meta_prod = npo->meta_prod;
+
+	/* Set up a GSO prefix descriptor, if necessary */
+	if (skb_shinfo(skb)->gso_size && vif->gso_prefix) {
+		req = RING_GET_REQUEST(&vif->rx.p0.back,
+				       vif->rx.p0.back.req_cons++);
+		meta = npo->meta + npo->meta_prod++;
+		meta->gso_size = skb_shinfo(skb)->gso_size;
+		meta->size = 0;
+		meta->id = req->id;
+	}
+
+	req = RING_GET_REQUEST(&vif->rx.p0.back, vif->rx.p0.back.req_cons++);
+	meta = npo->meta + npo->meta_prod++;
+
+	if (!vif->gso_prefix)
+		meta->gso_size = skb_shinfo(skb)->gso_size;
+	else
+		meta->gso_size = 0;
+
+	meta->size = 0;
+	meta->id = req->id;
+	npo->copy_off = 0;
+	npo->copy_gref = req->gref;
+
+	data = skb->data;
+
+	while (data < skb_tail_pointer(skb)) {
+		unsigned int offset = offset_in_page(data);
+		unsigned int len = PAGE_SIZE - offset;
+
+		if (data + len > skb_tail_pointer(skb))
+			len = skb_tail_pointer(skb) - data;
+
+		xenvif_gop_frag_copy(vif, skb, npo,
+				    virt_to_page(data), len, offset, &head);
+		data += len;
+	}
+
+	for (i = 0; i < nr_frags; i++) {
+		xenvif_gop_frag_copy(vif, skb, npo,
+				    skb_frag_page(&skb_shinfo(skb)->frags[i]),
+				    skb_frag_size(&skb_shinfo(skb)->frags[i]),
+				    skb_shinfo(skb)->frags[i].page_offset,
+				    &head);
+	}
+
+	return npo->meta_prod - old_meta_prod;
+}
+
+/*
+ * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
+ * used to set up the operations on the top of
+ * netrx_pending_operations, which have since been done.  Check that
+ * they didn't give any errors and advance over them.
+ */
+static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
+			   struct netrx_pending_operations *npo)
+{
+	struct gnttab_copy     *copy_op;
+	int status = XEN_NETIF_RSP_OKAY;
+	int i;
+
+	for (i = 0; i < nr_meta_slots; i++) {
+		copy_op = npo->copy + npo->copy_cons++;
+		if (copy_op->status != GNTST_okay) {
+			netdev_dbg(vif->dev,
+				   "Bad status %d from copy to DOM%d.\n",
+				   copy_op->status, vif->domid);
+			status = XEN_NETIF_RSP_ERROR;
+		}
+	}
+
+	return status;
+}
+
+static void xenvif_add_frag_responses(struct xenvif *vif, int status,
+				     struct xenvif_rx_meta *meta,
+				     int nr_meta_slots)
+{
+	int i;
+	unsigned long offset;
+
+	/* No fragments used */
+	if (nr_meta_slots <= 1)
+		return;
+
+	nr_meta_slots--;
+
+	for (i = 0; i < nr_meta_slots; i++) {
+		int flags;
+		if (i == nr_meta_slots - 1)
+			flags = 0;
+		else
+			flags = XEN_NETRXF_more_data;
+
+		offset = 0;
+		make_rx_response(vif, meta[i].id, status, offset,
+				 meta[i].size, flags);
+	}
+}
+
+/*
+ * Figure out how many ring slots we're going to need to send @skb to
+ * the guest. This function is essentially a dry run of
+ * xenvif_gop_frag_copy.
+ */
+unsigned int xenvif_count_skb_slots(struct xenvif *vif, struct sk_buff *skb)
+{
+	unsigned int count;
+	int i, copy_off;
+
+	count = DIV_ROUND_UP(
+			offset_in_page(skb->data)+skb_headlen(skb), PAGE_SIZE);
+
+	copy_off = skb_headlen(skb) % PAGE_SIZE;
+
+	if (skb_shinfo(skb)->gso_size)
+		count++;
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		unsigned long size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
+		unsigned long bytes;
+		while (size > 0) {
+			BUG_ON(copy_off > MAX_BUFFER_OFFSET);
+
+			if (start_new_rx_buffer(copy_off, size, 0)) {
+				count++;
+				copy_off = 0;
+			}
+
+			bytes = size;
+			if (copy_off + bytes > MAX_BUFFER_OFFSET)
+				bytes = MAX_BUFFER_OFFSET - copy_off;
+
+			copy_off += bytes;
+			size -= bytes;
+		}
+	}
+	return count;
+}
+
+
+void xenvif_rx_action(struct xenvif *vif)
+{
+	s8 status;
+	u16 flags;
+	struct xen_netif_rx_response *resp;
+	struct sk_buff_head rxq;
+	struct sk_buff *skb;
+	LIST_HEAD(notify);
+	int ret;
+	int nr_frags;
+	int count;
+	unsigned long offset;
+	struct skb_cb_overlay *sco;
+	int need_to_notify = 0;
+	struct xen_comms *comms = &vif->rx_comms;
+
+	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
+	struct xenvif_rx_meta *m = get_cpu_var(meta);
+
+	struct netrx_pending_operations npo = {
+		.copy  = gco,
+		.meta  = m,
+	};
+
+	if (gco == NULL || m == NULL) {
+		put_cpu_var(grant_copy_op);
+		put_cpu_var(meta);
+		printk(KERN_ALERT "netback: CPU %x scratch space is not usable,"
+		       " not doing any TX work for vif%u.%u\n",
+		       smp_processor_id(), vif->domid, vif->handle);
+		return;
+	}
+
+	skb_queue_head_init(&rxq);
+
+	count = 0;
+
+	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
+		vif = netdev_priv(skb->dev);
+		nr_frags = skb_shinfo(skb)->nr_frags;
+
+		sco = (struct skb_cb_overlay *)skb->cb;
+		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
+
+		count += nr_frags + 1;
+
+		__skb_queue_tail(&rxq, skb);
+
+		/* Filled the batch queue? */
+		if (count + MAX_SKB_FRAGS >=
+		    NETBK_RX_RING_SIZE(comms->nr_handles))
+			break;
+	}
+
+	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
+
+	if (!npo.copy_prod) {
+		put_cpu_var(grant_copy_op);
+		put_cpu_var(meta);
+		return;
+	}
+
+	BUG_ON(npo.copy_prod > (2 * NETBK_MAX_RX_RING_SIZE));
+	ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gco,
+					npo.copy_prod);
+	BUG_ON(ret != 0);
+
+	while ((skb = __skb_dequeue(&rxq)) != NULL) {
+		sco = (struct skb_cb_overlay *)skb->cb;
+
+		if (m[npo.meta_cons].gso_size && vif->gso_prefix) {
+			resp = RING_GET_RESPONSE(&vif->rx.p0.back,
+					 vif->rx.p0.back.rsp_prod_pvt++);
+
+			resp->flags =
+				XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
+
+			resp->offset = m[npo.meta_cons].gso_size;
+			resp->id = m[npo.meta_cons].id;
+			resp->status = sco->meta_slots_used;
+
+			npo.meta_cons++;
+			sco->meta_slots_used--;
+		}
+
+
+		vif->dev->stats.tx_bytes += skb->len;
+		vif->dev->stats.tx_packets++;
+
+		status = xenvif_check_gop(vif, sco->meta_slots_used, &npo);
+
+		if (sco->meta_slots_used == 1)
+			flags = 0;
+		else
+			flags = XEN_NETRXF_more_data;
+
+		if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */
+			flags |= XEN_NETRXF_csum_blank |
+				XEN_NETRXF_data_validated;
+		else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
+			/* remote but checksummed. */
+			flags |= XEN_NETRXF_data_validated;
+
+		offset = 0;
+		resp = make_rx_response(vif, m[npo.meta_cons].id,
+					status, offset,
+					m[npo.meta_cons].size,
+					flags);
+
+		if (m[npo.meta_cons].gso_size && !vif->gso_prefix) {
+			struct xen_netif_extra_info *gso =
+				(struct xen_netif_extra_info *)
+				RING_GET_RESPONSE(&vif->rx.p0.back,
+					  vif->rx.p0.back.rsp_prod_pvt++);
+
+			resp->flags |= XEN_NETRXF_extra_info;
+
+			gso->u.gso.size = m[npo.meta_cons].gso_size;
+			gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4;
+			gso->u.gso.pad = 0;
+			gso->u.gso.features = 0;
+
+			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
+			gso->flags = 0;
+		}
+
+		xenvif_add_frag_responses(vif, status,
+					  m + npo.meta_cons + 1,
+					  sco->meta_slots_used);
+
+		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->rx.p0.back, ret);
+		if (ret)
+			need_to_notify = 1;
+
+		if (netif_queue_stopped(vif->dev) &&
+		    xenvif_schedulable(vif) &&
+		    !xenvif_rx_ring_full(vif))
+			netif_wake_queue(vif->dev);
+
+		npo.meta_cons += sco->meta_slots_used;
+		dev_kfree_skb(skb);
+	}
+
+	if (need_to_notify)
+		notify_remote_via_irq(vif->irq);
+
+	if (!skb_queue_empty(&vif->rx_queue))
+		xenvif_kick_thread(vif);
+
+	put_cpu_var(grant_copy_op);
+	put_cpu_var(meta);
+}
+
+int xenvif_p0_setup(struct xenvif *vif)
+{
+	struct xenvif_rx_protocol0 *p0 = &vif->rx.p0;
+	struct xen_netif_rx_sring *sring;
+
+	p0->rx_req_cons_peek = 0;
+
+	sring = (struct xen_netif_rx_sring *)vif->rx_comms.ring_area->addr;
+	BACK_RING_INIT(&p0->back, sring, PAGE_SIZE * vif->rx_comms.nr_handles);
+
+	return 0;
+}
+
+void xenvif_p0_start_xmit(struct xenvif *vif, struct sk_buff *skb)
+{
+	struct net_device *dev = vif->dev;
+
+	/* Drop the packet if there is no carrier */
+	if (unlikely(!xenvif_schedulable(vif)))
+		goto drop;
+
+	/* Drop the packet if the target domain has no receive buffers. */
+	if (unlikely(xenvif_rx_ring_full(vif)))
+		goto drop;
+
+	/* Reserve ring slots for the worst-case number of fragments. */
+	vif->rx.p0.rx_req_cons_peek += xenvif_count_skb_slots(vif, skb);
+
+	if (vif->can_queue && xenvif_must_stop_queue(vif))
+		netif_stop_queue(dev);
+
+	xenvif_queue_tx_skb(vif, skb);
+
+	return;
+
+drop:
+	vif->dev->stats.tx_dropped++;
+	dev_kfree_skb(skb);
+}
+
+void xenvif_p0_teardown(struct xenvif *vif)
+{
+	/* Nothing to teardown, relax */
+}
+
+void xenvif_p0_event(struct xenvif *vif)
+{
+	if (!xenvif_rx_ring_full(vif))
+		netif_wake_queue(vif->dev);
+}
+
+void xenvif_p0_action(struct xenvif *vif)
+{
+	xenvif_rx_action(vif);
+}
diff --git a/drivers/net/xen-netback/xenvif_rx_protocol0.h b/drivers/net/xen-netback/xenvif_rx_protocol0.h
new file mode 100644
index 0000000..aceb2ec
--- /dev/null
+++ b/drivers/net/xen-netback/xenvif_rx_protocol0.h
@@ -0,0 +1,53 @@
+/*
+ * netback rx protocol 0 implementation.
+ *
+ * Copyright (c) 2012, Citrix Systems Inc.
+ *
+ * Author: Wei Liu <wei.liu2@citrix.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version 2
+ * as published by the Free Software Foundation; or, when distributed
+ * separately from the Linux kernel or incorporated into other
+ * software packages, subject to the following license:
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this source file (the "Software"), to deal in the Software without
+ * restriction, including without limitation the rights to use, copy, modify,
+ * merge, publish, distribute, sublicense, and/or sell copies of the Software,
+ * and to permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+ * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#ifndef __XENVIF_RX_PROTOCOL0_H__
+#define __XENVIF_RX_PROTOCOL0_H__
+
+struct xenvif_rx_protocol0 {
+	struct xen_netif_rx_back_ring back;
+	/*
+	 * Allow xenvif_start_xmit() to peek ahead in the rx request
+	 * ring.  This is a prediction of what rx_req_cons will be
+	 * once all queued skbs are put on the ring.
+	 */
+	RING_IDX rx_req_cons_peek;
+};
+
+
+int  xenvif_p0_setup(struct xenvif *vif);
+void xenvif_p0_start_xmit(struct xenvif *vif, struct sk_buff *skb);
+void xenvif_p0_teardown(struct xenvif *vif);
+void xenvif_p0_event(struct xenvif *vif);
+void xenvif_p0_action(struct xenvif *vif);
+
+#endif /* __XENVIF_RX_PROTOCOL0_H__ */
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 14/16] netback: split event channels support
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (12 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 13/16] netback: stub for multi receive protocol support Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-31 10:37   ` Ian Campbell
  2012-01-30 14:45 ` [RFC PATCH V3 15/16] netfront: multi page ring support Wei Liu
  2012-01-30 14:45 ` [RFC PATCH V3 16/16] netfront: split event channels support Wei Liu
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Originally, netback and netfront only use one event channel to do tx /
rx notification. This may cause unnecessary wake-up of NAPI / kthread.

When guest tx is completed, netback will only notify tx_irq.

Also modify xenvif_protocol0 to reflect this change. Rx protocol
only notifies rx_irq.

If split-event-channels feature is not activated, rx_irq = tx_irq, so
RX protocol will just work as expected.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netback/common.h              |    9 ++-
 drivers/net/xen-netback/interface.c           |   90 ++++++++++++++++++++-----
 drivers/net/xen-netback/netback.c             |    2 +-
 drivers/net/xen-netback/xenbus.c              |   52 ++++++++++++---
 drivers/net/xen-netback/xenvif_rx_protocol0.c |    2 +-
 5 files changed, 123 insertions(+), 32 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index f3d95b3..376f0bf 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -100,8 +100,10 @@ struct xenvif {
 
 	u8               fe_dev_addr[6];
 
-	/* Physical parameters of the comms window. */
-	unsigned int     irq;
+	/* when split_irq == 0, only use tx_irq */
+	int              split_irq;
+	unsigned int     tx_irq;
+	unsigned int     rx_irq;
 
 	/* The shared tx ring and index. */
 	struct xen_netif_tx_back_ring tx;
@@ -162,7 +164,8 @@ struct xenvif *xenvif_alloc(struct device *parent,
 int xenvif_connect(struct xenvif *vif,
 		   unsigned long tx_ring_ref[], unsigned int tx_ring_order,
 		   unsigned long rx_ring_ref[], unsigned int rx_ring_order,
-		   unsigned int evtchn, unsigned int rx_protocol);
+		   unsigned int evtchn[], int split_evtchn,
+		   unsigned int rx_protocol);
 void xenvif_disconnect(struct xenvif *vif);
 
 int xenvif_xenbus_init(void);
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 0f05f03..afccd5d 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -46,15 +46,31 @@ int xenvif_schedulable(struct xenvif *vif)
 	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
 }
 
-static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
+static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
+{
+	struct xenvif *vif = dev_id;
+
+	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
+		napi_schedule(&vif->napi);
+
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;
 
 	if (xenvif_schedulable(vif) && vif->event != NULL)
 		vif->event(vif);
 
-	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
-		napi_schedule(&vif->napi);
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
+{
+	xenvif_tx_interrupt(0, dev_id);
+
+	xenvif_rx_interrupt(0, dev_id);
 
 	return IRQ_HANDLED;
 }
@@ -118,14 +134,24 @@ static struct net_device_stats *xenvif_get_stats(struct net_device *dev)
 static void xenvif_up(struct xenvif *vif)
 {
 	napi_enable(&vif->napi);
-	enable_irq(vif->irq);
+	if (!vif->split_irq)
+		enable_irq(vif->tx_irq);
+	else {
+		enable_irq(vif->tx_irq);
+		enable_irq(vif->rx_irq);
+	}
 	xenvif_check_rx_xenvif(vif);
 }
 
 static void xenvif_down(struct xenvif *vif)
 {
 	napi_disable(&vif->napi);
-	disable_irq(vif->irq);
+	if (!vif->split_irq)
+		disable_irq(vif->tx_irq);
+	else {
+		disable_irq(vif->tx_irq);
+		disable_irq(vif->rx_irq);
+	}
 }
 
 static int xenvif_open(struct net_device *dev)
@@ -308,13 +334,14 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 int xenvif_connect(struct xenvif *vif,
 		   unsigned long tx_ring_ref[], unsigned int tx_ring_ref_count,
 		   unsigned long rx_ring_ref[], unsigned int rx_ring_ref_count,
-		   unsigned int evtchn, unsigned int rx_protocol)
+		   unsigned int evtchn[], int split_evtchn,
+		   unsigned int rx_protocol)
 {
 	int err = -ENOMEM;
 	struct xen_netif_tx_sring *txs;
 
 	/* Already connected through? */
-	if (vif->irq)
+	if (vif->tx_irq)
 		return 0;
 
 	__module_get(THIS_MODULE);
@@ -345,13 +372,35 @@ int xenvif_connect(struct xenvif *vif,
 	if (vif->setup(vif))
 		goto err_rx_unmap;
 
-	err = bind_interdomain_evtchn_to_irqhandler(
-		vif->domid, evtchn, xenvif_interrupt, 0,
-		vif->dev->name, vif);
-	if (err < 0)
-		goto err_rx_unmap;
-	vif->irq = err;
-	disable_irq(vif->irq);
+	if (!split_evtchn) {
+		err = bind_interdomain_evtchn_to_irqhandler(
+			vif->domid, evtchn[0], xenvif_interrupt, 0,
+			vif->dev->name, vif);
+		if (err < 0)
+			goto err_rx_unmap;
+		vif->tx_irq = vif->rx_irq = err;
+		disable_irq(vif->tx_irq);
+		vif->split_irq = 0;
+	} else {
+		err = bind_interdomain_evtchn_to_irqhandler(
+			vif->domid, evtchn[0], xenvif_tx_interrupt,
+			0, vif->dev->name, vif);
+		if (err < 0)
+			goto err_rx_unmap;
+		vif->tx_irq = err;
+		disable_irq(vif->tx_irq);
+
+		err = bind_interdomain_evtchn_to_irqhandler(
+			vif->domid, evtchn[1], xenvif_rx_interrupt,
+			0, vif->dev->name, vif);
+		if (err < 0) {
+			unbind_from_irqhandler(vif->tx_irq, vif);
+			goto err_rx_unmap;
+		}
+		vif->rx_irq = err;
+		disable_irq(vif->rx_irq);
+		vif->split_irq = 1;
+	}
 
 	init_waitqueue_head(&vif->wq);
 	vif->task = kthread_create(xenvif_kthread,
@@ -376,7 +425,12 @@ int xenvif_connect(struct xenvif *vif,
 
 	return 0;
 err_unbind:
-	unbind_from_irqhandler(vif->irq, vif);
+	if (!vif->split_irq)
+		unbind_from_irqhandler(vif->tx_irq, vif);
+	else {
+		unbind_from_irqhandler(vif->tx_irq, vif);
+		unbind_from_irqhandler(vif->rx_irq, vif);
+	}
 err_rx_unmap:
 	xenvif_unmap_frontend_rings(&vif->rx_comms);
 err_tx_unmap:
@@ -406,10 +460,12 @@ void xenvif_disconnect(struct xenvif *vif)
 
 	del_timer_sync(&vif->credit_timeout);
 
-	if (vif->irq) {
-		unbind_from_irqhandler(vif->irq, vif);
+	if (vif->tx_irq) {
+		unbind_from_irqhandler(vif->tx_irq, vif);
 		need_module_put = 1;
 	}
+	if (vif->split_irq && vif->rx_irq)
+		unbind_from_irqhandler(vif->rx_irq, vif);
 
 	unregister_netdev(vif->dev);
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 2ea43d4..f4ec292 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -895,7 +895,7 @@ static void make_tx_response(struct xenvif *vif,
 	vif->tx.rsp_prod_pvt = ++i;
 	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&vif->tx, notify);
 	if (notify)
-		notify_remote_via_irq(vif->irq);
+		notify_remote_via_irq(vif->tx_irq);
 }
 
 static inline int rx_work_todo(struct xenvif *vif)
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 4067286..c5a3b27 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -131,6 +131,14 @@ static int netback_probe(struct xenbus_device *dev,
 			goto abort_transaction;
 		}
 
+		err = xenbus_printf(xbt, dev->nodename,
+				    "split-event-channels",
+				    "%u", 1);
+		if (err) {
+			message = "writing split-event-channels";
+			goto abort_transaction;
+		}
+
 		err = xenbus_transaction_end(xbt, 0);
 	} while (err == -EAGAIN);
 
@@ -408,7 +416,7 @@ static int connect_rings(struct backend_info *be)
 {
 	struct xenvif *vif = be->vif;
 	struct xenbus_device *dev = be->dev;
-	unsigned int evtchn, rx_copy;
+	unsigned int evtchn[2], split_evtchn, rx_copy;
 	int err;
 	int val;
 	unsigned long tx_ring_ref[NETBK_MAX_RING_PAGES];
@@ -418,12 +426,31 @@ static int connect_rings(struct backend_info *be)
 	unsigned int  rx_protocol;
 
 	err = xenbus_gather(XBT_NIL, dev->otherend,
-			    "event-channel", "%u", &evtchn, NULL);
+			    "event-channel", "%u", &evtchn[0], NULL);
 	if (err) {
-		xenbus_dev_fatal(dev, err,
-				 "reading %s/event-channel",
-				 dev->otherend);
-		return err;
+		err = xenbus_gather(XBT_NIL, dev->otherend,
+				    "event-channel-tx", "%u", &evtchn[0],
+				    NULL);
+		if (err) {
+			xenbus_dev_fatal(dev, err,
+					 "reading %s/event-channel-tx",
+					 dev->otherend);
+			return err;
+		}
+		err = xenbus_gather(XBT_NIL, dev->otherend,
+				    "event-channel-rx", "%u", &evtchn[1],
+				    NULL);
+		if (err) {
+			xenbus_dev_fatal(dev, err,
+					 "reading %s/event-channel-rx",
+					 dev->otherend);
+			return err;
+		}
+		split_evtchn = 1;
+		dev_info(&dev->dev, "split event channels\n");
+	} else {
+		split_evtchn = 0;
+		dev_info(&dev->dev, "single event channel\n");
 	}
 
 	err = xenbus_scanf(XBT_NIL, dev->otherend, "tx-ring-order", "%u",
@@ -565,12 +592,17 @@ static int connect_rings(struct backend_info *be)
 	err = xenvif_connect(vif,
 			     tx_ring_ref, (1U << tx_ring_order),
 			     rx_ring_ref, (1U << rx_ring_order),
-			     evtchn, rx_protocol);
+			     evtchn, split_evtchn, rx_protocol);
 	if (err) {
 		int i;
-		xenbus_dev_fatal(dev, err,
-				 "binding port %u",
-				 evtchn);
+		if (!split_evtchn)
+			xenbus_dev_fatal(dev, err,
+					 "binding port %u",
+					 evtchn[0]);
+		else
+			xenbus_dev_fatal(dev, err,
+					 "binding tx port %u, rx port %u",
+					 evtchn[0], evtchn[1]);
 		for (i = 0; i < (1U << tx_ring_order); i++)
 			xenbus_dev_fatal(dev, err,
 					 "mapping tx ring handle: %lu",
diff --git a/drivers/net/xen-netback/xenvif_rx_protocol0.c b/drivers/net/xen-netback/xenvif_rx_protocol0.c
index 3c95d65..6959a1d 100644
--- a/drivers/net/xen-netback/xenvif_rx_protocol0.c
+++ b/drivers/net/xen-netback/xenvif_rx_protocol0.c
@@ -550,7 +550,7 @@ void xenvif_rx_action(struct xenvif *vif)
 	}
 
 	if (need_to_notify)
-		notify_remote_via_irq(vif->irq);
+		notify_remote_via_irq(vif->rx_irq);
 
 	if (!skb_queue_empty(&vif->rx_queue))
 		xenvif_kick_thread(vif);
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (13 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 14/16] netback: split event channels support Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 21:39   ` [Xen-devel] " Konrad Rzeszutek Wilk
  2012-01-30 14:45 ` [RFC PATCH V3 16/16] netfront: split event channels support Wei Liu
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

Use DMA API to allocate ring pages, because we need to get machine
contiginous memory.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |  258 ++++++++++++++++++++++++++++++++------------
 1 files changed, 187 insertions(+), 71 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 01f589d..32ec212 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -66,9 +66,18 @@ struct netfront_cb {
 
 #define GRANT_INVALID_REF	0
 
-#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
-#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
-#define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
+#define XENNET_MAX_RING_PAGE_ORDER 2
+#define XENNET_MAX_RING_PAGES      (1U << XENNET_MAX_RING_PAGE_ORDER)
+
+#define NET_TX_RING_SIZE(_nr_pages)					\
+	__CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE * (_nr_pages))
+#define NET_RX_RING_SIZE(_nr_pages)					\
+	__CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE * (_nr_pages))
+
+#define XENNET_MAX_TX_RING_SIZE NET_TX_RING_SIZE(XENNET_MAX_RING_PAGES)
+#define XENNET_MAX_RX_RING_SIZE NET_RX_RING_SIZE(XENNET_MAX_RING_PAGES)
+
+#define TX_MAX_TARGET XENNET_MAX_TX_RING_SIZE
 
 struct netfront_stats {
 	u64			rx_packets;
@@ -84,12 +93,20 @@ struct netfront_info {
 
 	struct napi_struct napi;
 
+	/* Statistics */
+	struct netfront_stats __percpu *stats;
+
+	unsigned long rx_gso_checksum_fixup;
+
 	unsigned int evtchn;
 	struct xenbus_device *xbdev;
 
 	spinlock_t   tx_lock;
 	struct xen_netif_tx_front_ring tx;
-	int tx_ring_ref;
+	dma_addr_t tx_ring_dma_handle;
+	int tx_ring_ref[XENNET_MAX_RING_PAGES];
+	int tx_ring_page_order;
+	int tx_ring_pages;
 
 	/*
 	 * {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries
@@ -103,36 +120,34 @@ struct netfront_info {
 	union skb_entry {
 		struct sk_buff *skb;
 		unsigned long link;
-	} tx_skbs[NET_TX_RING_SIZE];
+	} tx_skbs[XENNET_MAX_TX_RING_SIZE];
 	grant_ref_t gref_tx_head;
-	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
+	grant_ref_t grant_tx_ref[XENNET_MAX_TX_RING_SIZE];
 	unsigned tx_skb_freelist;
 
 	spinlock_t   rx_lock ____cacheline_aligned_in_smp;
 	struct xen_netif_rx_front_ring rx;
-	int rx_ring_ref;
+	dma_addr_t rx_ring_dma_handle;
+	int rx_ring_ref[XENNET_MAX_RING_PAGES];
+	int rx_ring_page_order;
+	int rx_ring_pages;
 
 	/* Receive-ring batched refills. */
 #define RX_MIN_TARGET 8
 #define RX_DFL_MIN_TARGET 64
-#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
+#define RX_MAX_TARGET XENNET_MAX_RX_RING_SIZE
 	unsigned rx_min_target, rx_max_target, rx_target;
 	struct sk_buff_head rx_batch;
 
 	struct timer_list rx_refill_timer;
 
-	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
+	struct sk_buff *rx_skbs[XENNET_MAX_RX_RING_SIZE];
 	grant_ref_t gref_rx_head;
-	grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
-
-	unsigned long rx_pfn_array[NET_RX_RING_SIZE];
-	struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
-	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
-
-	/* Statistics */
-	struct netfront_stats __percpu *stats;
+	grant_ref_t grant_rx_ref[XENNET_MAX_RX_RING_SIZE];
 
-	unsigned long rx_gso_checksum_fixup;
+	unsigned long rx_pfn_array[XENNET_MAX_RX_RING_SIZE];
+	struct multicall_entry rx_mcl[XENNET_MAX_RX_RING_SIZE+1];
+	struct mmu_update rx_mmu[XENNET_MAX_RX_RING_SIZE];
 };
 
 struct netfront_rx_info {
@@ -170,15 +185,15 @@ static unsigned short get_id_from_freelist(unsigned *head,
 	return id;
 }
 
-static int xennet_rxidx(RING_IDX idx)
+static int xennet_rxidx(RING_IDX idx, struct netfront_info *info)
 {
-	return idx & (NET_RX_RING_SIZE - 1);
+	return idx & (NET_RX_RING_SIZE(info->rx_ring_pages) - 1);
 }
 
 static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
 					 RING_IDX ri)
 {
-	int i = xennet_rxidx(ri);
+	int i = xennet_rxidx(ri, np);
 	struct sk_buff *skb = np->rx_skbs[i];
 	np->rx_skbs[i] = NULL;
 	return skb;
@@ -187,7 +202,7 @@ static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
 static grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
 					    RING_IDX ri)
 {
-	int i = xennet_rxidx(ri);
+	int i = xennet_rxidx(ri, np);
 	grant_ref_t ref = np->grant_rx_ref[i];
 	np->grant_rx_ref[i] = GRANT_INVALID_REF;
 	return ref;
@@ -300,7 +315,7 @@ no_skb:
 
 		skb->dev = dev;
 
-		id = xennet_rxidx(req_prod + i);
+		id = xennet_rxidx(req_prod + i, np);
 
 		BUG_ON(np->rx_skbs[id]);
 		np->rx_skbs[id] = skb;
@@ -596,7 +611,7 @@ static int xennet_close(struct net_device *dev)
 static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
 				grant_ref_t ref)
 {
-	int new = xennet_rxidx(np->rx.req_prod_pvt);
+	int new = xennet_rxidx(np->rx.req_prod_pvt, np);
 
 	BUG_ON(np->rx_skbs[new]);
 	np->rx_skbs[new] = skb;
@@ -1089,7 +1104,7 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
 	struct sk_buff *skb;
 	int i;
 
-	for (i = 0; i < NET_TX_RING_SIZE; i++) {
+	for (i = 0; i < NET_TX_RING_SIZE(np->tx_ring_pages); i++) {
 		/* Skip over entries which are actually freelist references */
 		if (skb_entry_is_link(&np->tx_skbs[i]))
 			continue;
@@ -1123,7 +1138,7 @@ static void xennet_release_rx_bufs(struct netfront_info *np)
 
 	spin_lock_bh(&np->rx_lock);
 
-	for (id = 0; id < NET_RX_RING_SIZE; id++) {
+	for (id = 0; id < NET_RX_RING_SIZE(np->rx_ring_pages); id++) {
 		ref = np->grant_rx_ref[id];
 		if (ref == GRANT_INVALID_REF) {
 			unused++;
@@ -1305,13 +1320,13 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev
 
 	/* Initialise tx_skbs as a free chain containing every entry. */
 	np->tx_skb_freelist = 0;
-	for (i = 0; i < NET_TX_RING_SIZE; i++) {
+	for (i = 0; i < XENNET_MAX_TX_RING_SIZE; i++) {
 		skb_entry_set_link(&np->tx_skbs[i], i+1);
 		np->grant_tx_ref[i] = GRANT_INVALID_REF;
 	}
 
 	/* Clear out rx_skbs */
-	for (i = 0; i < NET_RX_RING_SIZE; i++) {
+	for (i = 0; i < XENNET_MAX_RX_RING_SIZE; i++) {
 		np->rx_skbs[i] = NULL;
 		np->grant_rx_ref[i] = GRANT_INVALID_REF;
 	}
@@ -1409,15 +1424,11 @@ static int __devinit netfront_probe(struct xenbus_device *dev,
 	return err;
 }
 
-static void xennet_end_access(int ref, void *page)
-{
-	/* This frees the page as a side-effect */
-	if (ref != GRANT_INVALID_REF)
-		gnttab_end_foreign_access(ref, 0, (unsigned long)page);
-}
-
 static void xennet_disconnect_backend(struct netfront_info *info)
 {
+	int i;
+	struct xenbus_device *dev = info->xbdev;
+
 	/* Stop old i/f to prevent errors whilst we rebuild the state. */
 	spin_lock_bh(&info->rx_lock);
 	spin_lock_irq(&info->tx_lock);
@@ -1429,12 +1440,24 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 		unbind_from_irqhandler(info->netdev->irq, info->netdev);
 	info->evtchn = info->netdev->irq = 0;
 
-	/* End access and free the pages */
-	xennet_end_access(info->tx_ring_ref, info->tx.sring);
-	xennet_end_access(info->rx_ring_ref, info->rx.sring);
+	for (i = 0; i < info->tx_ring_pages; i++) {
+		int ref = info->tx_ring_ref[i];
+		gnttab_end_foreign_access_ref(ref, 0);
+		info->tx_ring_ref[i] = GRANT_INVALID_REF;
+	}
+	dma_free_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
+			  (void *)info->tx.sring,
+			  info->tx_ring_dma_handle);
+
+	for (i = 0; i < info->rx_ring_pages; i++) {
+		int ref = info->rx_ring_ref[i];
+		gnttab_end_foreign_access_ref(ref, 0);
+		info->rx_ring_ref[i] = GRANT_INVALID_REF;
+	}
+	dma_free_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
+			  (void *)info->rx.sring,
+			  info->rx_ring_dma_handle);
 
-	info->tx_ring_ref = GRANT_INVALID_REF;
-	info->rx_ring_ref = GRANT_INVALID_REF;
 	info->tx.sring = NULL;
 	info->rx.sring = NULL;
 }
@@ -1483,9 +1506,13 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 	struct xen_netif_rx_sring *rxs;
 	int err;
 	struct net_device *netdev = info->netdev;
+	unsigned int max_tx_ring_page_order, max_rx_ring_page_order;
+	int i, j;
 
-	info->tx_ring_ref = GRANT_INVALID_REF;
-	info->rx_ring_ref = GRANT_INVALID_REF;
+	for (i = 0; i < XENNET_MAX_RING_PAGES; i++) {
+		info->tx_ring_ref[i] = GRANT_INVALID_REF;
+		info->rx_ring_ref[i] = GRANT_INVALID_REF;
+	}
 	info->rx.sring = NULL;
 	info->tx.sring = NULL;
 	netdev->irq = 0;
@@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 		goto fail;
 	}
 
-	txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "max-tx-ring-page-order", "%u",
+			   &max_tx_ring_page_order);
+	if (err < 0) {
+		info->tx_ring_page_order = 0;
+		dev_info(&dev->dev, "single tx ring\n");
+	} else {
+		info->tx_ring_page_order = max_tx_ring_page_order;
+		dev_info(&dev->dev, "multi page tx ring, order = %d\n",
+			 max_tx_ring_page_order);
+	}
+	info->tx_ring_pages = (1U << info->tx_ring_page_order);
+
+	txs = (struct xen_netif_tx_sring *)
+		dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
+				   &info->tx_ring_dma_handle,
+				   __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
 	if (!txs) {
 		err = -ENOMEM;
 		xenbus_dev_fatal(dev, err, "allocating tx ring page");
 		goto fail;
 	}
 	SHARED_RING_INIT(txs);
-	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
+	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE * info->tx_ring_pages);
+
+	for (i = 0; i < info->tx_ring_pages; i++) {
+		void *addr = (void *)((unsigned long)txs + PAGE_SIZE * i);
+		err = xenbus_grant_ring(dev, virt_to_mfn(addr));
+		if (err < 0)
+			goto grant_tx_ring_fail;
+		info->tx_ring_ref[i] = err;
+	}
 
-	err = xenbus_grant_ring(dev, virt_to_mfn(txs));
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "max-rx-ring-page-order", "%u",
+			   &max_rx_ring_page_order);
 	if (err < 0) {
-		free_page((unsigned long)txs);
-		goto fail;
+		info->rx_ring_page_order = 0;
+		dev_info(&dev->dev, "single rx ring\n");
+	} else {
+		info->rx_ring_page_order = max_rx_ring_page_order;
+		dev_info(&dev->dev, "multi page rx ring, order = %d\n",
+			 max_rx_ring_page_order);
 	}
+	info->rx_ring_pages = (1U << info->rx_ring_page_order);
 
-	info->tx_ring_ref = err;
-	rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
+	rxs = (struct xen_netif_rx_sring *)
+		dma_alloc_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
+				   &info->rx_ring_dma_handle,
+				   __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
 	if (!rxs) {
 		err = -ENOMEM;
 		xenbus_dev_fatal(dev, err, "allocating rx ring page");
-		goto fail;
+		goto alloc_rx_ring_fail;
 	}
 	SHARED_RING_INIT(rxs);
-	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
-
-	err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
-	if (err < 0) {
-		free_page((unsigned long)rxs);
-		goto fail;
+	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE * info->rx_ring_pages);
+
+	for (j = 0; j < info->rx_ring_pages; j++) {
+		void *addr = (void *)((unsigned long)rxs + PAGE_SIZE * j);
+		err = xenbus_grant_ring(dev, virt_to_mfn(addr));
+		if (err < 0)
+			goto grant_rx_ring_fail;
+		info->rx_ring_ref[j] = err;
 	}
-	info->rx_ring_ref = err;
 
 	err = xenbus_alloc_evtchn(dev, &info->evtchn);
 	if (err)
-		goto fail;
+		goto alloc_evtchn_fail;
 
 	err = bind_evtchn_to_irqhandler(info->evtchn, xennet_interrupt,
 					0, netdev->name, netdev);
 	if (err < 0)
-		goto fail;
+		goto bind_fail;
 	netdev->irq = err;
+
 	return 0;
 
- fail:
+bind_fail:
+	xenbus_free_evtchn(dev, info->evtchn);
+alloc_evtchn_fail:
+	for (; j >= 0; j--) {
+		int ref = info->rx_ring_ref[j];
+		gnttab_end_foreign_access_ref(ref, 0);
+		info->rx_ring_ref[j] = GRANT_INVALID_REF;
+	}
+grant_rx_ring_fail:
+	dma_free_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
+			  (void *)rxs, info->rx_ring_dma_handle);
+alloc_rx_ring_fail:
+	for (; i >= 0; i--) {
+		int ref = info->tx_ring_ref[i];
+		gnttab_end_foreign_access_ref(ref, 0);
+		info->tx_ring_ref[i] = GRANT_INVALID_REF;
+	}
+grant_tx_ring_fail:
+	dma_free_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
+			  (void *)txs, info->tx_ring_dma_handle);
+fail:
 	return err;
 }
 
@@ -1550,6 +1632,7 @@ static int talk_to_netback(struct xenbus_device *dev,
 	const char *message;
 	struct xenbus_transaction xbt;
 	int err;
+	int i;
 
 	/* Create shared ring, alloc event channel. */
 	err = setup_netfront(dev, info);
@@ -1563,18 +1646,50 @@ again:
 		goto destroy_ring;
 	}
 
-	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
-			    info->tx_ring_ref);
-	if (err) {
-		message = "writing tx ring-ref";
-		goto abort_transaction;
+	if (info->tx_ring_page_order == 0)
+		err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
+				    info->tx_ring_ref[0]);
+	else {
+		err = xenbus_printf(xbt, dev->nodename, "tx-ring-order", "%u",
+				    info->tx_ring_page_order);
+		if (err) {
+			message = "writing tx ring-ref";
+			goto abort_transaction;
+		}
+		for (i = 0; i < info->tx_ring_pages; i++) {
+			char name[sizeof("tx-ring-ref")+2];
+			snprintf(name, sizeof(name), "tx-ring-ref%u", i);
+			err = xenbus_printf(xbt, dev->nodename, name, "%u",
+					    info->tx_ring_ref[i]);
+			if (err) {
+				message = "writing tx ring-ref";
+				goto abort_transaction;
+			}
+		}
 	}
-	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
-			    info->rx_ring_ref);
-	if (err) {
-		message = "writing rx ring-ref";
-		goto abort_transaction;
+
+	if (info->rx_ring_page_order == 0)
+		err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
+				    info->rx_ring_ref[0]);
+	else {
+		err = xenbus_printf(xbt, dev->nodename, "rx-ring-order", "%u",
+				    info->rx_ring_page_order);
+		if (err) {
+			message = "writing tx ring-ref";
+			goto abort_transaction;
+		}
+		for (i = 0; i < info->rx_ring_pages; i++) {
+			char name[sizeof("rx-ring-ref")+2];
+			snprintf(name, sizeof(name), "rx-ring-ref%u", i);
+			err = xenbus_printf(xbt, dev->nodename, name, "%u",
+					    info->rx_ring_ref[i]);
+			if (err) {
+				message = "writing rx ring-ref";
+				goto abort_transaction;
+			}
+		}
 	}
+
 	err = xenbus_printf(xbt, dev->nodename,
 			    "event-channel", "%u", info->evtchn);
 	if (err) {
@@ -1661,7 +1776,8 @@ static int xennet_connect(struct net_device *dev)
 	xennet_release_tx_bufs(np);
 
 	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
-	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
+	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE(np->rx_ring_pages);
+	     i++) {
 		skb_frag_t *frag;
 		const struct page *page;
 		if (!np->rx_skbs[i])
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [RFC PATCH V3 16/16] netfront: split event channels support.
  2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
                   ` (14 preceding siblings ...)
  2012-01-30 14:45 ` [RFC PATCH V3 15/16] netfront: multi page ring support Wei Liu
@ 2012-01-30 14:45 ` Wei Liu
  2012-01-30 21:25   ` [Xen-devel] " Konrad Rzeszutek Wilk
  15 siblings, 1 reply; 59+ messages in thread
From: Wei Liu @ 2012-01-30 14:45 UTC (permalink / raw)
  To: netdev, xen-devel; +Cc: ian.campbell, konrad.wilk, Wei Liu

If this feature is not activated, rx_irq = tx_irq. See corresponding
netback change log for details.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
 drivers/net/xen-netfront.c |  147 ++++++++++++++++++++++++++++++++++----------
 1 files changed, 115 insertions(+), 32 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 32ec212..72c0429 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -98,7 +98,9 @@ struct netfront_info {
 
 	unsigned long rx_gso_checksum_fixup;
 
-	unsigned int evtchn;
+	unsigned int split_evtchn;
+	unsigned int tx_evtchn, rx_evtchn;
+	unsigned int tx_irq, rx_irq;
 	struct xenbus_device *xbdev;
 
 	spinlock_t   tx_lock;
@@ -344,7 +346,7 @@ no_skb:
  push:
 	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->rx, notify);
 	if (notify)
-		notify_remote_via_irq(np->netdev->irq);
+		notify_remote_via_irq(np->rx_irq);
 }
 
 static int xennet_open(struct net_device *dev)
@@ -577,7 +579,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->tx, notify);
 	if (notify)
-		notify_remote_via_irq(np->netdev->irq);
+		notify_remote_via_irq(np->tx_irq);
 
 	u64_stats_update_begin(&stats->syncp);
 	stats->tx_bytes += skb->len;
@@ -1242,22 +1244,35 @@ static int xennet_set_features(struct net_device *dev, u32 features)
 	return 0;
 }
 
-static irqreturn_t xennet_interrupt(int irq, void *dev_id)
+static irqreturn_t xennet_tx_interrupt(int irq, void *dev_id)
 {
-	struct net_device *dev = dev_id;
-	struct netfront_info *np = netdev_priv(dev);
+	struct netfront_info *np = dev_id;
+	struct net_device *dev = np->netdev;
 	unsigned long flags;
 
 	spin_lock_irqsave(&np->tx_lock, flags);
+	xennet_tx_buf_gc(dev);
+	spin_unlock_irqrestore(&np->tx_lock, flags);
 
-	if (likely(netif_carrier_ok(dev))) {
-		xennet_tx_buf_gc(dev);
-		/* Under tx_lock: protects access to rx shared-ring indexes. */
-		if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
-			napi_schedule(&np->napi);
-	}
+	return IRQ_HANDLED;
+}
 
-	spin_unlock_irqrestore(&np->tx_lock, flags);
+static irqreturn_t xennet_rx_interrupt(int irq, void *dev_id)
+{
+	struct netfront_info *np = dev_id;
+	struct net_device *dev = np->netdev;
+
+	if (likely(netif_carrier_ok(dev)) &&
+	    RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
+		napi_schedule(&np->napi);
+
+	return IRQ_HANDLED;
+}
+static irqreturn_t xennet_interrupt(int irq, void *dev_id)
+{
+	xennet_tx_interrupt(0, dev_id);
+
+	xennet_rx_interrupt(0, dev_id);
 
 	return IRQ_HANDLED;
 }
@@ -1436,9 +1451,14 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 	spin_unlock_irq(&info->tx_lock);
 	spin_unlock_bh(&info->rx_lock);
 
-	if (info->netdev->irq)
-		unbind_from_irqhandler(info->netdev->irq, info->netdev);
-	info->evtchn = info->netdev->irq = 0;
+	if (info->tx_irq && (info->tx_irq == info->rx_irq))
+		unbind_from_irqhandler(info->tx_irq, info);
+	if (info->tx_irq && (info->tx_irq != info->rx_irq)) {
+		unbind_from_irqhandler(info->tx_irq, info);
+		unbind_from_irqhandler(info->rx_irq, info);
+	}
+	info->tx_evtchn = info->tx_irq = 0;
+	info->rx_evtchn = info->rx_irq = 0;
 
 	for (i = 0; i < info->tx_ring_pages; i++) {
 		int ref = info->tx_ring_ref[i];
@@ -1507,6 +1527,7 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 	int err;
 	struct net_device *netdev = info->netdev;
 	unsigned int max_tx_ring_page_order, max_rx_ring_page_order;
+	unsigned int split_evtchn;
 	int i, j;
 
 	for (i = 0; i < XENNET_MAX_RING_PAGES; i++) {
@@ -1515,7 +1536,6 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 	}
 	info->rx.sring = NULL;
 	info->tx.sring = NULL;
-	netdev->irq = 0;
 
 	err = xen_net_read_mac(dev, netdev->dev_addr);
 	if (err) {
@@ -1524,6 +1544,12 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 	}
 
 	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
+			   "split-event-channels", "%u",
+			   &split_evtchn);
+	if (err < 0)
+		split_evtchn = 0;
+
+	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
 			   "max-tx-ring-page-order", "%u",
 			   &max_tx_ring_page_order);
 	if (err < 0) {
@@ -1589,20 +1615,59 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
 		info->rx_ring_ref[j] = err;
 	}
 
-	err = xenbus_alloc_evtchn(dev, &info->evtchn);
-	if (err)
-		goto alloc_evtchn_fail;
+	if (!split_evtchn) {
+		err = xenbus_alloc_evtchn(dev, &info->tx_evtchn);
+		if (err)
+			goto alloc_evtchn_fail;
 
-	err = bind_evtchn_to_irqhandler(info->evtchn, xennet_interrupt,
-					0, netdev->name, netdev);
-	if (err < 0)
-		goto bind_fail;
-	netdev->irq = err;
+		err = bind_evtchn_to_irqhandler(info->tx_evtchn,
+						xennet_interrupt,
+						0, netdev->name, info);
+		if (err < 0)
+			goto bind_fail;
+		info->rx_evtchn = info->tx_evtchn;
+		info->tx_irq = info->rx_irq = err;
+		info->split_evtchn = 0;
+		dev_info(&dev->dev, "single event channel, irq = %d\n",
+			 info->tx_irq);
+	} else {
+		err = xenbus_alloc_evtchn(dev, &info->tx_evtchn);
+		if (err)
+			goto alloc_evtchn_fail;
+		err = xenbus_alloc_evtchn(dev, &info->rx_evtchn);
+		if (err) {
+			xenbus_free_evtchn(dev, info->tx_evtchn);
+			goto alloc_evtchn_fail;
+		}
+		err = bind_evtchn_to_irqhandler(info->tx_evtchn,
+						xennet_tx_interrupt,
+						0, netdev->name, info);
+		if (err < 0)
+			goto bind_fail;
+		info->tx_irq = err;
+		err = bind_evtchn_to_irqhandler(info->rx_evtchn,
+						xennet_rx_interrupt,
+						0, netdev->name, info);
+		if (err < 0) {
+			unbind_from_irqhandler(info->tx_irq, info);
+			goto bind_fail;
+		}
+		info->rx_irq = err;
+		info->split_evtchn = 1;
+		dev_info(&dev->dev, "split event channels,"
+			 " tx_irq = %d, rx_irq = %d\n",
+			 info->tx_irq, info->rx_irq);
+	}
 
 	return 0;
 
 bind_fail:
-	xenbus_free_evtchn(dev, info->evtchn);
+	if (!split_evtchn)
+		xenbus_free_evtchn(dev, info->tx_evtchn);
+	else {
+		xenbus_free_evtchn(dev, info->tx_evtchn);
+		xenbus_free_evtchn(dev, info->rx_evtchn);
+	}
 alloc_evtchn_fail:
 	for (; j >= 0; j--) {
 		int ref = info->rx_ring_ref[j];
@@ -1690,11 +1755,27 @@ again:
 		}
 	}
 
-	err = xenbus_printf(xbt, dev->nodename,
-			    "event-channel", "%u", info->evtchn);
-	if (err) {
-		message = "writing event-channel";
-		goto abort_transaction;
+
+	if (!info->split_evtchn) {
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel", "%u", info->tx_evtchn);
+		if (err) {
+			message = "writing event-channel";
+			goto abort_transaction;
+		}
+	} else {
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel-tx", "%u", info->tx_evtchn);
+		if (err) {
+			message = "writing event-channel-tx";
+			goto abort_transaction;
+		}
+		err = xenbus_printf(xbt, dev->nodename,
+				    "event-channel-rx", "%u", info->rx_evtchn);
+		if (err) {
+			message = "writing event-channel-rx";
+			goto abort_transaction;
+		}
 	}
 
 	err = xenbus_printf(xbt, dev->nodename, "request-rx-copy", "%u",
@@ -1808,7 +1889,9 @@ static int xennet_connect(struct net_device *dev)
 	 * packets.
 	 */
 	netif_carrier_on(np->netdev);
-	notify_remote_via_irq(np->netdev->irq);
+	notify_remote_via_irq(np->tx_irq);
+	if (np->split_evtchn)
+		notify_remote_via_irq(np->rx_irq);
 	xennet_tx_buf_gc(dev);
 	xennet_alloc_rx_buffers(dev);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-30 14:45 ` [RFC PATCH V3 12/16] netback: multi-page ring support Wei Liu
@ 2012-01-30 16:35     ` Jan Beulich
  0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2012-01-30 16:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, konrad.wilk, netdev

>>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> -int xenvif_map_frontend_rings(struct xenvif *vif,
> -			      grant_ref_t tx_ring_ref,
> -			      grant_ref_t rx_ring_ref)
> +int xenvif_map_frontend_rings(struct xen_comms *comms,
> +			      int domid,
> +			      unsigned long ring_ref[],
> +			      unsigned int  ring_ref_count)
>  {
> -	void *addr;
> -	struct xen_netif_tx_sring *txs;
> -	struct xen_netif_rx_sring *rxs;
> -
> -	int err = -ENOMEM;
> +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> +	unsigned int i;
> +	int err = 0;
>  
> -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> -				     tx_ring_ref, &addr);

Any reason why you don't just extend this function (in a prerequisite
patch) rather than open coding a common utility function (twice) here,
so that other backends (blkback!) can benefit later as well.

Jan

> -	if (err)
> -		goto err;
> +	comms->ring_area = alloc_vm_area(PAGE_SIZE * ring_ref_count, NULL);
> +	if (comms->ring_area == NULL)
> +		return -ENOMEM;
>  
> -	txs = (struct xen_netif_tx_sring *)addr;
> -	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
> +	for (i = 0; i < ring_ref_count; i++) {
> +		unsigned long addr = (unsigned long)comms->ring_area->addr +
> +			(i * PAGE_SIZE);
> +		gnttab_set_map_op(&op[i], addr, GNTMAP_host_map,
> +				  ring_ref[i], domid);
> +	}
>  
> -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> -				     rx_ring_ref, &addr);
> -	if (err)
> -		goto err;
> +	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
> +				      &op, ring_ref_count))
> +		BUG();
>  
> -	rxs = (struct xen_netif_rx_sring *)addr;
> -	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE);
> +	comms->nr_handles = ring_ref_count;
>  
> -	vif->rx_req_cons_peek = 0;
> +	for (i = 0; i < ring_ref_count; i++) {
> +		if (op[i].status != 0) {
> +			err = op[i].status;
> +			comms->shmem_handle[i] = INVALID_GRANT_HANDLE;
> +			continue;
> +		}
> +		comms->shmem_handle[i] = op[i].handle;
> +	}
>  
> -	return 0;
> +	if (err != 0)
> +		xenvif_unmap_frontend_rings(comms);
>  
> -err:
> -	xenvif_unmap_frontend_rings(vif);
>  	return err;
>  }
>  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
@ 2012-01-30 16:35     ` Jan Beulich
  0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2012-01-30 16:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, xen-devel, konrad.wilk, netdev

>>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> -int xenvif_map_frontend_rings(struct xenvif *vif,
> -			      grant_ref_t tx_ring_ref,
> -			      grant_ref_t rx_ring_ref)
> +int xenvif_map_frontend_rings(struct xen_comms *comms,
> +			      int domid,
> +			      unsigned long ring_ref[],
> +			      unsigned int  ring_ref_count)
>  {
> -	void *addr;
> -	struct xen_netif_tx_sring *txs;
> -	struct xen_netif_rx_sring *rxs;
> -
> -	int err = -ENOMEM;
> +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> +	unsigned int i;
> +	int err = 0;
>  
> -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> -				     tx_ring_ref, &addr);

Any reason why you don't just extend this function (in a prerequisite
patch) rather than open coding a common utility function (twice) here,
so that other backends (blkback!) can benefit later as well.

Jan

> -	if (err)
> -		goto err;
> +	comms->ring_area = alloc_vm_area(PAGE_SIZE * ring_ref_count, NULL);
> +	if (comms->ring_area == NULL)
> +		return -ENOMEM;
>  
> -	txs = (struct xen_netif_tx_sring *)addr;
> -	BACK_RING_INIT(&vif->tx, txs, PAGE_SIZE);
> +	for (i = 0; i < ring_ref_count; i++) {
> +		unsigned long addr = (unsigned long)comms->ring_area->addr +
> +			(i * PAGE_SIZE);
> +		gnttab_set_map_op(&op[i], addr, GNTMAP_host_map,
> +				  ring_ref[i], domid);
> +	}
>  
> -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> -				     rx_ring_ref, &addr);
> -	if (err)
> -		goto err;
> +	if (HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref,
> +				      &op, ring_ref_count))
> +		BUG();
>  
> -	rxs = (struct xen_netif_rx_sring *)addr;
> -	BACK_RING_INIT(&vif->rx, rxs, PAGE_SIZE);
> +	comms->nr_handles = ring_ref_count;
>  
> -	vif->rx_req_cons_peek = 0;
> +	for (i = 0; i < ring_ref_count; i++) {
> +		if (op[i].status != 0) {
> +			err = op[i].status;
> +			comms->shmem_handle[i] = INVALID_GRANT_HANDLE;
> +			continue;
> +		}
> +		comms->shmem_handle[i] = op[i].handle;
> +	}
>  
> -	return 0;
> +	if (err != 0)
> +		xenvif_unmap_frontend_rings(comms);
>  
> -err:
> -	xenvif_unmap_frontend_rings(vif);
>  	return err;
>  }
>  

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space.
  2012-01-30 14:45 ` [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space Wei Liu
@ 2012-01-30 16:49   ` Viral Mehta
  2012-01-30 17:05       ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Viral Mehta @ 2012-01-30 16:49 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell, konrad.wilk

Hi,

On Mon, Jan 30, 2012 at 9:45 AM, Wei Liu <wei.liu2@citrix.com> wrote:
>
>        skb_queue_head_init(&rxq);
> @@ -534,13 +527,16 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
>                        break;
>        }
>
> -       BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta));
> +       BUG_ON(npo.meta_prod > MAX_PENDING_REQS);

While you are already here,
how about having WARN_ON() ?

>
> -       if (!npo.copy_prod)
> +       if (!npo.copy_prod) {
> +               put_cpu_ptr(gco);
> +               put_cpu_ptr(m);
>                return;
> +       }
>
> -       BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
> -       ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, &netbk->grant_copy_op,
> +       BUG_ON(npo.copy_prod > (2 * XEN_NETIF_RX_RING_SIZE));

And may be here, too...

If there is serious bug, may be system will crash at a later point
But, IMHO, WARN_ON() is the correct function for drivers at least.

> +       ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gco,
>                                        npo.copy_prod);
>        BUG_ON(ret != 0);
>
-- 
Thanks,
Viral Mehta

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space.
  2012-01-30 16:49   ` Viral Mehta
@ 2012-01-30 17:05       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 17:05 UTC (permalink / raw)
  To: Viral Mehta; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell, konrad.wilk

On Mon, 2012-01-30 at 16:49 +0000, Viral Mehta wrote:
> Hi,
> 
> On Mon, Jan 30, 2012 at 9:45 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> >
> >        skb_queue_head_init(&rxq);
> > @@ -534,13 +527,16 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
> >                        break;
> >        }
> >
> > -       BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta));
> > +       BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
> 
> While you are already here,
> how about having WARN_ON() ?
> 
> >
> > -       if (!npo.copy_prod)
> > +       if (!npo.copy_prod) {
> > +               put_cpu_ptr(gco);
> > +               put_cpu_ptr(m);
> >                return;
> > +       }
> >
> > -       BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
> > -       ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, &netbk->grant_copy_op,
> > +       BUG_ON(npo.copy_prod > (2 * XEN_NETIF_RX_RING_SIZE));
> 
> And may be here, too...
> 
> If there is serious bug, may be system will crash at a later point
> But, IMHO, WARN_ON() is the correct function for drivers at least.
> 

I don't agree. Here BUG_ON means code logic has defects that we haven't
discovered. I won't take the risk of any undefined behavior.

Further more, there are a bunch of drivers use BUG_ON. A simple grep
will show you hundreds (even thousands?) of results.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space.
@ 2012-01-30 17:05       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 17:05 UTC (permalink / raw)
  To: Viral Mehta; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell, konrad.wilk

On Mon, 2012-01-30 at 16:49 +0000, Viral Mehta wrote:
> Hi,
> 
> On Mon, Jan 30, 2012 at 9:45 AM, Wei Liu <wei.liu2@citrix.com> wrote:
> >
> >        skb_queue_head_init(&rxq);
> > @@ -534,13 +527,16 @@ void xen_netbk_rx_action(struct xen_netbk *netbk)
> >                        break;
> >        }
> >
> > -       BUG_ON(npo.meta_prod > ARRAY_SIZE(netbk->meta));
> > +       BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
> 
> While you are already here,
> how about having WARN_ON() ?
> 
> >
> > -       if (!npo.copy_prod)
> > +       if (!npo.copy_prod) {
> > +               put_cpu_ptr(gco);
> > +               put_cpu_ptr(m);
> >                return;
> > +       }
> >
> > -       BUG_ON(npo.copy_prod > ARRAY_SIZE(netbk->grant_copy_op));
> > -       ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, &netbk->grant_copy_op,
> > +       BUG_ON(npo.copy_prod > (2 * XEN_NETIF_RX_RING_SIZE));
> 
> And may be here, too...
> 
> If there is serious bug, may be system will crash at a later point
> But, IMHO, WARN_ON() is the correct function for drivers at least.
> 

I don't agree. Here BUG_ON means code logic has defects that we haven't
discovered. I won't take the risk of any undefined behavior.

Further more, there are a bunch of drivers use BUG_ON. A simple grep
will show you hundreds (even thousands?) of results.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-30 16:35     ` Jan Beulich
@ 2012-01-30 17:10       ` Wei Liu
  -1 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 17:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel, konrad.wilk, netdev

On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> > -			      grant_ref_t tx_ring_ref,
> > -			      grant_ref_t rx_ring_ref)
> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> > +			      int domid,
> > +			      unsigned long ring_ref[],
> > +			      unsigned int  ring_ref_count)
> >  {
> > -	void *addr;
> > -	struct xen_netif_tx_sring *txs;
> > -	struct xen_netif_rx_sring *rxs;
> > -
> > -	int err = -ENOMEM;
> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> > +	unsigned int i;
> > +	int err = 0;
> >  
> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> > -				     tx_ring_ref, &addr);
> 
> Any reason why you don't just extend this function (in a prerequisite
> patch) rather than open coding a common utility function (twice) here,
> so that other backends (blkback!) can benefit later as well.
> 
> Jan
> 

I'm mainly focusing on netback stuffs, so the code is slightly coupled
with netback -- NETBK_MAX_RING_PAGES.

To extend xenbus_map_ring_valloc and make more generic, it requires
setting a global maximum page number limits on rings, I think it will
require further investigation and code refactor -- which I have no time
to attend to at the moment. :-/


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
@ 2012-01-30 17:10       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-30 17:10 UTC (permalink / raw)
  To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel, konrad.wilk, netdev

On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> > -			      grant_ref_t tx_ring_ref,
> > -			      grant_ref_t rx_ring_ref)
> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> > +			      int domid,
> > +			      unsigned long ring_ref[],
> > +			      unsigned int  ring_ref_count)
> >  {
> > -	void *addr;
> > -	struct xen_netif_tx_sring *txs;
> > -	struct xen_netif_rx_sring *rxs;
> > -
> > -	int err = -ENOMEM;
> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> > +	unsigned int i;
> > +	int err = 0;
> >  
> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> > -				     tx_ring_ref, &addr);
> 
> Any reason why you don't just extend this function (in a prerequisite
> patch) rather than open coding a common utility function (twice) here,
> so that other backends (blkback!) can benefit later as well.
> 
> Jan
> 

I'm mainly focusing on netback stuffs, so the code is slightly coupled
with netback -- NETBK_MAX_RING_PAGES.

To extend xenbus_map_ring_valloc and make more generic, it requires
setting a global maximum page number limits on rings, I think it will
require further investigation and code refactor -- which I have no time
to attend to at the moment. :-/


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 16/16] netfront: split event channels support.
  2012-01-30 14:45 ` [RFC PATCH V3 16/16] netfront: split event channels support Wei Liu
@ 2012-01-30 21:25   ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-30 21:25 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell

On Mon, Jan 30, 2012 at 02:45:34PM +0000, Wei Liu wrote:
> If this feature is not activated, rx_irq = tx_irq. See corresponding
> netback change log for details.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |  147 ++++++++++++++++++++++++++++++++++----------
>  1 files changed, 115 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 32ec212..72c0429 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -98,7 +98,9 @@ struct netfront_info {
>  
>  	unsigned long rx_gso_checksum_fixup;
>  
> -	unsigned int evtchn;
> +	unsigned int split_evtchn;

bool?

> +	unsigned int tx_evtchn, rx_evtchn;
> +	unsigned int tx_irq, rx_irq;
>  	struct xenbus_device *xbdev;
>  
>  	spinlock_t   tx_lock;
> @@ -344,7 +346,7 @@ no_skb:
>   push:
>  	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->rx, notify);
>  	if (notify)
> -		notify_remote_via_irq(np->netdev->irq);
> +		notify_remote_via_irq(np->rx_irq);
>  }
>  
>  static int xennet_open(struct net_device *dev)
> @@ -577,7 +579,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&np->tx, notify);
>  	if (notify)
> -		notify_remote_via_irq(np->netdev->irq);
> +		notify_remote_via_irq(np->tx_irq);
>  
>  	u64_stats_update_begin(&stats->syncp);
>  	stats->tx_bytes += skb->len;
> @@ -1242,22 +1244,35 @@ static int xennet_set_features(struct net_device *dev, u32 features)
>  	return 0;
>  }
>  
> -static irqreturn_t xennet_interrupt(int irq, void *dev_id)
> +static irqreturn_t xennet_tx_interrupt(int irq, void *dev_id)
>  {
> -	struct net_device *dev = dev_id;
> -	struct netfront_info *np = netdev_priv(dev);
> +	struct netfront_info *np = dev_id;
> +	struct net_device *dev = np->netdev;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&np->tx_lock, flags);
> +	xennet_tx_buf_gc(dev);
> +	spin_unlock_irqrestore(&np->tx_lock, flags);
>  
> -	if (likely(netif_carrier_ok(dev))) {
> -		xennet_tx_buf_gc(dev);
> -		/* Under tx_lock: protects access to rx shared-ring indexes. */
> -		if (RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
> -			napi_schedule(&np->napi);
> -	}
> +	return IRQ_HANDLED;
> +}
>  
> -	spin_unlock_irqrestore(&np->tx_lock, flags);
> +static irqreturn_t xennet_rx_interrupt(int irq, void *dev_id)
> +{
> +	struct netfront_info *np = dev_id;
> +	struct net_device *dev = np->netdev;
> +
> +	if (likely(netif_carrier_ok(dev)) &&
> +	    RING_HAS_UNCONSUMED_RESPONSES(&np->rx))
> +		napi_schedule(&np->napi);
> +
> +	return IRQ_HANDLED;
> +}
> +static irqreturn_t xennet_interrupt(int irq, void *dev_id)
> +{
> +	xennet_tx_interrupt(0, dev_id);
> +
> +	xennet_rx_interrupt(0, dev_id);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -1436,9 +1451,14 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>  	spin_unlock_irq(&info->tx_lock);
>  	spin_unlock_bh(&info->rx_lock);
>  
> -	if (info->netdev->irq)
> -		unbind_from_irqhandler(info->netdev->irq, info->netdev);
> -	info->evtchn = info->netdev->irq = 0;
> +	if (info->tx_irq && (info->tx_irq == info->rx_irq))
> +		unbind_from_irqhandler(info->tx_irq, info);
> +	if (info->tx_irq && (info->tx_irq != info->rx_irq)) {
> +		unbind_from_irqhandler(info->tx_irq, info);
> +		unbind_from_irqhandler(info->rx_irq, info);
> +	}
> +	info->tx_evtchn = info->tx_irq = 0;
> +	info->rx_evtchn = info->rx_irq = 0;
>  
>  	for (i = 0; i < info->tx_ring_pages; i++) {
>  		int ref = info->tx_ring_ref[i];
> @@ -1507,6 +1527,7 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
>  	int err;
>  	struct net_device *netdev = info->netdev;
>  	unsigned int max_tx_ring_page_order, max_rx_ring_page_order;
> +	unsigned int split_evtchn;
>  	int i, j;
>  
>  	for (i = 0; i < XENNET_MAX_RING_PAGES; i++) {
> @@ -1515,7 +1536,6 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
>  	}
>  	info->rx.sring = NULL;
>  	info->tx.sring = NULL;
> -	netdev->irq = 0;
>  
>  	err = xen_net_read_mac(dev, netdev->dev_addr);
>  	if (err) {
> @@ -1524,6 +1544,12 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
>  	}
>  
>  	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> +			   "split-event-channels", "%u",


We don't want to call them 'feature-split-event-channels' ?

> +			   &split_evtchn);
> +	if (err < 0)
> +		split_evtchn = 0;
> +
> +	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>  			   "max-tx-ring-page-order", "%u",
>  			   &max_tx_ring_page_order);
>  	if (err < 0) {
> @@ -1589,20 +1615,59 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
>  		info->rx_ring_ref[j] = err;
>  	}
>  
> -	err = xenbus_alloc_evtchn(dev, &info->evtchn);
> -	if (err)
> -		goto alloc_evtchn_fail;
> +	if (!split_evtchn) {

Why not just move most of the code that deals with this
allocation in two seperate functions: setup_netfront_split
and setup_netfront_generic ?


> +		err = xenbus_alloc_evtchn(dev, &info->tx_evtchn);
> +		if (err)
> +			goto alloc_evtchn_fail;
>  
> -	err = bind_evtchn_to_irqhandler(info->evtchn, xennet_interrupt,
> -					0, netdev->name, netdev);
> -	if (err < 0)
> -		goto bind_fail;
> -	netdev->irq = err;
> +		err = bind_evtchn_to_irqhandler(info->tx_evtchn,
> +						xennet_interrupt,
> +						0, netdev->name, info);
> +		if (err < 0)
> +			goto bind_fail;
> +		info->rx_evtchn = info->tx_evtchn;
> +		info->tx_irq = info->rx_irq = err;
> +		info->split_evtchn = 0;
> +		dev_info(&dev->dev, "single event channel, irq = %d\n",
> +			 info->tx_irq);
> +	} else {
> +		err = xenbus_alloc_evtchn(dev, &info->tx_evtchn);
> +		if (err)
> +			goto alloc_evtchn_fail;
> +		err = xenbus_alloc_evtchn(dev, &info->rx_evtchn);
> +		if (err) {
> +			xenbus_free_evtchn(dev, info->tx_evtchn);
> +			goto alloc_evtchn_fail;
> +		}
> +		err = bind_evtchn_to_irqhandler(info->tx_evtchn,
> +						xennet_tx_interrupt,
> +						0, netdev->name, info);
> +		if (err < 0)
> +			goto bind_fail;
> +		info->tx_irq = err;
> +		err = bind_evtchn_to_irqhandler(info->rx_evtchn,
> +						xennet_rx_interrupt,
> +						0, netdev->name, info);
> +		if (err < 0) {
> +			unbind_from_irqhandler(info->tx_irq, info);
> +			goto bind_fail;
> +		}
> +		info->rx_irq = err;
> +		info->split_evtchn = 1;
> +		dev_info(&dev->dev, "split event channels,"
> +			 " tx_irq = %d, rx_irq = %d\n",
> +			 info->tx_irq, info->rx_irq);
> +	}
>  
>  	return 0;
>  
>  bind_fail:
> -	xenbus_free_evtchn(dev, info->evtchn);
> +	if (!split_evtchn)
> +		xenbus_free_evtchn(dev, info->tx_evtchn);
> +	else {
> +		xenbus_free_evtchn(dev, info->tx_evtchn);
> +		xenbus_free_evtchn(dev, info->rx_evtchn);
> +	}
>  alloc_evtchn_fail:
>  	for (; j >= 0; j--) {
>  		int ref = info->rx_ring_ref[j];
> @@ -1690,11 +1755,27 @@ again:
>  		}
>  	}
>  
> -	err = xenbus_printf(xbt, dev->nodename,
> -			    "event-channel", "%u", info->evtchn);
> -	if (err) {
> -		message = "writing event-channel";
> -		goto abort_transaction;
> +
> +	if (!info->split_evtchn) {
> +		err = xenbus_printf(xbt, dev->nodename,
> +				    "event-channel", "%u", info->tx_evtchn);
> +		if (err) {
> +			message = "writing event-channel";
> +			goto abort_transaction;
> +		}
> +	} else {
> +		err = xenbus_printf(xbt, dev->nodename,
> +				    "event-channel-tx", "%u", info->tx_evtchn);
> +		if (err) {
> +			message = "writing event-channel-tx";
> +			goto abort_transaction;
> +		}
> +		err = xenbus_printf(xbt, dev->nodename,
> +				    "event-channel-rx", "%u", info->rx_evtchn);
> +		if (err) {
> +			message = "writing event-channel-rx";
> +			goto abort_transaction;
> +		}
>  	}
>  
>  	err = xenbus_printf(xbt, dev->nodename, "request-rx-copy", "%u",
> @@ -1808,7 +1889,9 @@ static int xennet_connect(struct net_device *dev)
>  	 * packets.
>  	 */
>  	netif_carrier_on(np->netdev);
> -	notify_remote_via_irq(np->netdev->irq);
> +	notify_remote_via_irq(np->tx_irq);
> +	if (np->split_evtchn)
> +		notify_remote_via_irq(np->rx_irq);
>  	xennet_tx_buf_gc(dev);
>  	xennet_alloc_rx_buffers(dev);
>  
> -- 
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-30 14:45 ` [RFC PATCH V3 15/16] netfront: multi page ring support Wei Liu
@ 2012-01-30 21:39   ` Konrad Rzeszutek Wilk
  2012-01-31  9:12     ` Ian Campbell
                       ` (2 more replies)
  0 siblings, 3 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-30 21:39 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell

On Mon, Jan 30, 2012 at 02:45:33PM +0000, Wei Liu wrote:
> Use DMA API to allocate ring pages, because we need to get machine
> contiginous memory.

> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netfront.c |  258 ++++++++++++++++++++++++++++++++------------
>  1 files changed, 187 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 01f589d..32ec212 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -66,9 +66,18 @@ struct netfront_cb {
>  
>  #define GRANT_INVALID_REF	0
>  
> -#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
> -#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
> -#define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
> +#define XENNET_MAX_RING_PAGE_ORDER 2
> +#define XENNET_MAX_RING_PAGES      (1U << XENNET_MAX_RING_PAGE_ORDER)
> +
> +#define NET_TX_RING_SIZE(_nr_pages)					\
> +	__CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE * (_nr_pages))
> +#define NET_RX_RING_SIZE(_nr_pages)					\
> +	__CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE * (_nr_pages))
> +
> +#define XENNET_MAX_TX_RING_SIZE NET_TX_RING_SIZE(XENNET_MAX_RING_PAGES)
> +#define XENNET_MAX_RX_RING_SIZE NET_RX_RING_SIZE(XENNET_MAX_RING_PAGES)
> +
> +#define TX_MAX_TARGET XENNET_MAX_TX_RING_SIZE
>  
>  struct netfront_stats {
>  	u64			rx_packets;
> @@ -84,12 +93,20 @@ struct netfront_info {
>  
>  	struct napi_struct napi;
>  
> +	/* Statistics */
> +	struct netfront_stats __percpu *stats;
> +
> +	unsigned long rx_gso_checksum_fixup;
> +
>  	unsigned int evtchn;
>  	struct xenbus_device *xbdev;
>  
>  	spinlock_t   tx_lock;
>  	struct xen_netif_tx_front_ring tx;
> -	int tx_ring_ref;
> +	dma_addr_t tx_ring_dma_handle;
> +	int tx_ring_ref[XENNET_MAX_RING_PAGES];
> +	int tx_ring_page_order;
> +	int tx_ring_pages;
>  
>  	/*
>  	 * {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries
> @@ -103,36 +120,34 @@ struct netfront_info {
>  	union skb_entry {
>  		struct sk_buff *skb;
>  		unsigned long link;
> -	} tx_skbs[NET_TX_RING_SIZE];
> +	} tx_skbs[XENNET_MAX_TX_RING_SIZE];
>  	grant_ref_t gref_tx_head;
> -	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
> +	grant_ref_t grant_tx_ref[XENNET_MAX_TX_RING_SIZE];
>  	unsigned tx_skb_freelist;
>  
>  	spinlock_t   rx_lock ____cacheline_aligned_in_smp;
>  	struct xen_netif_rx_front_ring rx;
> -	int rx_ring_ref;
> +	dma_addr_t rx_ring_dma_handle;
> +	int rx_ring_ref[XENNET_MAX_RING_PAGES];
> +	int rx_ring_page_order;
> +	int rx_ring_pages;
>  
>  	/* Receive-ring batched refills. */
>  #define RX_MIN_TARGET 8
>  #define RX_DFL_MIN_TARGET 64
> -#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
> +#define RX_MAX_TARGET XENNET_MAX_RX_RING_SIZE
>  	unsigned rx_min_target, rx_max_target, rx_target;
>  	struct sk_buff_head rx_batch;
>  
>  	struct timer_list rx_refill_timer;
>  
> -	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
> +	struct sk_buff *rx_skbs[XENNET_MAX_RX_RING_SIZE];
>  	grant_ref_t gref_rx_head;
> -	grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
> -
> -	unsigned long rx_pfn_array[NET_RX_RING_SIZE];
> -	struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
> -	struct mmu_update rx_mmu[NET_RX_RING_SIZE];
> -
> -	/* Statistics */
> -	struct netfront_stats __percpu *stats;
> +	grant_ref_t grant_rx_ref[XENNET_MAX_RX_RING_SIZE];
>  
> -	unsigned long rx_gso_checksum_fixup;
> +	unsigned long rx_pfn_array[XENNET_MAX_RX_RING_SIZE];
> +	struct multicall_entry rx_mcl[XENNET_MAX_RX_RING_SIZE+1];
> +	struct mmu_update rx_mmu[XENNET_MAX_RX_RING_SIZE];
>  };
>  
>  struct netfront_rx_info {
> @@ -170,15 +185,15 @@ static unsigned short get_id_from_freelist(unsigned *head,
>  	return id;
>  }
>  
> -static int xennet_rxidx(RING_IDX idx)
> +static int xennet_rxidx(RING_IDX idx, struct netfront_info *info)
>  {
> -	return idx & (NET_RX_RING_SIZE - 1);
> +	return idx & (NET_RX_RING_SIZE(info->rx_ring_pages) - 1);
>  }
>  
>  static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
>  					 RING_IDX ri)
>  {
> -	int i = xennet_rxidx(ri);
> +	int i = xennet_rxidx(ri, np);
>  	struct sk_buff *skb = np->rx_skbs[i];
>  	np->rx_skbs[i] = NULL;
>  	return skb;
> @@ -187,7 +202,7 @@ static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
>  static grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
>  					    RING_IDX ri)
>  {
> -	int i = xennet_rxidx(ri);
> +	int i = xennet_rxidx(ri, np);
>  	grant_ref_t ref = np->grant_rx_ref[i];
>  	np->grant_rx_ref[i] = GRANT_INVALID_REF;
>  	return ref;
> @@ -300,7 +315,7 @@ no_skb:
>  
>  		skb->dev = dev;
>  
> -		id = xennet_rxidx(req_prod + i);
> +		id = xennet_rxidx(req_prod + i, np);
>  
>  		BUG_ON(np->rx_skbs[id]);
>  		np->rx_skbs[id] = skb;
> @@ -596,7 +611,7 @@ static int xennet_close(struct net_device *dev)
>  static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
>  				grant_ref_t ref)
>  {
> -	int new = xennet_rxidx(np->rx.req_prod_pvt);
> +	int new = xennet_rxidx(np->rx.req_prod_pvt, np);
>  
>  	BUG_ON(np->rx_skbs[new]);
>  	np->rx_skbs[new] = skb;
> @@ -1089,7 +1104,7 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
>  	struct sk_buff *skb;
>  	int i;
>  
> -	for (i = 0; i < NET_TX_RING_SIZE; i++) {
> +	for (i = 0; i < NET_TX_RING_SIZE(np->tx_ring_pages); i++) {
>  		/* Skip over entries which are actually freelist references */
>  		if (skb_entry_is_link(&np->tx_skbs[i]))
>  			continue;
> @@ -1123,7 +1138,7 @@ static void xennet_release_rx_bufs(struct netfront_info *np)
>  
>  	spin_lock_bh(&np->rx_lock);
>  
> -	for (id = 0; id < NET_RX_RING_SIZE; id++) {
> +	for (id = 0; id < NET_RX_RING_SIZE(np->rx_ring_pages); id++) {
>  		ref = np->grant_rx_ref[id];
>  		if (ref == GRANT_INVALID_REF) {
>  			unused++;
> @@ -1305,13 +1320,13 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev
>  
>  	/* Initialise tx_skbs as a free chain containing every entry. */
>  	np->tx_skb_freelist = 0;
> -	for (i = 0; i < NET_TX_RING_SIZE; i++) {
> +	for (i = 0; i < XENNET_MAX_TX_RING_SIZE; i++) {
>  		skb_entry_set_link(&np->tx_skbs[i], i+1);
>  		np->grant_tx_ref[i] = GRANT_INVALID_REF;
>  	}
>  
>  	/* Clear out rx_skbs */
> -	for (i = 0; i < NET_RX_RING_SIZE; i++) {
> +	for (i = 0; i < XENNET_MAX_RX_RING_SIZE; i++) {
>  		np->rx_skbs[i] = NULL;
>  		np->grant_rx_ref[i] = GRANT_INVALID_REF;
>  	}
> @@ -1409,15 +1424,11 @@ static int __devinit netfront_probe(struct xenbus_device *dev,
>  	return err;
>  }
>  
> -static void xennet_end_access(int ref, void *page)
> -{
> -	/* This frees the page as a side-effect */
> -	if (ref != GRANT_INVALID_REF)
> -		gnttab_end_foreign_access(ref, 0, (unsigned long)page);
> -}
> -
>  static void xennet_disconnect_backend(struct netfront_info *info)
>  {
> +	int i;
> +	struct xenbus_device *dev = info->xbdev;
> +
>  	/* Stop old i/f to prevent errors whilst we rebuild the state. */
>  	spin_lock_bh(&info->rx_lock);
>  	spin_lock_irq(&info->tx_lock);
> @@ -1429,12 +1440,24 @@ static void xennet_disconnect_backend(struct netfront_info *info)
>  		unbind_from_irqhandler(info->netdev->irq, info->netdev);
>  	info->evtchn = info->netdev->irq = 0;
>  
> -	/* End access and free the pages */
> -	xennet_end_access(info->tx_ring_ref, info->tx.sring);
> -	xennet_end_access(info->rx_ring_ref, info->rx.sring);
> +	for (i = 0; i < info->tx_ring_pages; i++) {
> +		int ref = info->tx_ring_ref[i];
> +		gnttab_end_foreign_access_ref(ref, 0);
> +		info->tx_ring_ref[i] = GRANT_INVALID_REF;
> +	}
> +	dma_free_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> +			  (void *)info->tx.sring,
> +			  info->tx_ring_dma_handle);
> +
> +	for (i = 0; i < info->rx_ring_pages; i++) {
> +		int ref = info->rx_ring_ref[i];
> +		gnttab_end_foreign_access_ref(ref, 0);
> +		info->rx_ring_ref[i] = GRANT_INVALID_REF;
> +	}
> +	dma_free_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
> +			  (void *)info->rx.sring,
> +			  info->rx_ring_dma_handle);
>  
> -	info->tx_ring_ref = GRANT_INVALID_REF;
> -	info->rx_ring_ref = GRANT_INVALID_REF;
>  	info->tx.sring = NULL;
>  	info->rx.sring = NULL;
>  }
> @@ -1483,9 +1506,13 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
>  	struct xen_netif_rx_sring *rxs;
>  	int err;
>  	struct net_device *netdev = info->netdev;
> +	unsigned int max_tx_ring_page_order, max_rx_ring_page_order;
> +	int i, j;
>  
> -	info->tx_ring_ref = GRANT_INVALID_REF;
> -	info->rx_ring_ref = GRANT_INVALID_REF;
> +	for (i = 0; i < XENNET_MAX_RING_PAGES; i++) {
> +		info->tx_ring_ref[i] = GRANT_INVALID_REF;
> +		info->rx_ring_ref[i] = GRANT_INVALID_REF;
> +	}
>  	info->rx.sring = NULL;
>  	info->tx.sring = NULL;
>  	netdev->irq = 0;
> @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
>  		goto fail;
>  	}
>  
> -	txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> +	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> +			   "max-tx-ring-page-order", "%u",
> +			   &max_tx_ring_page_order);
> +	if (err < 0) {
> +		info->tx_ring_page_order = 0;
> +		dev_info(&dev->dev, "single tx ring\n");
> +	} else {
> +		info->tx_ring_page_order = max_tx_ring_page_order;
> +		dev_info(&dev->dev, "multi page tx ring, order = %d\n",
> +			 max_tx_ring_page_order);
> +	}
> +	info->tx_ring_pages = (1U << info->tx_ring_page_order);
> +
> +	txs = (struct xen_netif_tx_sring *)
> +		dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> +				   &info->tx_ring_dma_handle,
> +				   __GFP_ZERO | GFP_NOIO | __GFP_HIGH);

Hm, so I see you are using 'NULL' which is a big nono (the API docs say that).
But the other reason why it is a no-no, is b/c this way the generic DMA engine has no
clue whether you are OK getting pages under 4GB or above it (so 64-bit support).

If you don't supply a 'dev' it will assume 4GB. But when you are run this as a
pure PV guest that won't matter the slighest b/I there are no DMA code in action
(well, there is dma_alloc_coherent - which looking at the code would NULL it seems).

Anyhow, if you get to have more than 4GB in the guest or do PCI passthrough and use
'iommu=soft'- at which point the Xen SWIOTLB will kick and you will end up 'swizzling'
the pages to be under 4GB. That can be fixed if you declerae a 'fake' device where you set
the coherent_dma_mask to DMA_BIT_MASK(64).

But if you boot the guest under HVM, then it will use the generic SWIOTLB code, which
won't guaranteeing the pages to be "machine" contingous but will be "guest machine"
contingous. Is that sufficient for this?

How did you test this? Did you supply iommu=soft  to your guest or booted it
with more than 4GB?


>  	if (!txs) {
>  		err = -ENOMEM;
>  		xenbus_dev_fatal(dev, err, "allocating tx ring page");
>  		goto fail;
>  	}
>  	SHARED_RING_INIT(txs);
> -	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE);
> +	FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE * info->tx_ring_pages);
> +
> +	for (i = 0; i < info->tx_ring_pages; i++) {
> +		void *addr = (void *)((unsigned long)txs + PAGE_SIZE * i);
> +		err = xenbus_grant_ring(dev, virt_to_mfn(addr));
> +		if (err < 0)
> +			goto grant_tx_ring_fail;
> +		info->tx_ring_ref[i] = err;
> +	}
>  
> -	err = xenbus_grant_ring(dev, virt_to_mfn(txs));
> +	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> +			   "max-rx-ring-page-order", "%u",
> +			   &max_rx_ring_page_order);
>  	if (err < 0) {
> -		free_page((unsigned long)txs);
> -		goto fail;
> +		info->rx_ring_page_order = 0;
> +		dev_info(&dev->dev, "single rx ring\n");
> +	} else {
> +		info->rx_ring_page_order = max_rx_ring_page_order;
> +		dev_info(&dev->dev, "multi page rx ring, order = %d\n",
> +			 max_rx_ring_page_order);
>  	}
> +	info->rx_ring_pages = (1U << info->rx_ring_page_order);
>  
> -	info->tx_ring_ref = err;
> -	rxs = (struct xen_netif_rx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> +	rxs = (struct xen_netif_rx_sring *)
> +		dma_alloc_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
> +				   &info->rx_ring_dma_handle,
> +				   __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
>  	if (!rxs) {
>  		err = -ENOMEM;
>  		xenbus_dev_fatal(dev, err, "allocating rx ring page");
> -		goto fail;
> +		goto alloc_rx_ring_fail;
>  	}
>  	SHARED_RING_INIT(rxs);
> -	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE);
> -
> -	err = xenbus_grant_ring(dev, virt_to_mfn(rxs));
> -	if (err < 0) {
> -		free_page((unsigned long)rxs);
> -		goto fail;
> +	FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE * info->rx_ring_pages);
> +
> +	for (j = 0; j < info->rx_ring_pages; j++) {
> +		void *addr = (void *)((unsigned long)rxs + PAGE_SIZE * j);
> +		err = xenbus_grant_ring(dev, virt_to_mfn(addr));
> +		if (err < 0)
> +			goto grant_rx_ring_fail;
> +		info->rx_ring_ref[j] = err;
>  	}
> -	info->rx_ring_ref = err;
>  
>  	err = xenbus_alloc_evtchn(dev, &info->evtchn);
>  	if (err)
> -		goto fail;
> +		goto alloc_evtchn_fail;
>  
>  	err = bind_evtchn_to_irqhandler(info->evtchn, xennet_interrupt,
>  					0, netdev->name, netdev);
>  	if (err < 0)
> -		goto fail;
> +		goto bind_fail;
>  	netdev->irq = err;
> +
>  	return 0;
>  
> - fail:
> +bind_fail:
> +	xenbus_free_evtchn(dev, info->evtchn);
> +alloc_evtchn_fail:
> +	for (; j >= 0; j--) {
> +		int ref = info->rx_ring_ref[j];
> +		gnttab_end_foreign_access_ref(ref, 0);
> +		info->rx_ring_ref[j] = GRANT_INVALID_REF;
> +	}
> +grant_rx_ring_fail:
> +	dma_free_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
> +			  (void *)rxs, info->rx_ring_dma_handle);
> +alloc_rx_ring_fail:
> +	for (; i >= 0; i--) {
> +		int ref = info->tx_ring_ref[i];
> +		gnttab_end_foreign_access_ref(ref, 0);
> +		info->tx_ring_ref[i] = GRANT_INVALID_REF;
> +	}
> +grant_tx_ring_fail:
> +	dma_free_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> +			  (void *)txs, info->tx_ring_dma_handle);
> +fail:
>  	return err;
>  }
>  
> @@ -1550,6 +1632,7 @@ static int talk_to_netback(struct xenbus_device *dev,
>  	const char *message;
>  	struct xenbus_transaction xbt;
>  	int err;
> +	int i;
>  
>  	/* Create shared ring, alloc event channel. */
>  	err = setup_netfront(dev, info);
> @@ -1563,18 +1646,50 @@ again:
>  		goto destroy_ring;
>  	}
>  
> -	err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
> -			    info->tx_ring_ref);
> -	if (err) {
> -		message = "writing tx ring-ref";
> -		goto abort_transaction;
> +	if (info->tx_ring_page_order == 0)
> +		err = xenbus_printf(xbt, dev->nodename, "tx-ring-ref", "%u",
> +				    info->tx_ring_ref[0]);
> +	else {
> +		err = xenbus_printf(xbt, dev->nodename, "tx-ring-order", "%u",
> +				    info->tx_ring_page_order);
> +		if (err) {
> +			message = "writing tx ring-ref";
> +			goto abort_transaction;
> +		}
> +		for (i = 0; i < info->tx_ring_pages; i++) {
> +			char name[sizeof("tx-ring-ref")+2];
> +			snprintf(name, sizeof(name), "tx-ring-ref%u", i);
> +			err = xenbus_printf(xbt, dev->nodename, name, "%u",
> +					    info->tx_ring_ref[i]);
> +			if (err) {
> +				message = "writing tx ring-ref";
> +				goto abort_transaction;
> +			}
> +		}
>  	}
> -	err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
> -			    info->rx_ring_ref);
> -	if (err) {
> -		message = "writing rx ring-ref";
> -		goto abort_transaction;
> +
> +	if (info->rx_ring_page_order == 0)
> +		err = xenbus_printf(xbt, dev->nodename, "rx-ring-ref", "%u",
> +				    info->rx_ring_ref[0]);
> +	else {
> +		err = xenbus_printf(xbt, dev->nodename, "rx-ring-order", "%u",
> +				    info->rx_ring_page_order);
> +		if (err) {
> +			message = "writing tx ring-ref";
> +			goto abort_transaction;
> +		}
> +		for (i = 0; i < info->rx_ring_pages; i++) {
> +			char name[sizeof("rx-ring-ref")+2];
> +			snprintf(name, sizeof(name), "rx-ring-ref%u", i);
> +			err = xenbus_printf(xbt, dev->nodename, name, "%u",
> +					    info->rx_ring_ref[i]);
> +			if (err) {
> +				message = "writing rx ring-ref";
> +				goto abort_transaction;
> +			}
> +		}
>  	}
> +
>  	err = xenbus_printf(xbt, dev->nodename,
>  			    "event-channel", "%u", info->evtchn);
>  	if (err) {
> @@ -1661,7 +1776,8 @@ static int xennet_connect(struct net_device *dev)
>  	xennet_release_tx_bufs(np);
>  
>  	/* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */
> -	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) {
> +	for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE(np->rx_ring_pages);
> +	     i++) {
>  		skb_frag_t *frag;
>  		const struct page *page;
>  		if (!np->rx_skbs[i])
> -- 
> 1.7.2.5
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 13/16] netback: stub for multi receive protocol support.
  2012-01-30 14:45 ` [RFC PATCH V3 13/16] netback: stub for multi receive protocol support Wei Liu
@ 2012-01-30 21:47   ` Konrad Rzeszutek Wilk
  2012-01-31 11:03       ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-30 21:47 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell

On Mon, Jan 30, 2012 at 02:45:31PM +0000, Wei Liu wrote:
> Refactor netback, make stub for mutli receive protocols. Also stub

multi.

> existing code as protocol 0.

Why not 1?

Why do we need a new rework without anything using it besides
the existing framework? OR if you are, you should say which
patch is doing it...

> 
> Now the file layout becomes:
> 
>  - interface.c: xenvif interfaces
>  - xenbus.c: xenbus related functions
>  - netback.c: common functions for various protocols
> 
> For different protocols:
> 
>  - xenvif_rx_protocolX.h: header file for the protocol, including
>                           protocol structures and functions
>  - xenvif_rx_protocolX.c: implementations
> 
> To add a new protocol:
> 
>  - include protocol header in common.h
>  - modify XENVIF_MAX_RX_PROTOCOL in common.h
>  - add protocol structure in xenvif.rx union
>  - stub in xenbus.c
>  - modify Makefile
> 
> A protocol should define five functions:
> 
>  - setup: setup frontend / backend ring connections
>  - teardown: teardown frontend / backend ring connections
>  - start_xmit: host start xmit (i.e. guest need to do rx)
>  - event: rx completion event
>  - action: prepare host side data for guest rx
> 
.. snip..

> -
> -	return resp;
> -}
> -
>  static inline int rx_work_todo(struct xenvif *vif)
>  {
>  	return !skb_queue_empty(&vif->rx_queue);
> @@ -1507,8 +999,8 @@ int xenvif_kthread(void *data)
>  		if (kthread_should_stop())
>  			break;
>  
> -		if (rx_work_todo(vif))
> -			xenvif_rx_action(vif);
> +		if (rx_work_todo(vif) && vif->action)
> +			vif->action(vif);
>  	}
>  
>  	return 0;
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index 79499fc..4067286 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -415,6 +415,7 @@ static int connect_rings(struct backend_info *be)
>  	unsigned long rx_ring_ref[NETBK_MAX_RING_PAGES];
>  	unsigned int  tx_ring_order;
>  	unsigned int  rx_ring_order;
> +	unsigned int  rx_protocol;
>  
>  	err = xenbus_gather(XBT_NIL, dev->otherend,
>  			    "event-channel", "%u", &evtchn, NULL);
> @@ -510,6 +511,11 @@ static int connect_rings(struct backend_info *be)
>  		}
>  	}
>  
> +	err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-protocol",

feature-rx-protocol?

> +			   "%u", &rx_protocol);
> +	if (err < 0)
> +		rx_protocol = XENVIF_MIN_RX_PROTOCOL;
> +

You should check to see if the protocol is higher than what we can support.
The guest could be playing funny games and putting in 39432...


>  	err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
>  			   &rx_copy);
>  	if (err == -ENOENT) {
> @@ -559,7 +565,7 @@ static int connect_rings(struct backend_info *be)
>  	err = xenvif_connect(vif,
>  			     tx_ring_ref, (1U << tx_ring_order),
>  			     rx_ring_ref, (1U << rx_ring_order),
> -			     evtchn);
> +			     evtchn, rx_protocol);
>  	if (err) {
>  		int i;
>  		xenbus_dev_fatal(dev, err,

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
  2012-01-30 14:45 ` [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space Wei Liu
@ 2012-01-30 21:53   ` Konrad Rzeszutek Wilk
  2012-01-31 10:48       ` Wei Liu
  2012-01-31  1:25   ` Eric Dumazet
  1 sibling, 1 reply; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-30 21:53 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell

On Mon, Jan 30, 2012 at 02:45:28PM +0000, Wei Liu wrote:
> If we allocate large arrays in per-cpu section, multi-page ring
> feature is likely to blow up the per-cpu section. So avoid allocating
> large arrays, instead we only store pointers to scratch spaces in
> per-cpu section.
> 
> CPU hotplug event is also taken care of.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/netback.c |  140 +++++++++++++++++++++++++++----------
>  1 files changed, 104 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index a8d58a9..2ac9b84 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -39,6 +39,7 @@
>  #include <linux/kthread.h>
>  #include <linux/if_vlan.h>
>  #include <linux/udp.h>
> +#include <linux/cpu.h>
>  
>  #include <net/tcp.h>
>  
> @@ -49,15 +50,15 @@
>  #include <asm/xen/page.h>
>  
>  
> -struct gnttab_copy *tx_copy_ops;
> +DEFINE_PER_CPU(struct gnttab_copy *, tx_copy_ops);
>  
>  /*
>   * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
>   * head/fragment page uses 2 copy operations because it
>   * straddles two buffers in the frontend.
>   */
> -struct gnttab_copy *grant_copy_op;
> -struct xenvif_rx_meta *meta;
> +DEFINE_PER_CPU(struct gnttab_copy *, grant_copy_op);
> +DEFINE_PER_CPU(struct xenvif_rx_meta *, meta);
>  
>  static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx);
>  static void make_tx_response(struct xenvif *vif,
> @@ -481,8 +482,8 @@ void xenvif_rx_action(struct xenvif *vif)
>  	struct skb_cb_overlay *sco;
>  	int need_to_notify = 0;
>  
> -	struct gnttab_copy *gco = get_cpu_ptr(grant_copy_op);
> -	struct xenvif_rx_meta *m = get_cpu_ptr(meta);
> +	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
> +	struct xenvif_rx_meta *m = get_cpu_var(meta);
>  
>  	struct netrx_pending_operations npo = {
>  		.copy  = gco,
> @@ -512,8 +513,8 @@ void xenvif_rx_action(struct xenvif *vif)
>  	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
>  
>  	if (!npo.copy_prod) {
> -		put_cpu_ptr(gco);
> -		put_cpu_ptr(m);
> +		put_cpu_var(grant_copy_op);
> +		put_cpu_var(meta);
>  		return;
>  	}
>  
> @@ -599,8 +600,8 @@ void xenvif_rx_action(struct xenvif *vif)
>  	if (!skb_queue_empty(&vif->rx_queue))
>  		xenvif_kick_thread(vif);
>  
> -	put_cpu_ptr(gco);
> -	put_cpu_ptr(m);
> +	put_cpu_var(grant_copy_op);
> +	put_cpu_var(meta);
>  }
>  
>  void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
> @@ -1287,12 +1288,12 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
>  	if (unlikely(!tx_work_todo(vif)))
>  		return 0;
>  
> -	tco = get_cpu_ptr(tx_copy_ops);
> +	tco = get_cpu_var(tx_copy_ops);
>  
>  	nr_gops = xenvif_tx_build_gops(vif, tco);
>  
>  	if (nr_gops == 0) {
> -		put_cpu_ptr(tco);
> +		put_cpu_var(tx_copy_ops);
>  		return 0;
>  	}
>  
> @@ -1301,7 +1302,7 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
>  
>  	work_done = xenvif_tx_submit(vif, tco, budget);
>  
> -	put_cpu_ptr(tco);
> +	put_cpu_var(tx_copy_ops);
>  
>  	return work_done;
>  }
> @@ -1452,31 +1453,97 @@ int xenvif_kthread(void *data)
>  	return 0;
>  }
>  
> +static int __create_percpu_scratch_space(unsigned int cpu)
> +{
> +	per_cpu(tx_copy_ops, cpu) =
> +		vzalloc(sizeof(struct gnttab_copy) * MAX_PENDING_REQS);
> +
> +	per_cpu(grant_copy_op, cpu) =
> +		vzalloc(sizeof(struct gnttab_copy)
> +			* 2 * XEN_NETIF_RX_RING_SIZE);
> +
> +	per_cpu(meta, cpu) = vzalloc(sizeof(struct xenvif_rx_meta)
> +				     * 2 * XEN_NETIF_RX_RING_SIZE);
> +
> +	if (!per_cpu(tx_copy_ops, cpu) ||
> +	    !per_cpu(grant_copy_op, cpu) ||
> +	    !per_cpu(meta, cpu))

Hm, shouldn't you vfree one at least them? It might be that just one of
them failed.

> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void __free_percpu_scratch_space(unsigned int cpu)
> +{
> +	/* freeing NULL pointer is legit */
> +	vfree(per_cpu(tx_copy_ops, cpu));
> +	vfree(per_cpu(grant_copy_op, cpu));
> +	vfree(per_cpu(meta, cpu));
> +}
> +
> +static int __netback_percpu_callback(struct notifier_block *nfb,
> +				     unsigned long action, void *hcpu)
> +{
> +	unsigned int cpu = (unsigned long)hcpu;
> +	int rc = NOTIFY_DONE;
> +
> +	switch (action) {
> +	case CPU_ONLINE:
> +	case CPU_ONLINE_FROZEN:
> +		printk(KERN_INFO
> +		       "netback: CPU %x online, creating scratch space\n", cpu);

Is there any way to use 'pr_info(DRV_NAME' for these printk's? It might
require another patch, but it would  make it nicer.

> +		rc = __create_percpu_scratch_space(cpu);
> +		if (rc) {
> +			printk(KERN_ALERT
> +			       "netback: failed to create scratch space for CPU"
> +			       " %x\n", cpu);
> +			/* FIXME: nothing more we can do here, we will
> +			 * print out warning message when thread or
> +			 * NAPI runs on this cpu. Also stop getting
> +			 * called in the future.
> +			 */
> +			__free_percpu_scratch_space(cpu);
> +			rc = NOTIFY_BAD;
> +		} else {
> +			rc = NOTIFY_OK;
> +		}
> +		break;
> +	case CPU_DEAD:
> +	case CPU_DEAD_FROZEN:
> +		printk("netback: CPU %x offline, destroying scratch space\n",
> +		       cpu);

pr_debug?

> +		__free_percpu_scratch_space(cpu);
> +		rc = NOTIFY_OK;
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	return rc;
> +}
> +
> +static struct notifier_block netback_notifier_block = {
> +	.notifier_call = __netback_percpu_callback,
> +};
>  
>  static int __init netback_init(void)
>  {
>  	int rc = -ENOMEM;
> +	int cpu;
>  
>  	if (!xen_domain())
>  		return -ENODEV;
>  
> -	tx_copy_ops = __alloc_percpu(sizeof(struct gnttab_copy)
> -				     * MAX_PENDING_REQS,
> -				     __alignof__(struct gnttab_copy));
> -	if (!tx_copy_ops)
> -		goto failed_init;
> +	/* Don't need to disable preempt here, since nobody else will
> +	 * touch these percpu areas during start up. */
> +	for_each_online_cpu(cpu) {
> +		rc = __create_percpu_scratch_space(cpu);
>  
> -	grant_copy_op = __alloc_percpu(sizeof(struct gnttab_copy)
> -				       * 2 * XEN_NETIF_RX_RING_SIZE,
> -				       __alignof__(struct gnttab_copy));
> -	if (!grant_copy_op)
> -		goto failed_init_gco;
> +		if (rc)
> +			goto failed_init;
> +	}
>  
> -	meta = __alloc_percpu(sizeof(struct xenvif_rx_meta)
> -			      * 2 * XEN_NETIF_RX_RING_SIZE,
> -			      __alignof__(struct xenvif_rx_meta));
> -	if (!meta)
> -		goto failed_init_meta;
> +	register_hotcpu_notifier(&netback_notifier_block);
>  
>  	rc = page_pool_init();
>  	if (rc)
> @@ -1491,25 +1558,26 @@ static int __init netback_init(void)
>  failed_init_xenbus:
>  	page_pool_destroy();
>  failed_init_pool:
> -	free_percpu(meta);
> -failed_init_meta:
> -	free_percpu(grant_copy_op);
> -failed_init_gco:
> -	free_percpu(tx_copy_ops);
> +	for_each_online_cpu(cpu)
> +		__free_percpu_scratch_space(cpu);
>  failed_init:
>  	return rc;

We don't want to try to clean up the per_cpu_spaces that might
have gotten allocated in the loop?

> -
>  }
>  
>  module_init(netback_init);
>  
>  static void __exit netback_exit(void)
>  {
> +	int cpu;
> +
>  	xenvif_xenbus_exit();
>  	page_pool_destroy();
> -	free_percpu(meta);
> -	free_percpu(grant_copy_op);
> -	free_percpu(tx_copy_ops);
> +
> +	unregister_hotcpu_notifier(&netback_notifier_block);
> +
> +	/* Since we're here, nobody else will touch per-cpu area. */
> +	for_each_online_cpu(cpu)
> +		__free_percpu_scratch_space(cpu);
>  }
>  module_exit(netback_exit);
>  
> -- 
> 1.7.2.5

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
  2012-01-30 14:45 ` [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space Wei Liu
  2012-01-30 21:53   ` Konrad Rzeszutek Wilk
@ 2012-01-31  1:25   ` Eric Dumazet
  2012-01-31 10:43       ` Wei Liu
  1 sibling, 1 reply; 59+ messages in thread
From: Eric Dumazet @ 2012-01-31  1:25 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, ian.campbell, konrad.wilk

Le 30 janvier 2012 15:45, Wei Liu <wei.liu2@citrix.com> a écrit :
> If we allocate large arrays in per-cpu section, multi-page ring
> feature is likely to blow up the per-cpu section. So avoid allocating
> large arrays, instead we only store pointers to scratch spaces in
> per-cpu section.
>
> CPU hotplug event is also taken care of.
>

>  }
>
> +static int __create_percpu_scratch_space(unsigned int cpu)
> +{
> +       per_cpu(tx_copy_ops, cpu) =
> +               vzalloc(sizeof(struct gnttab_copy) * MAX_PENDING_REQS);
> +
> +       per_cpu(grant_copy_op, cpu) =
> +               vzalloc(sizeof(struct gnttab_copy)
> +                       * 2 * XEN_NETIF_RX_RING_SIZE);
> +
> +       per_cpu(meta, cpu) = vzalloc(sizeof(struct xenvif_rx_meta)
> +                                    * 2 * XEN_NETIF_RX_RING_SIZE);
> +
> +       if (!per_cpu(tx_copy_ops, cpu) ||
> +           !per_cpu(grant_copy_op, cpu) ||
> +           !per_cpu(meta, cpu))
> +               return -ENOMEM;
> +
> +       return 0;
> +}
> +

Problem is you lost NUMA awareness here.

Please check vzalloc_node()

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-30 17:10       ` Wei Liu
  (?)
@ 2012-01-31  9:01       ` Jan Beulich
  2012-01-31 11:09           ` Wei Liu
  -1 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2012-01-31  9:01 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, konrad.wilk, netdev

>>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
>> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
>> > -int xenvif_map_frontend_rings(struct xenvif *vif,
>> > -			      grant_ref_t tx_ring_ref,
>> > -			      grant_ref_t rx_ring_ref)
>> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
>> > +			      int domid,
>> > +			      unsigned long ring_ref[],
>> > +			      unsigned int  ring_ref_count)
>> >  {
>> > -	void *addr;
>> > -	struct xen_netif_tx_sring *txs;
>> > -	struct xen_netif_rx_sring *rxs;
>> > -
>> > -	int err = -ENOMEM;
>> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
>> > +	unsigned int i;
>> > +	int err = 0;
>> >  
>> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
>> > -				     tx_ring_ref, &addr);
>> 
>> Any reason why you don't just extend this function (in a prerequisite
>> patch) rather than open coding a common utility function (twice) here,
>> so that other backends (blkback!) can benefit later as well.
>> 
>> Jan
>> 
> 
> I'm mainly focusing on netback stuffs, so the code is slightly coupled
> with netback -- NETBK_MAX_RING_PAGES.
> 
> To extend xenbus_map_ring_valloc and make more generic, it requires
> setting a global maximum page number limits on rings, I think it will
> require further investigation and code refactor -- which I have no time
> to attend to at the moment. :-/

Why? You can simply pass in the number of pages, there's no need
for a global maximum.

Jan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-30 21:39   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-01-31  9:12     ` Ian Campbell
  2012-01-31 11:17         ` Wei Liu
  2012-01-31  9:53       ` Jan Beulich
  2012-01-31 10:58       ` Wei Liu
  2 siblings, 1 reply; 59+ messages in thread
From: Ian Campbell @ 2012-01-31  9:12 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: Wei Liu (Intern), netdev, xen-devel

On Mon, 2012-01-30 at 21:39 +0000, Konrad Rzeszutek Wilk wrote:

[...snip... please do consider trimming unnecessary quotes]

> > @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> >               goto fail;
> >       }
> >
> > -     txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> > +     err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> > +                        "max-tx-ring-page-order", "%u",
> > +                        &max_tx_ring_page_order);
> > +     if (err < 0) {
> > +             info->tx_ring_page_order = 0;
> > +             dev_info(&dev->dev, "single tx ring\n");
> > +     } else {
> > +             info->tx_ring_page_order = max_tx_ring_page_order;
> > +             dev_info(&dev->dev, "multi page tx ring, order = %d\n",
> > +                      max_tx_ring_page_order);
> > +     }
> > +     info->tx_ring_pages = (1U << info->tx_ring_page_order);
> > +
> > +     txs = (struct xen_netif_tx_sring *)
> > +             dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > +                                &info->tx_ring_dma_handle,
> > +                                __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> 
> Hm, so I see you are using 'NULL' which is a big nono (the API docs say that).
> But the other reason why it is a no-no, is b/c this way the generic DMA engine has no
> clue whether you are OK getting pages under 4GB or above it (so 64-bit support).

Does this allocation even need to be physically contiguous? I'd have
thought that virtually contiguous would be sufficient, and even then
only as a convenience at either end to avoid the need for more
complicated ring macros.

Ian.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-30 21:39   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-01-31  9:53       ` Jan Beulich
  2012-01-31  9:53       ` Jan Beulich
  2012-01-31 10:58       ` Wei Liu
  2 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2012-01-31  9:53 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk; +Cc: ian.campbell, xen-devel, netdev

>>> On 30.01.12 at 22:39, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Jan 30, 2012 at 02:45:33PM +0000, Wei Liu wrote:
>> @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, 
> struct netfront_info *info)
>>  		goto fail;
>>  	}
>>  
>> -	txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
>> +	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>> +			   "max-tx-ring-page-order", "%u",
>> +			   &max_tx_ring_page_order);
>> +	if (err < 0) {
>> +		info->tx_ring_page_order = 0;
>> +		dev_info(&dev->dev, "single tx ring\n");
>> +	} else {
>> +		info->tx_ring_page_order = max_tx_ring_page_order;
>> +		dev_info(&dev->dev, "multi page tx ring, order = %d\n",
>> +			 max_tx_ring_page_order);
>> +	}
>> +	info->tx_ring_pages = (1U << info->tx_ring_page_order);
>> +
>> +	txs = (struct xen_netif_tx_sring *)
>> +		dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
>> +				   &info->tx_ring_dma_handle,
>> +				   __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> 
> Hm, so I see you are using 'NULL' which is a big nono (the API docs say 
> that).
> But the other reason why it is a no-no, is b/c this way the generic DMA 
> engine has no
> clue whether you are OK getting pages under 4GB or above it (so 64-bit 
> support).
> 
> If you don't supply a 'dev' it will assume 4GB. But when you are run this as 
> a
> pure PV guest that won't matter the slighest b/I there are no DMA code in 
> action
> (well, there is dma_alloc_coherent - which looking at the code would NULL it 
> seems).
> 
> Anyhow, if you get to have more than 4GB in the guest or do PCI passthrough 
> and use
> 'iommu=soft'- at which point the Xen SWIOTLB will kick and you will end up 
> 'swizzling'
> the pages to be under 4GB. That can be fixed if you declerae a 'fake' device 
> where you set
> the coherent_dma_mask to DMA_BIT_MASK(64).
> 
> But if you boot the guest under HVM, then it will use the generic SWIOTLB 
> code, which
> won't guaranteeing the pages to be "machine" contingous but will be "guest 
> machine"
> contingous. Is that sufficient for this?
> 
> How did you test this? Did you supply iommu=soft  to your guest or booted it
> with more than 4GB?

Imo the use of the DMA API is a mistake here anyway. There's no need
for anything to be contiguous in a PV frontend/backend handshake
protocol, or if one finds there is it's very likely just because of trying to
avoid doing something properly.

Jan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
@ 2012-01-31  9:53       ` Jan Beulich
  0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2012-01-31  9:53 UTC (permalink / raw)
  To: Wei Liu, Konrad Rzeszutek Wilk; +Cc: ian.campbell, xen-devel, netdev

>>> On 30.01.12 at 22:39, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
> On Mon, Jan 30, 2012 at 02:45:33PM +0000, Wei Liu wrote:
>> @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, 
> struct netfront_info *info)
>>  		goto fail;
>>  	}
>>  
>> -	txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
>> +	err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
>> +			   "max-tx-ring-page-order", "%u",
>> +			   &max_tx_ring_page_order);
>> +	if (err < 0) {
>> +		info->tx_ring_page_order = 0;
>> +		dev_info(&dev->dev, "single tx ring\n");
>> +	} else {
>> +		info->tx_ring_page_order = max_tx_ring_page_order;
>> +		dev_info(&dev->dev, "multi page tx ring, order = %d\n",
>> +			 max_tx_ring_page_order);
>> +	}
>> +	info->tx_ring_pages = (1U << info->tx_ring_page_order);
>> +
>> +	txs = (struct xen_netif_tx_sring *)
>> +		dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
>> +				   &info->tx_ring_dma_handle,
>> +				   __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> 
> Hm, so I see you are using 'NULL' which is a big nono (the API docs say 
> that).
> But the other reason why it is a no-no, is b/c this way the generic DMA 
> engine has no
> clue whether you are OK getting pages under 4GB or above it (so 64-bit 
> support).
> 
> If you don't supply a 'dev' it will assume 4GB. But when you are run this as 
> a
> pure PV guest that won't matter the slighest b/I there are no DMA code in 
> action
> (well, there is dma_alloc_coherent - which looking at the code would NULL it 
> seems).
> 
> Anyhow, if you get to have more than 4GB in the guest or do PCI passthrough 
> and use
> 'iommu=soft'- at which point the Xen SWIOTLB will kick and you will end up 
> 'swizzling'
> the pages to be under 4GB. That can be fixed if you declerae a 'fake' device 
> where you set
> the coherent_dma_mask to DMA_BIT_MASK(64).
> 
> But if you boot the guest under HVM, then it will use the generic SWIOTLB 
> code, which
> won't guaranteeing the pages to be "machine" contingous but will be "guest 
> machine"
> contingous. Is that sufficient for this?
> 
> How did you test this? Did you supply iommu=soft  to your guest or booted it
> with more than 4GB?

Imo the use of the DMA API is a mistake here anyway. There's no need
for anything to be contiguous in a PV frontend/backend handshake
protocol, or if one finds there is it's very likely just because of trying to
avoid doing something properly.

Jan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect.
  2012-01-30 14:45 ` [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect Wei Liu
@ 2012-01-31 10:24   ` Ian Campbell
  2012-01-31 10:39       ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Ian Campbell @ 2012-01-31 10:24 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk

On Mon, 2012-01-30 at 14:45 +0000, Wei Liu wrote:
> If there is vif running and user unloads netback, it will certainly
> cause problems -- guest's network interface just mysteriously stops
> working.

This seems like a bug fix for  02/16 "netback: add module unload
function". Please could you fold back such fixes where appropriate? I
think there's a handful of these sorts of patches in the series.

> v2: fix module_put path
> 
> disconnect function may get called by the generic framework even
> before vif connects.
> 
> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/interface.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index dfc04f8..7914f60 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -323,6 +323,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
>  	if (vif->irq)
>  		return 0;
>  
> +	__module_get(THIS_MODULE);
> +
>  	err = xen_netbk_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
>  	if (err < 0)
>  		goto err;
> @@ -372,12 +374,14 @@ err_unbind:
>  err_unmap:
>  	xen_netbk_unmap_frontend_rings(vif);
>  err:
> +	module_put(THIS_MODULE);
>  	return err;
>  }
>  
>  void xenvif_disconnect(struct xenvif *vif)
>  {
>  	struct net_device *dev = vif->dev;
> +	int need_module_put = 0;
>  
>  	if (netif_carrier_ok(dev)) {
>  		rtnl_lock();
> @@ -397,12 +401,17 @@ void xenvif_disconnect(struct xenvif *vif)
>  
>  	del_timer_sync(&vif->credit_timeout);
>  
> -	if (vif->irq)
> +	if (vif->irq) {
>  		unbind_from_irqhandler(vif->irq, vif);
> +		need_module_put = 1;

This seems like a slightly odd condition. Why is the put not
unconditional?


> +	}
>  
>  	unregister_netdev(vif->dev);
>  
>  	xen_netbk_unmap_frontend_rings(vif);
>  
>  	free_netdev(vif->dev);
> +
> +	if (need_module_put)
> +		module_put(THIS_MODULE);
>  }

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 14/16] netback: split event channels support
  2012-01-30 14:45 ` [RFC PATCH V3 14/16] netback: split event channels support Wei Liu
@ 2012-01-31 10:37   ` Ian Campbell
  2012-01-31 11:57       ` Wei Liu
  0 siblings, 1 reply; 59+ messages in thread
From: Ian Campbell @ 2012-01-31 10:37 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, konrad.wilk

On Mon, 2012-01-30 at 14:45 +0000, Wei Liu wrote:
> Originally, netback and netfront only use one event channel to do tx /
> rx notification. This may cause unnecessary wake-up of NAPI / kthread.
> 
> When guest tx is completed, netback will only notify tx_irq.
> 
> Also modify xenvif_protocol0 to reflect this change. Rx protocol
> only notifies rx_irq.
> 
> If split-event-channels feature is not activated, rx_irq = tx_irq, so
> RX protocol will just work as expected.
> 
> Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> ---
>  drivers/net/xen-netback/common.h              |    9 ++-
>  drivers/net/xen-netback/interface.c           |   90 ++++++++++++++++++++-----
>  drivers/net/xen-netback/netback.c             |    2 +-
>  drivers/net/xen-netback/xenbus.c              |   52 ++++++++++++---
>  drivers/net/xen-netback/xenvif_rx_protocol0.c |    2 +-
>  5 files changed, 123 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index f3d95b3..376f0bf 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -100,8 +100,10 @@ struct xenvif {
>  
>  	u8               fe_dev_addr[6];
>  
> -	/* Physical parameters of the comms window. */
> -	unsigned int     irq;
> +	/* when split_irq == 0, only use tx_irq */
> +	int              split_irq;
> +	unsigned int     tx_irq;
> +	unsigned int     rx_irq;

Can you get rid of split_irq by setting tx_irq == rx_irq in that case
and simplify the code by doing so?

I think this should work even for places like:

	if (!vif->split_irq)
		enable_irq(vif->tx_irq);
	else {
		enable_irq(vif->tx_irq);
		enable_irq(vif->rx_irq);
	}

Just by doing
		enable_irq(vif->tx_irq);
		enable_irq(vif->rx_irq);

Since enable/disable_irq maintain a count and so it will do the right
thing if they happen to be the same.

>  	/* The shared tx ring and index. */
>  	struct xen_netif_tx_back_ring tx;
> @@ -162,7 +164,8 @@ struct xenvif *xenvif_alloc(struct device *parent,
>  int xenvif_connect(struct xenvif *vif,
>  		   unsigned long tx_ring_ref[], unsigned int tx_ring_order,
>  		   unsigned long rx_ring_ref[], unsigned int rx_ring_order,
> -		   unsigned int evtchn, unsigned int rx_protocol);
> +		   unsigned int evtchn[], int split_evtchn,
> +		   unsigned int rx_protocol);
>  void xenvif_disconnect(struct xenvif *vif);
>  
>  int xenvif_xenbus_init(void);
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 0f05f03..afccd5d 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -46,15 +46,31 @@ int xenvif_schedulable(struct xenvif *vif)
>  	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
>  }
>  
> -static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
> +static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
> +{
> +	struct xenvif *vif = dev_id;
> +
> +	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
> +		napi_schedule(&vif->napi);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
>  {
>  	struct xenvif *vif = dev_id;
>  
>  	if (xenvif_schedulable(vif) && vif->event != NULL)
>  		vif->event(vif);
>  
> -	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
> -		napi_schedule(&vif->napi);
> +	return IRQ_HANDLED;
> +}
> +
> +static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
> +{
> +	xenvif_tx_interrupt(0, dev_id);

Might as well pass irq down.
[...]
> @@ -308,13 +334,14 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  int xenvif_connect(struct xenvif *vif,
>  		   unsigned long tx_ring_ref[], unsigned int tx_ring_ref_count,
>  		   unsigned long rx_ring_ref[], unsigned int rx_ring_ref_count,
> -		   unsigned int evtchn, unsigned int rx_protocol)
> +		   unsigned int evtchn[], int split_evtchn,

Explicitly tx_evtchn and rx_evtchn would be clearer than remembering
that [0]==tx and [1]==rx I think.

> +		   unsigned int rx_protocol)
>  {
>  	int err = -ENOMEM;
>  	struct xen_netif_tx_sring *txs;
>  
>  	/* Already connected through? */
> -	if (vif->irq)
> +	if (vif->tx_irq)
>  		return 0;
>  
>  	__module_get(THIS_MODULE);
> @@ -345,13 +372,35 @@ int xenvif_connect(struct xenvif *vif,
>  	if (vif->setup(vif))
>  		goto err_rx_unmap;
>  
> -	err = bind_interdomain_evtchn_to_irqhandler(
> -		vif->domid, evtchn, xenvif_interrupt, 0,
> -		vif->dev->name, vif);
> -	if (err < 0)
> -		goto err_rx_unmap;
> -	vif->irq = err;
> -	disable_irq(vif->irq);
> +	if (!split_evtchn) {

Presumably this is one of the places where you do have to care about
split vs non. I did consider whether simply registering two handlers for
the interrupt in a shared-interrupt style would work, but I think that
way lies madness and confusion...

> +		err = bind_interdomain_evtchn_to_irqhandler(
> +			vif->domid, evtchn[0], xenvif_interrupt, 0,
> +			vif->dev->name, vif);
> +		if (err < 0)
> +			goto err_rx_unmap;
> +		vif->tx_irq = vif->rx_irq = err;
> +		disable_irq(vif->tx_irq);
> +		vif->split_irq = 0;
> +	} else {
> +		err = bind_interdomain_evtchn_to_irqhandler(
> +			vif->domid, evtchn[0], xenvif_tx_interrupt,
> +			0, vif->dev->name, vif);
> +		if (err < 0)
> +			goto err_rx_unmap;
> +		vif->tx_irq = err;
> +		disable_irq(vif->tx_irq);
> +
> +		err = bind_interdomain_evtchn_to_irqhandler(
> +			vif->domid, evtchn[1], xenvif_rx_interrupt,
> +			0, vif->dev->name, vif);
> +		if (err < 0) {
> +			unbind_from_irqhandler(vif->tx_irq, vif);
> +			goto err_rx_unmap;
> +		}
> +		vif->rx_irq = err;
> +		disable_irq(vif->rx_irq);
> +		vif->split_irq = 1;
> +	}
>  
>  	init_waitqueue_head(&vif->wq);
>  	vif->task = kthread_create(xenvif_kthread,
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index 4067286..c5a3b27 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -131,6 +131,14 @@ static int netback_probe(struct xenbus_device *dev,
>  			goto abort_transaction;
>  		}
>  
> +		err = xenbus_printf(xbt, dev->nodename,
> +				    "split-event-channels",

Usually we use "feature-FOO" as the names for these sorts of nodes.

> +				    "%u", 1);
> +		if (err) {
> +			message = "writing split-event-channels";
> +			goto abort_transaction;
> +		}
> +
>  		err = xenbus_transaction_end(xbt, 0);
>  	} while (err == -EAGAIN);
>  
> @@ -408,7 +416,7 @@ static int connect_rings(struct backend_info *be)
>  {
>  	struct xenvif *vif = be->vif;
>  	struct xenbus_device *dev = be->dev;
> -	unsigned int evtchn, rx_copy;
> +	unsigned int evtchn[2], split_evtchn, rx_copy;

Another case where I think two vars is better than a small array.

>  	int err;
>  	int val;
>  	unsigned long tx_ring_ref[NETBK_MAX_RING_PAGES];

Ian.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect.
  2012-01-31 10:24   ` Ian Campbell
@ 2012-01-31 10:39       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk

On Tue, 2012-01-31 at 10:24 +0000, Ian Campbell wrote:
> On Mon, 2012-01-30 at 14:45 +0000, Wei Liu wrote:
> > If there is vif running and user unloads netback, it will certainly
> > cause problems -- guest's network interface just mysteriously stops
> > working.
> 
> This seems like a bug fix for  02/16 "netback: add module unload
> function". Please could you fold back such fixes where appropriate? I
> think there's a handful of these sorts of patches in the series.
> 

Sure.

> > v2: fix module_put path
> > 
> > disconnect function may get called by the generic framework even
> > before vif connects.
> > 
> > Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netback/interface.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> > index dfc04f8..7914f60 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -323,6 +323,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
> >  	if (vif->irq)
> >  		return 0;
> >  
> > +	__module_get(THIS_MODULE);
> > +
> >  	err = xen_netbk_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
> >  	if (err < 0)
> >  		goto err;
> > @@ -372,12 +374,14 @@ err_unbind:
> >  err_unmap:
> >  	xen_netbk_unmap_frontend_rings(vif);
> >  err:
> > +	module_put(THIS_MODULE);
> >  	return err;
> >  }
> >  
> >  void xenvif_disconnect(struct xenvif *vif)
> >  {
> >  	struct net_device *dev = vif->dev;
> > +	int need_module_put = 0;
> >  
> >  	if (netif_carrier_ok(dev)) {
> >  		rtnl_lock();
> > @@ -397,12 +401,17 @@ void xenvif_disconnect(struct xenvif *vif)
> >  
> >  	del_timer_sync(&vif->credit_timeout);
> >  
> > -	if (vif->irq)
> > +	if (vif->irq) {
> >  		unbind_from_irqhandler(vif->irq, vif);
> > +		need_module_put = 1;
> 
> This seems like a slightly odd condition. Why is the put not
> unconditional?
> 

This is what I observed. The framework will call disconnect
unconditionally in the cleanup phase. If the frontend fails to
initialize, the connect function will not get called, so there lacks a
corresponding module_get().


Wei.

> 
> > +	}
> >  
> >  	unregister_netdev(vif->dev);
> >  
> >  	xen_netbk_unmap_frontend_rings(vif);
> >  
> >  	free_netdev(vif->dev);
> > +
> > +	if (need_module_put)
> > +		module_put(THIS_MODULE);
> >  }
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect.
@ 2012-01-31 10:39       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk

On Tue, 2012-01-31 at 10:24 +0000, Ian Campbell wrote:
> On Mon, 2012-01-30 at 14:45 +0000, Wei Liu wrote:
> > If there is vif running and user unloads netback, it will certainly
> > cause problems -- guest's network interface just mysteriously stops
> > working.
> 
> This seems like a bug fix for  02/16 "netback: add module unload
> function". Please could you fold back such fixes where appropriate? I
> think there's a handful of these sorts of patches in the series.
> 

Sure.

> > v2: fix module_put path
> > 
> > disconnect function may get called by the generic framework even
> > before vif connects.
> > 
> > Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netback/interface.c |   11 ++++++++++-
> >  1 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> > index dfc04f8..7914f60 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -323,6 +323,8 @@ int xenvif_connect(struct xenvif *vif, unsigned long tx_ring_ref,
> >  	if (vif->irq)
> >  		return 0;
> >  
> > +	__module_get(THIS_MODULE);
> > +
> >  	err = xen_netbk_map_frontend_rings(vif, tx_ring_ref, rx_ring_ref);
> >  	if (err < 0)
> >  		goto err;
> > @@ -372,12 +374,14 @@ err_unbind:
> >  err_unmap:
> >  	xen_netbk_unmap_frontend_rings(vif);
> >  err:
> > +	module_put(THIS_MODULE);
> >  	return err;
> >  }
> >  
> >  void xenvif_disconnect(struct xenvif *vif)
> >  {
> >  	struct net_device *dev = vif->dev;
> > +	int need_module_put = 0;
> >  
> >  	if (netif_carrier_ok(dev)) {
> >  		rtnl_lock();
> > @@ -397,12 +401,17 @@ void xenvif_disconnect(struct xenvif *vif)
> >  
> >  	del_timer_sync(&vif->credit_timeout);
> >  
> > -	if (vif->irq)
> > +	if (vif->irq) {
> >  		unbind_from_irqhandler(vif->irq, vif);
> > +		need_module_put = 1;
> 
> This seems like a slightly odd condition. Why is the put not
> unconditional?
> 

This is what I observed. The framework will call disconnect
unconditionally in the cleanup phase. If the frontend fails to
initialize, the connect function will not get called, so there lacks a
corresponding module_get().


Wei.

> 
> > +	}
> >  
> >  	unregister_netdev(vif->dev);
> >  
> >  	xen_netbk_unmap_frontend_rings(vif);
> >  
> >  	free_netdev(vif->dev);
> > +
> > +	if (need_module_put)
> > +		module_put(THIS_MODULE);
> >  }
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
  2012-01-31  1:25   ` Eric Dumazet
@ 2012-01-31 10:43       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell, konrad.wilk

On Tue, 2012-01-31 at 01:25 +0000, Eric Dumazet wrote:
> 
> Problem is you lost NUMA awareness here.
> 
> Please check vzalloc_node()

Good point, thanks.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
@ 2012-01-31 10:43       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:43 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell, konrad.wilk

On Tue, 2012-01-31 at 01:25 +0000, Eric Dumazet wrote:
> 
> Problem is you lost NUMA awareness here.
> 
> Please check vzalloc_node()

Good point, thanks.

Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
  2012-01-30 21:53   ` Konrad Rzeszutek Wilk
@ 2012-01-31 10:48       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell

On Mon, 2012-01-30 at 21:53 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 30, 2012 at 02:45:28PM +0000, Wei Liu wrote:
> > If we allocate large arrays in per-cpu section, multi-page ring
> > feature is likely to blow up the per-cpu section. So avoid allocating
> > large arrays, instead we only store pointers to scratch spaces in
> > per-cpu section.
> > 
> > CPU hotplug event is also taken care of.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netback/netback.c |  140 +++++++++++++++++++++++++++----------
> >  1 files changed, 104 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index a8d58a9..2ac9b84 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -39,6 +39,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/if_vlan.h>
> >  #include <linux/udp.h>
> > +#include <linux/cpu.h>
> >  
> >  #include <net/tcp.h>
> >  
> > @@ -49,15 +50,15 @@
> >  #include <asm/xen/page.h>
> >  
> >  
> > -struct gnttab_copy *tx_copy_ops;
> > +DEFINE_PER_CPU(struct gnttab_copy *, tx_copy_ops);
> >  
> >  /*
> >   * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
> >   * head/fragment page uses 2 copy operations because it
> >   * straddles two buffers in the frontend.
> >   */
> > -struct gnttab_copy *grant_copy_op;
> > -struct xenvif_rx_meta *meta;
> > +DEFINE_PER_CPU(struct gnttab_copy *, grant_copy_op);
> > +DEFINE_PER_CPU(struct xenvif_rx_meta *, meta);
> >  
> >  static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx);
> >  static void make_tx_response(struct xenvif *vif,
> > @@ -481,8 +482,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  	struct skb_cb_overlay *sco;
> >  	int need_to_notify = 0;
> >  
> > -	struct gnttab_copy *gco = get_cpu_ptr(grant_copy_op);
> > -	struct xenvif_rx_meta *m = get_cpu_ptr(meta);
> > +	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
> > +	struct xenvif_rx_meta *m = get_cpu_var(meta);
> >  
> >  	struct netrx_pending_operations npo = {
> >  		.copy  = gco,
> > @@ -512,8 +513,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
> >  
> >  	if (!npo.copy_prod) {
> > -		put_cpu_ptr(gco);
> > -		put_cpu_ptr(m);
> > +		put_cpu_var(grant_copy_op);
> > +		put_cpu_var(meta);
> >  		return;
> >  	}
> >  
> > @@ -599,8 +600,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  	if (!skb_queue_empty(&vif->rx_queue))
> >  		xenvif_kick_thread(vif);
> >  
> > -	put_cpu_ptr(gco);
> > -	put_cpu_ptr(m);
> > +	put_cpu_var(grant_copy_op);
> > +	put_cpu_var(meta);
> >  }
> >  
> >  void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
> > @@ -1287,12 +1288,12 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
> >  	if (unlikely(!tx_work_todo(vif)))
> >  		return 0;
> >  
> > -	tco = get_cpu_ptr(tx_copy_ops);
> > +	tco = get_cpu_var(tx_copy_ops);
> >  
> >  	nr_gops = xenvif_tx_build_gops(vif, tco);
> >  
> >  	if (nr_gops == 0) {
> > -		put_cpu_ptr(tco);
> > +		put_cpu_var(tx_copy_ops);
> >  		return 0;
> >  	}
> >  
> > @@ -1301,7 +1302,7 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
> >  
> >  	work_done = xenvif_tx_submit(vif, tco, budget);
> >  
> > -	put_cpu_ptr(tco);
> > +	put_cpu_var(tx_copy_ops);
> >  
> >  	return work_done;
> >  }
> > @@ -1452,31 +1453,97 @@ int xenvif_kthread(void *data)
> >  	return 0;
> >  }
> >  
> > +static int __create_percpu_scratch_space(unsigned int cpu)
> > +{
> > +	per_cpu(tx_copy_ops, cpu) =
> > +		vzalloc(sizeof(struct gnttab_copy) * MAX_PENDING_REQS);
> > +
> > +	per_cpu(grant_copy_op, cpu) =
> > +		vzalloc(sizeof(struct gnttab_copy)
> > +			* 2 * XEN_NETIF_RX_RING_SIZE);
> > +
> > +	per_cpu(meta, cpu) = vzalloc(sizeof(struct xenvif_rx_meta)
> > +				     * 2 * XEN_NETIF_RX_RING_SIZE);
> > +
> > +	if (!per_cpu(tx_copy_ops, cpu) ||
> > +	    !per_cpu(grant_copy_op, cpu) ||
> > +	    !per_cpu(meta, cpu))
> 
> Hm, shouldn't you vfree one at least them? It might be that just one of
> them failed.
> 

The caller will clean up.

> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +static void __free_percpu_scratch_space(unsigned int cpu)
> > +{
> > +	/* freeing NULL pointer is legit */
> > +	vfree(per_cpu(tx_copy_ops, cpu));
> > +	vfree(per_cpu(grant_copy_op, cpu));
> > +	vfree(per_cpu(meta, cpu));
> > +}
> > +
> > +static int __netback_percpu_callback(struct notifier_block *nfb,
> > +				     unsigned long action, void *hcpu)
> > +{
> > +	unsigned int cpu = (unsigned long)hcpu;
> > +	int rc = NOTIFY_DONE;
> > +
> > +	switch (action) {
> > +	case CPU_ONLINE:
> > +	case CPU_ONLINE_FROZEN:
> > +		printk(KERN_INFO
> > +		       "netback: CPU %x online, creating scratch space\n", cpu);
> 
> Is there any way to use 'pr_info(DRV_NAME' for these printk's? It might
> require another patch, but it would  make it nicer.
> 

Hmm... will look into that.

> > +		rc = __create_percpu_scratch_space(cpu);
> > +		if (rc) {
> > +			printk(KERN_ALERT
> > +			       "netback: failed to create scratch space for CPU"
> > +			       " %x\n", cpu);
> > +			/* FIXME: nothing more we can do here, we will
> > +			 * print out warning message when thread or
> > +			 * NAPI runs on this cpu. Also stop getting
> > +			 * called in the future.
> > +			 */
> > +			__free_percpu_scratch_space(cpu);
> > +			rc = NOTIFY_BAD;
> > +		} else {
> > +			rc = NOTIFY_OK;
> > +		}
> > +		break;
> > +	case CPU_DEAD:
> > +	case CPU_DEAD_FROZEN:
> > +		printk("netback: CPU %x offline, destroying scratch space\n",
> > +		       cpu);
> 
> pr_debug?
> 
> > +		__free_percpu_scratch_space(cpu);
> > +		rc = NOTIFY_OK;
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +static struct notifier_block netback_notifier_block = {
> > +	.notifier_call = __netback_percpu_callback,
> > +};
> >  
> >  static int __init netback_init(void)
> >  {
> >  	int rc = -ENOMEM;
> > +	int cpu;
> >  
> >  	if (!xen_domain())
> >  		return -ENODEV;
> >  
> > -	tx_copy_ops = __alloc_percpu(sizeof(struct gnttab_copy)
> > -				     * MAX_PENDING_REQS,
> > -				     __alignof__(struct gnttab_copy));
> > -	if (!tx_copy_ops)
> > -		goto failed_init;
> > +	/* Don't need to disable preempt here, since nobody else will
> > +	 * touch these percpu areas during start up. */
> > +	for_each_online_cpu(cpu) {
> > +		rc = __create_percpu_scratch_space(cpu);
> >  
> > -	grant_copy_op = __alloc_percpu(sizeof(struct gnttab_copy)
> > -				       * 2 * XEN_NETIF_RX_RING_SIZE,
> > -				       __alignof__(struct gnttab_copy));
> > -	if (!grant_copy_op)
> > -		goto failed_init_gco;
> > +		if (rc)
> > +			goto failed_init;
> > +	}
> >  
> > -	meta = __alloc_percpu(sizeof(struct xenvif_rx_meta)
> > -			      * 2 * XEN_NETIF_RX_RING_SIZE,
> > -			      __alignof__(struct xenvif_rx_meta));
> > -	if (!meta)
> > -		goto failed_init_meta;
> > +	register_hotcpu_notifier(&netback_notifier_block);
> >  
> >  	rc = page_pool_init();
> >  	if (rc)
> > @@ -1491,25 +1558,26 @@ static int __init netback_init(void)
> >  failed_init_xenbus:
> >  	page_pool_destroy();
> >  failed_init_pool:
> > -	free_percpu(meta);
> > -failed_init_meta:
> > -	free_percpu(grant_copy_op);
> > -failed_init_gco:
> > -	free_percpu(tx_copy_ops);
> > +	for_each_online_cpu(cpu)
> > +		__free_percpu_scratch_space(cpu);
> >  failed_init:
> >  	return rc;
> 
> We don't want to try to clean up the per_cpu_spaces that might
> have gotten allocated in the loop?
> 

Good catch, thanks.


Wei.

> > -
> >  }
> >  
> >  module_init(netback_init);
> >  
> >  static void __exit netback_exit(void)
> >  {
> > +	int cpu;
> > +
> >  	xenvif_xenbus_exit();
> >  	page_pool_destroy();
> > -	free_percpu(meta);
> > -	free_percpu(grant_copy_op);
> > -	free_percpu(tx_copy_ops);
> > +
> > +	unregister_hotcpu_notifier(&netback_notifier_block);
> > +
> > +	/* Since we're here, nobody else will touch per-cpu area. */
> > +	for_each_online_cpu(cpu)
> > +		__free_percpu_scratch_space(cpu);
> >  }
> >  module_exit(netback_exit);
> >  
> > -- 
> > 1.7.2.5

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space.
@ 2012-01-31 10:48       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:48 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell

On Mon, 2012-01-30 at 21:53 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 30, 2012 at 02:45:28PM +0000, Wei Liu wrote:
> > If we allocate large arrays in per-cpu section, multi-page ring
> > feature is likely to blow up the per-cpu section. So avoid allocating
> > large arrays, instead we only store pointers to scratch spaces in
> > per-cpu section.
> > 
> > CPU hotplug event is also taken care of.
> > 
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netback/netback.c |  140 +++++++++++++++++++++++++++----------
> >  1 files changed, 104 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index a8d58a9..2ac9b84 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -39,6 +39,7 @@
> >  #include <linux/kthread.h>
> >  #include <linux/if_vlan.h>
> >  #include <linux/udp.h>
> > +#include <linux/cpu.h>
> >  
> >  #include <net/tcp.h>
> >  
> > @@ -49,15 +50,15 @@
> >  #include <asm/xen/page.h>
> >  
> >  
> > -struct gnttab_copy *tx_copy_ops;
> > +DEFINE_PER_CPU(struct gnttab_copy *, tx_copy_ops);
> >  
> >  /*
> >   * Given MAX_BUFFER_OFFSET of 4096 the worst case is that each
> >   * head/fragment page uses 2 copy operations because it
> >   * straddles two buffers in the frontend.
> >   */
> > -struct gnttab_copy *grant_copy_op;
> > -struct xenvif_rx_meta *meta;
> > +DEFINE_PER_CPU(struct gnttab_copy *, grant_copy_op);
> > +DEFINE_PER_CPU(struct xenvif_rx_meta *, meta);
> >  
> >  static void xenvif_idx_release(struct xenvif *vif, u16 pending_idx);
> >  static void make_tx_response(struct xenvif *vif,
> > @@ -481,8 +482,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  	struct skb_cb_overlay *sco;
> >  	int need_to_notify = 0;
> >  
> > -	struct gnttab_copy *gco = get_cpu_ptr(grant_copy_op);
> > -	struct xenvif_rx_meta *m = get_cpu_ptr(meta);
> > +	struct gnttab_copy *gco = get_cpu_var(grant_copy_op);
> > +	struct xenvif_rx_meta *m = get_cpu_var(meta);
> >  
> >  	struct netrx_pending_operations npo = {
> >  		.copy  = gco,
> > @@ -512,8 +513,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  	BUG_ON(npo.meta_prod > MAX_PENDING_REQS);
> >  
> >  	if (!npo.copy_prod) {
> > -		put_cpu_ptr(gco);
> > -		put_cpu_ptr(m);
> > +		put_cpu_var(grant_copy_op);
> > +		put_cpu_var(meta);
> >  		return;
> >  	}
> >  
> > @@ -599,8 +600,8 @@ void xenvif_rx_action(struct xenvif *vif)
> >  	if (!skb_queue_empty(&vif->rx_queue))
> >  		xenvif_kick_thread(vif);
> >  
> > -	put_cpu_ptr(gco);
> > -	put_cpu_ptr(m);
> > +	put_cpu_var(grant_copy_op);
> > +	put_cpu_var(meta);
> >  }
> >  
> >  void xenvif_queue_tx_skb(struct xenvif *vif, struct sk_buff *skb)
> > @@ -1287,12 +1288,12 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
> >  	if (unlikely(!tx_work_todo(vif)))
> >  		return 0;
> >  
> > -	tco = get_cpu_ptr(tx_copy_ops);
> > +	tco = get_cpu_var(tx_copy_ops);
> >  
> >  	nr_gops = xenvif_tx_build_gops(vif, tco);
> >  
> >  	if (nr_gops == 0) {
> > -		put_cpu_ptr(tco);
> > +		put_cpu_var(tx_copy_ops);
> >  		return 0;
> >  	}
> >  
> > @@ -1301,7 +1302,7 @@ int xenvif_tx_action(struct xenvif *vif, int budget)
> >  
> >  	work_done = xenvif_tx_submit(vif, tco, budget);
> >  
> > -	put_cpu_ptr(tco);
> > +	put_cpu_var(tx_copy_ops);
> >  
> >  	return work_done;
> >  }
> > @@ -1452,31 +1453,97 @@ int xenvif_kthread(void *data)
> >  	return 0;
> >  }
> >  
> > +static int __create_percpu_scratch_space(unsigned int cpu)
> > +{
> > +	per_cpu(tx_copy_ops, cpu) =
> > +		vzalloc(sizeof(struct gnttab_copy) * MAX_PENDING_REQS);
> > +
> > +	per_cpu(grant_copy_op, cpu) =
> > +		vzalloc(sizeof(struct gnttab_copy)
> > +			* 2 * XEN_NETIF_RX_RING_SIZE);
> > +
> > +	per_cpu(meta, cpu) = vzalloc(sizeof(struct xenvif_rx_meta)
> > +				     * 2 * XEN_NETIF_RX_RING_SIZE);
> > +
> > +	if (!per_cpu(tx_copy_ops, cpu) ||
> > +	    !per_cpu(grant_copy_op, cpu) ||
> > +	    !per_cpu(meta, cpu))
> 
> Hm, shouldn't you vfree one at least them? It might be that just one of
> them failed.
> 

The caller will clean up.

> > +		return -ENOMEM;
> > +
> > +	return 0;
> > +}
> > +
> > +static void __free_percpu_scratch_space(unsigned int cpu)
> > +{
> > +	/* freeing NULL pointer is legit */
> > +	vfree(per_cpu(tx_copy_ops, cpu));
> > +	vfree(per_cpu(grant_copy_op, cpu));
> > +	vfree(per_cpu(meta, cpu));
> > +}
> > +
> > +static int __netback_percpu_callback(struct notifier_block *nfb,
> > +				     unsigned long action, void *hcpu)
> > +{
> > +	unsigned int cpu = (unsigned long)hcpu;
> > +	int rc = NOTIFY_DONE;
> > +
> > +	switch (action) {
> > +	case CPU_ONLINE:
> > +	case CPU_ONLINE_FROZEN:
> > +		printk(KERN_INFO
> > +		       "netback: CPU %x online, creating scratch space\n", cpu);
> 
> Is there any way to use 'pr_info(DRV_NAME' for these printk's? It might
> require another patch, but it would  make it nicer.
> 

Hmm... will look into that.

> > +		rc = __create_percpu_scratch_space(cpu);
> > +		if (rc) {
> > +			printk(KERN_ALERT
> > +			       "netback: failed to create scratch space for CPU"
> > +			       " %x\n", cpu);
> > +			/* FIXME: nothing more we can do here, we will
> > +			 * print out warning message when thread or
> > +			 * NAPI runs on this cpu. Also stop getting
> > +			 * called in the future.
> > +			 */
> > +			__free_percpu_scratch_space(cpu);
> > +			rc = NOTIFY_BAD;
> > +		} else {
> > +			rc = NOTIFY_OK;
> > +		}
> > +		break;
> > +	case CPU_DEAD:
> > +	case CPU_DEAD_FROZEN:
> > +		printk("netback: CPU %x offline, destroying scratch space\n",
> > +		       cpu);
> 
> pr_debug?
> 
> > +		__free_percpu_scratch_space(cpu);
> > +		rc = NOTIFY_OK;
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +static struct notifier_block netback_notifier_block = {
> > +	.notifier_call = __netback_percpu_callback,
> > +};
> >  
> >  static int __init netback_init(void)
> >  {
> >  	int rc = -ENOMEM;
> > +	int cpu;
> >  
> >  	if (!xen_domain())
> >  		return -ENODEV;
> >  
> > -	tx_copy_ops = __alloc_percpu(sizeof(struct gnttab_copy)
> > -				     * MAX_PENDING_REQS,
> > -				     __alignof__(struct gnttab_copy));
> > -	if (!tx_copy_ops)
> > -		goto failed_init;
> > +	/* Don't need to disable preempt here, since nobody else will
> > +	 * touch these percpu areas during start up. */
> > +	for_each_online_cpu(cpu) {
> > +		rc = __create_percpu_scratch_space(cpu);
> >  
> > -	grant_copy_op = __alloc_percpu(sizeof(struct gnttab_copy)
> > -				       * 2 * XEN_NETIF_RX_RING_SIZE,
> > -				       __alignof__(struct gnttab_copy));
> > -	if (!grant_copy_op)
> > -		goto failed_init_gco;
> > +		if (rc)
> > +			goto failed_init;
> > +	}
> >  
> > -	meta = __alloc_percpu(sizeof(struct xenvif_rx_meta)
> > -			      * 2 * XEN_NETIF_RX_RING_SIZE,
> > -			      __alignof__(struct xenvif_rx_meta));
> > -	if (!meta)
> > -		goto failed_init_meta;
> > +	register_hotcpu_notifier(&netback_notifier_block);
> >  
> >  	rc = page_pool_init();
> >  	if (rc)
> > @@ -1491,25 +1558,26 @@ static int __init netback_init(void)
> >  failed_init_xenbus:
> >  	page_pool_destroy();
> >  failed_init_pool:
> > -	free_percpu(meta);
> > -failed_init_meta:
> > -	free_percpu(grant_copy_op);
> > -failed_init_gco:
> > -	free_percpu(tx_copy_ops);
> > +	for_each_online_cpu(cpu)
> > +		__free_percpu_scratch_space(cpu);
> >  failed_init:
> >  	return rc;
> 
> We don't want to try to clean up the per_cpu_spaces that might
> have gotten allocated in the loop?
> 

Good catch, thanks.


Wei.

> > -
> >  }
> >  
> >  module_init(netback_init);
> >  
> >  static void __exit netback_exit(void)
> >  {
> > +	int cpu;
> > +
> >  	xenvif_xenbus_exit();
> >  	page_pool_destroy();
> > -	free_percpu(meta);
> > -	free_percpu(grant_copy_op);
> > -	free_percpu(tx_copy_ops);
> > +
> > +	unregister_hotcpu_notifier(&netback_notifier_block);
> > +
> > +	/* Since we're here, nobody else will touch per-cpu area. */
> > +	for_each_online_cpu(cpu)
> > +		__free_percpu_scratch_space(cpu);
> >  }
> >  module_exit(netback_exit);
> >  
> > -- 
> > 1.7.2.5

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-30 21:39   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-01-31 10:58       ` Wei Liu
  2012-01-31  9:53       ` Jan Beulich
  2012-01-31 10:58       ` Wei Liu
  2 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell

On Mon, 2012-01-30 at 21:39 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 30, 2012 at 02:45:33PM +0000, Wei Liu wrote:
> > Use DMA API to allocate ring pages, because we need to get machine
> > contiginous memory.
> 
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |  258 ++++++++++++++++++++++++++++++++------------
> >  1 files changed, 187 insertions(+), 71 deletions(-)
> >
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 01f589d..32ec212 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -66,9 +66,18 @@ struct netfront_cb {
> >
> >  #define GRANT_INVALID_REF    0
> >
> > -#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
> > -#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
> > -#define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
> > +#define XENNET_MAX_RING_PAGE_ORDER 2
> > +#define XENNET_MAX_RING_PAGES      (1U << XENNET_MAX_RING_PAGE_ORDER)
> > +
> > +#define NET_TX_RING_SIZE(_nr_pages)                                  \
> > +     __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE * (_nr_pages))
> > +#define NET_RX_RING_SIZE(_nr_pages)                                  \
> > +     __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE * (_nr_pages))
> > +
> > +#define XENNET_MAX_TX_RING_SIZE NET_TX_RING_SIZE(XENNET_MAX_RING_PAGES)
> > +#define XENNET_MAX_RX_RING_SIZE NET_RX_RING_SIZE(XENNET_MAX_RING_PAGES)
> > +
> > +#define TX_MAX_TARGET XENNET_MAX_TX_RING_SIZE
> >
> >  struct netfront_stats {
> >       u64                     rx_packets;
> > @@ -84,12 +93,20 @@ struct netfront_info {
> >
> >       struct napi_struct napi;
> >
> > +     /* Statistics */
> > +     struct netfront_stats __percpu *stats;
> > +
> > +     unsigned long rx_gso_checksum_fixup;
> > +
> >       unsigned int evtchn;
> >       struct xenbus_device *xbdev;
> >
> >       spinlock_t   tx_lock;
> >       struct xen_netif_tx_front_ring tx;
> > -     int tx_ring_ref;
> > +     dma_addr_t tx_ring_dma_handle;
> > +     int tx_ring_ref[XENNET_MAX_RING_PAGES];
> > +     int tx_ring_page_order;
> > +     int tx_ring_pages;
> >
> >       /*
> >        * {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries
> > @@ -103,36 +120,34 @@ struct netfront_info {
> >       union skb_entry {
> >               struct sk_buff *skb;
> >               unsigned long link;
> > -     } tx_skbs[NET_TX_RING_SIZE];
> > +     } tx_skbs[XENNET_MAX_TX_RING_SIZE];
> >       grant_ref_t gref_tx_head;
> > -     grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
> > +     grant_ref_t grant_tx_ref[XENNET_MAX_TX_RING_SIZE];
> >       unsigned tx_skb_freelist;
> >
> >       spinlock_t   rx_lock ____cacheline_aligned_in_smp;
> >       struct xen_netif_rx_front_ring rx;
> > -     int rx_ring_ref;
> > +     dma_addr_t rx_ring_dma_handle;
> > +     int rx_ring_ref[XENNET_MAX_RING_PAGES];
> > +     int rx_ring_page_order;
> > +     int rx_ring_pages;
> >
> >       /* Receive-ring batched refills. */
> >  #define RX_MIN_TARGET 8
> >  #define RX_DFL_MIN_TARGET 64
> > -#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
> > +#define RX_MAX_TARGET XENNET_MAX_RX_RING_SIZE
> >       unsigned rx_min_target, rx_max_target, rx_target;
> >       struct sk_buff_head rx_batch;
> >
> >       struct timer_list rx_refill_timer;
> >
> > -     struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
> > +     struct sk_buff *rx_skbs[XENNET_MAX_RX_RING_SIZE];
> >       grant_ref_t gref_rx_head;
> > -     grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
> > -
> > -     unsigned long rx_pfn_array[NET_RX_RING_SIZE];
> > -     struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
> > -     struct mmu_update rx_mmu[NET_RX_RING_SIZE];
> > -
> > -     /* Statistics */
> > -     struct netfront_stats __percpu *stats;
> > +     grant_ref_t grant_rx_ref[XENNET_MAX_RX_RING_SIZE];
> >
> > -     unsigned long rx_gso_checksum_fixup;
> > +     unsigned long rx_pfn_array[XENNET_MAX_RX_RING_SIZE];
> > +     struct multicall_entry rx_mcl[XENNET_MAX_RX_RING_SIZE+1];
> > +     struct mmu_update rx_mmu[XENNET_MAX_RX_RING_SIZE];
> >  };
> >
> >  struct netfront_rx_info {
> > @@ -170,15 +185,15 @@ static unsigned short get_id_from_freelist(unsigned *head,
> >       return id;
> >  }
> >
> > -static int xennet_rxidx(RING_IDX idx)
> > +static int xennet_rxidx(RING_IDX idx, struct netfront_info *info)
> >  {
> > -     return idx & (NET_RX_RING_SIZE - 1);
> > +     return idx & (NET_RX_RING_SIZE(info->rx_ring_pages) - 1);
> >  }
> >
> >  static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
> >                                        RING_IDX ri)
> >  {
> > -     int i = xennet_rxidx(ri);
> > +     int i = xennet_rxidx(ri, np);
> >       struct sk_buff *skb = np->rx_skbs[i];
> >       np->rx_skbs[i] = NULL;
> >       return skb;
> > @@ -187,7 +202,7 @@ static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
> >  static grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
> >                                           RING_IDX ri)
> >  {
> > -     int i = xennet_rxidx(ri);
> > +     int i = xennet_rxidx(ri, np);
> >       grant_ref_t ref = np->grant_rx_ref[i];
> >       np->grant_rx_ref[i] = GRANT_INVALID_REF;
> >       return ref;
> > @@ -300,7 +315,7 @@ no_skb:
> >
> >               skb->dev = dev;
> >
> > -             id = xennet_rxidx(req_prod + i);
> > +             id = xennet_rxidx(req_prod + i, np);
> >
> >               BUG_ON(np->rx_skbs[id]);
> >               np->rx_skbs[id] = skb;
> > @@ -596,7 +611,7 @@ static int xennet_close(struct net_device *dev)
> >  static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
> >                               grant_ref_t ref)
> >  {
> > -     int new = xennet_rxidx(np->rx.req_prod_pvt);
> > +     int new = xennet_rxidx(np->rx.req_prod_pvt, np);
> >
> >       BUG_ON(np->rx_skbs[new]);
> >       np->rx_skbs[new] = skb;
> > @@ -1089,7 +1104,7 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
> >       struct sk_buff *skb;
> >       int i;
> >
> > -     for (i = 0; i < NET_TX_RING_SIZE; i++) {
> > +     for (i = 0; i < NET_TX_RING_SIZE(np->tx_ring_pages); i++) {
> >               /* Skip over entries which are actually freelist references */
> >               if (skb_entry_is_link(&np->tx_skbs[i]))
> >                       continue;
> > @@ -1123,7 +1138,7 @@ static void xennet_release_rx_bufs(struct netfront_info *np)
> >
> >       spin_lock_bh(&np->rx_lock);
> >
> > -     for (id = 0; id < NET_RX_RING_SIZE; id++) {
> > +     for (id = 0; id < NET_RX_RING_SIZE(np->rx_ring_pages); id++) {
> >               ref = np->grant_rx_ref[id];
> >               if (ref == GRANT_INVALID_REF) {
> >                       unused++;
> > @@ -1305,13 +1320,13 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev
> >
> >       /* Initialise tx_skbs as a free chain containing every entry. */
> >       np->tx_skb_freelist = 0;
> > -     for (i = 0; i < NET_TX_RING_SIZE; i++) {
> > +     for (i = 0; i < XENNET_MAX_TX_RING_SIZE; i++) {
> >               skb_entry_set_link(&np->tx_skbs[i], i+1);
> >               np->grant_tx_ref[i] = GRANT_INVALID_REF;
> >       }
> >
> >       /* Clear out rx_skbs */
> > -     for (i = 0; i < NET_RX_RING_SIZE; i++) {
> > +     for (i = 0; i < XENNET_MAX_RX_RING_SIZE; i++) {
> >               np->rx_skbs[i] = NULL;
> >               np->grant_rx_ref[i] = GRANT_INVALID_REF;
> >       }
> > @@ -1409,15 +1424,11 @@ static int __devinit netfront_probe(struct xenbus_device *dev,
> >       return err;
> >  }
> >
> > -static void xennet_end_access(int ref, void *page)
> > -{
> > -     /* This frees the page as a side-effect */
> > -     if (ref != GRANT_INVALID_REF)
> > -             gnttab_end_foreign_access(ref, 0, (unsigned long)page);
> > -}
> > -
> >  static void xennet_disconnect_backend(struct netfront_info *info)
> >  {
> > +     int i;
> > +     struct xenbus_device *dev = info->xbdev;
> > +
> >       /* Stop old i/f to prevent errors whilst we rebuild the state. */
> >       spin_lock_bh(&info->rx_lock);
> >       spin_lock_irq(&info->tx_lock);
> > @@ -1429,12 +1440,24 @@ static void xennet_disconnect_backend(struct netfront_info *info)
> >               unbind_from_irqhandler(info->netdev->irq, info->netdev);
> >       info->evtchn = info->netdev->irq = 0;
> >
> > -     /* End access and free the pages */
> > -     xennet_end_access(info->tx_ring_ref, info->tx.sring);
> > -     xennet_end_access(info->rx_ring_ref, info->rx.sring);
> > +     for (i = 0; i < info->tx_ring_pages; i++) {
> > +             int ref = info->tx_ring_ref[i];
> > +             gnttab_end_foreign_access_ref(ref, 0);
> > +             info->tx_ring_ref[i] = GRANT_INVALID_REF;
> > +     }
> > +     dma_free_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > +                       (void *)info->tx.sring,
> > +                       info->tx_ring_dma_handle);
> > +
> > +     for (i = 0; i < info->rx_ring_pages; i++) {
> > +             int ref = info->rx_ring_ref[i];
> > +             gnttab_end_foreign_access_ref(ref, 0);
> > +             info->rx_ring_ref[i] = GRANT_INVALID_REF;
> > +     }
> > +     dma_free_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
> > +                       (void *)info->rx.sring,
> > +                       info->rx_ring_dma_handle);
> >
> > -     info->tx_ring_ref = GRANT_INVALID_REF;
> > -     info->rx_ring_ref = GRANT_INVALID_REF;
> >       info->tx.sring = NULL;
> >       info->rx.sring = NULL;
> >  }
> > @@ -1483,9 +1506,13 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> >       struct xen_netif_rx_sring *rxs;
> >       int err;
> >       struct net_device *netdev = info->netdev;
> > +     unsigned int max_tx_ring_page_order, max_rx_ring_page_order;
> > +     int i, j;
> >
> > -     info->tx_ring_ref = GRANT_INVALID_REF;
> > -     info->rx_ring_ref = GRANT_INVALID_REF;
> > +     for (i = 0; i < XENNET_MAX_RING_PAGES; i++) {
> > +             info->tx_ring_ref[i] = GRANT_INVALID_REF;
> > +             info->rx_ring_ref[i] = GRANT_INVALID_REF;
> > +     }
> >       info->rx.sring = NULL;
> >       info->tx.sring = NULL;
> >       netdev->irq = 0;
> > @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> >               goto fail;
> >       }
> >
> > -     txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> > +     err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> > +                        "max-tx-ring-page-order", "%u",
> > +                        &max_tx_ring_page_order);
> > +     if (err < 0) {
> > +             info->tx_ring_page_order = 0;
> > +             dev_info(&dev->dev, "single tx ring\n");
> > +     } else {
> > +             info->tx_ring_page_order = max_tx_ring_page_order;
> > +             dev_info(&dev->dev, "multi page tx ring, order = %d\n",
> > +                      max_tx_ring_page_order);
> > +     }
> > +     info->tx_ring_pages = (1U << info->tx_ring_page_order);
> > +
> > +     txs = (struct xen_netif_tx_sring *)
> > +             dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > +                                &info->tx_ring_dma_handle,
> > +                                __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> 
> Hm, so I see you are using 'NULL' which is a big nono (the API docs say that).
> But the other reason why it is a no-no, is b/c this way the generic DMA engine has no
> clue whether you are OK getting pages under 4GB or above it (so 64-bit support).
> 
> If you don't supply a 'dev' it will assume 4GB. But when you are run this as a
> pure PV guest that won't matter the slighest b/I there are no DMA code in action
> (well, there is dma_alloc_coherent - which looking at the code would NULL it seems).
> 
> Anyhow, if you get to have more than 4GB in the guest or do PCI passthrough and use
> 'iommu=soft'- at which point the Xen SWIOTLB will kick and you will end up 'swizzling'
> the pages to be under 4GB. That can be fixed if you declerae a 'fake' device where you set
> the coherent_dma_mask to DMA_BIT_MASK(64).
> 

This seems to be a reasonable solution. I could not set netfront's DMA
mask, that's why I used NULL device. And, how do I create a 'fake'
device?

> But if you boot the guest under HVM, then it will use the generic SWIOTLB code, which
> won't guaranteeing the pages to be "machine" contingous but will be "guest machine"
> contingous. Is that sufficient for this?
> 

For HVM, this is sufficient.

> How did you test this? Did you supply iommu=soft  to your guest or booted it
> with more than 4GB?
> 

I haven't tested guest with more than 4GB RAM.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
@ 2012-01-31 10:58       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 10:58 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell

On Mon, 2012-01-30 at 21:39 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 30, 2012 at 02:45:33PM +0000, Wei Liu wrote:
> > Use DMA API to allocate ring pages, because we need to get machine
> > contiginous memory.
> 
> >
> > Signed-off-by: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  drivers/net/xen-netfront.c |  258 ++++++++++++++++++++++++++++++++------------
> >  1 files changed, 187 insertions(+), 71 deletions(-)
> >
> > diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> > index 01f589d..32ec212 100644
> > --- a/drivers/net/xen-netfront.c
> > +++ b/drivers/net/xen-netfront.c
> > @@ -66,9 +66,18 @@ struct netfront_cb {
> >
> >  #define GRANT_INVALID_REF    0
> >
> > -#define NET_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE)
> > -#define NET_RX_RING_SIZE __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE)
> > -#define TX_MAX_TARGET min_t(int, NET_TX_RING_SIZE, 256)
> > +#define XENNET_MAX_RING_PAGE_ORDER 2
> > +#define XENNET_MAX_RING_PAGES      (1U << XENNET_MAX_RING_PAGE_ORDER)
> > +
> > +#define NET_TX_RING_SIZE(_nr_pages)                                  \
> > +     __CONST_RING_SIZE(xen_netif_tx, PAGE_SIZE * (_nr_pages))
> > +#define NET_RX_RING_SIZE(_nr_pages)                                  \
> > +     __CONST_RING_SIZE(xen_netif_rx, PAGE_SIZE * (_nr_pages))
> > +
> > +#define XENNET_MAX_TX_RING_SIZE NET_TX_RING_SIZE(XENNET_MAX_RING_PAGES)
> > +#define XENNET_MAX_RX_RING_SIZE NET_RX_RING_SIZE(XENNET_MAX_RING_PAGES)
> > +
> > +#define TX_MAX_TARGET XENNET_MAX_TX_RING_SIZE
> >
> >  struct netfront_stats {
> >       u64                     rx_packets;
> > @@ -84,12 +93,20 @@ struct netfront_info {
> >
> >       struct napi_struct napi;
> >
> > +     /* Statistics */
> > +     struct netfront_stats __percpu *stats;
> > +
> > +     unsigned long rx_gso_checksum_fixup;
> > +
> >       unsigned int evtchn;
> >       struct xenbus_device *xbdev;
> >
> >       spinlock_t   tx_lock;
> >       struct xen_netif_tx_front_ring tx;
> > -     int tx_ring_ref;
> > +     dma_addr_t tx_ring_dma_handle;
> > +     int tx_ring_ref[XENNET_MAX_RING_PAGES];
> > +     int tx_ring_page_order;
> > +     int tx_ring_pages;
> >
> >       /*
> >        * {tx,rx}_skbs store outstanding skbuffs. Free tx_skb entries
> > @@ -103,36 +120,34 @@ struct netfront_info {
> >       union skb_entry {
> >               struct sk_buff *skb;
> >               unsigned long link;
> > -     } tx_skbs[NET_TX_RING_SIZE];
> > +     } tx_skbs[XENNET_MAX_TX_RING_SIZE];
> >       grant_ref_t gref_tx_head;
> > -     grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
> > +     grant_ref_t grant_tx_ref[XENNET_MAX_TX_RING_SIZE];
> >       unsigned tx_skb_freelist;
> >
> >       spinlock_t   rx_lock ____cacheline_aligned_in_smp;
> >       struct xen_netif_rx_front_ring rx;
> > -     int rx_ring_ref;
> > +     dma_addr_t rx_ring_dma_handle;
> > +     int rx_ring_ref[XENNET_MAX_RING_PAGES];
> > +     int rx_ring_page_order;
> > +     int rx_ring_pages;
> >
> >       /* Receive-ring batched refills. */
> >  #define RX_MIN_TARGET 8
> >  #define RX_DFL_MIN_TARGET 64
> > -#define RX_MAX_TARGET min_t(int, NET_RX_RING_SIZE, 256)
> > +#define RX_MAX_TARGET XENNET_MAX_RX_RING_SIZE
> >       unsigned rx_min_target, rx_max_target, rx_target;
> >       struct sk_buff_head rx_batch;
> >
> >       struct timer_list rx_refill_timer;
> >
> > -     struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
> > +     struct sk_buff *rx_skbs[XENNET_MAX_RX_RING_SIZE];
> >       grant_ref_t gref_rx_head;
> > -     grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
> > -
> > -     unsigned long rx_pfn_array[NET_RX_RING_SIZE];
> > -     struct multicall_entry rx_mcl[NET_RX_RING_SIZE+1];
> > -     struct mmu_update rx_mmu[NET_RX_RING_SIZE];
> > -
> > -     /* Statistics */
> > -     struct netfront_stats __percpu *stats;
> > +     grant_ref_t grant_rx_ref[XENNET_MAX_RX_RING_SIZE];
> >
> > -     unsigned long rx_gso_checksum_fixup;
> > +     unsigned long rx_pfn_array[XENNET_MAX_RX_RING_SIZE];
> > +     struct multicall_entry rx_mcl[XENNET_MAX_RX_RING_SIZE+1];
> > +     struct mmu_update rx_mmu[XENNET_MAX_RX_RING_SIZE];
> >  };
> >
> >  struct netfront_rx_info {
> > @@ -170,15 +185,15 @@ static unsigned short get_id_from_freelist(unsigned *head,
> >       return id;
> >  }
> >
> > -static int xennet_rxidx(RING_IDX idx)
> > +static int xennet_rxidx(RING_IDX idx, struct netfront_info *info)
> >  {
> > -     return idx & (NET_RX_RING_SIZE - 1);
> > +     return idx & (NET_RX_RING_SIZE(info->rx_ring_pages) - 1);
> >  }
> >
> >  static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
> >                                        RING_IDX ri)
> >  {
> > -     int i = xennet_rxidx(ri);
> > +     int i = xennet_rxidx(ri, np);
> >       struct sk_buff *skb = np->rx_skbs[i];
> >       np->rx_skbs[i] = NULL;
> >       return skb;
> > @@ -187,7 +202,7 @@ static struct sk_buff *xennet_get_rx_skb(struct netfront_info *np,
> >  static grant_ref_t xennet_get_rx_ref(struct netfront_info *np,
> >                                           RING_IDX ri)
> >  {
> > -     int i = xennet_rxidx(ri);
> > +     int i = xennet_rxidx(ri, np);
> >       grant_ref_t ref = np->grant_rx_ref[i];
> >       np->grant_rx_ref[i] = GRANT_INVALID_REF;
> >       return ref;
> > @@ -300,7 +315,7 @@ no_skb:
> >
> >               skb->dev = dev;
> >
> > -             id = xennet_rxidx(req_prod + i);
> > +             id = xennet_rxidx(req_prod + i, np);
> >
> >               BUG_ON(np->rx_skbs[id]);
> >               np->rx_skbs[id] = skb;
> > @@ -596,7 +611,7 @@ static int xennet_close(struct net_device *dev)
> >  static void xennet_move_rx_slot(struct netfront_info *np, struct sk_buff *skb,
> >                               grant_ref_t ref)
> >  {
> > -     int new = xennet_rxidx(np->rx.req_prod_pvt);
> > +     int new = xennet_rxidx(np->rx.req_prod_pvt, np);
> >
> >       BUG_ON(np->rx_skbs[new]);
> >       np->rx_skbs[new] = skb;
> > @@ -1089,7 +1104,7 @@ static void xennet_release_tx_bufs(struct netfront_info *np)
> >       struct sk_buff *skb;
> >       int i;
> >
> > -     for (i = 0; i < NET_TX_RING_SIZE; i++) {
> > +     for (i = 0; i < NET_TX_RING_SIZE(np->tx_ring_pages); i++) {
> >               /* Skip over entries which are actually freelist references */
> >               if (skb_entry_is_link(&np->tx_skbs[i]))
> >                       continue;
> > @@ -1123,7 +1138,7 @@ static void xennet_release_rx_bufs(struct netfront_info *np)
> >
> >       spin_lock_bh(&np->rx_lock);
> >
> > -     for (id = 0; id < NET_RX_RING_SIZE; id++) {
> > +     for (id = 0; id < NET_RX_RING_SIZE(np->rx_ring_pages); id++) {
> >               ref = np->grant_rx_ref[id];
> >               if (ref == GRANT_INVALID_REF) {
> >                       unused++;
> > @@ -1305,13 +1320,13 @@ static struct net_device * __devinit xennet_create_dev(struct xenbus_device *dev
> >
> >       /* Initialise tx_skbs as a free chain containing every entry. */
> >       np->tx_skb_freelist = 0;
> > -     for (i = 0; i < NET_TX_RING_SIZE; i++) {
> > +     for (i = 0; i < XENNET_MAX_TX_RING_SIZE; i++) {
> >               skb_entry_set_link(&np->tx_skbs[i], i+1);
> >               np->grant_tx_ref[i] = GRANT_INVALID_REF;
> >       }
> >
> >       /* Clear out rx_skbs */
> > -     for (i = 0; i < NET_RX_RING_SIZE; i++) {
> > +     for (i = 0; i < XENNET_MAX_RX_RING_SIZE; i++) {
> >               np->rx_skbs[i] = NULL;
> >               np->grant_rx_ref[i] = GRANT_INVALID_REF;
> >       }
> > @@ -1409,15 +1424,11 @@ static int __devinit netfront_probe(struct xenbus_device *dev,
> >       return err;
> >  }
> >
> > -static void xennet_end_access(int ref, void *page)
> > -{
> > -     /* This frees the page as a side-effect */
> > -     if (ref != GRANT_INVALID_REF)
> > -             gnttab_end_foreign_access(ref, 0, (unsigned long)page);
> > -}
> > -
> >  static void xennet_disconnect_backend(struct netfront_info *info)
> >  {
> > +     int i;
> > +     struct xenbus_device *dev = info->xbdev;
> > +
> >       /* Stop old i/f to prevent errors whilst we rebuild the state. */
> >       spin_lock_bh(&info->rx_lock);
> >       spin_lock_irq(&info->tx_lock);
> > @@ -1429,12 +1440,24 @@ static void xennet_disconnect_backend(struct netfront_info *info)
> >               unbind_from_irqhandler(info->netdev->irq, info->netdev);
> >       info->evtchn = info->netdev->irq = 0;
> >
> > -     /* End access and free the pages */
> > -     xennet_end_access(info->tx_ring_ref, info->tx.sring);
> > -     xennet_end_access(info->rx_ring_ref, info->rx.sring);
> > +     for (i = 0; i < info->tx_ring_pages; i++) {
> > +             int ref = info->tx_ring_ref[i];
> > +             gnttab_end_foreign_access_ref(ref, 0);
> > +             info->tx_ring_ref[i] = GRANT_INVALID_REF;
> > +     }
> > +     dma_free_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > +                       (void *)info->tx.sring,
> > +                       info->tx_ring_dma_handle);
> > +
> > +     for (i = 0; i < info->rx_ring_pages; i++) {
> > +             int ref = info->rx_ring_ref[i];
> > +             gnttab_end_foreign_access_ref(ref, 0);
> > +             info->rx_ring_ref[i] = GRANT_INVALID_REF;
> > +     }
> > +     dma_free_coherent(NULL, PAGE_SIZE * info->rx_ring_pages,
> > +                       (void *)info->rx.sring,
> > +                       info->rx_ring_dma_handle);
> >
> > -     info->tx_ring_ref = GRANT_INVALID_REF;
> > -     info->rx_ring_ref = GRANT_INVALID_REF;
> >       info->tx.sring = NULL;
> >       info->rx.sring = NULL;
> >  }
> > @@ -1483,9 +1506,13 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> >       struct xen_netif_rx_sring *rxs;
> >       int err;
> >       struct net_device *netdev = info->netdev;
> > +     unsigned int max_tx_ring_page_order, max_rx_ring_page_order;
> > +     int i, j;
> >
> > -     info->tx_ring_ref = GRANT_INVALID_REF;
> > -     info->rx_ring_ref = GRANT_INVALID_REF;
> > +     for (i = 0; i < XENNET_MAX_RING_PAGES; i++) {
> > +             info->tx_ring_ref[i] = GRANT_INVALID_REF;
> > +             info->rx_ring_ref[i] = GRANT_INVALID_REF;
> > +     }
> >       info->rx.sring = NULL;
> >       info->tx.sring = NULL;
> >       netdev->irq = 0;
> > @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> >               goto fail;
> >       }
> >
> > -     txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> > +     err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> > +                        "max-tx-ring-page-order", "%u",
> > +                        &max_tx_ring_page_order);
> > +     if (err < 0) {
> > +             info->tx_ring_page_order = 0;
> > +             dev_info(&dev->dev, "single tx ring\n");
> > +     } else {
> > +             info->tx_ring_page_order = max_tx_ring_page_order;
> > +             dev_info(&dev->dev, "multi page tx ring, order = %d\n",
> > +                      max_tx_ring_page_order);
> > +     }
> > +     info->tx_ring_pages = (1U << info->tx_ring_page_order);
> > +
> > +     txs = (struct xen_netif_tx_sring *)
> > +             dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > +                                &info->tx_ring_dma_handle,
> > +                                __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> 
> Hm, so I see you are using 'NULL' which is a big nono (the API docs say that).
> But the other reason why it is a no-no, is b/c this way the generic DMA engine has no
> clue whether you are OK getting pages under 4GB or above it (so 64-bit support).
> 
> If you don't supply a 'dev' it will assume 4GB. But when you are run this as a
> pure PV guest that won't matter the slighest b/I there are no DMA code in action
> (well, there is dma_alloc_coherent - which looking at the code would NULL it seems).
> 
> Anyhow, if you get to have more than 4GB in the guest or do PCI passthrough and use
> 'iommu=soft'- at which point the Xen SWIOTLB will kick and you will end up 'swizzling'
> the pages to be under 4GB. That can be fixed if you declerae a 'fake' device where you set
> the coherent_dma_mask to DMA_BIT_MASK(64).
> 

This seems to be a reasonable solution. I could not set netfront's DMA
mask, that's why I used NULL device. And, how do I create a 'fake'
device?

> But if you boot the guest under HVM, then it will use the generic SWIOTLB code, which
> won't guaranteeing the pages to be "machine" contingous but will be "guest machine"
> contingous. Is that sufficient for this?
> 

For HVM, this is sufficient.

> How did you test this? Did you supply iommu=soft  to your guest or booted it
> with more than 4GB?
> 

I haven't tested guest with more than 4GB RAM.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 13/16] netback: stub for multi receive protocol support.
  2012-01-30 21:47   ` [Xen-devel] " Konrad Rzeszutek Wilk
@ 2012-01-31 11:03       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell

On Mon, 2012-01-30 at 21:47 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 30, 2012 at 02:45:31PM +0000, Wei Liu wrote:
> > Refactor netback, make stub for mutli receive protocols. Also stub
> 
> multi.
> 

Good catch, thanks.

> > existing code as protocol 0.
> 
> Why not 1?
> 

We have some existing xenolinux code which has not been upstreamed calls
this protocol 0, just try to be compatible.

> Why do we need a new rework without anything using it besides
> the existing framework? OR if you are, you should say which
> patch is doing it...
> 

It is not in use at the moment, and will be in use in the future.

> > 
> > Now the file layout becomes:
> > 
> >  - interface.c: xenvif interfaces
> >  - xenbus.c: xenbus related functions
> >  - netback.c: common functions for various protocols
> > 
> > For different protocols:
> > 
> >  - xenvif_rx_protocolX.h: header file for the protocol, including
> >                           protocol structures and functions
> >  - xenvif_rx_protocolX.c: implementations
> > 
> > To add a new protocol:
> > 
> >  - include protocol header in common.h
> >  - modify XENVIF_MAX_RX_PROTOCOL in common.h
> >  - add protocol structure in xenvif.rx union
> >  - stub in xenbus.c
> >  - modify Makefile
> > 
> > A protocol should define five functions:
> > 
> >  - setup: setup frontend / backend ring connections
> >  - teardown: teardown frontend / backend ring connections
> >  - start_xmit: host start xmit (i.e. guest need to do rx)
> >  - event: rx completion event
> >  - action: prepare host side data for guest rx
> > 
> .. snip..
> 
> > -
> > -	return resp;
> > -}
> > -
> >  static inline int rx_work_todo(struct xenvif *vif)
> >  {
> >  	return !skb_queue_empty(&vif->rx_queue);
> > @@ -1507,8 +999,8 @@ int xenvif_kthread(void *data)
> >  		if (kthread_should_stop())
> >  			break;
> >  
> > -		if (rx_work_todo(vif))
> > -			xenvif_rx_action(vif);
> > +		if (rx_work_todo(vif) && vif->action)
> > +			vif->action(vif);
> >  	}
> >  
> >  	return 0;
> > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> > index 79499fc..4067286 100644
> > --- a/drivers/net/xen-netback/xenbus.c
> > +++ b/drivers/net/xen-netback/xenbus.c
> > @@ -415,6 +415,7 @@ static int connect_rings(struct backend_info *be)
> >  	unsigned long rx_ring_ref[NETBK_MAX_RING_PAGES];
> >  	unsigned int  tx_ring_order;
> >  	unsigned int  rx_ring_order;
> > +	unsigned int  rx_protocol;
> >  
> >  	err = xenbus_gather(XBT_NIL, dev->otherend,
> >  			    "event-channel", "%u", &evtchn, NULL);
> > @@ -510,6 +511,11 @@ static int connect_rings(struct backend_info *be)
> >  		}
> >  	}
> >  
> > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-protocol",
> 
> feature-rx-protocol?
> 

This is not a feature switch. Does it make sense to add "feature-"
prefix?

> > +			   "%u", &rx_protocol);
> > +	if (err < 0)
> > +		rx_protocol = XENVIF_MIN_RX_PROTOCOL;
> > +
> 
> You should check to see if the protocol is higher than what we can support.
> The guest could be playing funny games and putting in 39432...
> 
> 

Good point.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 13/16] netback: stub for multi receive protocol support.
@ 2012-01-31 11:03       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:03 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk; +Cc: wei.liu2, netdev, xen-devel, Ian Campbell

On Mon, 2012-01-30 at 21:47 +0000, Konrad Rzeszutek Wilk wrote:
> On Mon, Jan 30, 2012 at 02:45:31PM +0000, Wei Liu wrote:
> > Refactor netback, make stub for mutli receive protocols. Also stub
> 
> multi.
> 

Good catch, thanks.

> > existing code as protocol 0.
> 
> Why not 1?
> 

We have some existing xenolinux code which has not been upstreamed calls
this protocol 0, just try to be compatible.

> Why do we need a new rework without anything using it besides
> the existing framework? OR if you are, you should say which
> patch is doing it...
> 

It is not in use at the moment, and will be in use in the future.

> > 
> > Now the file layout becomes:
> > 
> >  - interface.c: xenvif interfaces
> >  - xenbus.c: xenbus related functions
> >  - netback.c: common functions for various protocols
> > 
> > For different protocols:
> > 
> >  - xenvif_rx_protocolX.h: header file for the protocol, including
> >                           protocol structures and functions
> >  - xenvif_rx_protocolX.c: implementations
> > 
> > To add a new protocol:
> > 
> >  - include protocol header in common.h
> >  - modify XENVIF_MAX_RX_PROTOCOL in common.h
> >  - add protocol structure in xenvif.rx union
> >  - stub in xenbus.c
> >  - modify Makefile
> > 
> > A protocol should define five functions:
> > 
> >  - setup: setup frontend / backend ring connections
> >  - teardown: teardown frontend / backend ring connections
> >  - start_xmit: host start xmit (i.e. guest need to do rx)
> >  - event: rx completion event
> >  - action: prepare host side data for guest rx
> > 
> .. snip..
> 
> > -
> > -	return resp;
> > -}
> > -
> >  static inline int rx_work_todo(struct xenvif *vif)
> >  {
> >  	return !skb_queue_empty(&vif->rx_queue);
> > @@ -1507,8 +999,8 @@ int xenvif_kthread(void *data)
> >  		if (kthread_should_stop())
> >  			break;
> >  
> > -		if (rx_work_todo(vif))
> > -			xenvif_rx_action(vif);
> > +		if (rx_work_todo(vif) && vif->action)
> > +			vif->action(vif);
> >  	}
> >  
> >  	return 0;
> > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> > index 79499fc..4067286 100644
> > --- a/drivers/net/xen-netback/xenbus.c
> > +++ b/drivers/net/xen-netback/xenbus.c
> > @@ -415,6 +415,7 @@ static int connect_rings(struct backend_info *be)
> >  	unsigned long rx_ring_ref[NETBK_MAX_RING_PAGES];
> >  	unsigned int  tx_ring_order;
> >  	unsigned int  rx_ring_order;
> > +	unsigned int  rx_protocol;
> >  
> >  	err = xenbus_gather(XBT_NIL, dev->otherend,
> >  			    "event-channel", "%u", &evtchn, NULL);
> > @@ -510,6 +511,11 @@ static int connect_rings(struct backend_info *be)
> >  		}
> >  	}
> >  
> > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-protocol",
> 
> feature-rx-protocol?
> 

This is not a feature switch. Does it make sense to add "feature-"
prefix?

> > +			   "%u", &rx_protocol);
> > +	if (err < 0)
> > +		rx_protocol = XENVIF_MIN_RX_PROTOCOL;
> > +
> 
> You should check to see if the protocol is higher than what we can support.
> The guest could be playing funny games and putting in 39432...
> 
> 

Good point.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-31  9:01       ` Jan Beulich
@ 2012-01-31 11:09           ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel, konrad.wilk, netdev

On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
> >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> >> > -			      grant_ref_t tx_ring_ref,
> >> > -			      grant_ref_t rx_ring_ref)
> >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> >> > +			      int domid,
> >> > +			      unsigned long ring_ref[],
> >> > +			      unsigned int  ring_ref_count)
> >> >  {
> >> > -	void *addr;
> >> > -	struct xen_netif_tx_sring *txs;
> >> > -	struct xen_netif_rx_sring *rxs;
> >> > -
> >> > -	int err = -ENOMEM;
> >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> >> > +	unsigned int i;
> >> > +	int err = 0;
> >> >  
> >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> >> > -				     tx_ring_ref, &addr);
> >> 
> >> Any reason why you don't just extend this function (in a prerequisite
> >> patch) rather than open coding a common utility function (twice) here,
> >> so that other backends (blkback!) can benefit later as well.
> >> 
> >> Jan
> >> 
> > 
> > I'm mainly focusing on netback stuffs, so the code is slightly coupled
> > with netback -- NETBK_MAX_RING_PAGES.
> > 
> > To extend xenbus_map_ring_valloc and make more generic, it requires
> > setting a global maximum page number limits on rings, I think it will
> > require further investigation and code refactor -- which I have no time
> > to attend to at the moment. :-/
> 
> Why? You can simply pass in the number of pages, there's no need
> for a global maximum.
> 

I mean the gnttab_map_gran_ref array, it is statically allocated at the
moment. Of course we can make it dynamically allocated, but why bother
taking the risk of allocation failure.


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
@ 2012-01-31 11:09           ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel, konrad.wilk, netdev

On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
> >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> >> > -			      grant_ref_t tx_ring_ref,
> >> > -			      grant_ref_t rx_ring_ref)
> >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> >> > +			      int domid,
> >> > +			      unsigned long ring_ref[],
> >> > +			      unsigned int  ring_ref_count)
> >> >  {
> >> > -	void *addr;
> >> > -	struct xen_netif_tx_sring *txs;
> >> > -	struct xen_netif_rx_sring *rxs;
> >> > -
> >> > -	int err = -ENOMEM;
> >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> >> > +	unsigned int i;
> >> > +	int err = 0;
> >> >  
> >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> >> > -				     tx_ring_ref, &addr);
> >> 
> >> Any reason why you don't just extend this function (in a prerequisite
> >> patch) rather than open coding a common utility function (twice) here,
> >> so that other backends (blkback!) can benefit later as well.
> >> 
> >> Jan
> >> 
> > 
> > I'm mainly focusing on netback stuffs, so the code is slightly coupled
> > with netback -- NETBK_MAX_RING_PAGES.
> > 
> > To extend xenbus_map_ring_valloc and make more generic, it requires
> > setting a global maximum page number limits on rings, I think it will
> > require further investigation and code refactor -- which I have no time
> > to attend to at the moment. :-/
> 
> Why? You can simply pass in the number of pages, there's no need
> for a global maximum.
> 

I mean the gnttab_map_gran_ref array, it is statically allocated at the
moment. Of course we can make it dynamically allocated, but why bother
taking the risk of allocation failure.


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-31 11:09           ` Wei Liu
  (?)
@ 2012-01-31 11:12           ` Ian Campbell
  -1 siblings, 0 replies; 59+ messages in thread
From: Ian Campbell @ 2012-01-31 11:12 UTC (permalink / raw)
  To: Wei Liu (Intern); +Cc: Jan Beulich, xen-devel, konrad.wilk, netdev

On Tue, 2012-01-31 at 11:09 +0000, Wei Liu (Intern) wrote:
> On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
> > >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> > > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> > >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> > >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> > >> > -			      grant_ref_t tx_ring_ref,
> > >> > -			      grant_ref_t rx_ring_ref)
> > >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> > >> > +			      int domid,
> > >> > +			      unsigned long ring_ref[],
> > >> > +			      unsigned int  ring_ref_count)
> > >> >  {
> > >> > -	void *addr;
> > >> > -	struct xen_netif_tx_sring *txs;
> > >> > -	struct xen_netif_rx_sring *rxs;
> > >> > -
> > >> > -	int err = -ENOMEM;
> > >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> > >> > +	unsigned int i;
> > >> > +	int err = 0;
> > >> >  
> > >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> > >> > -				     tx_ring_ref, &addr);
> > >> 
> > >> Any reason why you don't just extend this function (in a prerequisite
> > >> patch) rather than open coding a common utility function (twice) here,
> > >> so that other backends (blkback!) can benefit later as well.
> > >> 
> > >> Jan
> > >> 
> > > 
> > > I'm mainly focusing on netback stuffs, so the code is slightly coupled
> > > with netback -- NETBK_MAX_RING_PAGES.
> > > 
> > > To extend xenbus_map_ring_valloc and make more generic, it requires
> > > setting a global maximum page number limits on rings, I think it will
> > > require further investigation and code refactor -- which I have no time
> > > to attend to at the moment. :-/
> > 
> > Why? You can simply pass in the number of pages, there's no need
> > for a global maximum.
> > 
> 
> I mean the gnttab_map_gran_ref array, it is statically allocated at the
> moment. Of course we can make it dynamically allocated, but why bother
> taking the risk of allocation failure.

You can do
	struct gnttab_map_grant_ref op[nr_pages];
and the compiler will do the right thing with the on-stack data
structure. In the kernel you'd need to be a bit careful about the size
of pages first but that should be all.

Ian.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-31  9:53       ` Jan Beulich
@ 2012-01-31 11:15         ` Wei Liu
  -1 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, Konrad Rzeszutek Wilk, Ian Campbell, xen-devel, netdev

On Tue, 2012-01-31 at 09:53 +0000, Jan Beulich wrote:
> > with more than 4GB?
> 
> Imo the use of the DMA API is a mistake here anyway. There's no need
> for anything to be contiguous in a PV frontend/backend handshake
> protocol, or if one finds there is it's very likely just because of trying to
> avoid doing something properly.
> 

Seems you're right. I look at the backend code again, I make use of 
vm_struct to create a contiguous area in backend kernel. In fact I can
avoid DMA API completely. ;-)


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
@ 2012-01-31 11:15         ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:15 UTC (permalink / raw)
  To: Jan Beulich
  Cc: wei.liu2, Konrad Rzeszutek Wilk, Ian Campbell, xen-devel, netdev

On Tue, 2012-01-31 at 09:53 +0000, Jan Beulich wrote:
> > with more than 4GB?
> 
> Imo the use of the DMA API is a mistake here anyway. There's no need
> for anything to be contiguous in a PV frontend/backend handshake
> protocol, or if one finds there is it's very likely just because of trying to
> avoid doing something properly.
> 

Seems you're right. I look at the backend code again, I make use of 
vm_struct to create a contiguous area in backend kernel. In fact I can
avoid DMA API completely. ;-)


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
  2012-01-31  9:12     ` Ian Campbell
@ 2012-01-31 11:17         ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:17 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, Konrad Rzeszutek Wilk, netdev, xen-devel

On Tue, 2012-01-31 at 09:12 +0000, Ian Campbell wrote:
> On Mon, 2012-01-30 at 21:39 +0000, Konrad Rzeszutek Wilk wrote:
> 
> [...snip... please do consider trimming unnecessary quotes]
> 
> > > @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> > >               goto fail;
> > >       }
> > >
> > > -     txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> > > +     err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> > > +                        "max-tx-ring-page-order", "%u",
> > > +                        &max_tx_ring_page_order);
> > > +     if (err < 0) {
> > > +             info->tx_ring_page_order = 0;
> > > +             dev_info(&dev->dev, "single tx ring\n");
> > > +     } else {
> > > +             info->tx_ring_page_order = max_tx_ring_page_order;
> > > +             dev_info(&dev->dev, "multi page tx ring, order = %d\n",
> > > +                      max_tx_ring_page_order);
> > > +     }
> > > +     info->tx_ring_pages = (1U << info->tx_ring_page_order);
> > > +
> > > +     txs = (struct xen_netif_tx_sring *)
> > > +             dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > > +                                &info->tx_ring_dma_handle,
> > > +                                __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> > 
> > Hm, so I see you are using 'NULL' which is a big nono (the API docs say that).
> > But the other reason why it is a no-no, is b/c this way the generic DMA engine has no
> > clue whether you are OK getting pages under 4GB or above it (so 64-bit support).
> 
> Does this allocation even need to be physically contiguous? I'd have
> thought that virtually contiguous would be sufficient, and even then
> only as a convenience at either end to avoid the need for more
> complicated ring macros.
> 

I had a second thought about this, you're right. I just need to make
sure it is virtually contiguous in both frontend and backend, that's
sufficient.

Wei.

> Ian.
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 15/16] netfront: multi page ring support.
@ 2012-01-31 11:17         ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:17 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, Konrad Rzeszutek Wilk, netdev, xen-devel

On Tue, 2012-01-31 at 09:12 +0000, Ian Campbell wrote:
> On Mon, 2012-01-30 at 21:39 +0000, Konrad Rzeszutek Wilk wrote:
> 
> [...snip... please do consider trimming unnecessary quotes]
> 
> > > @@ -1496,50 +1523,105 @@ static int setup_netfront(struct xenbus_device *dev, struct netfront_info *info)
> > >               goto fail;
> > >       }
> > >
> > > -     txs = (struct xen_netif_tx_sring *)get_zeroed_page(GFP_NOIO | __GFP_HIGH);
> > > +     err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
> > > +                        "max-tx-ring-page-order", "%u",
> > > +                        &max_tx_ring_page_order);
> > > +     if (err < 0) {
> > > +             info->tx_ring_page_order = 0;
> > > +             dev_info(&dev->dev, "single tx ring\n");
> > > +     } else {
> > > +             info->tx_ring_page_order = max_tx_ring_page_order;
> > > +             dev_info(&dev->dev, "multi page tx ring, order = %d\n",
> > > +                      max_tx_ring_page_order);
> > > +     }
> > > +     info->tx_ring_pages = (1U << info->tx_ring_page_order);
> > > +
> > > +     txs = (struct xen_netif_tx_sring *)
> > > +             dma_alloc_coherent(NULL, PAGE_SIZE * info->tx_ring_pages,
> > > +                                &info->tx_ring_dma_handle,
> > > +                                __GFP_ZERO | GFP_NOIO | __GFP_HIGH);
> > 
> > Hm, so I see you are using 'NULL' which is a big nono (the API docs say that).
> > But the other reason why it is a no-no, is b/c this way the generic DMA engine has no
> > clue whether you are OK getting pages under 4GB or above it (so 64-bit support).
> 
> Does this allocation even need to be physically contiguous? I'd have
> thought that virtually contiguous would be sufficient, and even then
> only as a convenience at either end to avoid the need for more
> complicated ring macros.
> 

I had a second thought about this, you're right. I just need to make
sure it is virtually contiguous in both frontend and backend, that's
sufficient.

Wei.

> Ian.
> 
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 14/16] netback: split event channels support
  2012-01-31 10:37   ` Ian Campbell
@ 2012-01-31 11:57       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk

On Tue, 2012-01-31 at 10:37 +0000, Ian Campbell wrote:
> O
> 
> Can you get rid of split_irq by setting tx_irq == rx_irq in that case
> and simplify the code by doing so?
> 
> I think this should work even for places like:
> 
> 	if (!vif->split_irq)
> 		enable_irq(vif->tx_irq);
> 	else {
> 		enable_irq(vif->tx_irq);
> 		enable_irq(vif->rx_irq);
> 	}
> 
> Just by doing
> 		enable_irq(vif->tx_irq);
> 		enable_irq(vif->rx_irq);
> 
> Since enable/disable_irq maintain a count and so it will do the right
> thing if they happen to be the same.
> 

Hmm... OK.

> >  	/* The shared tx ring and index. */
> >  	struct xen_netif_tx_back_ring tx;
> > @@ -162,7 +164,8 @@ struct xenvif *xenvif_alloc(struct device *parent,
> >  int xenvif_connect(struct xenvif *vif,
> >  		   unsigned long tx_ring_ref[], unsigned int tx_ring_order,
> >  		   unsigned long rx_ring_ref[], unsigned int rx_ring_order,
> > -		   unsigned int evtchn, unsigned int rx_protocol);
> > +		   unsigned int evtchn[], int split_evtchn,
> > +		   unsigned int rx_protocol);
> >  void xenvif_disconnect(struct xenvif *vif);
> >  
> >  int xenvif_xenbus_init(void);
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> > index 0f05f03..afccd5d 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -46,15 +46,31 @@ int xenvif_schedulable(struct xenvif *vif)
> >  	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
> >  }
> >  
> > -static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
> > +static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
> > +{
> > +	struct xenvif *vif = dev_id;
> > +
> > +	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
> > +		napi_schedule(&vif->napi);
> > +
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
> >  {
> >  	struct xenvif *vif = dev_id;
> >  
> >  	if (xenvif_schedulable(vif) && vif->event != NULL)
> >  		vif->event(vif);
> >  
> > -	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
> > -		napi_schedule(&vif->napi);
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
> > +{
> > +	xenvif_tx_interrupt(0, dev_id);
> 
> Might as well pass irq down.

Sure.

> [...]
> > @@ -308,13 +334,14 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
> >  int xenvif_connect(struct xenvif *vif,
> >  		   unsigned long tx_ring_ref[], unsigned int tx_ring_ref_count,
> >  		   unsigned long rx_ring_ref[], unsigned int rx_ring_ref_count,
> > -		   unsigned int evtchn, unsigned int rx_protocol)
> > +		   unsigned int evtchn[], int split_evtchn,
> 
> Explicitly tx_evtchn and rx_evtchn would be clearer than remembering
> that [0]==tx and [1]==rx I think.
> 
> > +		   unsigned int rx_protocol)
> >  {
> >  	int err = -ENOMEM;
> >  	struct xen_netif_tx_sring *txs;
> >  
> >  	/* Already connected through? */
> > -	if (vif->irq)
> > +	if (vif->tx_irq)
> >  		return 0;
> >  
> >  	__module_get(THIS_MODULE);
> > @@ -345,13 +372,35 @@ int xenvif_connect(struct xenvif *vif,
> >  	if (vif->setup(vif))
> >  		goto err_rx_unmap;
> >  
> > -	err = bind_interdomain_evtchn_to_irqhandler(
> > -		vif->domid, evtchn, xenvif_interrupt, 0,
> > -		vif->dev->name, vif);
> > -	if (err < 0)
> > -		goto err_rx_unmap;
> > -	vif->irq = err;
> > -	disable_irq(vif->irq);
> > +	if (!split_evtchn) {
> 
> Presumably this is one of the places where you do have to care about
> split vs non. I did consider whether simply registering two handlers for
> the interrupt in a shared-interrupt style would work, but I think that
> way lies madness and confusion...
> 
> > +		err = bind_interdomain_evtchn_to_irqhandler(
> > +			vif->domid, evtchn[0], xenvif_interrupt, 0,
> > +			vif->dev->name, vif);
> > +		if (err < 0)
> > +			goto err_rx_unmap;
> > +		vif->tx_irq = vif->rx_irq = err;
> > +		disable_irq(vif->tx_irq);
> > +		vif->split_irq = 0;
> > +	} else {
> > +		err = bind_interdomain_evtchn_to_irqhandler(
> > +			vif->domid, evtchn[0], xenvif_tx_interrupt,
> > +			0, vif->dev->name, vif);
> > +		if (err < 0)
> > +			goto err_rx_unmap;
> > +		vif->tx_irq = err;
> > +		disable_irq(vif->tx_irq);
> > +
> > +		err = bind_interdomain_evtchn_to_irqhandler(
> > +			vif->domid, evtchn[1], xenvif_rx_interrupt,
> > +			0, vif->dev->name, vif);
> > +		if (err < 0) {
> > +			unbind_from_irqhandler(vif->tx_irq, vif);
> > +			goto err_rx_unmap;
> > +		}
> > +		vif->rx_irq = err;
> > +		disable_irq(vif->rx_irq);
> > +		vif->split_irq = 1;
> > +	}
> >  
> >  	init_waitqueue_head(&vif->wq);
> >  	vif->task = kthread_create(xenvif_kthread,
> > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> > index 4067286..c5a3b27 100644
> > --- a/drivers/net/xen-netback/xenbus.c
> > +++ b/drivers/net/xen-netback/xenbus.c
> > @@ -131,6 +131,14 @@ static int netback_probe(struct xenbus_device *dev,
> >  			goto abort_transaction;
> >  		}
> >  
> > +		err = xenbus_printf(xbt, dev->nodename,
> > +				    "split-event-channels",
> 
> Usually we use "feature-FOO" as the names for these sorts of nodes.
> 

Got it.

> > +				    "%u", 1);
> > +		if (err) {
> > +			message = "writing split-event-channels";
> > +			goto abort_transaction;
> > +		}
> > +
> >  		err = xenbus_transaction_end(xbt, 0);
> >  	} while (err == -EAGAIN);
> >  
> > @@ -408,7 +416,7 @@ static int connect_rings(struct backend_info *be)
> >  {
> >  	struct xenvif *vif = be->vif;
> >  	struct xenbus_device *dev = be->dev;
> > -	unsigned int evtchn, rx_copy;
> > +	unsigned int evtchn[2], split_evtchn, rx_copy;
> 
> Another case where I think two vars is better than a small array.
> 
> >  	int err;
> >  	int val;
> >  	unsigned long tx_ring_ref[NETBK_MAX_RING_PAGES];
> 

Reasonable change.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH V3 14/16] netback: split event channels support
@ 2012-01-31 11:57       ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 11:57 UTC (permalink / raw)
  To: Ian Campbell; +Cc: wei.liu2, netdev, xen-devel, konrad.wilk

On Tue, 2012-01-31 at 10:37 +0000, Ian Campbell wrote:
> O
> 
> Can you get rid of split_irq by setting tx_irq == rx_irq in that case
> and simplify the code by doing so?
> 
> I think this should work even for places like:
> 
> 	if (!vif->split_irq)
> 		enable_irq(vif->tx_irq);
> 	else {
> 		enable_irq(vif->tx_irq);
> 		enable_irq(vif->rx_irq);
> 	}
> 
> Just by doing
> 		enable_irq(vif->tx_irq);
> 		enable_irq(vif->rx_irq);
> 
> Since enable/disable_irq maintain a count and so it will do the right
> thing if they happen to be the same.
> 

Hmm... OK.

> >  	/* The shared tx ring and index. */
> >  	struct xen_netif_tx_back_ring tx;
> > @@ -162,7 +164,8 @@ struct xenvif *xenvif_alloc(struct device *parent,
> >  int xenvif_connect(struct xenvif *vif,
> >  		   unsigned long tx_ring_ref[], unsigned int tx_ring_order,
> >  		   unsigned long rx_ring_ref[], unsigned int rx_ring_order,
> > -		   unsigned int evtchn, unsigned int rx_protocol);
> > +		   unsigned int evtchn[], int split_evtchn,
> > +		   unsigned int rx_protocol);
> >  void xenvif_disconnect(struct xenvif *vif);
> >  
> >  int xenvif_xenbus_init(void);
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> > index 0f05f03..afccd5d 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -46,15 +46,31 @@ int xenvif_schedulable(struct xenvif *vif)
> >  	return netif_running(vif->dev) && netif_carrier_ok(vif->dev);
> >  }
> >  
> > -static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
> > +static irqreturn_t xenvif_tx_interrupt(int irq, void *dev_id)
> > +{
> > +	struct xenvif *vif = dev_id;
> > +
> > +	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
> > +		napi_schedule(&vif->napi);
> > +
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
> >  {
> >  	struct xenvif *vif = dev_id;
> >  
> >  	if (xenvif_schedulable(vif) && vif->event != NULL)
> >  		vif->event(vif);
> >  
> > -	if (RING_HAS_UNCONSUMED_REQUESTS(&vif->tx))
> > -		napi_schedule(&vif->napi);
> > +	return IRQ_HANDLED;
> > +}
> > +
> > +static irqreturn_t xenvif_interrupt(int irq, void *dev_id)
> > +{
> > +	xenvif_tx_interrupt(0, dev_id);
> 
> Might as well pass irq down.

Sure.

> [...]
> > @@ -308,13 +334,14 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
> >  int xenvif_connect(struct xenvif *vif,
> >  		   unsigned long tx_ring_ref[], unsigned int tx_ring_ref_count,
> >  		   unsigned long rx_ring_ref[], unsigned int rx_ring_ref_count,
> > -		   unsigned int evtchn, unsigned int rx_protocol)
> > +		   unsigned int evtchn[], int split_evtchn,
> 
> Explicitly tx_evtchn and rx_evtchn would be clearer than remembering
> that [0]==tx and [1]==rx I think.
> 
> > +		   unsigned int rx_protocol)
> >  {
> >  	int err = -ENOMEM;
> >  	struct xen_netif_tx_sring *txs;
> >  
> >  	/* Already connected through? */
> > -	if (vif->irq)
> > +	if (vif->tx_irq)
> >  		return 0;
> >  
> >  	__module_get(THIS_MODULE);
> > @@ -345,13 +372,35 @@ int xenvif_connect(struct xenvif *vif,
> >  	if (vif->setup(vif))
> >  		goto err_rx_unmap;
> >  
> > -	err = bind_interdomain_evtchn_to_irqhandler(
> > -		vif->domid, evtchn, xenvif_interrupt, 0,
> > -		vif->dev->name, vif);
> > -	if (err < 0)
> > -		goto err_rx_unmap;
> > -	vif->irq = err;
> > -	disable_irq(vif->irq);
> > +	if (!split_evtchn) {
> 
> Presumably this is one of the places where you do have to care about
> split vs non. I did consider whether simply registering two handlers for
> the interrupt in a shared-interrupt style would work, but I think that
> way lies madness and confusion...
> 
> > +		err = bind_interdomain_evtchn_to_irqhandler(
> > +			vif->domid, evtchn[0], xenvif_interrupt, 0,
> > +			vif->dev->name, vif);
> > +		if (err < 0)
> > +			goto err_rx_unmap;
> > +		vif->tx_irq = vif->rx_irq = err;
> > +		disable_irq(vif->tx_irq);
> > +		vif->split_irq = 0;
> > +	} else {
> > +		err = bind_interdomain_evtchn_to_irqhandler(
> > +			vif->domid, evtchn[0], xenvif_tx_interrupt,
> > +			0, vif->dev->name, vif);
> > +		if (err < 0)
> > +			goto err_rx_unmap;
> > +		vif->tx_irq = err;
> > +		disable_irq(vif->tx_irq);
> > +
> > +		err = bind_interdomain_evtchn_to_irqhandler(
> > +			vif->domid, evtchn[1], xenvif_rx_interrupt,
> > +			0, vif->dev->name, vif);
> > +		if (err < 0) {
> > +			unbind_from_irqhandler(vif->tx_irq, vif);
> > +			goto err_rx_unmap;
> > +		}
> > +		vif->rx_irq = err;
> > +		disable_irq(vif->rx_irq);
> > +		vif->split_irq = 1;
> > +	}
> >  
> >  	init_waitqueue_head(&vif->wq);
> >  	vif->task = kthread_create(xenvif_kthread,
> > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> > index 4067286..c5a3b27 100644
> > --- a/drivers/net/xen-netback/xenbus.c
> > +++ b/drivers/net/xen-netback/xenbus.c
> > @@ -131,6 +131,14 @@ static int netback_probe(struct xenbus_device *dev,
> >  			goto abort_transaction;
> >  		}
> >  
> > +		err = xenbus_printf(xbt, dev->nodename,
> > +				    "split-event-channels",
> 
> Usually we use "feature-FOO" as the names for these sorts of nodes.
> 

Got it.

> > +				    "%u", 1);
> > +		if (err) {
> > +			message = "writing split-event-channels";
> > +			goto abort_transaction;
> > +		}
> > +
> >  		err = xenbus_transaction_end(xbt, 0);
> >  	} while (err == -EAGAIN);
> >  
> > @@ -408,7 +416,7 @@ static int connect_rings(struct backend_info *be)
> >  {
> >  	struct xenvif *vif = be->vif;
> >  	struct xenbus_device *dev = be->dev;
> > -	unsigned int evtchn, rx_copy;
> > +	unsigned int evtchn[2], split_evtchn, rx_copy;
> 
> Another case where I think two vars is better than a small array.
> 
> >  	int err;
> >  	int val;
> >  	unsigned long tx_ring_ref[NETBK_MAX_RING_PAGES];
> 

Reasonable change.


Wei.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-31 11:09           ` Wei Liu
  (?)
  (?)
@ 2012-01-31 13:24           ` Jan Beulich
  2012-01-31 13:32               ` Wei Liu
  -1 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2012-01-31 13:24 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, konrad.wilk, netdev

>>> On 31.01.12 at 12:09, Wei Liu <wei.liu2@citrix.com> wrote:
> On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
>> >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
>> > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
>> >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
>> >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
>> >> > -			      grant_ref_t tx_ring_ref,
>> >> > -			      grant_ref_t rx_ring_ref)
>> >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
>> >> > +			      int domid,
>> >> > +			      unsigned long ring_ref[],
>> >> > +			      unsigned int  ring_ref_count)
>> >> >  {
>> >> > -	void *addr;
>> >> > -	struct xen_netif_tx_sring *txs;
>> >> > -	struct xen_netif_rx_sring *rxs;
>> >> > -
>> >> > -	int err = -ENOMEM;
>> >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
>> >> > +	unsigned int i;
>> >> > +	int err = 0;
>> >> >  
>> >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
>> >> > -				     tx_ring_ref, &addr);
>> >> 
>> >> Any reason why you don't just extend this function (in a prerequisite
>> >> patch) rather than open coding a common utility function (twice) here,
>> >> so that other backends (blkback!) can benefit later as well.
>> >> 
>> >> Jan
>> >> 
>> > 
>> > I'm mainly focusing on netback stuffs, so the code is slightly coupled
>> > with netback -- NETBK_MAX_RING_PAGES.
>> > 
>> > To extend xenbus_map_ring_valloc and make more generic, it requires
>> > setting a global maximum page number limits on rings, I think it will
>> > require further investigation and code refactor -- which I have no time
>> > to attend to at the moment. :-/
>> 
>> Why? You can simply pass in the number of pages, there's no need
>> for a global maximum.
>> 
> 
> I mean the gnttab_map_gran_ref array, it is statically allocated at the
> moment. Of course we can make it dynamically allocated, but why bother
> taking the risk of allocation failure.

There's so many other allocations, why would you worry about this
one.

But of course you can undo what a recent change did, and then
subsequently someone else will have to clean up again after you.
I'm just asking to follow good programming practices and write
re-usable code where potential for re-use is obvious (after all,
multi-page block interface patches have been floating around for
much longer than yours for the net interface).

Jan

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-31 13:24           ` Jan Beulich
@ 2012-01-31 13:32               ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 13:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel, konrad.wilk, netdev

On Tue, 2012-01-31 at 13:24 +0000, Jan Beulich wrote:
> >>> On 31.01.12 at 12:09, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
> >> >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> >> > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> >> >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> >> >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> >> >> > -			      grant_ref_t tx_ring_ref,
> >> >> > -			      grant_ref_t rx_ring_ref)
> >> >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> >> >> > +			      int domid,
> >> >> > +			      unsigned long ring_ref[],
> >> >> > +			      unsigned int  ring_ref_count)
> >> >> >  {
> >> >> > -	void *addr;
> >> >> > -	struct xen_netif_tx_sring *txs;
> >> >> > -	struct xen_netif_rx_sring *rxs;
> >> >> > -
> >> >> > -	int err = -ENOMEM;
> >> >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> >> >> > +	unsigned int i;
> >> >> > +	int err = 0;
> >> >> >  
> >> >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> >> >> > -				     tx_ring_ref, &addr);
> >> >> 
> >> >> Any reason why you don't just extend this function (in a prerequisite
> >> >> patch) rather than open coding a common utility function (twice) here,
> >> >> so that other backends (blkback!) can benefit later as well.
> >> >> 
> >> >> Jan
> >> >> 
> >> > 
> >> > I'm mainly focusing on netback stuffs, so the code is slightly coupled
> >> > with netback -- NETBK_MAX_RING_PAGES.
> >> > 
> >> > To extend xenbus_map_ring_valloc and make more generic, it requires
> >> > setting a global maximum page number limits on rings, I think it will
> >> > require further investigation and code refactor -- which I have no time
> >> > to attend to at the moment. :-/
> >> 
> >> Why? You can simply pass in the number of pages, there's no need
> >> for a global maximum.
> >> 
> > 
> > I mean the gnttab_map_gran_ref array, it is statically allocated at the
> > moment. Of course we can make it dynamically allocated, but why bother
> > taking the risk of allocation failure.
> 
> There's so many other allocations, why would you worry about this
> one.
> 

IMHO, this is not a critical part of the code, so failing this one thus
making all other pieces not workable is very strange.

> But of course you can undo what a recent change did, and then
> subsequently someone else will have to clean up again after you.
> I'm just asking to follow good programming practices and write
> re-usable code where potential for re-use is obvious (after all,
> multi-page block interface patches have been floating around for
> much longer than yours for the net interface).
> 

I understand your concern. If the changes required will not make this
series longer and involves major changes in block interface, I'm happy
to refactor the xenbus interface.


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
@ 2012-01-31 13:32               ` Wei Liu
  0 siblings, 0 replies; 59+ messages in thread
From: Wei Liu @ 2012-01-31 13:32 UTC (permalink / raw)
  To: Jan Beulich; +Cc: wei.liu2, Ian Campbell, xen-devel, konrad.wilk, netdev

On Tue, 2012-01-31 at 13:24 +0000, Jan Beulich wrote:
> >>> On 31.01.12 at 12:09, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
> >> >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> >> > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> >> >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> >> >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> >> >> > -			      grant_ref_t tx_ring_ref,
> >> >> > -			      grant_ref_t rx_ring_ref)
> >> >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> >> >> > +			      int domid,
> >> >> > +			      unsigned long ring_ref[],
> >> >> > +			      unsigned int  ring_ref_count)
> >> >> >  {
> >> >> > -	void *addr;
> >> >> > -	struct xen_netif_tx_sring *txs;
> >> >> > -	struct xen_netif_rx_sring *rxs;
> >> >> > -
> >> >> > -	int err = -ENOMEM;
> >> >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> >> >> > +	unsigned int i;
> >> >> > +	int err = 0;
> >> >> >  
> >> >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> >> >> > -				     tx_ring_ref, &addr);
> >> >> 
> >> >> Any reason why you don't just extend this function (in a prerequisite
> >> >> patch) rather than open coding a common utility function (twice) here,
> >> >> so that other backends (blkback!) can benefit later as well.
> >> >> 
> >> >> Jan
> >> >> 
> >> > 
> >> > I'm mainly focusing on netback stuffs, so the code is slightly coupled
> >> > with netback -- NETBK_MAX_RING_PAGES.
> >> > 
> >> > To extend xenbus_map_ring_valloc and make more generic, it requires
> >> > setting a global maximum page number limits on rings, I think it will
> >> > require further investigation and code refactor -- which I have no time
> >> > to attend to at the moment. :-/
> >> 
> >> Why? You can simply pass in the number of pages, there's no need
> >> for a global maximum.
> >> 
> > 
> > I mean the gnttab_map_gran_ref array, it is statically allocated at the
> > moment. Of course we can make it dynamically allocated, but why bother
> > taking the risk of allocation failure.
> 
> There's so many other allocations, why would you worry about this
> one.
> 

IMHO, this is not a critical part of the code, so failing this one thus
making all other pieces not workable is very strange.

> But of course you can undo what a recent change did, and then
> subsequently someone else will have to clean up again after you.
> I'm just asking to follow good programming practices and write
> re-usable code where potential for re-use is obvious (after all,
> multi-page block interface patches have been floating around for
> much longer than yours for the net interface).
> 

I understand your concern. If the changes required will not make this
series longer and involves major changes in block interface, I'm happy
to refactor the xenbus interface.


Wei.

> Jan
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 13/16] netback: stub for multi receive protocol support.
  2012-01-31 11:03       ` Wei Liu
  (?)
@ 2012-01-31 14:43       ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-31 14:43 UTC (permalink / raw)
  To: Wei Liu; +Cc: netdev, xen-devel, Ian Campbell

> > > existing code as protocol 0.
> > 
> > Why not 1?
> > 
> 
> We have some existing xenolinux code which has not been upstreamed calls
> this protocol 0, just try to be compatible.

Ah. Please do mention that in the description.

> 
> > Why do we need a new rework without anything using it besides
> > the existing framework? OR if you are, you should say which
> > patch is doing it...
> > 
> 
> It is not in use at the moment, and will be in use in the future.

Ok, should it be part of the "in the future" patchset then?

> 
> > > 
> > > Now the file layout becomes:
> > > 
> > >  - interface.c: xenvif interfaces
> > >  - xenbus.c: xenbus related functions
> > >  - netback.c: common functions for various protocols
> > > 
> > > For different protocols:
> > > 
> > >  - xenvif_rx_protocolX.h: header file for the protocol, including
> > >                           protocol structures and functions
> > >  - xenvif_rx_protocolX.c: implementations
> > > 
> > > To add a new protocol:
> > > 
> > >  - include protocol header in common.h
> > >  - modify XENVIF_MAX_RX_PROTOCOL in common.h
> > >  - add protocol structure in xenvif.rx union
> > >  - stub in xenbus.c
> > >  - modify Makefile
> > > 
> > > A protocol should define five functions:
> > > 
> > >  - setup: setup frontend / backend ring connections
> > >  - teardown: teardown frontend / backend ring connections
> > >  - start_xmit: host start xmit (i.e. guest need to do rx)
> > >  - event: rx completion event
> > >  - action: prepare host side data for guest rx
> > > 
> > .. snip..
> > 
> > > -
> > > -	return resp;
> > > -}
> > > -
> > >  static inline int rx_work_todo(struct xenvif *vif)
> > >  {
> > >  	return !skb_queue_empty(&vif->rx_queue);
> > > @@ -1507,8 +999,8 @@ int xenvif_kthread(void *data)
> > >  		if (kthread_should_stop())
> > >  			break;
> > >  
> > > -		if (rx_work_todo(vif))
> > > -			xenvif_rx_action(vif);
> > > +		if (rx_work_todo(vif) && vif->action)
> > > +			vif->action(vif);
> > >  	}
> > >  
> > >  	return 0;
> > > diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> > > index 79499fc..4067286 100644
> > > --- a/drivers/net/xen-netback/xenbus.c
> > > +++ b/drivers/net/xen-netback/xenbus.c
> > > @@ -415,6 +415,7 @@ static int connect_rings(struct backend_info *be)
> > >  	unsigned long rx_ring_ref[NETBK_MAX_RING_PAGES];
> > >  	unsigned int  tx_ring_order;
> > >  	unsigned int  rx_ring_order;
> > > +	unsigned int  rx_protocol;
> > >  
> > >  	err = xenbus_gather(XBT_NIL, dev->otherend,
> > >  			    "event-channel", "%u", &evtchn, NULL);
> > > @@ -510,6 +511,11 @@ static int connect_rings(struct backend_info *be)
> > >  		}
> > >  	}
> > >  
> > > +	err = xenbus_scanf(XBT_NIL, dev->otherend, "rx-protocol",
> > 
> > feature-rx-protocol?
> > 
> 
> This is not a feature switch. Does it make sense to add "feature-"

Good point.
> prefix?

It is negotiating a new protocol. Hm, perhaps 'protocol-rx-version' instead?
Or just 'protocol-version'?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [Xen-devel] [RFC PATCH V3 12/16] netback: multi-page ring support
  2012-01-31 13:32               ` Wei Liu
  (?)
@ 2012-01-31 14:48               ` Konrad Rzeszutek Wilk
  -1 siblings, 0 replies; 59+ messages in thread
From: Konrad Rzeszutek Wilk @ 2012-01-31 14:48 UTC (permalink / raw)
  To: Wei Liu; +Cc: Jan Beulich, Ian Campbell, xen-devel, netdev

On Tue, Jan 31, 2012 at 01:32:50PM +0000, Wei Liu wrote:
> On Tue, 2012-01-31 at 13:24 +0000, Jan Beulich wrote:
> > >>> On 31.01.12 at 12:09, Wei Liu <wei.liu2@citrix.com> wrote:
> > > On Tue, 2012-01-31 at 09:01 +0000, Jan Beulich wrote:
> > >> >>> On 30.01.12 at 18:10, Wei Liu <wei.liu2@citrix.com> wrote:
> > >> > On Mon, 2012-01-30 at 16:35 +0000, Jan Beulich wrote:
> > >> >> >>> On 30.01.12 at 15:45, Wei Liu <wei.liu2@citrix.com> wrote:
> > >> >> > -int xenvif_map_frontend_rings(struct xenvif *vif,
> > >> >> > -			      grant_ref_t tx_ring_ref,
> > >> >> > -			      grant_ref_t rx_ring_ref)
> > >> >> > +int xenvif_map_frontend_rings(struct xen_comms *comms,
> > >> >> > +			      int domid,
> > >> >> > +			      unsigned long ring_ref[],
> > >> >> > +			      unsigned int  ring_ref_count)
> > >> >> >  {
> > >> >> > -	void *addr;
> > >> >> > -	struct xen_netif_tx_sring *txs;
> > >> >> > -	struct xen_netif_rx_sring *rxs;
> > >> >> > -
> > >> >> > -	int err = -ENOMEM;
> > >> >> > +	struct gnttab_map_grant_ref op[NETBK_MAX_RING_PAGES];
> > >> >> > +	unsigned int i;
> > >> >> > +	int err = 0;
> > >> >> >  
> > >> >> > -	err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> > >> >> > -				     tx_ring_ref, &addr);
> > >> >> 
> > >> >> Any reason why you don't just extend this function (in a prerequisite
> > >> >> patch) rather than open coding a common utility function (twice) here,
> > >> >> so that other backends (blkback!) can benefit later as well.
> > >> >> 
> > >> >> Jan
> > >> >> 
> > >> > 
> > >> > I'm mainly focusing on netback stuffs, so the code is slightly coupled
> > >> > with netback -- NETBK_MAX_RING_PAGES.
> > >> > 
> > >> > To extend xenbus_map_ring_valloc and make more generic, it requires
> > >> > setting a global maximum page number limits on rings, I think it will
> > >> > require further investigation and code refactor -- which I have no time
> > >> > to attend to at the moment. :-/
> > >> 
> > >> Why? You can simply pass in the number of pages, there's no need
> > >> for a global maximum.
> > >> 
> > > 
> > > I mean the gnttab_map_gran_ref array, it is statically allocated at the
> > > moment. Of course we can make it dynamically allocated, but why bother
> > > taking the risk of allocation failure.
> > 
> > There's so many other allocations, why would you worry about this
> > one.
> > 
> 
> IMHO, this is not a critical part of the code, so failing this one thus
> making all other pieces not workable is very strange.
> 
> > But of course you can undo what a recent change did, and then
> > subsequently someone else will have to clean up again after you.
> > I'm just asking to follow good programming practices and write
> > re-usable code where potential for re-use is obvious (after all,
> > multi-page block interface patches have been floating around for
> > much longer than yours for the net interface).
> > 
> 
> I understand your concern. If the changes required will not make this
> series longer and involves major changes in block interface, I'm happy
> to refactor the xenbus interface.

Please do. Patches are more than welcome to make it be more versatile.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2012-01-31 14:51 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-30 14:45 [RFC PATCH V3] Xen netback / netfront improvement Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 01/16] netback: page pool version 1 Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 02/16] netback: add module unload function Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 03/16] netback: switch to NAPI + kthread model Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 04/16] netback: switch to per-cpu scratch space Wei Liu
2012-01-30 16:49   ` Viral Mehta
2012-01-30 17:05     ` Wei Liu
2012-01-30 17:05       ` Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 05/16] netback: add module get/put operations along with vif connect/disconnect Wei Liu
2012-01-31 10:24   ` Ian Campbell
2012-01-31 10:39     ` Wei Liu
2012-01-31 10:39       ` Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 06/16] netback: melt xen_netbk into xenvif Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 07/16] netback: alter internal function/structure names Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 08/16] netback: remove unwanted notification generation during NAPI processing Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 09/16] netback: nuke xenvif_receive_skb Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 10/16] netback: rework of per-cpu scratch space Wei Liu
2012-01-30 21:53   ` Konrad Rzeszutek Wilk
2012-01-31 10:48     ` Wei Liu
2012-01-31 10:48       ` Wei Liu
2012-01-31  1:25   ` Eric Dumazet
2012-01-31 10:43     ` Wei Liu
2012-01-31 10:43       ` Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 11/16] netback: print alert and bail when scratch space is not available Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 12/16] netback: multi-page ring support Wei Liu
2012-01-30 16:35   ` [Xen-devel] " Jan Beulich
2012-01-30 16:35     ` Jan Beulich
2012-01-30 17:10     ` Wei Liu
2012-01-30 17:10       ` Wei Liu
2012-01-31  9:01       ` Jan Beulich
2012-01-31 11:09         ` Wei Liu
2012-01-31 11:09           ` Wei Liu
2012-01-31 11:12           ` Ian Campbell
2012-01-31 13:24           ` Jan Beulich
2012-01-31 13:32             ` Wei Liu
2012-01-31 13:32               ` Wei Liu
2012-01-31 14:48               ` Konrad Rzeszutek Wilk
2012-01-30 14:45 ` [RFC PATCH V3 13/16] netback: stub for multi receive protocol support Wei Liu
2012-01-30 21:47   ` [Xen-devel] " Konrad Rzeszutek Wilk
2012-01-31 11:03     ` Wei Liu
2012-01-31 11:03       ` Wei Liu
2012-01-31 14:43       ` Konrad Rzeszutek Wilk
2012-01-30 14:45 ` [RFC PATCH V3 14/16] netback: split event channels support Wei Liu
2012-01-31 10:37   ` Ian Campbell
2012-01-31 11:57     ` Wei Liu
2012-01-31 11:57       ` Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 15/16] netfront: multi page ring support Wei Liu
2012-01-30 21:39   ` [Xen-devel] " Konrad Rzeszutek Wilk
2012-01-31  9:12     ` Ian Campbell
2012-01-31 11:17       ` Wei Liu
2012-01-31 11:17         ` Wei Liu
2012-01-31  9:53     ` Jan Beulich
2012-01-31  9:53       ` Jan Beulich
2012-01-31 11:15       ` Wei Liu
2012-01-31 11:15         ` Wei Liu
2012-01-31 10:58     ` Wei Liu
2012-01-31 10:58       ` Wei Liu
2012-01-30 14:45 ` [RFC PATCH V3 16/16] netfront: split event channels support Wei Liu
2012-01-30 21:25   ` [Xen-devel] " Konrad Rzeszutek Wilk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.