All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/13] Persistent grant maps for xen net drivers
@ 2015-05-12 17:18 ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

This patch implements persistent grants for xen-net{back,front}. There has
been work on persistent grants in the past[1], but the one here
described is a bit different: 1) using zerocopy skbs for RX path in
xen-netfront as opposed to memcpy every packet; 2) using a tree to store
the grants (and doing so using same interfaces as blkback/blkfront, to
hopefully share a common layer) 3) reusing TX map/unmap paths for handling
persistent grants case 4) sending the buffer on ndo_start_xmit as opposed
to bouncing them through the RX kthread.

Intrahost performance increases significantly between 16-78% (depending on
the host), and performance per-queue (tested with up to 6 queues) scales
nicely all the way to ~30 Gbits, specially on TX path. Rates for small
packet sizes increase up to 1.82 Mpps TX (1.2 Gbit/s on wire, 64 pkt_size)
and 2.78 Mpps RX (1.8 Gbit/s on wire, 64 pkt_size). This is around 1.5x to
2x improvement compared with grant copy/map. On bigger packet sizes the
improvement is even more noticeable. The only problem though, is that
performance seems to decrease for RX path on 1 queue (and 2 queues with
slower CPUs). This is because of the extra memcpy on xen-netfront (in
skb_copy_ubufs, frags > 0) for pkt len > RX_COPY_THRESHOLD.

This series are organized as follows: Patch 1 defines the routines for
managing the tree, Patch 2,11 implements feature detection, Patch 3-4 the
actual implementation of TX/RX grants on the backend, Patch 5 proposes
copying the buffer on ndo_start_xmit, Patch 7 fixes a bug when using pktgen
with burst >1 without persistent grants. Patches 12-13 implements the
frontend part. Patches 5,9,10 are only structural before introducing the
main changes. Overall, it works as the following:

On transmit path, xen-netfront will grant a new page and memcpy the skb to
it.  On netback NAPI poll, we will check for a persistent grant available
for header and frags grefs. If none is found in the tree, it will resort to
grant copy (only for the header) but still preparing the grant map and
adding them to the tree.  The frags are handled the same as before with the
exception of having to add the grant map to the tree and not having to
unmap it in the skb callback (when freeing).

On receive path we lookup for a page mapped for guest gref. If it exists in
the tree, it does a memcpy and reverting to grant copy in case of the first
failed lookup, or in case we don't have free pages to create mappings. On
xen-netfront RX we then grab responses and use zerocopy skbs to be able to
reuse the same pages for future requests, and to avoid an extra copy for
packets smaller <= RX_COPY_THRESHOLD.  Additionally I also propose copying
the buffer on ndo_start_xmit to avoid wait queue and rx_queue contention
and because no grant table batching is required with persistent grants
(besides the initial map/copy). The latter improved the pkt rates
(by ~2x) specially for smaller packet sizes, and for pkt
len <= RX_COPY_THRESHOLD in RX path.

Packet I/O Tests:

Measured on a Intel Xeon E5-1650 v2, Xen 4.5, no HT. Used pktgen "burst 1"
and "clone_skbs 100000" (to avoid alloc skb overheads) with various pkt
sizes. All tests are DomU <-> Dom0, unless specified otherwise.
Graphs:
http://cnp.neclab.eu/images/pgntperf/udprx_hostA.png
http://cnp.neclab.eu/images/pgntperf/udptx_hostA.png

                             | Baseline  | Pers. Grants       |
---------------------------------------------------------------
1q DomU TX (pkt_size 64)     | 1.24 Mpps | 1.82  Mpps (+ 46%) |
1q DomU TX (pkt_size 1496)   | 518  Kpps | 1.48  Mpps (+150%) |
1q DomU TX (pkt_size 65535)  | 66.4 Kpps | 205.1 Kpps         |
---------------------------------------------------------------
1q DomU RX (pkt_size 64)     | 1.33 Mpps | 2.78  Mpps (+109%) |
1q DomU RX (pkt_size 1496)   | 1.03 Mpps | 1.66  Mpps (+ 60%) |
1q DomU RX (pkt_size 65535)  | 52.5 Kpps | 97.8  Kpps         |
---------------------------------------------------------------

I also made a micro-benchmark with a MiniOS-based guest and it was able
to reach up to 4.17 Mpps (with pktgen burst 8, pkt_size 64) hinting that
the throughput grows with bigger batches when using xmit_more. In
this case, the guest netfront was busy looping, and not setting the ring
rsp_event which would (only) lead to not triggering the notification by
the backend. Note that my purpose with this experiment was just to see if
copying the buffer on xenvif_start_xmit was indeed performing better.

Bulk Transfer Tests A:

Measured on a Intel Xeon E5-1650 v2 @ 3.5 Ghz, Xen 4.5, no HT. Used
iperf (TCP)  with an increased number of flows following similar
methodology explained in [2]. All vif irqs are balanced across cores
in both Dom0 and DomU. Tests are between DomU <-> Dom0, unless
specified otherwise.
Graph:
http://cnp.neclab.eu/images/pgntperf/tcprx_hostA.png
http://cnp.neclab.eu/images/pgntperf/tcptx_hostA.png
http://cnp.neclab.eu/images/pgntperf/tcpintra_hostA.png

                | Baseline  | Pers. Grants      |
-------------------------------------------------
1q DomU TX      | 14.5 Gbit | 21.6 Gbit (+ 48%) |
2q DomU TX      | 17.6 Gbit | 27.4 Gbit         |
3q DomU TX      | 17.2 Gbit | 29.3 Gbit (+ 70%) |
-------------------------------------------------
1q DomU RX      | 20.9 Gbit | 17.8 Gbit (- 15%) |
2q DomU RX      | 21.1 Gbit | 24.9 Gbit         |
3q DomU RX      | 22.1 Gbit | 31.0 Gbit (+ 40%) |
-------------------------------------------------
1q DomU-to-DomU | 12.4 Gbit | 18.9 Gbit (+ 52%) |

Bulk Transfer Tests B: 

Same as before, but measured on a Intel Xeon E5-2697 v2 @ 2.7 Ghz,
to test guests with a higher number of queues (>3).
Graph:
http://cnp.neclab.eu/images/pgntperf/tcprx_hostB.png
http://cnp.neclab.eu/images/pgntperf/tcptx_hostB.png
http://cnp.neclab.eu/images/pgntperf/tcpintra_hostB.png

                | Baseline  | Pers. Grants      |
-------------------------------------------------
1q DomU TX      | 10.5 Gbit | 15.9 Gbit (+ 51%) |
2q DomU TX      | 14.0 Gbit | 20.1 Gbit         |
3q DomU TX      | 15.7 Gbit | 23.5 Gbit         |
4q DomU TX      | 15.0 Gbit | 25.9 Gbit         |
6q DomU TX      | 15.9 Gbit | 30.0 Gbit (+ 88%) |
-------------------------------------------------
1q DomU RX      | 15.1 Gbit | 13.3 Gbit (- 11%) |
2q DomU RX      | 19.5 Gbit | 18.1 Gbit (-  7%) |
3q DomU RX      | 22.2 Gbit | 22.7 Gbit         |
4q DomU RX      | 23.7 Gbit | 25.8 Gbit         |
6q DomU RX      | 24.0 Gbit | 29.8 Gbit (+ 24%) |
-------------------------------------------------
1q DomU-to-DomU | 12.5 Gbit | 14.5 Gbit (+ 16%) |
2q DomU-to-DomU | 12.6 Gbit | 20.6 Gbit (+ 63%) |
3q DomU-to-DomU | 13.7 Gbit | 24.5 Gbit (+ 78%) |

There have been recently[3] some discussions and issues raised on
persistent grants for the block layer, though the numbers above
show some significant improvements specially on more network intensive
workloads and provide a margin for comparison against future map/unmap
improvements.

Any comments or suggestions are welcome,
Thanks!
Joao

[1] http://article.gmane.org/gmane.linux.network/249383
[2] http://bit.ly/1IhJfXD
[3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html

Joao Martins (13):
  xen-netback: add persistent grant tree ops
  xen-netback: xenbus feature persistent support
  xen-netback: implement TX persistent grants
  xen-netback: implement RX persistent grants
  xen-netback: refactor xenvif_rx_action
  xen-netback: copy buffer on xenvif_start_xmit()
  xen-netback: add persistent tree counters to debugfs
  xen-netback: clone skb if skb->xmit_more is set
  xen-netfront: move grant_{ref,page} to struct grant
  xen-netfront: refactor claim/release grant
  xen-netfront: feature-persistent xenbus support
  xen-netfront: implement TX persistent grants
  xen-netfront: implement RX persistent grants

 drivers/net/xen-netback/common.h    |  79 ++++
 drivers/net/xen-netback/interface.c |  78 +++-
 drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
 drivers/net/xen-netback/xenbus.c    |  24 +
 drivers/net/xen-netfront.c          | 362 ++++++++++++---
 5 files changed, 1216 insertions(+), 200 deletions(-)

-- 
2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [RFC PATCH 00/13] Persistent grant maps for xen net drivers
@ 2015-05-12 17:18 ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

This patch implements persistent grants for xen-net{back,front}. There has
been work on persistent grants in the past[1], but the one here
described is a bit different: 1) using zerocopy skbs for RX path in
xen-netfront as opposed to memcpy every packet; 2) using a tree to store
the grants (and doing so using same interfaces as blkback/blkfront, to
hopefully share a common layer) 3) reusing TX map/unmap paths for handling
persistent grants case 4) sending the buffer on ndo_start_xmit as opposed
to bouncing them through the RX kthread.

Intrahost performance increases significantly between 16-78% (depending on
the host), and performance per-queue (tested with up to 6 queues) scales
nicely all the way to ~30 Gbits, specially on TX path. Rates for small
packet sizes increase up to 1.82 Mpps TX (1.2 Gbit/s on wire, 64 pkt_size)
and 2.78 Mpps RX (1.8 Gbit/s on wire, 64 pkt_size). This is around 1.5x to
2x improvement compared with grant copy/map. On bigger packet sizes the
improvement is even more noticeable. The only problem though, is that
performance seems to decrease for RX path on 1 queue (and 2 queues with
slower CPUs). This is because of the extra memcpy on xen-netfront (in
skb_copy_ubufs, frags > 0) for pkt len > RX_COPY_THRESHOLD.

This series are organized as follows: Patch 1 defines the routines for
managing the tree, Patch 2,11 implements feature detection, Patch 3-4 the
actual implementation of TX/RX grants on the backend, Patch 5 proposes
copying the buffer on ndo_start_xmit, Patch 7 fixes a bug when using pktgen
with burst >1 without persistent grants. Patches 12-13 implements the
frontend part. Patches 5,9,10 are only structural before introducing the
main changes. Overall, it works as the following:

On transmit path, xen-netfront will grant a new page and memcpy the skb to
it.  On netback NAPI poll, we will check for a persistent grant available
for header and frags grefs. If none is found in the tree, it will resort to
grant copy (only for the header) but still preparing the grant map and
adding them to the tree.  The frags are handled the same as before with the
exception of having to add the grant map to the tree and not having to
unmap it in the skb callback (when freeing).

On receive path we lookup for a page mapped for guest gref. If it exists in
the tree, it does a memcpy and reverting to grant copy in case of the first
failed lookup, or in case we don't have free pages to create mappings. On
xen-netfront RX we then grab responses and use zerocopy skbs to be able to
reuse the same pages for future requests, and to avoid an extra copy for
packets smaller <= RX_COPY_THRESHOLD.  Additionally I also propose copying
the buffer on ndo_start_xmit to avoid wait queue and rx_queue contention
and because no grant table batching is required with persistent grants
(besides the initial map/copy). The latter improved the pkt rates
(by ~2x) specially for smaller packet sizes, and for pkt
len <= RX_COPY_THRESHOLD in RX path.

Packet I/O Tests:

Measured on a Intel Xeon E5-1650 v2, Xen 4.5, no HT. Used pktgen "burst 1"
and "clone_skbs 100000" (to avoid alloc skb overheads) with various pkt
sizes. All tests are DomU <-> Dom0, unless specified otherwise.
Graphs:
http://cnp.neclab.eu/images/pgntperf/udprx_hostA.png
http://cnp.neclab.eu/images/pgntperf/udptx_hostA.png

                             | Baseline  | Pers. Grants       |
---------------------------------------------------------------
1q DomU TX (pkt_size 64)     | 1.24 Mpps | 1.82  Mpps (+ 46%) |
1q DomU TX (pkt_size 1496)   | 518  Kpps | 1.48  Mpps (+150%) |
1q DomU TX (pkt_size 65535)  | 66.4 Kpps | 205.1 Kpps         |
---------------------------------------------------------------
1q DomU RX (pkt_size 64)     | 1.33 Mpps | 2.78  Mpps (+109%) |
1q DomU RX (pkt_size 1496)   | 1.03 Mpps | 1.66  Mpps (+ 60%) |
1q DomU RX (pkt_size 65535)  | 52.5 Kpps | 97.8  Kpps         |
---------------------------------------------------------------

I also made a micro-benchmark with a MiniOS-based guest and it was able
to reach up to 4.17 Mpps (with pktgen burst 8, pkt_size 64) hinting that
the throughput grows with bigger batches when using xmit_more. In
this case, the guest netfront was busy looping, and not setting the ring
rsp_event which would (only) lead to not triggering the notification by
the backend. Note that my purpose with this experiment was just to see if
copying the buffer on xenvif_start_xmit was indeed performing better.

Bulk Transfer Tests A:

Measured on a Intel Xeon E5-1650 v2 @ 3.5 Ghz, Xen 4.5, no HT. Used
iperf (TCP)  with an increased number of flows following similar
methodology explained in [2]. All vif irqs are balanced across cores
in both Dom0 and DomU. Tests are between DomU <-> Dom0, unless
specified otherwise.
Graph:
http://cnp.neclab.eu/images/pgntperf/tcprx_hostA.png
http://cnp.neclab.eu/images/pgntperf/tcptx_hostA.png
http://cnp.neclab.eu/images/pgntperf/tcpintra_hostA.png

                | Baseline  | Pers. Grants      |
-------------------------------------------------
1q DomU TX      | 14.5 Gbit | 21.6 Gbit (+ 48%) |
2q DomU TX      | 17.6 Gbit | 27.4 Gbit         |
3q DomU TX      | 17.2 Gbit | 29.3 Gbit (+ 70%) |
-------------------------------------------------
1q DomU RX      | 20.9 Gbit | 17.8 Gbit (- 15%) |
2q DomU RX      | 21.1 Gbit | 24.9 Gbit         |
3q DomU RX      | 22.1 Gbit | 31.0 Gbit (+ 40%) |
-------------------------------------------------
1q DomU-to-DomU | 12.4 Gbit | 18.9 Gbit (+ 52%) |

Bulk Transfer Tests B: 

Same as before, but measured on a Intel Xeon E5-2697 v2 @ 2.7 Ghz,
to test guests with a higher number of queues (>3).
Graph:
http://cnp.neclab.eu/images/pgntperf/tcprx_hostB.png
http://cnp.neclab.eu/images/pgntperf/tcptx_hostB.png
http://cnp.neclab.eu/images/pgntperf/tcpintra_hostB.png

                | Baseline  | Pers. Grants      |
-------------------------------------------------
1q DomU TX      | 10.5 Gbit | 15.9 Gbit (+ 51%) |
2q DomU TX      | 14.0 Gbit | 20.1 Gbit         |
3q DomU TX      | 15.7 Gbit | 23.5 Gbit         |
4q DomU TX      | 15.0 Gbit | 25.9 Gbit         |
6q DomU TX      | 15.9 Gbit | 30.0 Gbit (+ 88%) |
-------------------------------------------------
1q DomU RX      | 15.1 Gbit | 13.3 Gbit (- 11%) |
2q DomU RX      | 19.5 Gbit | 18.1 Gbit (-  7%) |
3q DomU RX      | 22.2 Gbit | 22.7 Gbit         |
4q DomU RX      | 23.7 Gbit | 25.8 Gbit         |
6q DomU RX      | 24.0 Gbit | 29.8 Gbit (+ 24%) |
-------------------------------------------------
1q DomU-to-DomU | 12.5 Gbit | 14.5 Gbit (+ 16%) |
2q DomU-to-DomU | 12.6 Gbit | 20.6 Gbit (+ 63%) |
3q DomU-to-DomU | 13.7 Gbit | 24.5 Gbit (+ 78%) |

There have been recently[3] some discussions and issues raised on
persistent grants for the block layer, though the numbers above
show some significant improvements specially on more network intensive
workloads and provide a margin for comparison against future map/unmap
improvements.

Any comments or suggestions are welcome,
Thanks!
Joao

[1] http://article.gmane.org/gmane.linux.network/249383
[2] http://bit.ly/1IhJfXD
[3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html

Joao Martins (13):
  xen-netback: add persistent grant tree ops
  xen-netback: xenbus feature persistent support
  xen-netback: implement TX persistent grants
  xen-netback: implement RX persistent grants
  xen-netback: refactor xenvif_rx_action
  xen-netback: copy buffer on xenvif_start_xmit()
  xen-netback: add persistent tree counters to debugfs
  xen-netback: clone skb if skb->xmit_more is set
  xen-netfront: move grant_{ref,page} to struct grant
  xen-netfront: refactor claim/release grant
  xen-netfront: feature-persistent xenbus support
  xen-netfront: implement TX persistent grants
  xen-netfront: implement RX persistent grants

 drivers/net/xen-netback/common.h    |  79 ++++
 drivers/net/xen-netback/interface.c |  78 +++-
 drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
 drivers/net/xen-netback/xenbus.c    |  24 +
 drivers/net/xen-netfront.c          | 362 ++++++++++++---
 5 files changed, 1216 insertions(+), 200 deletions(-)

-- 
2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* [RFC PATCH 01/13] xen-netback: add persistent grant tree ops
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Implement the necessary routines for managing the grant tree. These
routines are ported from blkback driver and slightly modified to be
more generic. This patch is separated because it relates to code that
could be shared with other drivers, in case persistent grants are adopted.

The changes compared to blkback are: declaring a struct persistent_gnt_tree
to store grant tree info so that these routines are called with a tree
argument rather than a driver private data structure. It has a pool of
free pages that should be used for grant maps to be added to the tree.
We can't sleep on xenvif_tx_action/xenvif_start_xmit, so this pool is
prefilled with xen ballooned pages when initializing the tree.

Regarding *_persistent_gnt API changes: get_persistent_gnt() will return
ERR_PTR(-EBUSY) if we try to fetch an already in use grant ref. This is
useful on netback case so that we fallback to map/unmap in case we try to
fetch an already  in use grant. This way we save a map (plus unmap on
error) and prevent the error on add_persistent_gnt that would also lead
towards dropping the packet.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h  |  57 +++++++++++++++
 drivers/net/xen-netback/netback.c | 145 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..dd02386 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -106,6 +106,48 @@ struct xenvif_rx_meta {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+/* Number of available flags */
+#define PERSISTENT_GNT_FLAGS_SIZE      2
+/* This persistent grant is currently in use */
+#define PERSISTENT_GNT_ACTIVE          0
+/* This persistent grant has been used, this flag is set when we remove the
+ * PERSISTENT_GNT_ACTIVE, to know that this grant has been used recently.
+ */
+#define PERSISTENT_GNT_WAS_ACTIVE      1
+
+struct persistent_gnt {
+	struct page *page; /* mapped page */
+	grant_ref_t gnt;
+	grant_handle_t handle;
+	DECLARE_BITMAP(flags, PERSISTENT_GNT_FLAGS_SIZE);
+	struct rb_node node;
+};
+
+struct persistent_gnt_tree {
+	/* Tree to store persistent grants */
+	struct rb_root root;
+
+	/* Number of grants in use */
+	atomic_t gnt_in_use;
+
+	/* Number of grants in the tree */
+	unsigned int gnt_c;
+
+	/* Maximum number of grants in the tree */
+	unsigned int gnt_max;
+
+	/* True if we reached maximum number of
+	 * persistent grants in the tree
+	 */
+	bool overflow;
+
+	/* Free pages for grant maps */
+	struct list_head free_pages;
+
+	/* Initialized with <gnt_max> pages */
+	unsigned int free_pages_num;
+};
+
 struct xenvif;
 
 struct xenvif_stats {
@@ -224,6 +266,7 @@ struct xenvif {
 	u8 can_sg:1;
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
+	u8 persistent_grants:1;
 
 	/* Is this interface disabled? True when backend discovers
 	 * frontend is rogue.
@@ -344,4 +387,18 @@ void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue,
 				 struct sk_buff *skb);
 void xenvif_skb_zerocopy_complete(struct xenvif_queue *queue);
 
+/* tree ops for persistent grants */
+struct persistent_gnt *get_persistent_gnt(struct persistent_gnt_tree *tree,
+					  grant_ref_t gref);
+int add_persistent_gnt(struct persistent_gnt_tree *tree,
+		       struct persistent_gnt *persistent_gnt);
+void put_persistent_gnt(struct persistent_gnt_tree *tree,
+			struct persistent_gnt *persistent_gnt);
+void free_persistent_gnts(struct persistent_gnt_tree *tree, unsigned int num);
+/* Gets one page from the free pool in the tree */
+int get_free_page(struct persistent_gnt_tree *tree, struct page **page);
+/* Adds pages to the free pool in the tree */
+void put_free_pages(struct persistent_gnt_tree *tree, struct page **page,
+		    int num);
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4de46aa..8df0a73 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -107,6 +107,151 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue
 					     u16      size,
 					     u16      flags);
 
+#define foreach_grant_safe(pos, n, rbtree, node) \
+	for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
+	     (n) = (&(pos)->node) ? rb_next(&(pos)->node) : NULL; \
+	     &(pos)->node; \
+	     (pos) = container_of(n, typeof(*(pos)), node), \
+	     (n) = (&(pos)->node) ? rb_next(&(pos)->node) : NULL)
+
+int add_persistent_gnt(struct persistent_gnt_tree *tree,
+		       struct persistent_gnt *persistent_gnt)
+{
+	struct rb_node **new = NULL, *parent = NULL;
+	struct persistent_gnt *this;
+
+	if (tree->gnt_c >= tree->gnt_max) {
+		pr_err("Using maximum number of peristent grants\n");
+		tree->overflow = true;
+		return -EBUSY;
+	}
+	/* Figure out where to put new node */
+	new = &tree->root.rb_node;
+	while (*new) {
+		this = container_of(*new, struct persistent_gnt, node);
+
+		parent = *new;
+		if (persistent_gnt->gnt < this->gnt) {
+			new = &((*new)->rb_left);
+		} else if (persistent_gnt->gnt > this->gnt) {
+			new = &((*new)->rb_right);
+		} else {
+			pr_err("Trying to add a gref that's already in the tree\n");
+			return -EINVAL;
+		}
+	}
+
+	bitmap_zero(persistent_gnt->flags, PERSISTENT_GNT_FLAGS_SIZE);
+	set_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
+	/* Add new node and rebalance tree. */
+	rb_link_node(&persistent_gnt->node, parent, new);
+	rb_insert_color(&persistent_gnt->node, &tree->root);
+	tree->gnt_c++;
+	atomic_inc(&tree->gnt_in_use);
+	return 0;
+}
+
+struct persistent_gnt *get_persistent_gnt(struct persistent_gnt_tree *tree,
+					  grant_ref_t gref)
+{
+	struct persistent_gnt *data;
+	struct rb_node *node = NULL;
+
+	node = tree->root.rb_node;
+	while (node) {
+		data = container_of(node, struct persistent_gnt, node);
+
+		if (gref < data->gnt) {
+			node = node->rb_left;
+		} else if (gref > data->gnt) {
+			node = node->rb_right;
+		} else {
+			if (test_bit(PERSISTENT_GNT_ACTIVE, data->flags)) {
+				pr_err("Requesting a grant already in use\n");
+				return ERR_PTR(-EBUSY);
+			}
+			set_bit(PERSISTENT_GNT_ACTIVE, data->flags);
+			atomic_inc(&tree->gnt_in_use);
+			return data;
+		}
+	}
+	return NULL;
+}
+
+void put_persistent_gnt(struct persistent_gnt_tree *tree,
+			struct persistent_gnt *persistent_gnt)
+{
+	if (!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
+		pr_alert("Freeing a grant already unused\n");
+	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
+	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
+	atomic_dec(&tree->gnt_in_use);
+}
+
+void free_persistent_gnts(struct persistent_gnt_tree *tree, unsigned int num)
+
+{
+	struct gnttab_unmap_grant_ref unmap[FATAL_SKB_SLOTS_DEFAULT];
+	struct page *pages[FATAL_SKB_SLOTS_DEFAULT];
+	struct persistent_gnt *persistent_gnt;
+	struct rb_root *root = &tree->root;
+	struct rb_node *n;
+	int ret = 0;
+	int pages_to_unmap = 0;
+	void *addr;
+
+	foreach_grant_safe(persistent_gnt, n, root, node) {
+		BUG_ON(persistent_gnt->handle ==
+			NETBACK_INVALID_HANDLE);
+
+		addr = pfn_to_kaddr(page_to_pfn(persistent_gnt->page));
+		gnttab_set_unmap_op(&unmap[pages_to_unmap],
+				    (unsigned long)addr,
+				    GNTMAP_host_map | GNTMAP_readonly,
+				    persistent_gnt->handle);
+
+		pages[pages_to_unmap] = persistent_gnt->page;
+
+		if (++pages_to_unmap == FATAL_SKB_SLOTS_DEFAULT ||
+		    !rb_next(&persistent_gnt->node)) {
+			ret = gnttab_unmap_refs(unmap, NULL, pages,
+						pages_to_unmap);
+			BUG_ON(ret);
+			put_free_pages(tree, pages, pages_to_unmap);
+			pages_to_unmap = 0;
+		}
+
+		rb_erase(&persistent_gnt->node, root);
+		kfree(persistent_gnt);
+		num--;
+	}
+	BUG_ON(num != 0);
+}
+
+int get_free_page(struct persistent_gnt_tree *tree,
+		  struct page **page)
+{
+	if (list_empty(&tree->free_pages)) {
+		BUG_ON(tree->free_pages_num != 0);
+		return 1;
+	}
+	BUG_ON(tree->free_pages_num == 0);
+	page[0] = list_first_entry(&tree->free_pages, struct page, lru);
+	list_del(&page[0]->lru);
+	tree->free_pages_num--;
+	return 0;
+}
+
+void put_free_pages(struct persistent_gnt_tree *tree,
+		    struct page **page, int num)
+{
+	int i;
+
+	for (i = 0; i < num; i++)
+		list_add(&page[i]->lru, &tree->free_pages);
+	tree->free_pages_num += num;
+}
+
 static inline unsigned long idx_to_pfn(struct xenvif_queue *queue,
 				       u16 idx)
 {
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 01/13] xen-netback: add persistent grant tree ops
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Implement the necessary routines for managing the grant tree. These
routines are ported from blkback driver and slightly modified to be
more generic. This patch is separated because it relates to code that
could be shared with other drivers, in case persistent grants are adopted.

The changes compared to blkback are: declaring a struct persistent_gnt_tree
to store grant tree info so that these routines are called with a tree
argument rather than a driver private data structure. It has a pool of
free pages that should be used for grant maps to be added to the tree.
We can't sleep on xenvif_tx_action/xenvif_start_xmit, so this pool is
prefilled with xen ballooned pages when initializing the tree.

Regarding *_persistent_gnt API changes: get_persistent_gnt() will return
ERR_PTR(-EBUSY) if we try to fetch an already in use grant ref. This is
useful on netback case so that we fallback to map/unmap in case we try to
fetch an already  in use grant. This way we save a map (plus unmap on
error) and prevent the error on add_persistent_gnt that would also lead
towards dropping the packet.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h  |  57 +++++++++++++++
 drivers/net/xen-netback/netback.c | 145 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 202 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..dd02386 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -106,6 +106,48 @@ struct xenvif_rx_meta {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+/* Number of available flags */
+#define PERSISTENT_GNT_FLAGS_SIZE      2
+/* This persistent grant is currently in use */
+#define PERSISTENT_GNT_ACTIVE          0
+/* This persistent grant has been used, this flag is set when we remove the
+ * PERSISTENT_GNT_ACTIVE, to know that this grant has been used recently.
+ */
+#define PERSISTENT_GNT_WAS_ACTIVE      1
+
+struct persistent_gnt {
+	struct page *page; /* mapped page */
+	grant_ref_t gnt;
+	grant_handle_t handle;
+	DECLARE_BITMAP(flags, PERSISTENT_GNT_FLAGS_SIZE);
+	struct rb_node node;
+};
+
+struct persistent_gnt_tree {
+	/* Tree to store persistent grants */
+	struct rb_root root;
+
+	/* Number of grants in use */
+	atomic_t gnt_in_use;
+
+	/* Number of grants in the tree */
+	unsigned int gnt_c;
+
+	/* Maximum number of grants in the tree */
+	unsigned int gnt_max;
+
+	/* True if we reached maximum number of
+	 * persistent grants in the tree
+	 */
+	bool overflow;
+
+	/* Free pages for grant maps */
+	struct list_head free_pages;
+
+	/* Initialized with <gnt_max> pages */
+	unsigned int free_pages_num;
+};
+
 struct xenvif;
 
 struct xenvif_stats {
@@ -224,6 +266,7 @@ struct xenvif {
 	u8 can_sg:1;
 	u8 ip_csum:1;
 	u8 ipv6_csum:1;
+	u8 persistent_grants:1;
 
 	/* Is this interface disabled? True when backend discovers
 	 * frontend is rogue.
@@ -344,4 +387,18 @@ void xenvif_skb_zerocopy_prepare(struct xenvif_queue *queue,
 				 struct sk_buff *skb);
 void xenvif_skb_zerocopy_complete(struct xenvif_queue *queue);
 
+/* tree ops for persistent grants */
+struct persistent_gnt *get_persistent_gnt(struct persistent_gnt_tree *tree,
+					  grant_ref_t gref);
+int add_persistent_gnt(struct persistent_gnt_tree *tree,
+		       struct persistent_gnt *persistent_gnt);
+void put_persistent_gnt(struct persistent_gnt_tree *tree,
+			struct persistent_gnt *persistent_gnt);
+void free_persistent_gnts(struct persistent_gnt_tree *tree, unsigned int num);
+/* Gets one page from the free pool in the tree */
+int get_free_page(struct persistent_gnt_tree *tree, struct page **page);
+/* Adds pages to the free pool in the tree */
+void put_free_pages(struct persistent_gnt_tree *tree, struct page **page,
+		    int num);
+
 #endif /* __XEN_NETBACK__COMMON_H__ */
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 4de46aa..8df0a73 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -107,6 +107,151 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue
 					     u16      size,
 					     u16      flags);
 
+#define foreach_grant_safe(pos, n, rbtree, node) \
+	for ((pos) = container_of(rb_first((rbtree)), typeof(*(pos)), node), \
+	     (n) = (&(pos)->node) ? rb_next(&(pos)->node) : NULL; \
+	     &(pos)->node; \
+	     (pos) = container_of(n, typeof(*(pos)), node), \
+	     (n) = (&(pos)->node) ? rb_next(&(pos)->node) : NULL)
+
+int add_persistent_gnt(struct persistent_gnt_tree *tree,
+		       struct persistent_gnt *persistent_gnt)
+{
+	struct rb_node **new = NULL, *parent = NULL;
+	struct persistent_gnt *this;
+
+	if (tree->gnt_c >= tree->gnt_max) {
+		pr_err("Using maximum number of peristent grants\n");
+		tree->overflow = true;
+		return -EBUSY;
+	}
+	/* Figure out where to put new node */
+	new = &tree->root.rb_node;
+	while (*new) {
+		this = container_of(*new, struct persistent_gnt, node);
+
+		parent = *new;
+		if (persistent_gnt->gnt < this->gnt) {
+			new = &((*new)->rb_left);
+		} else if (persistent_gnt->gnt > this->gnt) {
+			new = &((*new)->rb_right);
+		} else {
+			pr_err("Trying to add a gref that's already in the tree\n");
+			return -EINVAL;
+		}
+	}
+
+	bitmap_zero(persistent_gnt->flags, PERSISTENT_GNT_FLAGS_SIZE);
+	set_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
+	/* Add new node and rebalance tree. */
+	rb_link_node(&persistent_gnt->node, parent, new);
+	rb_insert_color(&persistent_gnt->node, &tree->root);
+	tree->gnt_c++;
+	atomic_inc(&tree->gnt_in_use);
+	return 0;
+}
+
+struct persistent_gnt *get_persistent_gnt(struct persistent_gnt_tree *tree,
+					  grant_ref_t gref)
+{
+	struct persistent_gnt *data;
+	struct rb_node *node = NULL;
+
+	node = tree->root.rb_node;
+	while (node) {
+		data = container_of(node, struct persistent_gnt, node);
+
+		if (gref < data->gnt) {
+			node = node->rb_left;
+		} else if (gref > data->gnt) {
+			node = node->rb_right;
+		} else {
+			if (test_bit(PERSISTENT_GNT_ACTIVE, data->flags)) {
+				pr_err("Requesting a grant already in use\n");
+				return ERR_PTR(-EBUSY);
+			}
+			set_bit(PERSISTENT_GNT_ACTIVE, data->flags);
+			atomic_inc(&tree->gnt_in_use);
+			return data;
+		}
+	}
+	return NULL;
+}
+
+void put_persistent_gnt(struct persistent_gnt_tree *tree,
+			struct persistent_gnt *persistent_gnt)
+{
+	if (!test_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags))
+		pr_alert("Freeing a grant already unused\n");
+	set_bit(PERSISTENT_GNT_WAS_ACTIVE, persistent_gnt->flags);
+	clear_bit(PERSISTENT_GNT_ACTIVE, persistent_gnt->flags);
+	atomic_dec(&tree->gnt_in_use);
+}
+
+void free_persistent_gnts(struct persistent_gnt_tree *tree, unsigned int num)
+
+{
+	struct gnttab_unmap_grant_ref unmap[FATAL_SKB_SLOTS_DEFAULT];
+	struct page *pages[FATAL_SKB_SLOTS_DEFAULT];
+	struct persistent_gnt *persistent_gnt;
+	struct rb_root *root = &tree->root;
+	struct rb_node *n;
+	int ret = 0;
+	int pages_to_unmap = 0;
+	void *addr;
+
+	foreach_grant_safe(persistent_gnt, n, root, node) {
+		BUG_ON(persistent_gnt->handle ==
+			NETBACK_INVALID_HANDLE);
+
+		addr = pfn_to_kaddr(page_to_pfn(persistent_gnt->page));
+		gnttab_set_unmap_op(&unmap[pages_to_unmap],
+				    (unsigned long)addr,
+				    GNTMAP_host_map | GNTMAP_readonly,
+				    persistent_gnt->handle);
+
+		pages[pages_to_unmap] = persistent_gnt->page;
+
+		if (++pages_to_unmap == FATAL_SKB_SLOTS_DEFAULT ||
+		    !rb_next(&persistent_gnt->node)) {
+			ret = gnttab_unmap_refs(unmap, NULL, pages,
+						pages_to_unmap);
+			BUG_ON(ret);
+			put_free_pages(tree, pages, pages_to_unmap);
+			pages_to_unmap = 0;
+		}
+
+		rb_erase(&persistent_gnt->node, root);
+		kfree(persistent_gnt);
+		num--;
+	}
+	BUG_ON(num != 0);
+}
+
+int get_free_page(struct persistent_gnt_tree *tree,
+		  struct page **page)
+{
+	if (list_empty(&tree->free_pages)) {
+		BUG_ON(tree->free_pages_num != 0);
+		return 1;
+	}
+	BUG_ON(tree->free_pages_num == 0);
+	page[0] = list_first_entry(&tree->free_pages, struct page, lru);
+	list_del(&page[0]->lru);
+	tree->free_pages_num--;
+	return 0;
+}
+
+void put_free_pages(struct persistent_gnt_tree *tree,
+		    struct page **page, int num)
+{
+	int i;
+
+	for (i = 0; i < num; i++)
+		list_add(&page[i]->lru, &tree->free_pages);
+	tree->free_pages_num += num;
+}
+
 static inline unsigned long idx_to_pfn(struct xenvif_queue *queue,
 				       u16 idx)
 {
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 02/13] xen-netback: xenbus feature persistent support
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Checks for "feature-persistent" that indicates persistent grants
support. Adds max_persistent_grants module param that specifies the max
number of persistent grants, which if set to zero disables persistent
grants.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h  |  1 +
 drivers/net/xen-netback/netback.c |  5 +++++
 drivers/net/xen-netback/xenbus.c  | 13 +++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index dd02386..e70ace7 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -378,6 +378,7 @@ extern bool separate_tx_rx_irq;
 extern unsigned int rx_drain_timeout_msecs;
 extern unsigned int rx_stall_timeout_msecs;
 extern unsigned int xenvif_max_queues;
+extern unsigned int xenvif_max_pgrants;
 
 #ifdef CONFIG_DEBUG_FS
 extern struct dentry *xen_netback_dbg_root;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 8df0a73..332e489 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -72,6 +72,11 @@ module_param_named(max_queues, xenvif_max_queues, uint, 0644);
 MODULE_PARM_DESC(max_queues,
 		 "Maximum number of queues per virtual interface");
 
+unsigned int xenvif_max_pgrants = XEN_NETIF_RX_RING_SIZE;
+module_param_named(max_persistent_grants, xenvif_max_pgrants, int, 0644);
+MODULE_PARM_DESC(max_persistent_grants,
+		 "Maximum number of grants to map persistently");
+
 /*
  * This is the maximum slots a skb can have. If a guest sends a skb
  * which exceeds this limit it is considered malicious.
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3d8dbf5..766f7e5 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -331,6 +331,14 @@ static int netback_probe(struct xenbus_device *dev,
 		goto fail;
 	}
 
+	/* Persistent grants support. This is an optional feature. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "feature-persistent", "%d", xenvif_max_pgrants > 0);
+	if (err) {
+		message = "writing feature-persistent";
+		goto abort_transaction;
+	}
+
 	/*
 	 * Split event channels support, this is optional so it is not
 	 * put inside the above loop.
@@ -961,6 +969,11 @@ static int read_xenbus_vif_flags(struct backend_info *be)
 		val = 0;
 	vif->can_sg = !!val;
 
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-persistent",
+			 "%d", &val) < 0)
+		val = 0;
+	vif->persistent_grants = (xenvif_max_pgrants && !!val);
+
 	vif->gso_mask = 0;
 	vif->gso_prefix_mask = 0;
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 02/13] xen-netback: xenbus feature persistent support
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Checks for "feature-persistent" that indicates persistent grants
support. Adds max_persistent_grants module param that specifies the max
number of persistent grants, which if set to zero disables persistent
grants.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h  |  1 +
 drivers/net/xen-netback/netback.c |  5 +++++
 drivers/net/xen-netback/xenbus.c  | 13 +++++++++++++
 3 files changed, 19 insertions(+)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index dd02386..e70ace7 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -378,6 +378,7 @@ extern bool separate_tx_rx_irq;
 extern unsigned int rx_drain_timeout_msecs;
 extern unsigned int rx_stall_timeout_msecs;
 extern unsigned int xenvif_max_queues;
+extern unsigned int xenvif_max_pgrants;
 
 #ifdef CONFIG_DEBUG_FS
 extern struct dentry *xen_netback_dbg_root;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 8df0a73..332e489 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -72,6 +72,11 @@ module_param_named(max_queues, xenvif_max_queues, uint, 0644);
 MODULE_PARM_DESC(max_queues,
 		 "Maximum number of queues per virtual interface");
 
+unsigned int xenvif_max_pgrants = XEN_NETIF_RX_RING_SIZE;
+module_param_named(max_persistent_grants, xenvif_max_pgrants, int, 0644);
+MODULE_PARM_DESC(max_persistent_grants,
+		 "Maximum number of grants to map persistently");
+
 /*
  * This is the maximum slots a skb can have. If a guest sends a skb
  * which exceeds this limit it is considered malicious.
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 3d8dbf5..766f7e5 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -331,6 +331,14 @@ static int netback_probe(struct xenbus_device *dev,
 		goto fail;
 	}
 
+	/* Persistent grants support. This is an optional feature. */
+	err = xenbus_printf(XBT_NIL, dev->nodename,
+			    "feature-persistent", "%d", xenvif_max_pgrants > 0);
+	if (err) {
+		message = "writing feature-persistent";
+		goto abort_transaction;
+	}
+
 	/*
 	 * Split event channels support, this is optional so it is not
 	 * put inside the above loop.
@@ -961,6 +969,11 @@ static int read_xenbus_vif_flags(struct backend_info *be)
 		val = 0;
 	vif->can_sg = !!val;
 
+	if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-persistent",
+			 "%d", &val) < 0)
+		val = 0;
+	vif->persistent_grants = (xenvif_max_pgrants && !!val);
+
 	vif->gso_mask = 0;
 	vif->gso_prefix_mask = 0;
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Introduces persistent grants for TX path which follows similar code path
as the grant mapping.

It starts by checking if there's a persistent grant available for header
and frags grefs and if so setting it in tx_pgrants. If no persistent grant
is found in the tree for the header it will resort to grant copy (but
preparing the map ops and add them laster). For the frags it will use the
tree page pool, and in case of no pages it fallbacks to grant map/unmap
using mmap_pages. When skb destructor callback gets called we release the
slot and persistent grant within the callback to avoid waking up the
dealloc thread. As long as there are no unmaps to done the dealloc thread
will remain inactive.

Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h    |  12 ++
 drivers/net/xen-netback/interface.c |  46 +++++
 drivers/net/xen-netback/netback.c   | 341 +++++++++++++++++++++++++++++++-----
 3 files changed, 360 insertions(+), 39 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e70ace7..e5ee220 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -191,6 +191,15 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+
+	/* Tree to store the TX grants
+	 * Only used if feature-persistent = 1
+	 */
+	struct persistent_gnt_tree tx_gnts_tree;
+	struct page *tx_gnts_pages[XEN_NETIF_TX_RING_SIZE];
+	/* persistent grants in use */
+	struct persistent_gnt *tx_pgrants[MAX_PENDING_REQS];
+
 	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
 	struct page *pages_to_map[MAX_PENDING_REQS];
 	struct page *pages_to_unmap[MAX_PENDING_REQS];
@@ -361,6 +370,9 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
 
 /* Unmap a pending page and release it back to the guest */
 void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
+void xenvif_page_unmap(struct xenvif_queue *queue,
+		       grant_handle_t handle,
+		       struct page **page);
 
 static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
 {
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1a83e19..6f996ac 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -456,6 +456,34 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	return vif;
 }
 
+static int init_persistent_gnt_tree(struct persistent_gnt_tree *tree,
+				    struct page **pages, int max)
+{
+	int err;
+
+	tree->gnt_max = min_t(unsigned, max, xenvif_max_pgrants);
+	tree->root.rb_node = NULL;
+	atomic_set(&tree->gnt_in_use, 0);
+
+	err = gnttab_alloc_pages(tree->gnt_max, pages);
+	if (!err) {
+		tree->free_pages_num = 0;
+		INIT_LIST_HEAD(&tree->free_pages);
+		put_free_pages(tree, pages, tree->gnt_max);
+	}
+
+	return err;
+}
+
+static void deinit_persistent_gnt_tree(struct persistent_gnt_tree *tree,
+				       struct page **pages)
+{
+	free_persistent_gnts(tree, tree->gnt_c);
+	BUG_ON(!RB_EMPTY_ROOT(&tree->root));
+	tree->gnt_c = 0;
+	gnttab_free_pages(tree->gnt_max, pages);
+}
+
 int xenvif_init_queue(struct xenvif_queue *queue)
 {
 	int err, i;
@@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 			  .ctx = NULL,
 			  .desc = i };
 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
+		queue->tx_pgrants[i] = NULL;
+	}
+
+	if (queue->vif->persistent_grants) {
+		err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
+					       queue->tx_gnts_pages,
+					       XEN_NETIF_TX_RING_SIZE);
+		if (err)
+			goto err_disable;
 	}
 
 	return 0;
+
+err_disable:
+	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
+	queue->vif->persistent_grants = 0;
+	return 0;
 }
 
 void xenvif_carrier_on(struct xenvif *vif)
@@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
 		}
 
 		xenvif_unmap_frontend_rings(queue);
+
+		if (queue->vif->persistent_grants)
+			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
+						   queue->tx_gnts_pages);
 	}
 }
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 332e489..529d7c3 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
 }
 
+static inline void *page_to_kaddr(struct page *page)
+{
+	return pfn_to_kaddr(page_to_pfn(page));
+}
+
 #define callback_param(vif, pending_idx) \
 	(vif->pending_tx_info[pending_idx].callback_struct)
 
@@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
 	return i & (MAX_PENDING_REQS-1);
 }
 
+/*  Creates a new persistent grant and add it to the tree.
+ */
+static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
+						struct gnttab_map_grant_ref *gop)
+{
+	struct persistent_gnt *persistent_gnt;
+
+	persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);
+	if (!persistent_gnt)
+		return NULL;
+
+	persistent_gnt->gnt = gop->ref;
+	persistent_gnt->page = virt_to_page(gop->host_addr);
+	persistent_gnt->handle = gop->handle;
+
+	if (unlikely(add_persistent_gnt(tree, persistent_gnt))) {
+		kfree(persistent_gnt);
+		persistent_gnt = NULL;
+	}
+
+	return persistent_gnt;
+}
+
 bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
 {
 	RING_IDX prod, cons;
@@ -927,22 +955,59 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 
 struct xenvif_tx_cb {
 	u16 pending_idx;
+	bool pending_map;
 };
 
 #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
 
+static inline void xenvif_pgrant_set(struct xenvif_queue *queue,
+				     u16 pending_idx,
+				     struct persistent_gnt *pgrant)
+{
+	if (unlikely(queue->tx_pgrants[pending_idx])) {
+		netdev_err(queue->vif->dev,
+			   "Trying to overwrite an active persistent grant ! pending_idx: %x\n",
+			   pending_idx);
+		BUG();
+	}
+	queue->tx_pgrants[pending_idx] = pgrant;
+}
+
+static inline void xenvif_pgrant_reset(struct xenvif_queue *queue,
+				       u16 pending_idx)
+{
+	struct persistent_gnt *pgrant = queue->tx_pgrants[pending_idx];
+
+	if (unlikely(!pgrant)) {
+		netdev_err(queue->vif->dev,
+			   "Trying to release an inactive persistent_grant ! pending_idx: %x\n",
+			   pending_idx);
+		BUG();
+	}
+	put_persistent_gnt(&queue->tx_gnts_tree, pgrant);
+	queue->tx_pgrants[pending_idx] = NULL;
+}
+
 static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
-					  u16 pending_idx,
-					  struct xen_netif_tx_request *txp,
-					  struct gnttab_map_grant_ref *mop)
+					   u16 pending_idx,
+					   struct xen_netif_tx_request *txp,
+					   struct gnttab_map_grant_ref *mop,
+					   bool use_persistent_gnts)
 {
-	queue->pages_to_map[mop-queue->tx_map_ops] = queue->mmap_pages[pending_idx];
-	gnttab_set_map_op(mop, idx_to_kaddr(queue, pending_idx),
+	struct page *page = NULL;
+
+	if (use_persistent_gnts &&
+	    get_free_page(&queue->tx_gnts_tree, &page)) {
+		xenvif_pgrant_reset(queue, pending_idx);
+		use_persistent_gnts = false;
+	}
+
+	page = (!use_persistent_gnts ? queue->mmap_pages[pending_idx] : page);
+	queue->pages_to_map[mop - queue->tx_map_ops] = page;
+	gnttab_set_map_op(mop,
+			  (unsigned long)page_to_kaddr(page),
 			  GNTMAP_host_map | GNTMAP_readonly,
 			  txp->gref, queue->vif->domid);
-
-	memcpy(&queue->pending_tx_info[pending_idx].req, txp,
-	       sizeof(*txp));
 }
 
 static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
@@ -962,6 +1027,39 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
 	return skb;
 }
 
+/* Checks if there's a persistent grant available for gref and
+ * if so, set it also in the tx_pgrants array that keeps the ones
+ * in use.
+ */
+static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
+				       grant_ref_t ref, u16 pending_idx,
+				       bool *can_map)
+{
+	struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
+	struct persistent_gnt *persistent_gnt;
+	bool busy;
+
+	if (!queue->vif->persistent_grants)
+		return false;
+
+	persistent_gnt = get_persistent_gnt(tree, ref);
+
+	/* If gref is already in use we fallback, since it would
+	 * otherwise mean re-adding the same gref to the tree
+	 */
+	busy = IS_ERR(persistent_gnt);
+	if (unlikely(busy))
+		persistent_gnt = NULL;
+
+	xenvif_pgrant_set(queue, pending_idx, persistent_gnt);
+	if (likely(persistent_gnt))
+		return true;
+
+	/* Check if we can create another persistent grant */
+	*can_map = (!busy && tree->free_pages_num);
+	return false;
+}
+
 static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue,
 							struct sk_buff *skb,
 							struct xen_netif_tx_request *txp,
@@ -973,6 +1071,7 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 	int start;
 	pending_ring_idx_t index;
 	unsigned int nr_slots, frag_overflow = 0;
+	bool map_pgrant = false;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
@@ -988,11 +1087,16 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
 	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
-	     shinfo->nr_frags++, txp++, gop++) {
+	     shinfo->nr_frags++, txp++) {
 		index = pending_index(queue->pending_cons++);
 		pending_idx = queue->pending_ring[index];
-		xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
 		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
+		memcpy(&queue->pending_tx_info[pending_idx].req, txp,
+		       sizeof(*txp));
+		if (!xenvif_tx_pgrant_available(queue, txp->gref, pending_idx,
+						&map_pgrant))
+			xenvif_tx_create_map_op(queue, pending_idx, txp, gop++,
+						map_pgrant);
 	}
 
 	if (frag_overflow) {
@@ -1006,14 +1110,21 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 
 		shinfo = skb_shinfo(nskb);
 		frags = shinfo->frags;
+		map_pgrant = false;
 
 		for (shinfo->nr_frags = 0; shinfo->nr_frags < frag_overflow;
-		     shinfo->nr_frags++, txp++, gop++) {
+		     shinfo->nr_frags++, txp++) {
 			index = pending_index(queue->pending_cons++);
 			pending_idx = queue->pending_ring[index];
-			xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
 			frag_set_pending_idx(&frags[shinfo->nr_frags],
 					     pending_idx);
+			memcpy(&queue->pending_tx_info[pending_idx].req, txp,
+			       sizeof(*txp));
+			if (!xenvif_tx_pgrant_available(queue, txp->gref,
+							pending_idx,
+							&map_pgrant))
+				xenvif_tx_create_map_op(queue, pending_idx, txp,
+							gop++, map_pgrant);
 		}
 
 		skb_shinfo(skb)->frag_list = nskb;
@@ -1049,6 +1160,65 @@ static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
 	queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
 }
 
+/* Creates a new persistent grant and set it in tx_pgrants.
+ * In case of failure, unmaps the grant and insert the pages
+ * back in the tree pool.
+ */
+static int xenvif_tx_pgrant_check(struct xenvif_queue *queue,
+				  struct gnttab_map_grant_ref *gop_map,
+				  u16 pending_idx)
+{
+	struct persistent_gnt *persistent_gnt = queue->tx_pgrants[pending_idx];
+	struct page *page = virt_to_page(gop_map->host_addr);
+
+	BUG_ON(persistent_gnt);
+	persistent_gnt = xenvif_pgrant_new(&queue->tx_gnts_tree, gop_map);
+	if (unlikely(!persistent_gnt)) {
+		netdev_err(queue->vif->dev,
+			   "Couldn't add grant! ref: %d pending_idx: %d",
+			   gop_map->ref, pending_idx);
+		xenvif_page_unmap(queue, gop_map->handle, &page);
+		put_free_pages(&queue->tx_gnts_tree, &page, 1);
+		return 1;
+	}
+
+	xenvif_pgrant_set(queue, pending_idx, persistent_gnt);
+	return 0;
+}
+
+/* Skip the frags that have persistent grants active, and if there's
+ * a preceding error, invalidate the frag instead. Returns if there's
+ * still frags to check with new grant maps.
+ */
+static int xenvif_tx_pgrant_skip(struct xenvif_queue *queue,
+				 struct skb_shared_info *shinfo,
+				 bool invalidate, unsigned *i)
+{
+	int j;
+	u16 pending_idx;
+	struct persistent_gnt *persistent_gnt;
+
+	for (j = *i; j < shinfo->nr_frags; j++) {
+		pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+		persistent_gnt = queue->tx_pgrants[pending_idx];
+
+		if (!persistent_gnt)
+			break;
+
+		xenvif_grant_handle_set(queue, pending_idx,
+					persistent_gnt->handle);
+
+		if (unlikely(invalidate)) {
+			xenvif_idx_unmap(queue, pending_idx);
+			xenvif_idx_release(queue, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		}
+	}
+
+	*i = j;
+	return !(j < shinfo->nr_frags);
+}
+
 static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 			       struct sk_buff *skb,
 			       struct gnttab_map_grant_ref **gopp_map,
@@ -1067,7 +1237,16 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 	int nr_frags = shinfo->nr_frags;
 	const bool sharedslot = nr_frags &&
 				frag_get_pending_idx(&shinfo->frags[0]) == pending_idx;
-	int i, err;
+	int i, err = 0;
+	bool pgrantslot = XENVIF_TX_CB(skb)->pending_map;
+	struct page *page;
+
+	/* Check the frags if there's persistent grant for the header */
+	if (likely(queue->tx_pgrants[pending_idx])) {
+		if (!sharedslot)
+			xenvif_pgrant_reset(queue, pending_idx);
+		goto check_frags;
+	}
 
 	/* Check status of header. */
 	err = (*gopp_copy)->status;
@@ -1085,22 +1264,50 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 	}
 	(*gopp_copy)++;
 
+	if (unlikely(pgrantslot && !sharedslot)) {
+		if (!xenvif_tx_pgrant_check(queue, gop_map++,
+					    pending_idx))
+			xenvif_pgrant_reset(queue, pending_idx);
+	}
+
 check_frags:
 	for (i = 0; i < nr_frags; i++, gop_map++) {
 		int j, newerr;
 
+		/* Skip the frags that use persistent grants */
+		if (xenvif_tx_pgrant_skip(queue, shinfo, err != 0, &i))
+			break;
+
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
+		page = virt_to_page(gop_map->host_addr);
+		pgrantslot = (page != queue->mmap_pages[pending_idx]);
 
 		/* Check error status: if okay then remember grant handle. */
 		newerr = gop_map->status;
 
+		/* Newly mapped grant to be added to the tree.
+		 * Append error in case of tree errors.
+		 */
+		if (!newerr && pgrantslot)
+			newerr |= xenvif_tx_pgrant_check(queue, gop_map,
+							 pending_idx);
+
 		if (likely(!newerr)) {
 			xenvif_grant_handle_set(queue,
 						pending_idx,
 						gop_map->handle);
+
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err)) {
-				xenvif_idx_unmap(queue, pending_idx);
+				xenvif_page_unmap(queue,
+						  gop_map->handle,
+						  &page);
+				xenvif_grant_handle_reset(queue,
+							  pending_idx);
+				if (pgrantslot)
+					put_free_pages(&queue->tx_gnts_tree,
+						       &page, 1);
+
 				/* If the mapping of the first frag was OK, but
 				 * the header's copy failed, and they are
 				 * sharing a slot, send an error
@@ -1116,7 +1323,7 @@ check_frags:
 		}
 
 		/* Error on this fragment: respond to client with an error. */
-		if (net_ratelimit())
+		if (net_ratelimit() && gop_map->status)
 			netdev_dbg(queue->vif->dev,
 				   "Grant map of %d. frag failed! status: %d pending_idx: %u ref: %u\n",
 				   i,
@@ -1186,6 +1393,7 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 		struct xen_netif_tx_request *txp;
 		struct page *page;
 		u16 pending_idx;
+		struct persistent_gnt *persistent_gnt = NULL;
 
 		pending_idx = frag_get_pending_idx(frag);
 
@@ -1201,14 +1409,16 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 		prev_pending_idx = pending_idx;
 
 		txp = &queue->pending_tx_info[pending_idx].req;
-		page = virt_to_page(idx_to_kaddr(queue, pending_idx));
+		persistent_gnt = queue->tx_pgrants[pending_idx];
+		page = (persistent_gnt ? persistent_gnt->page :
+					  queue->mmap_pages[pending_idx]);
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
 		skb->len += txp->size;
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
 		/* Take an extra reference to offset network stack's put_page */
-		get_page(queue->mmap_pages[pending_idx]);
+		get_page(page);
 	}
 }
 
@@ -1332,17 +1542,21 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 {
 	struct gnttab_map_grant_ref *gop = queue->tx_map_ops, *request_gop;
 	struct sk_buff *skb;
+	bool use_persistent_gnts = queue->vif->persistent_grants;
 	int ret;
 
 	while (skb_queue_len(&queue->tx_queue) < budget) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
+		struct persistent_gnt *persistent_gnt = NULL;
 		u16 pending_idx;
 		RING_IDX idx;
 		int work_to_do;
 		unsigned int data_len;
 		pending_ring_idx_t index;
+		bool need_map = !use_persistent_gnts;
+		bool map_pgrant = false;
 
 		if (queue->tx.sring->req_prod - queue->tx.req_cons >
 		    XEN_NETIF_TX_RING_SIZE) {
@@ -1432,8 +1646,24 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		}
 
 		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
+		XENVIF_TX_CB(skb)->pending_map = false;
 
 		__skb_put(skb, data_len);
+		if (use_persistent_gnts) {
+			xenvif_tx_pgrant_available(queue, txreq.gref,
+						   pending_idx, &map_pgrant);
+			persistent_gnt = queue->tx_pgrants[pending_idx];
+		}
+
+		if (persistent_gnt) {
+			void *saddr = page_to_kaddr(persistent_gnt->page);
+
+			memcpy(skb->data, saddr + txreq.offset, data_len);
+			goto skip_gop;
+		}
+
+		need_map = true;
+
 		queue->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
 		queue->tx_copy_ops[*copy_ops].source.domid = queue->vif->domid;
 		queue->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
@@ -1449,20 +1679,28 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 
 		(*copy_ops)++;
 
+skip_gop:
 		skb_shinfo(skb)->nr_frags = ret;
 		if (data_len < txreq.size) {
 			skb_shinfo(skb)->nr_frags++;
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     pending_idx);
-			xenvif_tx_create_map_op(queue, pending_idx, &txreq, gop);
-			gop++;
 		} else {
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     INVALID_PENDING_IDX);
-			memcpy(&queue->pending_tx_info[pending_idx].req, &txreq,
-			       sizeof(txreq));
+			need_map = use_persistent_gnts && map_pgrant;
+			XENVIF_TX_CB(skb)->pending_map = need_map;
 		}
 
+		if (need_map)
+			xenvif_tx_create_map_op(queue,
+						pending_idx,
+						&txreq, gop++,
+						map_pgrant);
+
+		memcpy(&queue->pending_tx_info[pending_idx].req, &txreq,
+		       sizeof(txreq));
+
 		queue->pending_cons++;
 
 		request_gop = xenvif_get_requests(queue, skb, txfrags, gop);
@@ -1671,16 +1909,26 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
 void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 {
 	unsigned long flags;
-	pending_ring_idx_t index;
+	pending_ring_idx_t index, dealloc_prod_save;
 	struct xenvif_queue *queue = ubuf_to_queue(ubuf);
 
 	/* This is the only place where we grab this lock, to protect callbacks
 	 * from each other.
 	 */
 	spin_lock_irqsave(&queue->callback_lock, flags);
+	dealloc_prod_save = queue->dealloc_prod;
 	do {
 		u16 pending_idx = ubuf->desc;
 		ubuf = (struct ubuf_info *) ubuf->ctx;
+
+		if (queue->vif->persistent_grants &&
+		    queue->tx_pgrants[pending_idx]) {
+			xenvif_pgrant_reset(queue, pending_idx);
+			xenvif_grant_handle_reset(queue, pending_idx);
+			xenvif_idx_release(queue, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+			continue;
+		}
 		BUG_ON(queue->dealloc_prod - queue->dealloc_cons >=
 			MAX_PENDING_REQS);
 		index = pending_index(queue->dealloc_prod);
@@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 		smp_wmb();
 		queue->dealloc_prod++;
 	} while (ubuf);
-	wake_up(&queue->dealloc_wq);
+	/* Wake up only when there are grants to unmap */
+	if (dealloc_prod_save != queue->dealloc_prod)
+		wake_up(&queue->dealloc_wq);
+
 	spin_unlock_irqrestore(&queue->callback_lock, flags);
 
 	if (likely(zerocopy_success))
@@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
 
 	xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
 
-	if (nr_cops == 0)
+	if (!queue->vif->persistent_grants &&
+	    nr_cops == 0)
 		return 0;
 
-	gnttab_batch_copy(queue->tx_copy_ops, nr_cops);
+	if (nr_cops != 0)
+		gnttab_batch_copy(queue->tx_copy_ops, nr_cops);
+
 	if (nr_mops != 0) {
 		ret = gnttab_map_refs(queue->tx_map_ops,
 				      NULL,
@@ -1871,31 +2125,40 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue
 	return resp;
 }
 
-void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
+void xenvif_page_unmap(struct xenvif_queue *queue,
+		       grant_handle_t handle,
+		       struct page **page)
 {
 	int ret;
-	struct gnttab_unmap_grant_ref tx_unmap_op;
+	struct gnttab_unmap_grant_ref unmap_op;
 
-	gnttab_set_unmap_op(&tx_unmap_op,
-			    idx_to_kaddr(queue, pending_idx),
+	gnttab_set_unmap_op(&unmap_op,
+			    (unsigned long)page_to_kaddr(*page),
 			    GNTMAP_host_map,
-			    queue->grant_tx_handle[pending_idx]);
-	xenvif_grant_handle_reset(queue, pending_idx);
-
-	ret = gnttab_unmap_refs(&tx_unmap_op, NULL,
-				&queue->mmap_pages[pending_idx], 1);
+			    handle);
+	ret = gnttab_unmap_refs(&unmap_op, NULL, page, 1);
 	if (ret) {
 		netdev_err(queue->vif->dev,
-			   "Unmap fail: ret: %d pending_idx: %d host_addr: %llx handle: %x status: %d\n",
+			   "Unmap fail: ret: %d host_addr: %llx handle: %x status: %d\n",
 			   ret,
-			   pending_idx,
-			   tx_unmap_op.host_addr,
-			   tx_unmap_op.handle,
-			   tx_unmap_op.status);
+			   unmap_op.host_addr,
+			   unmap_op.handle,
+			   unmap_op.status);
 		BUG();
 	}
 }
 
+void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
+{
+	if (queue->tx_pgrants[pending_idx])
+		xenvif_pgrant_reset(queue, pending_idx);
+	else
+		xenvif_page_unmap(queue,
+				  queue->grant_tx_handle[pending_idx],
+				  &queue->mmap_pages[pending_idx]);
+	xenvif_grant_handle_reset(queue, pending_idx);
+}
+
 static inline int tx_work_todo(struct xenvif_queue *queue)
 {
 	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&queue->tx)))
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 03/13] xen-netback: implement TX persistent grants
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Introduces persistent grants for TX path which follows similar code path
as the grant mapping.

It starts by checking if there's a persistent grant available for header
and frags grefs and if so setting it in tx_pgrants. If no persistent grant
is found in the tree for the header it will resort to grant copy (but
preparing the map ops and add them laster). For the frags it will use the
tree page pool, and in case of no pages it fallbacks to grant map/unmap
using mmap_pages. When skb destructor callback gets called we release the
slot and persistent grant within the callback to avoid waking up the
dealloc thread. As long as there are no unmaps to done the dealloc thread
will remain inactive.

Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h    |  12 ++
 drivers/net/xen-netback/interface.c |  46 +++++
 drivers/net/xen-netback/netback.c   | 341 +++++++++++++++++++++++++++++++-----
 3 files changed, 360 insertions(+), 39 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e70ace7..e5ee220 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -191,6 +191,15 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
 	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
 	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
+
+	/* Tree to store the TX grants
+	 * Only used if feature-persistent = 1
+	 */
+	struct persistent_gnt_tree tx_gnts_tree;
+	struct page *tx_gnts_pages[XEN_NETIF_TX_RING_SIZE];
+	/* persistent grants in use */
+	struct persistent_gnt *tx_pgrants[MAX_PENDING_REQS];
+
 	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
 	struct page *pages_to_map[MAX_PENDING_REQS];
 	struct page *pages_to_unmap[MAX_PENDING_REQS];
@@ -361,6 +370,9 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
 
 /* Unmap a pending page and release it back to the guest */
 void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
+void xenvif_page_unmap(struct xenvif_queue *queue,
+		       grant_handle_t handle,
+		       struct page **page);
 
 static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
 {
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1a83e19..6f996ac 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -456,6 +456,34 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
 	return vif;
 }
 
+static int init_persistent_gnt_tree(struct persistent_gnt_tree *tree,
+				    struct page **pages, int max)
+{
+	int err;
+
+	tree->gnt_max = min_t(unsigned, max, xenvif_max_pgrants);
+	tree->root.rb_node = NULL;
+	atomic_set(&tree->gnt_in_use, 0);
+
+	err = gnttab_alloc_pages(tree->gnt_max, pages);
+	if (!err) {
+		tree->free_pages_num = 0;
+		INIT_LIST_HEAD(&tree->free_pages);
+		put_free_pages(tree, pages, tree->gnt_max);
+	}
+
+	return err;
+}
+
+static void deinit_persistent_gnt_tree(struct persistent_gnt_tree *tree,
+				       struct page **pages)
+{
+	free_persistent_gnts(tree, tree->gnt_c);
+	BUG_ON(!RB_EMPTY_ROOT(&tree->root));
+	tree->gnt_c = 0;
+	gnttab_free_pages(tree->gnt_max, pages);
+}
+
 int xenvif_init_queue(struct xenvif_queue *queue)
 {
 	int err, i;
@@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 			  .ctx = NULL,
 			  .desc = i };
 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
+		queue->tx_pgrants[i] = NULL;
+	}
+
+	if (queue->vif->persistent_grants) {
+		err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
+					       queue->tx_gnts_pages,
+					       XEN_NETIF_TX_RING_SIZE);
+		if (err)
+			goto err_disable;
 	}
 
 	return 0;
+
+err_disable:
+	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
+	queue->vif->persistent_grants = 0;
+	return 0;
 }
 
 void xenvif_carrier_on(struct xenvif *vif)
@@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
 		}
 
 		xenvif_unmap_frontend_rings(queue);
+
+		if (queue->vif->persistent_grants)
+			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
+						   queue->tx_gnts_pages);
 	}
 }
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 332e489..529d7c3 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
 }
 
+static inline void *page_to_kaddr(struct page *page)
+{
+	return pfn_to_kaddr(page_to_pfn(page));
+}
+
 #define callback_param(vif, pending_idx) \
 	(vif->pending_tx_info[pending_idx].callback_struct)
 
@@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
 	return i & (MAX_PENDING_REQS-1);
 }
 
+/*  Creates a new persistent grant and add it to the tree.
+ */
+static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
+						struct gnttab_map_grant_ref *gop)
+{
+	struct persistent_gnt *persistent_gnt;
+
+	persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);
+	if (!persistent_gnt)
+		return NULL;
+
+	persistent_gnt->gnt = gop->ref;
+	persistent_gnt->page = virt_to_page(gop->host_addr);
+	persistent_gnt->handle = gop->handle;
+
+	if (unlikely(add_persistent_gnt(tree, persistent_gnt))) {
+		kfree(persistent_gnt);
+		persistent_gnt = NULL;
+	}
+
+	return persistent_gnt;
+}
+
 bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
 {
 	RING_IDX prod, cons;
@@ -927,22 +955,59 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
 
 struct xenvif_tx_cb {
 	u16 pending_idx;
+	bool pending_map;
 };
 
 #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
 
+static inline void xenvif_pgrant_set(struct xenvif_queue *queue,
+				     u16 pending_idx,
+				     struct persistent_gnt *pgrant)
+{
+	if (unlikely(queue->tx_pgrants[pending_idx])) {
+		netdev_err(queue->vif->dev,
+			   "Trying to overwrite an active persistent grant ! pending_idx: %x\n",
+			   pending_idx);
+		BUG();
+	}
+	queue->tx_pgrants[pending_idx] = pgrant;
+}
+
+static inline void xenvif_pgrant_reset(struct xenvif_queue *queue,
+				       u16 pending_idx)
+{
+	struct persistent_gnt *pgrant = queue->tx_pgrants[pending_idx];
+
+	if (unlikely(!pgrant)) {
+		netdev_err(queue->vif->dev,
+			   "Trying to release an inactive persistent_grant ! pending_idx: %x\n",
+			   pending_idx);
+		BUG();
+	}
+	put_persistent_gnt(&queue->tx_gnts_tree, pgrant);
+	queue->tx_pgrants[pending_idx] = NULL;
+}
+
 static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
-					  u16 pending_idx,
-					  struct xen_netif_tx_request *txp,
-					  struct gnttab_map_grant_ref *mop)
+					   u16 pending_idx,
+					   struct xen_netif_tx_request *txp,
+					   struct gnttab_map_grant_ref *mop,
+					   bool use_persistent_gnts)
 {
-	queue->pages_to_map[mop-queue->tx_map_ops] = queue->mmap_pages[pending_idx];
-	gnttab_set_map_op(mop, idx_to_kaddr(queue, pending_idx),
+	struct page *page = NULL;
+
+	if (use_persistent_gnts &&
+	    get_free_page(&queue->tx_gnts_tree, &page)) {
+		xenvif_pgrant_reset(queue, pending_idx);
+		use_persistent_gnts = false;
+	}
+
+	page = (!use_persistent_gnts ? queue->mmap_pages[pending_idx] : page);
+	queue->pages_to_map[mop - queue->tx_map_ops] = page;
+	gnttab_set_map_op(mop,
+			  (unsigned long)page_to_kaddr(page),
 			  GNTMAP_host_map | GNTMAP_readonly,
 			  txp->gref, queue->vif->domid);
-
-	memcpy(&queue->pending_tx_info[pending_idx].req, txp,
-	       sizeof(*txp));
 }
 
 static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
@@ -962,6 +1027,39 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
 	return skb;
 }
 
+/* Checks if there's a persistent grant available for gref and
+ * if so, set it also in the tx_pgrants array that keeps the ones
+ * in use.
+ */
+static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
+				       grant_ref_t ref, u16 pending_idx,
+				       bool *can_map)
+{
+	struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
+	struct persistent_gnt *persistent_gnt;
+	bool busy;
+
+	if (!queue->vif->persistent_grants)
+		return false;
+
+	persistent_gnt = get_persistent_gnt(tree, ref);
+
+	/* If gref is already in use we fallback, since it would
+	 * otherwise mean re-adding the same gref to the tree
+	 */
+	busy = IS_ERR(persistent_gnt);
+	if (unlikely(busy))
+		persistent_gnt = NULL;
+
+	xenvif_pgrant_set(queue, pending_idx, persistent_gnt);
+	if (likely(persistent_gnt))
+		return true;
+
+	/* Check if we can create another persistent grant */
+	*can_map = (!busy && tree->free_pages_num);
+	return false;
+}
+
 static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue,
 							struct sk_buff *skb,
 							struct xen_netif_tx_request *txp,
@@ -973,6 +1071,7 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 	int start;
 	pending_ring_idx_t index;
 	unsigned int nr_slots, frag_overflow = 0;
+	bool map_pgrant = false;
 
 	/* At this point shinfo->nr_frags is in fact the number of
 	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
@@ -988,11 +1087,16 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
 
 	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
-	     shinfo->nr_frags++, txp++, gop++) {
+	     shinfo->nr_frags++, txp++) {
 		index = pending_index(queue->pending_cons++);
 		pending_idx = queue->pending_ring[index];
-		xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
 		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
+		memcpy(&queue->pending_tx_info[pending_idx].req, txp,
+		       sizeof(*txp));
+		if (!xenvif_tx_pgrant_available(queue, txp->gref, pending_idx,
+						&map_pgrant))
+			xenvif_tx_create_map_op(queue, pending_idx, txp, gop++,
+						map_pgrant);
 	}
 
 	if (frag_overflow) {
@@ -1006,14 +1110,21 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
 
 		shinfo = skb_shinfo(nskb);
 		frags = shinfo->frags;
+		map_pgrant = false;
 
 		for (shinfo->nr_frags = 0; shinfo->nr_frags < frag_overflow;
-		     shinfo->nr_frags++, txp++, gop++) {
+		     shinfo->nr_frags++, txp++) {
 			index = pending_index(queue->pending_cons++);
 			pending_idx = queue->pending_ring[index];
-			xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
 			frag_set_pending_idx(&frags[shinfo->nr_frags],
 					     pending_idx);
+			memcpy(&queue->pending_tx_info[pending_idx].req, txp,
+			       sizeof(*txp));
+			if (!xenvif_tx_pgrant_available(queue, txp->gref,
+							pending_idx,
+							&map_pgrant))
+				xenvif_tx_create_map_op(queue, pending_idx, txp,
+							gop++, map_pgrant);
 		}
 
 		skb_shinfo(skb)->frag_list = nskb;
@@ -1049,6 +1160,65 @@ static inline void xenvif_grant_handle_reset(struct xenvif_queue *queue,
 	queue->grant_tx_handle[pending_idx] = NETBACK_INVALID_HANDLE;
 }
 
+/* Creates a new persistent grant and set it in tx_pgrants.
+ * In case of failure, unmaps the grant and insert the pages
+ * back in the tree pool.
+ */
+static int xenvif_tx_pgrant_check(struct xenvif_queue *queue,
+				  struct gnttab_map_grant_ref *gop_map,
+				  u16 pending_idx)
+{
+	struct persistent_gnt *persistent_gnt = queue->tx_pgrants[pending_idx];
+	struct page *page = virt_to_page(gop_map->host_addr);
+
+	BUG_ON(persistent_gnt);
+	persistent_gnt = xenvif_pgrant_new(&queue->tx_gnts_tree, gop_map);
+	if (unlikely(!persistent_gnt)) {
+		netdev_err(queue->vif->dev,
+			   "Couldn't add grant! ref: %d pending_idx: %d",
+			   gop_map->ref, pending_idx);
+		xenvif_page_unmap(queue, gop_map->handle, &page);
+		put_free_pages(&queue->tx_gnts_tree, &page, 1);
+		return 1;
+	}
+
+	xenvif_pgrant_set(queue, pending_idx, persistent_gnt);
+	return 0;
+}
+
+/* Skip the frags that have persistent grants active, and if there's
+ * a preceding error, invalidate the frag instead. Returns if there's
+ * still frags to check with new grant maps.
+ */
+static int xenvif_tx_pgrant_skip(struct xenvif_queue *queue,
+				 struct skb_shared_info *shinfo,
+				 bool invalidate, unsigned *i)
+{
+	int j;
+	u16 pending_idx;
+	struct persistent_gnt *persistent_gnt;
+
+	for (j = *i; j < shinfo->nr_frags; j++) {
+		pending_idx = frag_get_pending_idx(&shinfo->frags[j]);
+		persistent_gnt = queue->tx_pgrants[pending_idx];
+
+		if (!persistent_gnt)
+			break;
+
+		xenvif_grant_handle_set(queue, pending_idx,
+					persistent_gnt->handle);
+
+		if (unlikely(invalidate)) {
+			xenvif_idx_unmap(queue, pending_idx);
+			xenvif_idx_release(queue, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+		}
+	}
+
+	*i = j;
+	return !(j < shinfo->nr_frags);
+}
+
 static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 			       struct sk_buff *skb,
 			       struct gnttab_map_grant_ref **gopp_map,
@@ -1067,7 +1237,16 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 	int nr_frags = shinfo->nr_frags;
 	const bool sharedslot = nr_frags &&
 				frag_get_pending_idx(&shinfo->frags[0]) == pending_idx;
-	int i, err;
+	int i, err = 0;
+	bool pgrantslot = XENVIF_TX_CB(skb)->pending_map;
+	struct page *page;
+
+	/* Check the frags if there's persistent grant for the header */
+	if (likely(queue->tx_pgrants[pending_idx])) {
+		if (!sharedslot)
+			xenvif_pgrant_reset(queue, pending_idx);
+		goto check_frags;
+	}
 
 	/* Check status of header. */
 	err = (*gopp_copy)->status;
@@ -1085,22 +1264,50 @@ static int xenvif_tx_check_gop(struct xenvif_queue *queue,
 	}
 	(*gopp_copy)++;
 
+	if (unlikely(pgrantslot && !sharedslot)) {
+		if (!xenvif_tx_pgrant_check(queue, gop_map++,
+					    pending_idx))
+			xenvif_pgrant_reset(queue, pending_idx);
+	}
+
 check_frags:
 	for (i = 0; i < nr_frags; i++, gop_map++) {
 		int j, newerr;
 
+		/* Skip the frags that use persistent grants */
+		if (xenvif_tx_pgrant_skip(queue, shinfo, err != 0, &i))
+			break;
+
 		pending_idx = frag_get_pending_idx(&shinfo->frags[i]);
+		page = virt_to_page(gop_map->host_addr);
+		pgrantslot = (page != queue->mmap_pages[pending_idx]);
 
 		/* Check error status: if okay then remember grant handle. */
 		newerr = gop_map->status;
 
+		/* Newly mapped grant to be added to the tree.
+		 * Append error in case of tree errors.
+		 */
+		if (!newerr && pgrantslot)
+			newerr |= xenvif_tx_pgrant_check(queue, gop_map,
+							 pending_idx);
+
 		if (likely(!newerr)) {
 			xenvif_grant_handle_set(queue,
 						pending_idx,
 						gop_map->handle);
+
 			/* Had a previous error? Invalidate this fragment. */
 			if (unlikely(err)) {
-				xenvif_idx_unmap(queue, pending_idx);
+				xenvif_page_unmap(queue,
+						  gop_map->handle,
+						  &page);
+				xenvif_grant_handle_reset(queue,
+							  pending_idx);
+				if (pgrantslot)
+					put_free_pages(&queue->tx_gnts_tree,
+						       &page, 1);
+
 				/* If the mapping of the first frag was OK, but
 				 * the header's copy failed, and they are
 				 * sharing a slot, send an error
@@ -1116,7 +1323,7 @@ check_frags:
 		}
 
 		/* Error on this fragment: respond to client with an error. */
-		if (net_ratelimit())
+		if (net_ratelimit() && gop_map->status)
 			netdev_dbg(queue->vif->dev,
 				   "Grant map of %d. frag failed! status: %d pending_idx: %u ref: %u\n",
 				   i,
@@ -1186,6 +1393,7 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 		struct xen_netif_tx_request *txp;
 		struct page *page;
 		u16 pending_idx;
+		struct persistent_gnt *persistent_gnt = NULL;
 
 		pending_idx = frag_get_pending_idx(frag);
 
@@ -1201,14 +1409,16 @@ static void xenvif_fill_frags(struct xenvif_queue *queue, struct sk_buff *skb)
 		prev_pending_idx = pending_idx;
 
 		txp = &queue->pending_tx_info[pending_idx].req;
-		page = virt_to_page(idx_to_kaddr(queue, pending_idx));
+		persistent_gnt = queue->tx_pgrants[pending_idx];
+		page = (persistent_gnt ? persistent_gnt->page :
+					  queue->mmap_pages[pending_idx]);
 		__skb_fill_page_desc(skb, i, page, txp->offset, txp->size);
 		skb->len += txp->size;
 		skb->data_len += txp->size;
 		skb->truesize += txp->size;
 
 		/* Take an extra reference to offset network stack's put_page */
-		get_page(queue->mmap_pages[pending_idx]);
+		get_page(page);
 	}
 }
 
@@ -1332,17 +1542,21 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 {
 	struct gnttab_map_grant_ref *gop = queue->tx_map_ops, *request_gop;
 	struct sk_buff *skb;
+	bool use_persistent_gnts = queue->vif->persistent_grants;
 	int ret;
 
 	while (skb_queue_len(&queue->tx_queue) < budget) {
 		struct xen_netif_tx_request txreq;
 		struct xen_netif_tx_request txfrags[XEN_NETBK_LEGACY_SLOTS_MAX];
 		struct xen_netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX-1];
+		struct persistent_gnt *persistent_gnt = NULL;
 		u16 pending_idx;
 		RING_IDX idx;
 		int work_to_do;
 		unsigned int data_len;
 		pending_ring_idx_t index;
+		bool need_map = !use_persistent_gnts;
+		bool map_pgrant = false;
 
 		if (queue->tx.sring->req_prod - queue->tx.req_cons >
 		    XEN_NETIF_TX_RING_SIZE) {
@@ -1432,8 +1646,24 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 		}
 
 		XENVIF_TX_CB(skb)->pending_idx = pending_idx;
+		XENVIF_TX_CB(skb)->pending_map = false;
 
 		__skb_put(skb, data_len);
+		if (use_persistent_gnts) {
+			xenvif_tx_pgrant_available(queue, txreq.gref,
+						   pending_idx, &map_pgrant);
+			persistent_gnt = queue->tx_pgrants[pending_idx];
+		}
+
+		if (persistent_gnt) {
+			void *saddr = page_to_kaddr(persistent_gnt->page);
+
+			memcpy(skb->data, saddr + txreq.offset, data_len);
+			goto skip_gop;
+		}
+
+		need_map = true;
+
 		queue->tx_copy_ops[*copy_ops].source.u.ref = txreq.gref;
 		queue->tx_copy_ops[*copy_ops].source.domid = queue->vif->domid;
 		queue->tx_copy_ops[*copy_ops].source.offset = txreq.offset;
@@ -1449,20 +1679,28 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 
 		(*copy_ops)++;
 
+skip_gop:
 		skb_shinfo(skb)->nr_frags = ret;
 		if (data_len < txreq.size) {
 			skb_shinfo(skb)->nr_frags++;
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     pending_idx);
-			xenvif_tx_create_map_op(queue, pending_idx, &txreq, gop);
-			gop++;
 		} else {
 			frag_set_pending_idx(&skb_shinfo(skb)->frags[0],
 					     INVALID_PENDING_IDX);
-			memcpy(&queue->pending_tx_info[pending_idx].req, &txreq,
-			       sizeof(txreq));
+			need_map = use_persistent_gnts && map_pgrant;
+			XENVIF_TX_CB(skb)->pending_map = need_map;
 		}
 
+		if (need_map)
+			xenvif_tx_create_map_op(queue,
+						pending_idx,
+						&txreq, gop++,
+						map_pgrant);
+
+		memcpy(&queue->pending_tx_info[pending_idx].req, &txreq,
+		       sizeof(txreq));
+
 		queue->pending_cons++;
 
 		request_gop = xenvif_get_requests(queue, skb, txfrags, gop);
@@ -1671,16 +1909,26 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
 void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 {
 	unsigned long flags;
-	pending_ring_idx_t index;
+	pending_ring_idx_t index, dealloc_prod_save;
 	struct xenvif_queue *queue = ubuf_to_queue(ubuf);
 
 	/* This is the only place where we grab this lock, to protect callbacks
 	 * from each other.
 	 */
 	spin_lock_irqsave(&queue->callback_lock, flags);
+	dealloc_prod_save = queue->dealloc_prod;
 	do {
 		u16 pending_idx = ubuf->desc;
 		ubuf = (struct ubuf_info *) ubuf->ctx;
+
+		if (queue->vif->persistent_grants &&
+		    queue->tx_pgrants[pending_idx]) {
+			xenvif_pgrant_reset(queue, pending_idx);
+			xenvif_grant_handle_reset(queue, pending_idx);
+			xenvif_idx_release(queue, pending_idx,
+					   XEN_NETIF_RSP_OKAY);
+			continue;
+		}
 		BUG_ON(queue->dealloc_prod - queue->dealloc_cons >=
 			MAX_PENDING_REQS);
 		index = pending_index(queue->dealloc_prod);
@@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
 		smp_wmb();
 		queue->dealloc_prod++;
 	} while (ubuf);
-	wake_up(&queue->dealloc_wq);
+	/* Wake up only when there are grants to unmap */
+	if (dealloc_prod_save != queue->dealloc_prod)
+		wake_up(&queue->dealloc_wq);
+
 	spin_unlock_irqrestore(&queue->callback_lock, flags);
 
 	if (likely(zerocopy_success))
@@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
 
 	xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
 
-	if (nr_cops == 0)
+	if (!queue->vif->persistent_grants &&
+	    nr_cops == 0)
 		return 0;
 
-	gnttab_batch_copy(queue->tx_copy_ops, nr_cops);
+	if (nr_cops != 0)
+		gnttab_batch_copy(queue->tx_copy_ops, nr_cops);
+
 	if (nr_mops != 0) {
 		ret = gnttab_map_refs(queue->tx_map_ops,
 				      NULL,
@@ -1871,31 +2125,40 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif_queue *queue
 	return resp;
 }
 
-void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
+void xenvif_page_unmap(struct xenvif_queue *queue,
+		       grant_handle_t handle,
+		       struct page **page)
 {
 	int ret;
-	struct gnttab_unmap_grant_ref tx_unmap_op;
+	struct gnttab_unmap_grant_ref unmap_op;
 
-	gnttab_set_unmap_op(&tx_unmap_op,
-			    idx_to_kaddr(queue, pending_idx),
+	gnttab_set_unmap_op(&unmap_op,
+			    (unsigned long)page_to_kaddr(*page),
 			    GNTMAP_host_map,
-			    queue->grant_tx_handle[pending_idx]);
-	xenvif_grant_handle_reset(queue, pending_idx);
-
-	ret = gnttab_unmap_refs(&tx_unmap_op, NULL,
-				&queue->mmap_pages[pending_idx], 1);
+			    handle);
+	ret = gnttab_unmap_refs(&unmap_op, NULL, page, 1);
 	if (ret) {
 		netdev_err(queue->vif->dev,
-			   "Unmap fail: ret: %d pending_idx: %d host_addr: %llx handle: %x status: %d\n",
+			   "Unmap fail: ret: %d host_addr: %llx handle: %x status: %d\n",
 			   ret,
-			   pending_idx,
-			   tx_unmap_op.host_addr,
-			   tx_unmap_op.handle,
-			   tx_unmap_op.status);
+			   unmap_op.host_addr,
+			   unmap_op.handle,
+			   unmap_op.status);
 		BUG();
 	}
 }
 
+void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx)
+{
+	if (queue->tx_pgrants[pending_idx])
+		xenvif_pgrant_reset(queue, pending_idx);
+	else
+		xenvif_page_unmap(queue,
+				  queue->grant_tx_handle[pending_idx],
+				  &queue->mmap_pages[pending_idx]);
+	xenvif_grant_handle_reset(queue, pending_idx);
+}
+
 static inline int tx_work_todo(struct xenvif_queue *queue)
 {
 	if (likely(RING_HAS_UNCONSUMED_REQUESTS(&queue->tx)))
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

It starts by doing a lookup in the tree for a gref. If no persistent
grant is found on the tree, it will do grant copy and prepare
the grant maps. Finally valides the grant map and adds it to the tree.
After mapped these grants can be pulled from the tree in the subsequent
requests. If it's out of pages in the tree pool, it will fallback to
grant copy.

It adds four new fields in the netrx_pending_operations: copy_done
to track how many copies were made; map_prod and map_cons to track
how many maps are outstanding validation and finally copy_page for
the correspondent page (in tree) for copy_gref.

Results are 1.04 Mpps measured with pktgen (pkt_size 64, burst 1)
with persistent grants versus 1.23 Mpps with grant copy (20%
regression). With persistent grants it adds up contention on
queue->wq as the kthread_guest_rx goes to sleep more often. If we
speed up the sender (burst 2,4 and 8) it goes up to 1.7 Mpps with
persistent grants. This issue is addressed in later a commit, by
copying the skb on xenvif_start_xmit() instead of going through
the RX kthread.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h    |   7 ++
 drivers/net/xen-netback/interface.c |  14 ++-
 drivers/net/xen-netback/netback.c   | 190 ++++++++++++++++++++++++++++++------
 3 files changed, 178 insertions(+), 33 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e5ee220..23deb6a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -235,6 +235,13 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 
 	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
 
+	/* To map the grefs to be added to the tree */
+	struct gnttab_map_grant_ref rx_map_ops[XEN_NETIF_RX_RING_SIZE];
+	struct page *rx_pages_to_map[XEN_NETIF_RX_RING_SIZE];
+	/* Only used if feature-persistent = 1 */
+	struct persistent_gnt_tree rx_gnts_tree;
+	struct page *rx_gnts_pages[XEN_NETIF_RX_RING_SIZE];
+
 	/* We create one meta structure per ring request we consume, so
 	 * the maximum number is the same as the ring size.
 	 */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 6f996ac..1103568 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -533,10 +533,19 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 					       XEN_NETIF_TX_RING_SIZE);
 		if (err)
 			goto err_disable;
+
+		err = init_persistent_gnt_tree(&queue->rx_gnts_tree,
+					       queue->rx_gnts_pages,
+					       XEN_NETIF_RX_RING_SIZE);
+		if (err)
+			goto err_free_tx;
 	}
 
 	return 0;
 
+err_free_tx:
+	gnttab_free_pages(XEN_NETIF_TX_RING_SIZE,
+			  queue->tx_gnts_pages);
 err_disable:
 	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
 	queue->vif->persistent_grants = 0;
@@ -697,9 +706,12 @@ void xenvif_disconnect(struct xenvif *vif)
 
 		xenvif_unmap_frontend_rings(queue);
 
-		if (queue->vif->persistent_grants)
+		if (queue->vif->persistent_grants) {
 			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
 						   queue->tx_gnts_pages);
+			deinit_persistent_gnt_tree(&queue->rx_gnts_tree,
+						   queue->rx_gnts_pages);
+		}
 	}
 }
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 529d7c3..738b6ee 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -413,14 +413,62 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
 }
 
 struct netrx_pending_operations {
+	unsigned map_prod, map_cons;
 	unsigned copy_prod, copy_cons;
 	unsigned meta_prod, meta_cons;
 	struct gnttab_copy *copy;
 	struct xenvif_rx_meta *meta;
 	int copy_off;
 	grant_ref_t copy_gref;
+	struct page *copy_page;
+	unsigned copy_done;
 };
 
+static void xenvif_create_rx_map_op(struct xenvif_queue *queue,
+				    struct gnttab_map_grant_ref *mop,
+				    grant_ref_t ref,
+				    struct page *page)
+{
+	queue->rx_pages_to_map[mop - queue->rx_map_ops] = page;
+	gnttab_set_map_op(mop,
+			  (unsigned long)page_to_kaddr(page),
+			  GNTMAP_host_map,
+			  ref, queue->vif->domid);
+}
+
+static struct page *get_next_rx_page(struct xenvif_queue *queue,
+				     struct netrx_pending_operations *npo)
+{
+	struct persistent_gnt_tree *tree = &queue->rx_gnts_tree;
+	struct persistent_gnt *gnt;
+	struct page *page = NULL;
+
+	gnt = get_persistent_gnt(tree, npo->copy_gref);
+	BUG_ON(IS_ERR(gnt));
+
+	if (likely(gnt)) {
+		page = gnt->page;
+		put_persistent_gnt(tree, gnt);
+		npo->copy_done++;
+		return page;
+	}
+
+	/* We couldn't find a match for the gref in the tree.
+	 * Map the page and add it to the tree. This page won't
+	 * be used for copying the packet but instead we will rely
+	 * grant copy. On the second time the gref is requested, the
+	 * persistent grant will be used instead.
+	 */
+	if (!get_free_page(tree, &page)) {
+		struct gnttab_map_grant_ref *mop;
+
+		mop = queue->rx_map_ops + npo->map_prod++;
+		xenvif_create_rx_map_op(queue, mop, npo->copy_gref, page);
+	}
+
+	return NULL;
+}
+
 static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 						 struct netrx_pending_operations *npo)
 {
@@ -437,10 +485,48 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
-
+	npo->copy_page = NULL;
+	if (queue->vif->persistent_grants)
+		npo->copy_page = get_next_rx_page(queue, npo);
 	return meta;
 }
 
+static void xenvif_rx_copy_page(struct xenvif_queue *queue,
+				struct netrx_pending_operations *npo,
+				struct page *page, unsigned len,
+				unsigned offset)
+{
+	struct gnttab_copy *copy_gop;
+	struct xen_page_foreign *foreign = xen_page_foreign(page);
+
+	if (likely(npo->copy_page)) {
+		memcpy(page_address(npo->copy_page) + npo->copy_off,
+		       page_to_kaddr(page) + offset, len);
+		return;
+	}
+
+	/* No persistent grant found, so we rely on grant copy
+	 */
+	copy_gop = npo->copy + npo->copy_prod++;
+	copy_gop->flags = GNTCOPY_dest_gref;
+	copy_gop->len = len;
+
+	if (foreign) {
+		copy_gop->source.domid = foreign->domid;
+		copy_gop->source.u.ref = foreign->gref;
+		copy_gop->flags |= GNTCOPY_source_gref;
+	} else {
+		copy_gop->source.domid = DOMID_SELF;
+		copy_gop->source.u.gmfn =
+			virt_to_mfn(page_address(page));
+	}
+	copy_gop->source.offset = offset;
+
+	copy_gop->dest.domid = queue->vif->domid;
+	copy_gop->dest.offset = npo->copy_off;
+	copy_gop->dest.u.ref = npo->copy_gref;
+}
+
 /*
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
@@ -450,7 +536,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
 {
-	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
 	unsigned long bytes;
 	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
@@ -465,8 +550,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
-		struct xen_page_foreign *foreign;
-
 		BUG_ON(offset >= PAGE_SIZE);
 		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
 
@@ -480,25 +563,7 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
 			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
 
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		copy_gop->len = bytes;
-
-		foreign = xen_page_foreign(page);
-		if (foreign) {
-			copy_gop->source.domid = foreign->domid;
-			copy_gop->source.u.ref = foreign->gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn =
-				virt_to_mfn(page_address(page));
-		}
-		copy_gop->source.offset = offset;
-
-		copy_gop->dest.domid = queue->vif->domid;
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
+		xenvif_rx_copy_page(queue, npo, page, bytes, offset);
 
 		npo->copy_off += bytes;
 		meta->size += bytes;
@@ -590,6 +655,8 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	meta->id = req->id;
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
+	if (queue->vif->persistent_grants)
+		npo->copy_page = get_next_rx_page(queue, npo);
 
 	data = skb->data;
 	while (data < skb_tail_pointer(skb)) {
@@ -616,24 +683,74 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 }
 
 /*
+ * Called to check if the grant maps succeded, and also adding
+ * them to the grant tree. If some of the grants already exist in the tree
+ * it will unmap those.
+ */
+static void xenvif_check_mop(struct xenvif_queue *queue, int nr_mops,
+			     struct netrx_pending_operations *npo)
+{
+	struct persistent_gnt_tree *tree = &queue->rx_gnts_tree;
+	struct gnttab_map_grant_ref *gop_map;
+	struct page *page;
+	int i;
+
+	for (i = 0; i < nr_mops; i++) {
+		struct persistent_gnt *persistent_gnt;
+
+		gop_map = queue->rx_map_ops + npo->map_cons++;
+		page = virt_to_page(gop_map->host_addr);
+
+		if (gop_map->status != GNTST_okay) {
+			if (net_ratelimit())
+				netdev_err(queue->vif->dev,
+					   "Bad status %d from map to DOM%d.\n",
+					   gop_map->status, queue->vif->domid);
+			put_free_pages(tree, &page, 1);
+			continue;
+		}
+
+		persistent_gnt = xenvif_pgrant_new(tree, gop_map);
+		if (unlikely(!persistent_gnt)) {
+			netdev_err(queue->vif->dev,
+				   "Couldn't add gref to the tree! ref: %d",
+				   gop_map->ref);
+			xenvif_page_unmap(queue, gop_map->handle, &page);
+			put_free_pages(tree, &page, 1);
+			kfree(persistent_gnt);
+			persistent_gnt = NULL;
+			continue;
+		}
+
+		put_persistent_gnt(tree, persistent_gnt);
+	}
+}
+
+/*
  * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
  * used to set up the operations on the top of
  * netrx_pending_operations, which have since been done.  Check that
  * they didn't give any errors and advance over them.
  */
-static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
+static int xenvif_check_gop(struct xenvif_queue *queue, int nr_meta_slots,
 			    struct netrx_pending_operations *npo)
 {
 	struct gnttab_copy     *copy_op;
 	int status = XEN_NETIF_RSP_OKAY;
 	int i;
 
+	nr_meta_slots -= npo->copy_done;
+	if (npo->map_prod)
+		xenvif_check_mop(queue,
+				 npo->map_prod - npo->map_cons,
+				 npo);
+
 	for (i = 0; i < nr_meta_slots; i++) {
 		copy_op = npo->copy + npo->copy_cons++;
 		if (copy_op->status != GNTST_okay) {
-			netdev_dbg(vif->dev,
+			netdev_dbg(queue->vif->dev,
 				   "Bad status %d from copy to DOM%d.\n",
-				   copy_op->status, vif->domid);
+				   copy_op->status, queue->vif->domid);
 			status = XEN_NETIF_RSP_ERROR;
 		}
 	}
@@ -686,7 +803,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
 	struct netrx_pending_operations npo = {
 		.copy  = queue->grant_copy_op,
-		.meta  = queue->meta,
+		.meta  = queue->meta
 	};
 
 	skb_queue_head_init(&rxq);
@@ -705,13 +822,22 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		__skb_queue_tail(&rxq, skb);
 	}
 
-	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
-
-	if (!npo.copy_prod)
+	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
+	if (!npo.copy_done && !npo.copy_prod)
 		goto done;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
-	gnttab_batch_copy(queue->grant_copy_op, npo.copy_prod);
+	if (npo.copy_prod)
+		gnttab_batch_copy(npo.copy, npo.copy_prod);
+
+	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
+	if (npo.map_prod) {
+		ret = gnttab_map_refs(queue->rx_map_ops,
+				      NULL,
+				      queue->rx_pages_to_map,
+				      npo.map_prod);
+		BUG_ON(ret);
+	}
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
 
@@ -734,7 +860,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		queue->stats.tx_bytes += skb->len;
 		queue->stats.tx_packets++;
 
-		status = xenvif_check_gop(queue->vif,
+		status = xenvif_check_gop(queue,
 					  XENVIF_RX_CB(skb)->meta_slots_used,
 					  &npo);
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 04/13] xen-netback: implement RX persistent grants
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

It starts by doing a lookup in the tree for a gref. If no persistent
grant is found on the tree, it will do grant copy and prepare
the grant maps. Finally valides the grant map and adds it to the tree.
After mapped these grants can be pulled from the tree in the subsequent
requests. If it's out of pages in the tree pool, it will fallback to
grant copy.

It adds four new fields in the netrx_pending_operations: copy_done
to track how many copies were made; map_prod and map_cons to track
how many maps are outstanding validation and finally copy_page for
the correspondent page (in tree) for copy_gref.

Results are 1.04 Mpps measured with pktgen (pkt_size 64, burst 1)
with persistent grants versus 1.23 Mpps with grant copy (20%
regression). With persistent grants it adds up contention on
queue->wq as the kthread_guest_rx goes to sleep more often. If we
speed up the sender (burst 2,4 and 8) it goes up to 1.7 Mpps with
persistent grants. This issue is addressed in later a commit, by
copying the skb on xenvif_start_xmit() instead of going through
the RX kthread.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h    |   7 ++
 drivers/net/xen-netback/interface.c |  14 ++-
 drivers/net/xen-netback/netback.c   | 190 ++++++++++++++++++++++++++++++------
 3 files changed, 178 insertions(+), 33 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index e5ee220..23deb6a 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -235,6 +235,13 @@ struct xenvif_queue { /* Per-queue data for xenvif */
 
 	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
 
+	/* To map the grefs to be added to the tree */
+	struct gnttab_map_grant_ref rx_map_ops[XEN_NETIF_RX_RING_SIZE];
+	struct page *rx_pages_to_map[XEN_NETIF_RX_RING_SIZE];
+	/* Only used if feature-persistent = 1 */
+	struct persistent_gnt_tree rx_gnts_tree;
+	struct page *rx_gnts_pages[XEN_NETIF_RX_RING_SIZE];
+
 	/* We create one meta structure per ring request we consume, so
 	 * the maximum number is the same as the ring size.
 	 */
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 6f996ac..1103568 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -533,10 +533,19 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 					       XEN_NETIF_TX_RING_SIZE);
 		if (err)
 			goto err_disable;
+
+		err = init_persistent_gnt_tree(&queue->rx_gnts_tree,
+					       queue->rx_gnts_pages,
+					       XEN_NETIF_RX_RING_SIZE);
+		if (err)
+			goto err_free_tx;
 	}
 
 	return 0;
 
+err_free_tx:
+	gnttab_free_pages(XEN_NETIF_TX_RING_SIZE,
+			  queue->tx_gnts_pages);
 err_disable:
 	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
 	queue->vif->persistent_grants = 0;
@@ -697,9 +706,12 @@ void xenvif_disconnect(struct xenvif *vif)
 
 		xenvif_unmap_frontend_rings(queue);
 
-		if (queue->vif->persistent_grants)
+		if (queue->vif->persistent_grants) {
 			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
 						   queue->tx_gnts_pages);
+			deinit_persistent_gnt_tree(&queue->rx_gnts_tree,
+						   queue->rx_gnts_pages);
+		}
 	}
 }
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 529d7c3..738b6ee 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -413,14 +413,62 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
 }
 
 struct netrx_pending_operations {
+	unsigned map_prod, map_cons;
 	unsigned copy_prod, copy_cons;
 	unsigned meta_prod, meta_cons;
 	struct gnttab_copy *copy;
 	struct xenvif_rx_meta *meta;
 	int copy_off;
 	grant_ref_t copy_gref;
+	struct page *copy_page;
+	unsigned copy_done;
 };
 
+static void xenvif_create_rx_map_op(struct xenvif_queue *queue,
+				    struct gnttab_map_grant_ref *mop,
+				    grant_ref_t ref,
+				    struct page *page)
+{
+	queue->rx_pages_to_map[mop - queue->rx_map_ops] = page;
+	gnttab_set_map_op(mop,
+			  (unsigned long)page_to_kaddr(page),
+			  GNTMAP_host_map,
+			  ref, queue->vif->domid);
+}
+
+static struct page *get_next_rx_page(struct xenvif_queue *queue,
+				     struct netrx_pending_operations *npo)
+{
+	struct persistent_gnt_tree *tree = &queue->rx_gnts_tree;
+	struct persistent_gnt *gnt;
+	struct page *page = NULL;
+
+	gnt = get_persistent_gnt(tree, npo->copy_gref);
+	BUG_ON(IS_ERR(gnt));
+
+	if (likely(gnt)) {
+		page = gnt->page;
+		put_persistent_gnt(tree, gnt);
+		npo->copy_done++;
+		return page;
+	}
+
+	/* We couldn't find a match for the gref in the tree.
+	 * Map the page and add it to the tree. This page won't
+	 * be used for copying the packet but instead we will rely
+	 * grant copy. On the second time the gref is requested, the
+	 * persistent grant will be used instead.
+	 */
+	if (!get_free_page(tree, &page)) {
+		struct gnttab_map_grant_ref *mop;
+
+		mop = queue->rx_map_ops + npo->map_prod++;
+		xenvif_create_rx_map_op(queue, mop, npo->copy_gref, page);
+	}
+
+	return NULL;
+}
+
 static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 						 struct netrx_pending_operations *npo)
 {
@@ -437,10 +485,48 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct xenvif_queue *queue,
 
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
-
+	npo->copy_page = NULL;
+	if (queue->vif->persistent_grants)
+		npo->copy_page = get_next_rx_page(queue, npo);
 	return meta;
 }
 
+static void xenvif_rx_copy_page(struct xenvif_queue *queue,
+				struct netrx_pending_operations *npo,
+				struct page *page, unsigned len,
+				unsigned offset)
+{
+	struct gnttab_copy *copy_gop;
+	struct xen_page_foreign *foreign = xen_page_foreign(page);
+
+	if (likely(npo->copy_page)) {
+		memcpy(page_address(npo->copy_page) + npo->copy_off,
+		       page_to_kaddr(page) + offset, len);
+		return;
+	}
+
+	/* No persistent grant found, so we rely on grant copy
+	 */
+	copy_gop = npo->copy + npo->copy_prod++;
+	copy_gop->flags = GNTCOPY_dest_gref;
+	copy_gop->len = len;
+
+	if (foreign) {
+		copy_gop->source.domid = foreign->domid;
+		copy_gop->source.u.ref = foreign->gref;
+		copy_gop->flags |= GNTCOPY_source_gref;
+	} else {
+		copy_gop->source.domid = DOMID_SELF;
+		copy_gop->source.u.gmfn =
+			virt_to_mfn(page_address(page));
+	}
+	copy_gop->source.offset = offset;
+
+	copy_gop->dest.domid = queue->vif->domid;
+	copy_gop->dest.offset = npo->copy_off;
+	copy_gop->dest.u.ref = npo->copy_gref;
+}
+
 /*
  * Set up the grant operations for this fragment. If it's a flipping
  * interface, we also set up the unmap request from here.
@@ -450,7 +536,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 				 struct page *page, unsigned long size,
 				 unsigned long offset, int *head)
 {
-	struct gnttab_copy *copy_gop;
 	struct xenvif_rx_meta *meta;
 	unsigned long bytes;
 	int gso_type = XEN_NETIF_GSO_TYPE_NONE;
@@ -465,8 +550,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 	offset &= ~PAGE_MASK;
 
 	while (size > 0) {
-		struct xen_page_foreign *foreign;
-
 		BUG_ON(offset >= PAGE_SIZE);
 		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
 
@@ -480,25 +563,7 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
 			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
 
-		copy_gop = npo->copy + npo->copy_prod++;
-		copy_gop->flags = GNTCOPY_dest_gref;
-		copy_gop->len = bytes;
-
-		foreign = xen_page_foreign(page);
-		if (foreign) {
-			copy_gop->source.domid = foreign->domid;
-			copy_gop->source.u.ref = foreign->gref;
-			copy_gop->flags |= GNTCOPY_source_gref;
-		} else {
-			copy_gop->source.domid = DOMID_SELF;
-			copy_gop->source.u.gmfn =
-				virt_to_mfn(page_address(page));
-		}
-		copy_gop->source.offset = offset;
-
-		copy_gop->dest.domid = queue->vif->domid;
-		copy_gop->dest.offset = npo->copy_off;
-		copy_gop->dest.u.ref = npo->copy_gref;
+		xenvif_rx_copy_page(queue, npo, page, bytes, offset);
 
 		npo->copy_off += bytes;
 		meta->size += bytes;
@@ -590,6 +655,8 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 	meta->id = req->id;
 	npo->copy_off = 0;
 	npo->copy_gref = req->gref;
+	if (queue->vif->persistent_grants)
+		npo->copy_page = get_next_rx_page(queue, npo);
 
 	data = skb->data;
 	while (data < skb_tail_pointer(skb)) {
@@ -616,24 +683,74 @@ static int xenvif_gop_skb(struct sk_buff *skb,
 }
 
 /*
+ * Called to check if the grant maps succeded, and also adding
+ * them to the grant tree. If some of the grants already exist in the tree
+ * it will unmap those.
+ */
+static void xenvif_check_mop(struct xenvif_queue *queue, int nr_mops,
+			     struct netrx_pending_operations *npo)
+{
+	struct persistent_gnt_tree *tree = &queue->rx_gnts_tree;
+	struct gnttab_map_grant_ref *gop_map;
+	struct page *page;
+	int i;
+
+	for (i = 0; i < nr_mops; i++) {
+		struct persistent_gnt *persistent_gnt;
+
+		gop_map = queue->rx_map_ops + npo->map_cons++;
+		page = virt_to_page(gop_map->host_addr);
+
+		if (gop_map->status != GNTST_okay) {
+			if (net_ratelimit())
+				netdev_err(queue->vif->dev,
+					   "Bad status %d from map to DOM%d.\n",
+					   gop_map->status, queue->vif->domid);
+			put_free_pages(tree, &page, 1);
+			continue;
+		}
+
+		persistent_gnt = xenvif_pgrant_new(tree, gop_map);
+		if (unlikely(!persistent_gnt)) {
+			netdev_err(queue->vif->dev,
+				   "Couldn't add gref to the tree! ref: %d",
+				   gop_map->ref);
+			xenvif_page_unmap(queue, gop_map->handle, &page);
+			put_free_pages(tree, &page, 1);
+			kfree(persistent_gnt);
+			persistent_gnt = NULL;
+			continue;
+		}
+
+		put_persistent_gnt(tree, persistent_gnt);
+	}
+}
+
+/*
  * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
  * used to set up the operations on the top of
  * netrx_pending_operations, which have since been done.  Check that
  * they didn't give any errors and advance over them.
  */
-static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
+static int xenvif_check_gop(struct xenvif_queue *queue, int nr_meta_slots,
 			    struct netrx_pending_operations *npo)
 {
 	struct gnttab_copy     *copy_op;
 	int status = XEN_NETIF_RSP_OKAY;
 	int i;
 
+	nr_meta_slots -= npo->copy_done;
+	if (npo->map_prod)
+		xenvif_check_mop(queue,
+				 npo->map_prod - npo->map_cons,
+				 npo);
+
 	for (i = 0; i < nr_meta_slots; i++) {
 		copy_op = npo->copy + npo->copy_cons++;
 		if (copy_op->status != GNTST_okay) {
-			netdev_dbg(vif->dev,
+			netdev_dbg(queue->vif->dev,
 				   "Bad status %d from copy to DOM%d.\n",
-				   copy_op->status, vif->domid);
+				   copy_op->status, queue->vif->domid);
 			status = XEN_NETIF_RSP_ERROR;
 		}
 	}
@@ -686,7 +803,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
 	struct netrx_pending_operations npo = {
 		.copy  = queue->grant_copy_op,
-		.meta  = queue->meta,
+		.meta  = queue->meta
 	};
 
 	skb_queue_head_init(&rxq);
@@ -705,13 +822,22 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		__skb_queue_tail(&rxq, skb);
 	}
 
-	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
-
-	if (!npo.copy_prod)
+	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
+	if (!npo.copy_done && !npo.copy_prod)
 		goto done;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
-	gnttab_batch_copy(queue->grant_copy_op, npo.copy_prod);
+	if (npo.copy_prod)
+		gnttab_batch_copy(npo.copy, npo.copy_prod);
+
+	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
+	if (npo.map_prod) {
+		ret = gnttab_map_refs(queue->rx_map_ops,
+				      NULL,
+				      queue->rx_pages_to_map,
+				      npo.map_prod);
+		BUG_ON(ret);
+	}
 
 	while ((skb = __skb_dequeue(&rxq)) != NULL) {
 
@@ -734,7 +860,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		queue->stats.tx_bytes += skb->len;
 		queue->stats.tx_packets++;
 
-		status = xenvif_check_gop(queue->vif,
+		status = xenvif_check_gop(queue,
 					  XENVIF_RX_CB(skb)->meta_slots_used,
 					  &npo);
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 05/13] xen-netback: refactor xenvif_rx_action
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Refactor xenvif_rx_action by dividing it into build_gops and
submit, similar to what xenvif_tx_action looks like.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/netback.c | 180 ++++++++++++++++++++------------------
 1 file changed, 96 insertions(+), 84 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 738b6ee..c4f57d7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -789,16 +789,105 @@ void xenvif_kick_thread(struct xenvif_queue *queue)
 	wake_up(&queue->wq);
 }
 
-static void xenvif_rx_action(struct xenvif_queue *queue)
+static void xenvif_rx_build_gops(struct xenvif_queue *queue,
+				 struct netrx_pending_operations *npo,
+				 struct sk_buff *skb)
+{
+	RING_IDX old_req_cons;
+	RING_IDX ring_slots_used;
+
+	queue->last_rx_time = jiffies;
+
+	old_req_cons = queue->rx.req_cons;
+	XENVIF_RX_CB(skb)->meta_slots_used = xenvif_gop_skb(skb, npo, queue);
+	ring_slots_used = queue->rx.req_cons - old_req_cons;
+}
+
+static bool xenvif_rx_submit(struct xenvif_queue *queue,
+			     struct netrx_pending_operations *npo,
+			     struct sk_buff *skb)
 {
 	s8 status;
 	u16 flags;
 	struct xen_netif_rx_response *resp;
-	struct sk_buff_head rxq;
-	struct sk_buff *skb;
 	LIST_HEAD(notify);
 	int ret;
 	unsigned long offset;
+
+	if ((1 << queue->meta[npo->meta_cons].gso_type) &
+	    queue->vif->gso_prefix_mask) {
+		resp = RING_GET_RESPONSE(&queue->rx,
+					 queue->rx.rsp_prod_pvt++);
+
+		resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
+
+		resp->offset = queue->meta[npo->meta_cons].gso_size;
+		resp->id = queue->meta[npo->meta_cons].id;
+		resp->status = XENVIF_RX_CB(skb)->meta_slots_used;
+
+		npo->meta_cons++;
+		XENVIF_RX_CB(skb)->meta_slots_used--;
+	}
+
+	queue->stats.tx_bytes += skb->len;
+	queue->stats.tx_packets++;
+
+	status = xenvif_check_gop(queue,
+				  XENVIF_RX_CB(skb)->meta_slots_used,
+				  npo);
+
+	if (XENVIF_RX_CB(skb)->meta_slots_used == 1)
+		flags = 0;
+	else
+		flags = XEN_NETRXF_more_data;
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */
+		flags |= XEN_NETRXF_csum_blank | XEN_NETRXF_data_validated;
+	else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
+		/* remote but checksummed. */
+		flags |= XEN_NETRXF_data_validated;
+
+	offset = 0;
+	resp = make_rx_response(queue, queue->meta[npo->meta_cons].id,
+				status, offset,
+				queue->meta[npo->meta_cons].size,
+				flags);
+
+	if ((1 << queue->meta[npo->meta_cons].gso_type) &
+	    queue->vif->gso_mask) {
+		struct xen_netif_extra_info *gso =
+			(struct xen_netif_extra_info *)
+			RING_GET_RESPONSE(&queue->rx,
+					  queue->rx.rsp_prod_pvt++);
+
+		resp->flags |= XEN_NETRXF_extra_info;
+
+		gso->u.gso.type = queue->meta[npo->meta_cons].gso_type;
+		gso->u.gso.size = queue->meta[npo->meta_cons].gso_size;
+		gso->u.gso.pad = 0;
+		gso->u.gso.features = 0;
+
+		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
+		gso->flags = 0;
+	}
+
+	xenvif_add_frag_responses(queue, status,
+				  queue->meta + npo->meta_cons + 1,
+				  XENVIF_RX_CB(skb)->meta_slots_used);
+
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->rx, ret);
+
+	npo->meta_cons += XENVIF_RX_CB(skb)->meta_slots_used;
+	dev_kfree_skb(skb);
+
+	return !!ret;
+}
+
+static void xenvif_rx_action(struct xenvif_queue *queue)
+{
+	int ret;
+	struct sk_buff *skb;
+	struct sk_buff_head rxq;
 	bool need_to_notify = false;
 
 	struct netrx_pending_operations npo = {
@@ -810,21 +899,14 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
 	while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
 	       && (skb = xenvif_rx_dequeue(queue)) != NULL) {
-		RING_IDX old_req_cons;
-		RING_IDX ring_slots_used;
-
-		queue->last_rx_time = jiffies;
-
-		old_req_cons = queue->rx.req_cons;
-		XENVIF_RX_CB(skb)->meta_slots_used = xenvif_gop_skb(skb, &npo, queue);
-		ring_slots_used = queue->rx.req_cons - old_req_cons;
 
+		xenvif_rx_build_gops(queue, &npo, skb);
 		__skb_queue_tail(&rxq, skb);
 	}
 
 	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
 	if (!npo.copy_done && !npo.copy_prod)
-		goto done;
+		return;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
 	if (npo.copy_prod)
@@ -839,79 +921,9 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		BUG_ON(ret);
 	}
 
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-
-		if ((1 << queue->meta[npo.meta_cons].gso_type) &
-		    queue->vif->gso_prefix_mask) {
-			resp = RING_GET_RESPONSE(&queue->rx,
-						 queue->rx.rsp_prod_pvt++);
-
-			resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
-
-			resp->offset = queue->meta[npo.meta_cons].gso_size;
-			resp->id = queue->meta[npo.meta_cons].id;
-			resp->status = XENVIF_RX_CB(skb)->meta_slots_used;
-
-			npo.meta_cons++;
-			XENVIF_RX_CB(skb)->meta_slots_used--;
-		}
-
-
-		queue->stats.tx_bytes += skb->len;
-		queue->stats.tx_packets++;
-
-		status = xenvif_check_gop(queue,
-					  XENVIF_RX_CB(skb)->meta_slots_used,
-					  &npo);
-
-		if (XENVIF_RX_CB(skb)->meta_slots_used == 1)
-			flags = 0;
-		else
-			flags = XEN_NETRXF_more_data;
-
-		if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */
-			flags |= XEN_NETRXF_csum_blank | XEN_NETRXF_data_validated;
-		else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
-			/* remote but checksummed. */
-			flags |= XEN_NETRXF_data_validated;
-
-		offset = 0;
-		resp = make_rx_response(queue, queue->meta[npo.meta_cons].id,
-					status, offset,
-					queue->meta[npo.meta_cons].size,
-					flags);
-
-		if ((1 << queue->meta[npo.meta_cons].gso_type) &
-		    queue->vif->gso_mask) {
-			struct xen_netif_extra_info *gso =
-				(struct xen_netif_extra_info *)
-				RING_GET_RESPONSE(&queue->rx,
-						  queue->rx.rsp_prod_pvt++);
-
-			resp->flags |= XEN_NETRXF_extra_info;
-
-			gso->u.gso.type = queue->meta[npo.meta_cons].gso_type;
-			gso->u.gso.size = queue->meta[npo.meta_cons].gso_size;
-			gso->u.gso.pad = 0;
-			gso->u.gso.features = 0;
-
-			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
-			gso->flags = 0;
-		}
-
-		xenvif_add_frag_responses(queue, status,
-					  queue->meta + npo.meta_cons + 1,
-					  XENVIF_RX_CB(skb)->meta_slots_used);
-
-		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->rx, ret);
-
-		need_to_notify |= !!ret;
-
-		npo.meta_cons += XENVIF_RX_CB(skb)->meta_slots_used;
-		dev_kfree_skb(skb);
-	}
+	while ((skb = __skb_dequeue(&rxq)) != NULL)
+		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
 
-done:
 	if (need_to_notify)
 		notify_remote_via_irq(queue->rx_irq);
 }
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 05/13] xen-netback: refactor xenvif_rx_action
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Refactor xenvif_rx_action by dividing it into build_gops and
submit, similar to what xenvif_tx_action looks like.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/netback.c | 180 ++++++++++++++++++++------------------
 1 file changed, 96 insertions(+), 84 deletions(-)

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 738b6ee..c4f57d7 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -789,16 +789,105 @@ void xenvif_kick_thread(struct xenvif_queue *queue)
 	wake_up(&queue->wq);
 }
 
-static void xenvif_rx_action(struct xenvif_queue *queue)
+static void xenvif_rx_build_gops(struct xenvif_queue *queue,
+				 struct netrx_pending_operations *npo,
+				 struct sk_buff *skb)
+{
+	RING_IDX old_req_cons;
+	RING_IDX ring_slots_used;
+
+	queue->last_rx_time = jiffies;
+
+	old_req_cons = queue->rx.req_cons;
+	XENVIF_RX_CB(skb)->meta_slots_used = xenvif_gop_skb(skb, npo, queue);
+	ring_slots_used = queue->rx.req_cons - old_req_cons;
+}
+
+static bool xenvif_rx_submit(struct xenvif_queue *queue,
+			     struct netrx_pending_operations *npo,
+			     struct sk_buff *skb)
 {
 	s8 status;
 	u16 flags;
 	struct xen_netif_rx_response *resp;
-	struct sk_buff_head rxq;
-	struct sk_buff *skb;
 	LIST_HEAD(notify);
 	int ret;
 	unsigned long offset;
+
+	if ((1 << queue->meta[npo->meta_cons].gso_type) &
+	    queue->vif->gso_prefix_mask) {
+		resp = RING_GET_RESPONSE(&queue->rx,
+					 queue->rx.rsp_prod_pvt++);
+
+		resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
+
+		resp->offset = queue->meta[npo->meta_cons].gso_size;
+		resp->id = queue->meta[npo->meta_cons].id;
+		resp->status = XENVIF_RX_CB(skb)->meta_slots_used;
+
+		npo->meta_cons++;
+		XENVIF_RX_CB(skb)->meta_slots_used--;
+	}
+
+	queue->stats.tx_bytes += skb->len;
+	queue->stats.tx_packets++;
+
+	status = xenvif_check_gop(queue,
+				  XENVIF_RX_CB(skb)->meta_slots_used,
+				  npo);
+
+	if (XENVIF_RX_CB(skb)->meta_slots_used == 1)
+		flags = 0;
+	else
+		flags = XEN_NETRXF_more_data;
+
+	if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */
+		flags |= XEN_NETRXF_csum_blank | XEN_NETRXF_data_validated;
+	else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
+		/* remote but checksummed. */
+		flags |= XEN_NETRXF_data_validated;
+
+	offset = 0;
+	resp = make_rx_response(queue, queue->meta[npo->meta_cons].id,
+				status, offset,
+				queue->meta[npo->meta_cons].size,
+				flags);
+
+	if ((1 << queue->meta[npo->meta_cons].gso_type) &
+	    queue->vif->gso_mask) {
+		struct xen_netif_extra_info *gso =
+			(struct xen_netif_extra_info *)
+			RING_GET_RESPONSE(&queue->rx,
+					  queue->rx.rsp_prod_pvt++);
+
+		resp->flags |= XEN_NETRXF_extra_info;
+
+		gso->u.gso.type = queue->meta[npo->meta_cons].gso_type;
+		gso->u.gso.size = queue->meta[npo->meta_cons].gso_size;
+		gso->u.gso.pad = 0;
+		gso->u.gso.features = 0;
+
+		gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
+		gso->flags = 0;
+	}
+
+	xenvif_add_frag_responses(queue, status,
+				  queue->meta + npo->meta_cons + 1,
+				  XENVIF_RX_CB(skb)->meta_slots_used);
+
+	RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->rx, ret);
+
+	npo->meta_cons += XENVIF_RX_CB(skb)->meta_slots_used;
+	dev_kfree_skb(skb);
+
+	return !!ret;
+}
+
+static void xenvif_rx_action(struct xenvif_queue *queue)
+{
+	int ret;
+	struct sk_buff *skb;
+	struct sk_buff_head rxq;
 	bool need_to_notify = false;
 
 	struct netrx_pending_operations npo = {
@@ -810,21 +899,14 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
 	while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
 	       && (skb = xenvif_rx_dequeue(queue)) != NULL) {
-		RING_IDX old_req_cons;
-		RING_IDX ring_slots_used;
-
-		queue->last_rx_time = jiffies;
-
-		old_req_cons = queue->rx.req_cons;
-		XENVIF_RX_CB(skb)->meta_slots_used = xenvif_gop_skb(skb, &npo, queue);
-		ring_slots_used = queue->rx.req_cons - old_req_cons;
 
+		xenvif_rx_build_gops(queue, &npo, skb);
 		__skb_queue_tail(&rxq, skb);
 	}
 
 	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
 	if (!npo.copy_done && !npo.copy_prod)
-		goto done;
+		return;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
 	if (npo.copy_prod)
@@ -839,79 +921,9 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 		BUG_ON(ret);
 	}
 
-	while ((skb = __skb_dequeue(&rxq)) != NULL) {
-
-		if ((1 << queue->meta[npo.meta_cons].gso_type) &
-		    queue->vif->gso_prefix_mask) {
-			resp = RING_GET_RESPONSE(&queue->rx,
-						 queue->rx.rsp_prod_pvt++);
-
-			resp->flags = XEN_NETRXF_gso_prefix | XEN_NETRXF_more_data;
-
-			resp->offset = queue->meta[npo.meta_cons].gso_size;
-			resp->id = queue->meta[npo.meta_cons].id;
-			resp->status = XENVIF_RX_CB(skb)->meta_slots_used;
-
-			npo.meta_cons++;
-			XENVIF_RX_CB(skb)->meta_slots_used--;
-		}
-
-
-		queue->stats.tx_bytes += skb->len;
-		queue->stats.tx_packets++;
-
-		status = xenvif_check_gop(queue,
-					  XENVIF_RX_CB(skb)->meta_slots_used,
-					  &npo);
-
-		if (XENVIF_RX_CB(skb)->meta_slots_used == 1)
-			flags = 0;
-		else
-			flags = XEN_NETRXF_more_data;
-
-		if (skb->ip_summed == CHECKSUM_PARTIAL) /* local packet? */
-			flags |= XEN_NETRXF_csum_blank | XEN_NETRXF_data_validated;
-		else if (skb->ip_summed == CHECKSUM_UNNECESSARY)
-			/* remote but checksummed. */
-			flags |= XEN_NETRXF_data_validated;
-
-		offset = 0;
-		resp = make_rx_response(queue, queue->meta[npo.meta_cons].id,
-					status, offset,
-					queue->meta[npo.meta_cons].size,
-					flags);
-
-		if ((1 << queue->meta[npo.meta_cons].gso_type) &
-		    queue->vif->gso_mask) {
-			struct xen_netif_extra_info *gso =
-				(struct xen_netif_extra_info *)
-				RING_GET_RESPONSE(&queue->rx,
-						  queue->rx.rsp_prod_pvt++);
-
-			resp->flags |= XEN_NETRXF_extra_info;
-
-			gso->u.gso.type = queue->meta[npo.meta_cons].gso_type;
-			gso->u.gso.size = queue->meta[npo.meta_cons].gso_size;
-			gso->u.gso.pad = 0;
-			gso->u.gso.features = 0;
-
-			gso->type = XEN_NETIF_EXTRA_TYPE_GSO;
-			gso->flags = 0;
-		}
-
-		xenvif_add_frag_responses(queue, status,
-					  queue->meta + npo.meta_cons + 1,
-					  XENVIF_RX_CB(skb)->meta_slots_used);
-
-		RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(&queue->rx, ret);
-
-		need_to_notify |= !!ret;
-
-		npo.meta_cons += XENVIF_RX_CB(skb)->meta_slots_used;
-		dev_kfree_skb(skb);
-	}
+	while ((skb = __skb_dequeue(&rxq)) != NULL)
+		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
 
-done:
 	if (need_to_notify)
 		notify_remote_via_irq(queue->rx_irq);
 }
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

By introducing persistent grants we speed up the RX thread with the
decreased copy cost, that leads to a throughput decrease of 20%.
It is observed that the rx_queue stays mostly at 10% of its capacity,
as opposed to full capacity when using grant copy. And a finer measure
with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
queue contention on queue->wq, which hints that the RX kthread is
waits/wake_up more often, that is actually doing work.

Without persistent grants:

class name    con-bounces    contentions   waittime-min   waittime-max
waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min
holdtime-max holdtime-total   holdtime-avg
--------------------------------------------------------------------------
&queue->wq:   792            792           0.36          24.36
1140.30           1.44           4208        1002671           0.00
46.75      538164.02           0.54
----------
&queue->wq    326          [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq    410          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
&queue->wq     56          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
----------
&queue->wq    202          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
&queue->wq    467          [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq    123          [<ffffffff811592bf>] finish_wait+0x4f/0xa0

With persistent grants:

&queue->wq:   61834          61836           0.32          30.12
99710.27           1.61         241400        1125308           0.00
75.61     1106578.82           0.98
----------
&queue->wq     5079        [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq    56280        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
&queue->wq      479        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
----------
&queue->wq     1005        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
&queue->wq    56761        [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq     4072        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0

Also, with persistent grants, we don't require batching grant copy ops
(besides the initial copy+map) which makes me believe that deferring
the skb to the RX kthread just adds up unnecessary overhead (for this
particular case). This patch proposes copying the buffer on
xenvif_start_xmit(), which lets us both remove the contention on
queue->wq and lock on rx_queue. Here, an alternative to
xenvif_rx_action routine is added namely xenvif_rx_map() that maps
and copies the buffer to the guest. This is only used when persistent
grants are used, since it would otherwise mean an hypercall per
packet.

Improvements are up to a factor of 2.14 with a single queue getting us
from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
(burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h    |  2 ++
 drivers/net/xen-netback/interface.c | 11 +++++---
 drivers/net/xen-netback/netback.c   | 52 +++++++++++++++++++++++++++++--------
 3 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 23deb6a..f3ece12 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
 
 int xenvif_dealloc_kthread(void *data);
 
+int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
+
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
 /* Determine whether the needed number of slots (req) are available,
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1103568..dfe2b7b 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif_queue *queue = dev_id;
 
-	xenvif_kick_thread(queue);
+	if (!queue->vif->persistent_grants)
+		xenvif_kick_thread(queue);
 
 	return IRQ_HANDLED;
 }
@@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	cb = XENVIF_RX_CB(skb);
 	cb->expires = jiffies + vif->drain_timeout;
 
-	xenvif_rx_queue_tail(queue, skb);
-	xenvif_kick_thread(queue);
+	if (!queue->vif->persistent_grants) {
+		xenvif_rx_queue_tail(queue, skb);
+		xenvif_kick_thread(queue);
+	} else if (xenvif_rx_map(queue, skb)) {
+		return NETDEV_TX_BUSY;
+	}
 
 	return NETDEV_TX_OK;
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index c4f57d7..228df92 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,
 	return !!ret;
 }
 
+int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb)
+{
+	int ret = -EBUSY;
+	struct netrx_pending_operations npo = {
+		.copy  = queue->grant_copy_op,
+		.meta  = queue->meta
+	};
+
+	if (!xenvif_rx_ring_slots_available(queue, XEN_NETBK_LEGACY_SLOTS_MAX))
+		goto done;
+
+	xenvif_rx_build_gops(queue, &npo, skb);
+
+	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
+	if (!npo.copy_done && !npo.copy_prod)
+		goto done;
+
+	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
+	if (npo.map_prod) {
+		ret = gnttab_map_refs(queue->rx_map_ops,
+				      NULL,
+				      queue->rx_pages_to_map,
+				      npo.map_prod);
+		BUG_ON(ret);
+	}
+
+	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
+	if (npo.copy_prod)
+		gnttab_batch_copy(npo.copy, npo.copy_prod);
+
+	if (xenvif_rx_submit(queue, &npo, skb))
+		notify_remote_via_irq(queue->rx_irq);
+
+	ret = 0; /* clear error */
+done:
+	if (xenvif_queue_stopped(queue))
+		xenvif_wake_queue(queue);
+	return ret;
+}
+
 static void xenvif_rx_action(struct xenvif_queue *queue)
 {
-	int ret;
 	struct sk_buff *skb;
 	struct sk_buff_head rxq;
 	bool need_to_notify = false;
@@ -905,22 +944,13 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 	}
 
 	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
-	if (!npo.copy_done && !npo.copy_prod)
+	if (!npo.copy_prod)
 		return;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
 	if (npo.copy_prod)
 		gnttab_batch_copy(npo.copy, npo.copy_prod);
 
-	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
-	if (npo.map_prod) {
-		ret = gnttab_map_refs(queue->rx_map_ops,
-				      NULL,
-				      queue->rx_pages_to_map,
-				      npo.map_prod);
-		BUG_ON(ret);
-	}
-
 	while ((skb = __skb_dequeue(&rxq)) != NULL)
 		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

By introducing persistent grants we speed up the RX thread with the
decreased copy cost, that leads to a throughput decrease of 20%.
It is observed that the rx_queue stays mostly at 10% of its capacity,
as opposed to full capacity when using grant copy. And a finer measure
with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
queue contention on queue->wq, which hints that the RX kthread is
waits/wake_up more often, that is actually doing work.

Without persistent grants:

class name    con-bounces    contentions   waittime-min   waittime-max
waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min
holdtime-max holdtime-total   holdtime-avg
--------------------------------------------------------------------------
&queue->wq:   792            792           0.36          24.36
1140.30           1.44           4208        1002671           0.00
46.75      538164.02           0.54
----------
&queue->wq    326          [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq    410          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
&queue->wq     56          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
----------
&queue->wq    202          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
&queue->wq    467          [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq    123          [<ffffffff811592bf>] finish_wait+0x4f/0xa0

With persistent grants:

&queue->wq:   61834          61836           0.32          30.12
99710.27           1.61         241400        1125308           0.00
75.61     1106578.82           0.98
----------
&queue->wq     5079        [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq    56280        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
&queue->wq      479        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
----------
&queue->wq     1005        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
&queue->wq    56761        [<ffffffff8115949f>] __wake_up+0x2f/0x80
&queue->wq     4072        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0

Also, with persistent grants, we don't require batching grant copy ops
(besides the initial copy+map) which makes me believe that deferring
the skb to the RX kthread just adds up unnecessary overhead (for this
particular case). This patch proposes copying the buffer on
xenvif_start_xmit(), which lets us both remove the contention on
queue->wq and lock on rx_queue. Here, an alternative to
xenvif_rx_action routine is added namely xenvif_rx_map() that maps
and copies the buffer to the guest. This is only used when persistent
grants are used, since it would otherwise mean an hypercall per
packet.

Improvements are up to a factor of 2.14 with a single queue getting us
from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
(burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/common.h    |  2 ++
 drivers/net/xen-netback/interface.c | 11 +++++---
 drivers/net/xen-netback/netback.c   | 52 +++++++++++++++++++++++++++++--------
 3 files changed, 51 insertions(+), 14 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 23deb6a..f3ece12 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
 
 int xenvif_dealloc_kthread(void *data);
 
+int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
+
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
 /* Determine whether the needed number of slots (req) are available,
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index 1103568..dfe2b7b 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif_queue *queue = dev_id;
 
-	xenvif_kick_thread(queue);
+	if (!queue->vif->persistent_grants)
+		xenvif_kick_thread(queue);
 
 	return IRQ_HANDLED;
 }
@@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	cb = XENVIF_RX_CB(skb);
 	cb->expires = jiffies + vif->drain_timeout;
 
-	xenvif_rx_queue_tail(queue, skb);
-	xenvif_kick_thread(queue);
+	if (!queue->vif->persistent_grants) {
+		xenvif_rx_queue_tail(queue, skb);
+		xenvif_kick_thread(queue);
+	} else if (xenvif_rx_map(queue, skb)) {
+		return NETDEV_TX_BUSY;
+	}
 
 	return NETDEV_TX_OK;
 
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index c4f57d7..228df92 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,
 	return !!ret;
 }
 
+int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb)
+{
+	int ret = -EBUSY;
+	struct netrx_pending_operations npo = {
+		.copy  = queue->grant_copy_op,
+		.meta  = queue->meta
+	};
+
+	if (!xenvif_rx_ring_slots_available(queue, XEN_NETBK_LEGACY_SLOTS_MAX))
+		goto done;
+
+	xenvif_rx_build_gops(queue, &npo, skb);
+
+	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
+	if (!npo.copy_done && !npo.copy_prod)
+		goto done;
+
+	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
+	if (npo.map_prod) {
+		ret = gnttab_map_refs(queue->rx_map_ops,
+				      NULL,
+				      queue->rx_pages_to_map,
+				      npo.map_prod);
+		BUG_ON(ret);
+	}
+
+	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
+	if (npo.copy_prod)
+		gnttab_batch_copy(npo.copy, npo.copy_prod);
+
+	if (xenvif_rx_submit(queue, &npo, skb))
+		notify_remote_via_irq(queue->rx_irq);
+
+	ret = 0; /* clear error */
+done:
+	if (xenvif_queue_stopped(queue))
+		xenvif_wake_queue(queue);
+	return ret;
+}
+
 static void xenvif_rx_action(struct xenvif_queue *queue)
 {
-	int ret;
 	struct sk_buff *skb;
 	struct sk_buff_head rxq;
 	bool need_to_notify = false;
@@ -905,22 +944,13 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 	}
 
 	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
-	if (!npo.copy_done && !npo.copy_prod)
+	if (!npo.copy_prod)
 		return;
 
 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
 	if (npo.copy_prod)
 		gnttab_batch_copy(npo.copy, npo.copy_prod);
 
-	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
-	if (npo.map_prod) {
-		ret = gnttab_map_refs(queue->rx_map_ops,
-				      NULL,
-				      queue->rx_pages_to_map,
-				      npo.map_prod);
-		BUG_ON(ret);
-	}
-
 	while ((skb = __skb_dequeue(&rxq)) != NULL)
 		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 07/13] xen-netback: add persistent tree counters to debugfs
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Prints the total/max number of persistent grants and how many of
them are in use.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/xenbus.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 766f7e5..1e6f27a 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -121,6 +121,17 @@ static int xenvif_read_io_ring(struct seq_file *m, void *v)
 		   skb_queue_len(&queue->rx_queue),
 		   netif_tx_queue_stopped(dev_queue) ? "stopped" : "running");
 
+	if (queue->vif->persistent_grants) {
+		seq_printf(m, "\nRx persistent_gnts: in_use %d max %d gnts %d\n",
+			   atomic_read(&queue->rx_gnts_tree.gnt_in_use),
+			   queue->rx_gnts_tree.gnt_max,
+			   queue->rx_gnts_tree.gnt_c);
+		seq_printf(m, "\nTx persistent_gnts: in_use %d max %d gnts %d\n",
+			   atomic_read(&queue->tx_gnts_tree.gnt_in_use),
+			   queue->tx_gnts_tree.gnt_max,
+			   queue->tx_gnts_tree.gnt_c);
+	}
+
 	return 0;
 }
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 07/13] xen-netback: add persistent tree counters to debugfs
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Prints the total/max number of persistent grants and how many of
them are in use.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/xenbus.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 766f7e5..1e6f27a 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -121,6 +121,17 @@ static int xenvif_read_io_ring(struct seq_file *m, void *v)
 		   skb_queue_len(&queue->rx_queue),
 		   netif_tx_queue_stopped(dev_queue) ? "stopped" : "running");
 
+	if (queue->vif->persistent_grants) {
+		seq_printf(m, "\nRx persistent_gnts: in_use %d max %d gnts %d\n",
+			   atomic_read(&queue->rx_gnts_tree.gnt_in_use),
+			   queue->rx_gnts_tree.gnt_max,
+			   queue->rx_gnts_tree.gnt_c);
+		seq_printf(m, "\nTx persistent_gnts: in_use %d max %d gnts %d\n",
+			   atomic_read(&queue->tx_gnts_tree.gnt_in_use),
+			   queue->tx_gnts_tree.gnt_max,
+			   queue->tx_gnts_tree.gnt_c);
+	}
+
 	return 0;
 }
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

On xenvif_start_xmit() we have an additional queue to the netback RX
kthread that will sends the packet. When using burst>1 pktgen sets
skb->xmit_more to tell the driver that there more skbs in the queue.
However, pktgen transmits the same skb <burst> times, which leads to
the BUG below. Long story short adding the same skb in the rx_queue
queue leads to crash. Specifically, having pktgen running with burst=2
what happens is: when we queue the second skb (that is the same as
the first queued skb), the list will have the tail element with skb->prev
which is the skb itself. On skb_unlink (i.e. when dequeueing the skb)
skb->prev will become NULL, but still having list->next pointing to the
unlinked skb. Because of this skb_peek will still return an skb, which
will redo the skb_unlink trying to set (skb->prev)->next where skb->prev
is now NULL, thus leading to the crash (trace below).

I'm not sure what the best way to fix this but since it's only happening
when we use pktgen with burst>1: I chose doing an skb_clone when we don't
use persistent grants and skb->xmit_more flag is set, and when
CONFIG_NET_PKTGEN is compiled builtin.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa01dbcdc>] xenvif_rx_dequeue+0x7c/0x120 [xen_netback]
PGD 0
Oops: 0002 [#1] SMP
CPU: 1 PID: 10391 Comm: vif510.1-q0-gue Not tainted 4.0.0-rc2-net-next+
task: ffff88003b0ce400 ti: ffff880008538000 task.ti: ffff880008538000
RIP: e030:[<ffffffffa01dbcdc>]  [<ffffffffa01dbcdc>]
xenvif_rx_dequeue+0x7c/0x120 [xen_netback]
RSP: e02b:ffff88000853bde8  EFLAGS: 00010006
RAX: 0000000000000000 RBX: ffffc9000212e000 RCX: 00000000000000e4
RDX: 0000000000000000 RSI: ffff88003b0c0200 RDI: ffffc90002139a24
RBP: ffff88000853bdf8 R08: ffff880008538000 R09: 0000000000000000
R10: aaaaaaaaaaaaaaaa R11: 0000000000000000 R12: ffff8800089a6400
R13: ffffc9000212e000 R14: ffffc90002139a10 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003f700000(0000)
knlGS:ffff88003f700000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000000260b000 CR4: 0000000000042660
Stack:
 ffff88000853be48 ffff88000853be30 ffff88000853beb8 ffffffffa01e19ea
 ffff88000853be60 ffff88003b0ce400 ffff88003ba418c0 ffffc900021399c0
 0000000000000000 ffff88000853be30 ffff88000853be30 ffff000000000000
Call Trace:
 [<ffffffffa01e19ea>] xenvif_kthread_guest_rx+0x26a/0x6e0 [xen_netback]
 [<ffffffffa01e1780>] ? xenvif_map_frontend_rings+0x110/0x110 [xen_netback]
 [<ffffffff8111ae9b>] kthread+0x11b/0x150
 [<ffffffff81120000>] ? clean_sort_range+0x170/0x2f0
 [<ffffffff8111ad80>] ? kthread_stop+0x230/0x230
 [<ffffffff81d6957c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8111ad80>] ? kthread_stop+0x230/0x230
Code: 01 48 83 05 9e f5 00 00 01 49 8b 44 24 08 49 8b 14 24 49 c7 44 24 08
00 00 00 00 49 c7 04 24 00 00 00 00 48 83 05 84 f5 00 00 01 <48> 89 42 08
48 89 10 41 8b 84 24 80 00 00 00 29 83 2c ba 00 00
RIP  [<ffffffffa01dbcdc>] xenvif_rx_dequeue+0x7c/0x120 [xen_netback]
 RSP <ffff88000853bde8>
CR2: 0000000000000008
---[ end trace b3caaf6875c8a975 ]---

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/interface.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index dfe2b7b..5748ba5 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -170,6 +170,15 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	cb->expires = jiffies + vif->drain_timeout;
 
 	if (!queue->vif->persistent_grants) {
+#ifdef CONFIG_NET_PKTGEN
+		if (skb->xmit_more) {
+			struct sk_buff *nskb;
+
+			nskb = skb_clone(skb, GFP_ATOMIC | __GFP_NOWARN);
+			dev_kfree_skb(skb);
+			skb = nskb;
+		}
+#endif
 		xenvif_rx_queue_tail(queue, skb);
 		xenvif_kick_thread(queue);
 	} else if (xenvif_rx_map(queue, skb)) {
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

On xenvif_start_xmit() we have an additional queue to the netback RX
kthread that will sends the packet. When using burst>1 pktgen sets
skb->xmit_more to tell the driver that there more skbs in the queue.
However, pktgen transmits the same skb <burst> times, which leads to
the BUG below. Long story short adding the same skb in the rx_queue
queue leads to crash. Specifically, having pktgen running with burst=2
what happens is: when we queue the second skb (that is the same as
the first queued skb), the list will have the tail element with skb->prev
which is the skb itself. On skb_unlink (i.e. when dequeueing the skb)
skb->prev will become NULL, but still having list->next pointing to the
unlinked skb. Because of this skb_peek will still return an skb, which
will redo the skb_unlink trying to set (skb->prev)->next where skb->prev
is now NULL, thus leading to the crash (trace below).

I'm not sure what the best way to fix this but since it's only happening
when we use pktgen with burst>1: I chose doing an skb_clone when we don't
use persistent grants and skb->xmit_more flag is set, and when
CONFIG_NET_PKTGEN is compiled builtin.

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: [<ffffffffa01dbcdc>] xenvif_rx_dequeue+0x7c/0x120 [xen_netback]
PGD 0
Oops: 0002 [#1] SMP
CPU: 1 PID: 10391 Comm: vif510.1-q0-gue Not tainted 4.0.0-rc2-net-next+
task: ffff88003b0ce400 ti: ffff880008538000 task.ti: ffff880008538000
RIP: e030:[<ffffffffa01dbcdc>]  [<ffffffffa01dbcdc>]
xenvif_rx_dequeue+0x7c/0x120 [xen_netback]
RSP: e02b:ffff88000853bde8  EFLAGS: 00010006
RAX: 0000000000000000 RBX: ffffc9000212e000 RCX: 00000000000000e4
RDX: 0000000000000000 RSI: ffff88003b0c0200 RDI: ffffc90002139a24
RBP: ffff88000853bdf8 R08: ffff880008538000 R09: 0000000000000000
R10: aaaaaaaaaaaaaaaa R11: 0000000000000000 R12: ffff8800089a6400
R13: ffffc9000212e000 R14: ffffc90002139a10 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88003f700000(0000)
knlGS:ffff88003f700000
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000000260b000 CR4: 0000000000042660
Stack:
 ffff88000853be48 ffff88000853be30 ffff88000853beb8 ffffffffa01e19ea
 ffff88000853be60 ffff88003b0ce400 ffff88003ba418c0 ffffc900021399c0
 0000000000000000 ffff88000853be30 ffff88000853be30 ffff000000000000
Call Trace:
 [<ffffffffa01e19ea>] xenvif_kthread_guest_rx+0x26a/0x6e0 [xen_netback]
 [<ffffffffa01e1780>] ? xenvif_map_frontend_rings+0x110/0x110 [xen_netback]
 [<ffffffff8111ae9b>] kthread+0x11b/0x150
 [<ffffffff81120000>] ? clean_sort_range+0x170/0x2f0
 [<ffffffff8111ad80>] ? kthread_stop+0x230/0x230
 [<ffffffff81d6957c>] ret_from_fork+0x7c/0xb0
 [<ffffffff8111ad80>] ? kthread_stop+0x230/0x230
Code: 01 48 83 05 9e f5 00 00 01 49 8b 44 24 08 49 8b 14 24 49 c7 44 24 08
00 00 00 00 49 c7 04 24 00 00 00 00 48 83 05 84 f5 00 00 01 <48> 89 42 08
48 89 10 41 8b 84 24 80 00 00 00 29 83 2c ba 00 00
RIP  [<ffffffffa01dbcdc>] xenvif_rx_dequeue+0x7c/0x120 [xen_netback]
 RSP <ffff88000853bde8>
CR2: 0000000000000008
---[ end trace b3caaf6875c8a975 ]---

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netback/interface.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index dfe2b7b..5748ba5 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -170,6 +170,15 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	cb->expires = jiffies + vif->drain_timeout;
 
 	if (!queue->vif->persistent_grants) {
+#ifdef CONFIG_NET_PKTGEN
+		if (skb->xmit_more) {
+			struct sk_buff *nskb;
+
+			nskb = skb_clone(skb, GFP_ATOMIC | __GFP_NOWARN);
+			dev_kfree_skb(skb);
+			skb = nskb;
+		}
+#endif
 		xenvif_rx_queue_tail(queue, skb);
 		xenvif_kick_thread(queue);
 	} else if (xenvif_rx_map(queue, skb)) {
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Refactors a little bit how grants are stored by moving
grant_rx_ref/grant_tx_ref and grant_tx_page to its
own structure, namely struct grant.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 56 ++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3f45afd..8f49ed4 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -87,6 +87,11 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+struct grant {
+	grant_ref_t ref;
+	struct page *page;
+};
+
 struct netfront_stats {
 	u64			packets;
 	u64			bytes;
@@ -129,8 +134,7 @@ struct netfront_queue {
 		unsigned long link;
 	} tx_skbs[NET_TX_RING_SIZE];
 	grant_ref_t gref_tx_head;
-	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
-	struct page *grant_tx_page[NET_TX_RING_SIZE];
+	struct grant grant_tx[NET_TX_RING_SIZE];
 	unsigned tx_skb_freelist;
 
 	spinlock_t   rx_lock ____cacheline_aligned_in_smp;
@@ -141,7 +145,7 @@ struct netfront_queue {
 
 	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
 	grant_ref_t gref_rx_head;
-	grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
+	struct grant grant_rx[NET_RX_RING_SIZE];
 };
 
 struct netfront_info {
@@ -213,8 +217,9 @@ static grant_ref_t xennet_get_rx_ref(struct netfront_queue *queue,
 					    RING_IDX ri)
 {
 	int i = xennet_rxidx(ri);
-	grant_ref_t ref = queue->grant_rx_ref[i];
-	queue->grant_rx_ref[i] = GRANT_INVALID_REF;
+	grant_ref_t ref = queue->grant_rx[i].ref;
+
+	queue->grant_rx[i].ref = GRANT_INVALID_REF;
 	return ref;
 }
 
@@ -306,7 +311,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 
 		ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
-		queue->grant_rx_ref[id] = ref;
+		queue->grant_rx[id].ref = ref;
 
 		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 
@@ -383,17 +388,17 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 			id  = txrsp->id;
 			skb = queue->tx_skbs[id].skb;
 			if (unlikely(gnttab_query_foreign_access(
-				queue->grant_tx_ref[id]) != 0)) {
+				queue->grant_tx[id].ref) != 0)) {
 				pr_alert("%s: warning -- grant still in use by backend domain\n",
 					 __func__);
 				BUG();
 			}
 			gnttab_end_foreign_access_ref(
-				queue->grant_tx_ref[id], GNTMAP_readonly);
+				queue->grant_tx[id].ref, GNTMAP_readonly);
 			gnttab_release_grant_reference(
-				&queue->gref_tx_head, queue->grant_tx_ref[id]);
-			queue->grant_tx_ref[id] = GRANT_INVALID_REF;
-			queue->grant_tx_page[id] = NULL;
+				&queue->gref_tx_head, queue->grant_tx[id].ref);
+			queue->grant_tx[id].ref = GRANT_INVALID_REF;
+			queue->grant_tx[id].page = NULL;
 			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
@@ -435,8 +440,8 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 					page_to_mfn(page), GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
-	queue->grant_tx_page[id] = page;
-	queue->grant_tx_ref[id] = ref;
+	queue->grant_tx[id].page = page;
+	queue->grant_tx[id].ref = ref;
 
 	tx->id = id;
 	tx->gref = ref;
@@ -659,7 +664,7 @@ static void xennet_move_rx_slot(struct netfront_queue *queue, struct sk_buff *sk
 
 	BUG_ON(queue->rx_skbs[new]);
 	queue->rx_skbs[new] = skb;
-	queue->grant_rx_ref[new] = ref;
+	queue->grant_rx[new].ref = ref;
 	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->id = new;
 	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->gref = ref;
 	queue->rx.req_prod_pvt++;
@@ -1055,6 +1060,7 @@ static struct rtnl_link_stats64 *xennet_get_stats64(struct net_device *dev,
 static void xennet_release_tx_bufs(struct netfront_queue *queue)
 {
 	struct sk_buff *skb;
+	struct page *page;
 	int i;
 
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
@@ -1063,12 +1069,13 @@ static void xennet_release_tx_bufs(struct netfront_queue *queue)
 			continue;
 
 		skb = queue->tx_skbs[i].skb;
-		get_page(queue->grant_tx_page[i]);
-		gnttab_end_foreign_access(queue->grant_tx_ref[i],
+		page = queue->grant_tx[i].page;
+		get_page(page);
+		gnttab_end_foreign_access(queue->grant_tx[i].ref,
 					  GNTMAP_readonly,
-					  (unsigned long)page_address(queue->grant_tx_page[i]));
-		queue->grant_tx_page[i] = NULL;
-		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
+					  (unsigned long)page_address(page));
+		queue->grant_tx[i].page = NULL;
+		queue->grant_tx[i].ref = GRANT_INVALID_REF;
 		add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i);
 		dev_kfree_skb_irq(skb);
 	}
@@ -1088,7 +1095,7 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
 		if (!skb)
 			continue;
 
-		ref = queue->grant_rx_ref[id];
+		ref = queue->grant_rx[id].ref;
 		if (ref == GRANT_INVALID_REF)
 			continue;
 
@@ -1100,7 +1107,7 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
 		get_page(page);
 		gnttab_end_foreign_access(ref, 0,
 					  (unsigned long)page_address(page));
-		queue->grant_rx_ref[id] = GRANT_INVALID_REF;
+		queue->grant_rx[id].ref = GRANT_INVALID_REF;
 
 		kfree_skb(skb);
 	}
@@ -1571,14 +1578,15 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	queue->tx_skb_freelist = 0;
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
-		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
-		queue->grant_tx_page[i] = NULL;
+		queue->grant_tx[i].ref = GRANT_INVALID_REF;
+		queue->grant_tx[i].page = NULL;
 	}
 
 	/* Clear out rx_skbs */
 	for (i = 0; i < NET_RX_RING_SIZE; i++) {
 		queue->rx_skbs[i] = NULL;
-		queue->grant_rx_ref[i] = GRANT_INVALID_REF;
+		queue->grant_rx[i].ref = GRANT_INVALID_REF;
+		queue->grant_rx[i].page = NULL;
 	}
 
 	/* A grant for every tx ring slot */
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Refactors a little bit how grants are stored by moving
grant_rx_ref/grant_tx_ref and grant_tx_page to its
own structure, namely struct grant.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 56 ++++++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 3f45afd..8f49ed4 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -87,6 +87,11 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+struct grant {
+	grant_ref_t ref;
+	struct page *page;
+};
+
 struct netfront_stats {
 	u64			packets;
 	u64			bytes;
@@ -129,8 +134,7 @@ struct netfront_queue {
 		unsigned long link;
 	} tx_skbs[NET_TX_RING_SIZE];
 	grant_ref_t gref_tx_head;
-	grant_ref_t grant_tx_ref[NET_TX_RING_SIZE];
-	struct page *grant_tx_page[NET_TX_RING_SIZE];
+	struct grant grant_tx[NET_TX_RING_SIZE];
 	unsigned tx_skb_freelist;
 
 	spinlock_t   rx_lock ____cacheline_aligned_in_smp;
@@ -141,7 +145,7 @@ struct netfront_queue {
 
 	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
 	grant_ref_t gref_rx_head;
-	grant_ref_t grant_rx_ref[NET_RX_RING_SIZE];
+	struct grant grant_rx[NET_RX_RING_SIZE];
 };
 
 struct netfront_info {
@@ -213,8 +217,9 @@ static grant_ref_t xennet_get_rx_ref(struct netfront_queue *queue,
 					    RING_IDX ri)
 {
 	int i = xennet_rxidx(ri);
-	grant_ref_t ref = queue->grant_rx_ref[i];
-	queue->grant_rx_ref[i] = GRANT_INVALID_REF;
+	grant_ref_t ref = queue->grant_rx[i].ref;
+
+	queue->grant_rx[i].ref = GRANT_INVALID_REF;
 	return ref;
 }
 
@@ -306,7 +311,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 
 		ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
 		BUG_ON((signed short)ref < 0);
-		queue->grant_rx_ref[id] = ref;
+		queue->grant_rx[id].ref = ref;
 
 		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
 
@@ -383,17 +388,17 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 			id  = txrsp->id;
 			skb = queue->tx_skbs[id].skb;
 			if (unlikely(gnttab_query_foreign_access(
-				queue->grant_tx_ref[id]) != 0)) {
+				queue->grant_tx[id].ref) != 0)) {
 				pr_alert("%s: warning -- grant still in use by backend domain\n",
 					 __func__);
 				BUG();
 			}
 			gnttab_end_foreign_access_ref(
-				queue->grant_tx_ref[id], GNTMAP_readonly);
+				queue->grant_tx[id].ref, GNTMAP_readonly);
 			gnttab_release_grant_reference(
-				&queue->gref_tx_head, queue->grant_tx_ref[id]);
-			queue->grant_tx_ref[id] = GRANT_INVALID_REF;
-			queue->grant_tx_page[id] = NULL;
+				&queue->gref_tx_head, queue->grant_tx[id].ref);
+			queue->grant_tx[id].ref = GRANT_INVALID_REF;
+			queue->grant_tx[id].page = NULL;
 			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
@@ -435,8 +440,8 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 					page_to_mfn(page), GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
-	queue->grant_tx_page[id] = page;
-	queue->grant_tx_ref[id] = ref;
+	queue->grant_tx[id].page = page;
+	queue->grant_tx[id].ref = ref;
 
 	tx->id = id;
 	tx->gref = ref;
@@ -659,7 +664,7 @@ static void xennet_move_rx_slot(struct netfront_queue *queue, struct sk_buff *sk
 
 	BUG_ON(queue->rx_skbs[new]);
 	queue->rx_skbs[new] = skb;
-	queue->grant_rx_ref[new] = ref;
+	queue->grant_rx[new].ref = ref;
 	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->id = new;
 	RING_GET_REQUEST(&queue->rx, queue->rx.req_prod_pvt)->gref = ref;
 	queue->rx.req_prod_pvt++;
@@ -1055,6 +1060,7 @@ static struct rtnl_link_stats64 *xennet_get_stats64(struct net_device *dev,
 static void xennet_release_tx_bufs(struct netfront_queue *queue)
 {
 	struct sk_buff *skb;
+	struct page *page;
 	int i;
 
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
@@ -1063,12 +1069,13 @@ static void xennet_release_tx_bufs(struct netfront_queue *queue)
 			continue;
 
 		skb = queue->tx_skbs[i].skb;
-		get_page(queue->grant_tx_page[i]);
-		gnttab_end_foreign_access(queue->grant_tx_ref[i],
+		page = queue->grant_tx[i].page;
+		get_page(page);
+		gnttab_end_foreign_access(queue->grant_tx[i].ref,
 					  GNTMAP_readonly,
-					  (unsigned long)page_address(queue->grant_tx_page[i]));
-		queue->grant_tx_page[i] = NULL;
-		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
+					  (unsigned long)page_address(page));
+		queue->grant_tx[i].page = NULL;
+		queue->grant_tx[i].ref = GRANT_INVALID_REF;
 		add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, i);
 		dev_kfree_skb_irq(skb);
 	}
@@ -1088,7 +1095,7 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
 		if (!skb)
 			continue;
 
-		ref = queue->grant_rx_ref[id];
+		ref = queue->grant_rx[id].ref;
 		if (ref == GRANT_INVALID_REF)
 			continue;
 
@@ -1100,7 +1107,7 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
 		get_page(page);
 		gnttab_end_foreign_access(ref, 0,
 					  (unsigned long)page_address(page));
-		queue->grant_rx_ref[id] = GRANT_INVALID_REF;
+		queue->grant_rx[id].ref = GRANT_INVALID_REF;
 
 		kfree_skb(skb);
 	}
@@ -1571,14 +1578,15 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	queue->tx_skb_freelist = 0;
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
-		queue->grant_tx_ref[i] = GRANT_INVALID_REF;
-		queue->grant_tx_page[i] = NULL;
+		queue->grant_tx[i].ref = GRANT_INVALID_REF;
+		queue->grant_tx[i].page = NULL;
 	}
 
 	/* Clear out rx_skbs */
 	for (i = 0; i < NET_RX_RING_SIZE; i++) {
 		queue->rx_skbs[i] = NULL;
-		queue->grant_rx_ref[i] = GRANT_INVALID_REF;
+		queue->grant_rx[i].ref = GRANT_INVALID_REF;
+		queue->grant_rx[i].page = NULL;
 	}
 
 	/* A grant for every tx ring slot */
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 10/13] xen-netfront: refactor claim/release grant
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Refactors how grants are claimed/released/revoked by moving that code
into claim_grant and release_grant helpers routines that can be shared
in both TX/RX path.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 87 ++++++++++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 29 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8f49ed4..99c17c9 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -256,6 +256,43 @@ static void xennet_maybe_wake_tx(struct netfront_queue *queue)
 		netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
 }
 
+static grant_ref_t claim_grant(struct page *page,
+			       grant_ref_t *gref_head,
+			       int otherend_id,
+			       int flags)
+{
+	grant_ref_t ref;
+	unsigned long mfn;
+
+	ref = gnttab_claim_grant_reference(gref_head);
+	BUG_ON(ref < 0);
+
+	mfn = pfn_to_mfn(page_to_pfn(page));
+	gnttab_grant_foreign_access_ref(
+		ref, otherend_id, mfn, flags);
+
+	return ref;
+}
+
+static void release_grant(grant_ref_t ref,
+			  grant_ref_t *gref_head,
+			  int otherend_id,
+			  int flags)
+{
+	int ret;
+
+	if (unlikely(gnttab_query_foreign_access(
+		ref) != 0)) {
+		pr_alert("%s: warning -- grant still in use by backend domain\n",
+			 __func__);
+		BUG();
+	}
+
+	ret = gnttab_end_foreign_access_ref(ref, flags);
+	BUG_ON(!ret);
+
+	gnttab_release_grant_reference(gref_head, ref);
+}
 
 static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 {
@@ -297,7 +334,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct sk_buff *skb;
 		unsigned short id;
 		grant_ref_t ref;
-		unsigned long pfn;
+		struct page *page;
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
@@ -309,17 +346,15 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		BUG_ON(queue->rx_skbs[id]);
 		queue->rx_skbs[id] = skb;
 
-		ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
-		BUG_ON((signed short)ref < 0);
-		queue->grant_rx[id].ref = ref;
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
+		ref = claim_grant(page,
+				  &queue->gref_rx_head,
+				  queue->info->xbdev->otherend_id,
+				  0);
 
-		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		queue->grant_rx[id].ref = ref;
 
 		req = RING_GET_REQUEST(&queue->rx, req_prod);
-		gnttab_grant_foreign_access_ref(ref,
-						queue->info->xbdev->otherend_id,
-						pfn_to_mfn(pfn),
-						0);
 
 		req->id = id;
 		req->gref = ref;
@@ -387,16 +422,12 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 
 			id  = txrsp->id;
 			skb = queue->tx_skbs[id].skb;
-			if (unlikely(gnttab_query_foreign_access(
-				queue->grant_tx[id].ref) != 0)) {
-				pr_alert("%s: warning -- grant still in use by backend domain\n",
-					 __func__);
-				BUG();
-			}
-			gnttab_end_foreign_access_ref(
-				queue->grant_tx[id].ref, GNTMAP_readonly);
-			gnttab_release_grant_reference(
-				&queue->gref_tx_head, queue->grant_tx[id].ref);
+
+			release_grant(queue->grant_tx[id].ref,
+				      &queue->gref_tx_head,
+				      queue->info->xbdev->otherend_id,
+				      GNTMAP_readonly);
+
 			queue->grant_tx[id].ref = GRANT_INVALID_REF;
 			queue->grant_tx[id].page = NULL;
 			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
@@ -433,11 +464,10 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
-	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
-	BUG_ON((signed short)ref < 0);
-
-	gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
-					page_to_mfn(page), GNTMAP_readonly);
+	ref = claim_grant(page,
+			  &queue->gref_tx_head,
+			  queue->info->xbdev->otherend_id,
+			  GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
 	queue->grant_tx[id].page = page;
@@ -727,7 +757,6 @@ static int xennet_get_responses(struct netfront_queue *queue,
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int slots = 1;
 	int err = 0;
-	unsigned long ret;
 
 	if (rx->flags & XEN_NETRXF_extra_info) {
 		err = xennet_get_extras(queue, extras, rp);
@@ -758,10 +787,10 @@ static int xennet_get_responses(struct netfront_queue *queue,
 			goto next;
 		}
 
-		ret = gnttab_end_foreign_access_ref(ref, 0);
-		BUG_ON(!ret);
-
-		gnttab_release_grant_reference(&queue->gref_rx_head, ref);
+		release_grant(ref,
+			      &queue->gref_rx_head,
+			      queue->info->xbdev->otherend_id,
+			      0);
 
 		__skb_queue_tail(list, skb);
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 10/13] xen-netfront: refactor claim/release grant
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Refactors how grants are claimed/released/revoked by moving that code
into claim_grant and release_grant helpers routines that can be shared
in both TX/RX path.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 87 ++++++++++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 29 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 8f49ed4..99c17c9 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -256,6 +256,43 @@ static void xennet_maybe_wake_tx(struct netfront_queue *queue)
 		netif_tx_wake_queue(netdev_get_tx_queue(dev, queue->id));
 }
 
+static grant_ref_t claim_grant(struct page *page,
+			       grant_ref_t *gref_head,
+			       int otherend_id,
+			       int flags)
+{
+	grant_ref_t ref;
+	unsigned long mfn;
+
+	ref = gnttab_claim_grant_reference(gref_head);
+	BUG_ON(ref < 0);
+
+	mfn = pfn_to_mfn(page_to_pfn(page));
+	gnttab_grant_foreign_access_ref(
+		ref, otherend_id, mfn, flags);
+
+	return ref;
+}
+
+static void release_grant(grant_ref_t ref,
+			  grant_ref_t *gref_head,
+			  int otherend_id,
+			  int flags)
+{
+	int ret;
+
+	if (unlikely(gnttab_query_foreign_access(
+		ref) != 0)) {
+		pr_alert("%s: warning -- grant still in use by backend domain\n",
+			 __func__);
+		BUG();
+	}
+
+	ret = gnttab_end_foreign_access_ref(ref, flags);
+	BUG_ON(!ret);
+
+	gnttab_release_grant_reference(gref_head, ref);
+}
 
 static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 {
@@ -297,7 +334,7 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		struct sk_buff *skb;
 		unsigned short id;
 		grant_ref_t ref;
-		unsigned long pfn;
+		struct page *page;
 		struct xen_netif_rx_request *req;
 
 		skb = xennet_alloc_one_rx_buffer(queue);
@@ -309,17 +346,15 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		BUG_ON(queue->rx_skbs[id]);
 		queue->rx_skbs[id] = skb;
 
-		ref = gnttab_claim_grant_reference(&queue->gref_rx_head);
-		BUG_ON((signed short)ref < 0);
-		queue->grant_rx[id].ref = ref;
+		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
+		ref = claim_grant(page,
+				  &queue->gref_rx_head,
+				  queue->info->xbdev->otherend_id,
+				  0);
 
-		pfn = page_to_pfn(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		queue->grant_rx[id].ref = ref;
 
 		req = RING_GET_REQUEST(&queue->rx, req_prod);
-		gnttab_grant_foreign_access_ref(ref,
-						queue->info->xbdev->otherend_id,
-						pfn_to_mfn(pfn),
-						0);
 
 		req->id = id;
 		req->gref = ref;
@@ -387,16 +422,12 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 
 			id  = txrsp->id;
 			skb = queue->tx_skbs[id].skb;
-			if (unlikely(gnttab_query_foreign_access(
-				queue->grant_tx[id].ref) != 0)) {
-				pr_alert("%s: warning -- grant still in use by backend domain\n",
-					 __func__);
-				BUG();
-			}
-			gnttab_end_foreign_access_ref(
-				queue->grant_tx[id].ref, GNTMAP_readonly);
-			gnttab_release_grant_reference(
-				&queue->gref_tx_head, queue->grant_tx[id].ref);
+
+			release_grant(queue->grant_tx[id].ref,
+				      &queue->gref_tx_head,
+				      queue->info->xbdev->otherend_id,
+				      GNTMAP_readonly);
+
 			queue->grant_tx[id].ref = GRANT_INVALID_REF;
 			queue->grant_tx[id].page = NULL;
 			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
@@ -433,11 +464,10 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
-	ref = gnttab_claim_grant_reference(&queue->gref_tx_head);
-	BUG_ON((signed short)ref < 0);
-
-	gnttab_grant_foreign_access_ref(ref, queue->info->xbdev->otherend_id,
-					page_to_mfn(page), GNTMAP_readonly);
+	ref = claim_grant(page,
+			  &queue->gref_tx_head,
+			  queue->info->xbdev->otherend_id,
+			  GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
 	queue->grant_tx[id].page = page;
@@ -727,7 +757,6 @@ static int xennet_get_responses(struct netfront_queue *queue,
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int slots = 1;
 	int err = 0;
-	unsigned long ret;
 
 	if (rx->flags & XEN_NETRXF_extra_info) {
 		err = xennet_get_extras(queue, extras, rp);
@@ -758,10 +787,10 @@ static int xennet_get_responses(struct netfront_queue *queue,
 			goto next;
 		}
 
-		ret = gnttab_end_foreign_access_ref(ref, 0);
-		BUG_ON(!ret);
-
-		gnttab_release_grant_reference(&queue->gref_rx_head, ref);
+		release_grant(ref,
+			      &queue->gref_rx_head,
+			      queue->info->xbdev->otherend_id,
+			      0);
 
 		__skb_queue_tail(list, skb);
 
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

"feature-persistent" check on xenbus for persistent grants
support on the backend.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 99c17c9..7f44cc7 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -162,6 +162,8 @@ struct netfront_info {
 	struct netfront_stats __percpu *tx_stats;
 
 	atomic_t rx_gso_checksum_fixup;
+
+	unsigned int feature_persistent:1;
 };
 
 struct netfront_rx_info {
@@ -1919,6 +1921,12 @@ again:
 		goto abort_transaction;
 	}
 
+	err = xenbus_write(xbt, dev->nodename, "feature-persistent", "1");
+	if (err) {
+		message = "writing feature-persistent";
+		goto abort_transaction;
+	}
+
 	err = xenbus_transaction_end(xbt, 0);
 	if (err) {
 		if (err == -EAGAIN)
@@ -1950,6 +1958,7 @@ static int xennet_connect(struct net_device *dev)
 	unsigned int num_queues = 0;
 	int err;
 	unsigned int feature_rx_copy;
+	unsigned int feature_persistent;
 	unsigned int j = 0;
 	struct netfront_queue *queue = NULL;
 
@@ -1964,6 +1973,13 @@ static int xennet_connect(struct net_device *dev)
 		return -ENODEV;
 	}
 
+	err = xenbus_gather(XBT_NIL, np->xbdev->otherend,
+			    "feature-persistent", "%u", &feature_persistent,
+			    NULL);
+	if (err)
+		feature_persistent = 0;
+	np->feature_persistent = !!feature_persistent;
+
 	err = talk_to_netback(np->xbdev, np);
 	if (err)
 		return err;
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

"feature-persistent" check on xenbus for persistent grants
support on the backend.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 99c17c9..7f44cc7 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -162,6 +162,8 @@ struct netfront_info {
 	struct netfront_stats __percpu *tx_stats;
 
 	atomic_t rx_gso_checksum_fixup;
+
+	unsigned int feature_persistent:1;
 };
 
 struct netfront_rx_info {
@@ -1919,6 +1921,12 @@ again:
 		goto abort_transaction;
 	}
 
+	err = xenbus_write(xbt, dev->nodename, "feature-persistent", "1");
+	if (err) {
+		message = "writing feature-persistent";
+		goto abort_transaction;
+	}
+
 	err = xenbus_transaction_end(xbt, 0);
 	if (err) {
 		if (err == -EAGAIN)
@@ -1950,6 +1958,7 @@ static int xennet_connect(struct net_device *dev)
 	unsigned int num_queues = 0;
 	int err;
 	unsigned int feature_rx_copy;
+	unsigned int feature_persistent;
 	unsigned int j = 0;
 	struct netfront_queue *queue = NULL;
 
@@ -1964,6 +1973,13 @@ static int xennet_connect(struct net_device *dev)
 		return -ENODEV;
 	}
 
+	err = xenbus_gather(XBT_NIL, np->xbdev->otherend,
+			    "feature-persistent", "%u", &feature_persistent,
+			    NULL);
+	if (err)
+		feature_persistent = 0;
+	np->feature_persistent = !!feature_persistent;
+
 	err = talk_to_netback(np->xbdev, np);
 	if (err)
 		return err;
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Instead of grant/revoking the buffer related to the skb, it will use
an already granted page and memcpy  to it. The grants will be mapped
by xen-netback and reused overtime, but only unmapped when the vif
disconnects, as opposed to every packet.

This only happens if the backend supports persistent grants since it
would, otherwise, introduce the overhead of a memcpy on top of the
grant map.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 7f44cc7..ae0a13b 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -408,6 +408,7 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 	RING_IDX cons, prod;
 	unsigned short id;
 	struct sk_buff *skb;
+	unsigned use_persistent_gnts = queue->info->feature_persistent;
 
 	BUG_ON(!netif_carrier_ok(queue->info->netdev));
 
@@ -425,13 +426,16 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 			id  = txrsp->id;
 			skb = queue->tx_skbs[id].skb;
 
-			release_grant(queue->grant_tx[id].ref,
-				      &queue->gref_tx_head,
-				      queue->info->xbdev->otherend_id,
-				      GNTMAP_readonly);
+			if (!use_persistent_gnts) {
+				release_grant(queue->grant_tx[id].ref,
+					      &queue->gref_tx_head,
+					      queue->info->xbdev->otherend_id,
+					      GNTMAP_readonly);
+
+				queue->grant_tx[id].ref = GRANT_INVALID_REF;
+				queue->grant_tx[id].page = NULL;
+			}
 
-			queue->grant_tx[id].ref = GRANT_INVALID_REF;
-			queue->grant_tx[id].page = NULL;
 			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
@@ -460,23 +464,31 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 {
 	unsigned int id;
 	struct xen_netif_tx_request *tx;
-	grant_ref_t ref;
+	struct grant *gnt;
 
 	len = min_t(unsigned int, PAGE_SIZE - offset, len);
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
-	ref = claim_grant(page,
-			  &queue->gref_tx_head,
-			  queue->info->xbdev->otherend_id,
-			  GNTMAP_readonly);
+	gnt = &queue->grant_tx[id];
+
+	if (queue->info->feature_persistent)
+		memcpy(pfn_to_kaddr(page_to_pfn(gnt->page)) + offset,
+		       pfn_to_kaddr(page_to_pfn(page)) + offset,
+		       len);
+	else
+		gnt->page = page;
+
+	if (gnt->ref == GRANT_INVALID_REF)
+		gnt->ref = claim_grant(gnt->page,
+				       &queue->gref_tx_head,
+				       queue->info->xbdev->otherend_id,
+				       GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
-	queue->grant_tx[id].page = page;
-	queue->grant_tx[id].ref = ref;
 
 	tx->id = id;
-	tx->gref = ref;
+	tx->gref = gnt->ref;
 	tx->offset = offset;
 	tx->size = len;
 	tx->flags = 0;
@@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
 		queue->grant_tx[i].ref = GRANT_INVALID_REF;
-		queue->grant_tx[i].page = NULL;
+		if (queue->info->feature_persistent)
+			queue->grant_tx[i].page = alloc_page(GFP_NOIO);
+		else
+			queue->grant_tx[i].page = NULL;
 	}
 
 	/* Clear out rx_skbs */
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

Instead of grant/revoking the buffer related to the skb, it will use
an already granted page and memcpy  to it. The grants will be mapped
by xen-netback and reused overtime, but only unmapped when the vif
disconnects, as opposed to every packet.

This only happens if the backend supports persistent grants since it
would, otherwise, introduce the overhead of a memcpy on top of the
grant map.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 7f44cc7..ae0a13b 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -408,6 +408,7 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 	RING_IDX cons, prod;
 	unsigned short id;
 	struct sk_buff *skb;
+	unsigned use_persistent_gnts = queue->info->feature_persistent;
 
 	BUG_ON(!netif_carrier_ok(queue->info->netdev));
 
@@ -425,13 +426,16 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
 			id  = txrsp->id;
 			skb = queue->tx_skbs[id].skb;
 
-			release_grant(queue->grant_tx[id].ref,
-				      &queue->gref_tx_head,
-				      queue->info->xbdev->otherend_id,
-				      GNTMAP_readonly);
+			if (!use_persistent_gnts) {
+				release_grant(queue->grant_tx[id].ref,
+					      &queue->gref_tx_head,
+					      queue->info->xbdev->otherend_id,
+					      GNTMAP_readonly);
+
+				queue->grant_tx[id].ref = GRANT_INVALID_REF;
+				queue->grant_tx[id].page = NULL;
+			}
 
-			queue->grant_tx[id].ref = GRANT_INVALID_REF;
-			queue->grant_tx[id].page = NULL;
 			add_id_to_freelist(&queue->tx_skb_freelist, queue->tx_skbs, id);
 			dev_kfree_skb_irq(skb);
 		}
@@ -460,23 +464,31 @@ static struct xen_netif_tx_request *xennet_make_one_txreq(
 {
 	unsigned int id;
 	struct xen_netif_tx_request *tx;
-	grant_ref_t ref;
+	struct grant *gnt;
 
 	len = min_t(unsigned int, PAGE_SIZE - offset, len);
 
 	id = get_id_from_freelist(&queue->tx_skb_freelist, queue->tx_skbs);
 	tx = RING_GET_REQUEST(&queue->tx, queue->tx.req_prod_pvt++);
-	ref = claim_grant(page,
-			  &queue->gref_tx_head,
-			  queue->info->xbdev->otherend_id,
-			  GNTMAP_readonly);
+	gnt = &queue->grant_tx[id];
+
+	if (queue->info->feature_persistent)
+		memcpy(pfn_to_kaddr(page_to_pfn(gnt->page)) + offset,
+		       pfn_to_kaddr(page_to_pfn(page)) + offset,
+		       len);
+	else
+		gnt->page = page;
+
+	if (gnt->ref == GRANT_INVALID_REF)
+		gnt->ref = claim_grant(gnt->page,
+				       &queue->gref_tx_head,
+				       queue->info->xbdev->otherend_id,
+				       GNTMAP_readonly);
 
 	queue->tx_skbs[id].skb = skb;
-	queue->grant_tx[id].page = page;
-	queue->grant_tx[id].ref = ref;
 
 	tx->id = id;
-	tx->gref = ref;
+	tx->gref = gnt->ref;
 	tx->offset = offset;
 	tx->size = len;
 	tx->flags = 0;
@@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
 		queue->grant_tx[i].ref = GRANT_INVALID_REF;
-		queue->grant_tx[i].page = NULL;
+		if (queue->info->feature_persistent)
+			queue->grant_tx[i].page = alloc_page(GFP_NOIO);
+		else
+			queue->grant_tx[i].page = NULL;
 	}
 
 	/* Clear out rx_skbs */
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 13/13] xen-netfront: implement RX persistent grants
  2015-05-12 17:18 ` Joao Martins
@ 2015-05-12 17:18   ` Joao Martins
  -1 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

It allows a newly allocated skb to reuse the gref taken from the
pending_ring, which means xennet will grant the pages once and release
them only when freeing the device. It changes how netfront handles news
skbs to be able to reuse the allocated pages similarly to how netback
is already doing for the netback TX path.

alloc_rx_buffers() will consume pages from the pending_ring to
allocate new skbs. When responses are handled we will move the grants
from the grant_rx to the pending_grants. The latter is a shadow ring
that keeps all grants belonging to inflight skbs. Finally chaining
all skbs ubuf_info together to finally pass the packet up to the
network stack. We make use of SKBTX_DEV_ZEROCOPY to get notified
once the skb is freed to be able to reuse pages. On the destructor
callback we will then add the grant to the pending_ring.

The only catch about this approach is: when we orphan frags, there
will be a memcpy on skb_copy_ubufs() (if frags bigger than 0).
Depending on the CPU and number of queues this leads to a performance
drop of between 7-11%. For this reason, SKBTX_DEV_ZEROCOPY skbs will
only be used with persistent grants.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 212 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 202 insertions(+), 10 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ae0a13b..7067bbb 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -67,6 +67,7 @@ static const struct ethtool_ops xennet_ethtool_ops;
 
 struct netfront_cb {
 	int pull_to;
+	u16 pending_idx;
 };
 
 #define NETFRONT_SKB_CB(skb)	((struct netfront_cb *)((skb)->cb))
@@ -87,9 +88,13 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+#define callback_param(queue, id) \
+	(queue->pending_grants[id].callback_struct)
+
 struct grant {
 	grant_ref_t ref;
 	struct page *page;
+	struct ubuf_info callback_struct;
 };
 
 struct netfront_stats {
@@ -146,6 +151,21 @@ struct netfront_queue {
 	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
 	grant_ref_t gref_rx_head;
 	struct grant grant_rx[NET_RX_RING_SIZE];
+
+	/* Store the grants inflight or freed.
+	 * Only used when persistent grants are enabled
+	 */
+	struct grant pending_grants[NET_RX_RING_SIZE];
+	/* Ring containing the indexes of the free grants */
+	u16 pending_ring[NET_RX_RING_SIZE];
+	unsigned pending_cons;
+	unsigned pending_prod;
+	/* Used to represent how many grants are still inflight */
+	unsigned pending_event;
+
+	/* Protects zerocopy callbacks to race over pending_ring */
+	spinlock_t callback_lock;
+	atomic_t inflight_packets;
 };
 
 struct netfront_info {
@@ -296,6 +316,50 @@ static void release_grant(grant_ref_t ref,
 	gnttab_release_grant_reference(gref_head, ref);
 }
 
+static struct grant *xennet_get_pending_gnt(struct netfront_queue *queue,
+					    unsigned ri)
+{
+	int pending_idx = xennet_rxidx(ri);
+	u16 id = queue->pending_ring[pending_idx];
+
+	return &queue->pending_grants[id];
+}
+
+static void xennet_set_pending_gnt(struct netfront_queue *queue,
+				   grant_ref_t ref, struct sk_buff *skb)
+{
+	int i = xennet_rxidx(queue->pending_event++);
+	struct grant *gnt = &queue->pending_grants[i];
+
+	gnt->ref = ref;
+	gnt->page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
+	NETFRONT_SKB_CB(skb)->pending_idx = gnt->callback_struct.desc;
+}
+
+static bool pending_grant_available(struct netfront_queue *queue)
+{
+	return (queue->pending_prod - queue->pending_cons);
+}
+
+static struct page *xennet_alloc_page(struct netfront_queue *queue,
+				      struct netfront_cb *cb)
+{
+	struct page *page;
+	struct grant *gnt;
+
+	if (!queue->info->feature_persistent)
+		return alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+
+	if (unlikely(!pending_grant_available(queue)))
+		return NULL;
+
+	gnt = xennet_get_pending_gnt(queue, queue->pending_cons++);
+	cb->pending_idx = gnt - queue->pending_grants;
+	page = gnt->page;
+	gnt->page = NULL;
+	return page;
+}
+
 static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 {
 	struct sk_buff *skb;
@@ -307,7 +371,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 	if (unlikely(!skb))
 		return NULL;
 
-	page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+	page = xennet_alloc_page(queue, NETFRONT_SKB_CB(skb));
 	if (!page) {
 		kfree_skb(skb);
 		return NULL;
@@ -317,6 +381,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 	/* Align ip header to a 16 bytes boundary */
 	skb_reserve(skb, NET_IP_ALIGN);
 	skb->dev = queue->info->netdev;
+	skb_shinfo(skb)->destructor_arg = NULL;
 
 	return skb;
 }
@@ -324,6 +389,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 
 static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 {
+	bool use_persistent_gnts = queue->info->feature_persistent;
 	RING_IDX req_prod = queue->rx.req_prod_pvt;
 	int notify;
 
@@ -343,16 +409,24 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		if (!skb)
 			break;
 
+		ref = GRANT_INVALID_REF;
+		if (use_persistent_gnts) {
+			id = NETFRONT_SKB_CB(skb)->pending_idx;
+			ref = queue->pending_grants[id].ref;
+			queue->pending_grants[id].ref = GRANT_INVALID_REF;
+		}
+
 		id = xennet_rxidx(req_prod);
 
 		BUG_ON(queue->rx_skbs[id]);
 		queue->rx_skbs[id] = skb;
 
 		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
-		ref = claim_grant(page,
-				  &queue->gref_rx_head,
-				  queue->info->xbdev->otherend_id,
-				  0);
+		if (ref == GRANT_INVALID_REF)
+			ref = claim_grant(page,
+					  &queue->gref_rx_head,
+					  queue->info->xbdev->otherend_id,
+					  0);
 
 		queue->grant_rx[id].ref = ref;
 
@@ -771,6 +845,10 @@ static int xennet_get_responses(struct netfront_queue *queue,
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int slots = 1;
 	int err = 0;
+	bool use_persistent_gnts = queue->info->feature_persistent;
+
+	if (use_persistent_gnts)
+		xennet_set_pending_gnt(queue, ref, skb);
 
 	if (rx->flags & XEN_NETRXF_extra_info) {
 		err = xennet_get_extras(queue, extras, rp);
@@ -801,10 +879,11 @@ static int xennet_get_responses(struct netfront_queue *queue,
 			goto next;
 		}
 
-		release_grant(ref,
-			      &queue->gref_rx_head,
-			      queue->info->xbdev->otherend_id,
-			      0);
+		if (!use_persistent_gnts)
+			release_grant(ref,
+				      &queue->gref_rx_head,
+				      queue->info->xbdev->otherend_id,
+				      0);
 
 		__skb_queue_tail(list, skb);
 
@@ -822,6 +901,8 @@ next:
 		rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
 		skb = xennet_get_rx_skb(queue, cons + slots);
 		ref = xennet_get_rx_ref(queue, cons + slots);
+		if (use_persistent_gnts)
+			xennet_set_pending_gnt(queue, ref, skb);
 		slots++;
 	}
 
@@ -866,6 +947,50 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
 	return 0;
 }
 
+static void xennet_zerocopy_prepare(struct netfront_queue *queue,
+				    struct sk_buff *skb)
+{
+	skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+	atomic_inc(&queue->inflight_packets);
+}
+
+static void xennet_zerocopy_complete(struct netfront_queue *queue)
+{
+	atomic_dec(&queue->inflight_packets);
+}
+
+static inline struct netfront_queue *ubuf_to_queue(const struct ubuf_info *ubuf)
+{
+	u16 pending_idx = ubuf->desc;
+	struct grant *tmp =
+		container_of(ubuf, struct grant, callback_struct);
+	return container_of(tmp - pending_idx,
+			    struct netfront_queue,
+			    pending_grants[0]);
+}
+
+static void xennet_zerocopy_callback(struct ubuf_info *ubuf,
+				     bool zerocopy_success)
+{
+	struct netfront_queue *queue = ubuf_to_queue(ubuf);
+	unsigned long flags;
+
+	spin_lock_irqsave(&queue->callback_lock, flags);
+	do {
+		int index = xennet_rxidx(queue->pending_prod++);
+
+		BUG_ON(queue->pending_prod - queue->pending_cons
+				>= NET_RX_RING_SIZE);
+		queue->pending_ring[index] = ubuf->desc;
+		ubuf = (struct ubuf_info *)ubuf->ctx;
+	} while (ubuf);
+	spin_unlock_irqrestore(&queue->callback_lock, flags);
+
+	BUG_ON(queue->pending_prod > queue->pending_event);
+
+	xennet_zerocopy_complete(queue);
+}
+
 static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 				  struct sk_buff *skb,
 				  struct sk_buff_head *list)
@@ -873,6 +998,9 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	RING_IDX cons = queue->rx.rsp_cons;
 	struct sk_buff *nskb;
+	bool use_persistent_gnts = queue->info->feature_persistent;
+	u16 prev_pending_idx = NETFRONT_SKB_CB(skb)->pending_idx;
+	u16 pending_idx;
 
 	while ((nskb = __skb_dequeue(list))) {
 		struct xen_netif_rx_response *rx =
@@ -887,6 +1015,16 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 		}
 		BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
 
+		/* Chain it to the previous */
+		if (use_persistent_gnts) {
+			pending_idx = NETFRONT_SKB_CB(nskb)->pending_idx;
+			callback_param(queue, prev_pending_idx).ctx =
+					&callback_param(queue, pending_idx);
+			callback_param(queue, pending_idx).ctx = NULL;
+			prev_pending_idx = pending_idx;
+			get_page(skb_frag_page(nfrag));
+		}
+
 		skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag),
 				rx->offset, rx->status, PAGE_SIZE);
 
@@ -939,6 +1077,9 @@ static int handle_incoming_queue(struct netfront_queue *queue,
 		skb_reset_network_header(skb);
 
 		if (checksum_setup(queue->info->netdev, skb)) {
+			if (skb_shinfo(skb)->destructor_arg)
+				xennet_zerocopy_prepare(queue, skb);
+
 			kfree_skb(skb);
 			packets_dropped++;
 			queue->info->netdev->stats.rx_errors++;
@@ -950,8 +1091,11 @@ static int handle_incoming_queue(struct netfront_queue *queue,
 		rx_stats->bytes += skb->len;
 		u64_stats_update_end(&rx_stats->syncp);
 
+		if (skb_shinfo(skb)->destructor_arg)
+			xennet_zerocopy_prepare(queue, skb);
+
 		/* Pass it up. */
-		napi_gro_receive(&queue->napi, skb);
+		netif_receive_skb(skb);
 	}
 
 	return packets_dropped;
@@ -1015,6 +1159,16 @@ err:
 		if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD)
 			NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD;
 
+		if (queue->info->feature_persistent) {
+			u16 pending_idx;
+
+			pending_idx = NETFRONT_SKB_CB(skb)->pending_idx;
+			callback_param(queue, pending_idx).ctx = NULL;
+			skb_shinfo(skb)->destructor_arg =
+				&callback_param(queue, pending_idx);
+			get_page(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		}
+
 		skb_shinfo(skb)->frags[0].page_offset = rx->offset;
 		skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status);
 		skb->data_len = rx->status;
@@ -1158,6 +1312,25 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
 	spin_unlock_bh(&queue->rx_lock);
 }
 
+static void xennet_release_pending(struct netfront_queue *queue)
+{
+	RING_IDX i;
+
+	for (i = queue->pending_prod;
+	     i < queue->pending_event; i++) {
+		struct grant *gnt = xennet_get_pending_gnt(queue, i);
+		struct page *page = gnt->page;
+
+		if (gnt->ref == GRANT_INVALID_REF)
+			continue;
+
+		get_page(page);
+		gnttab_end_foreign_access(gnt->ref, 0,
+					  (unsigned long)page_address(page));
+		gnt->ref = GRANT_INVALID_REF;
+	}
+}
+
 static netdev_features_t xennet_fix_features(struct net_device *dev,
 	netdev_features_t features)
 {
@@ -1407,6 +1580,9 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 
 		xennet_release_tx_bufs(queue);
 		xennet_release_rx_bufs(queue);
+		if (queue->info->feature_persistent)
+			xennet_release_pending(queue);
+
 		gnttab_free_grant_references(queue->gref_tx_head);
 		gnttab_free_grant_references(queue->gref_rx_head);
 
@@ -1609,6 +1785,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
 
 	spin_lock_init(&queue->tx_lock);
 	spin_lock_init(&queue->rx_lock);
+	spin_lock_init(&queue->callback_lock);
 
 	init_timer(&queue->rx_refill_timer);
 	queue->rx_refill_timer.data = (unsigned long)queue;
@@ -1633,8 +1810,23 @@ static int xennet_init_queue(struct netfront_queue *queue)
 		queue->rx_skbs[i] = NULL;
 		queue->grant_rx[i].ref = GRANT_INVALID_REF;
 		queue->grant_rx[i].page = NULL;
+
+		if (!queue->info->feature_persistent)
+			continue;
+
+		queue->pending_grants[i].ref = GRANT_INVALID_REF;
+		queue->pending_grants[i].page = alloc_page(GFP_NOIO);
+		queue->pending_grants[i].callback_struct = (struct ubuf_info)
+			{ .callback = xennet_zerocopy_callback,
+			  .ctx = NULL,
+			  .desc = (unsigned long)i };
+
+		queue->pending_ring[i] = i;
+		queue->pending_prod++;
 	}
 
+	queue->pending_event = queue->pending_prod;
+
 	/* A grant for every tx ring slot */
 	if (gnttab_alloc_grant_references(NET_TX_RING_SIZE,
 					  &queue->gref_tx_head) < 0) {
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* [RFC PATCH 13/13] xen-netfront: implement RX persistent grants
@ 2015-05-12 17:18   ` Joao Martins
  0 siblings, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-12 17:18 UTC (permalink / raw)
  To: xen-devel, netdev
  Cc: wei.liu2, ian.campbell, Joao Martins, david.vrabel, boris.ostrovsky

It allows a newly allocated skb to reuse the gref taken from the
pending_ring, which means xennet will grant the pages once and release
them only when freeing the device. It changes how netfront handles news
skbs to be able to reuse the allocated pages similarly to how netback
is already doing for the netback TX path.

alloc_rx_buffers() will consume pages from the pending_ring to
allocate new skbs. When responses are handled we will move the grants
from the grant_rx to the pending_grants. The latter is a shadow ring
that keeps all grants belonging to inflight skbs. Finally chaining
all skbs ubuf_info together to finally pass the packet up to the
network stack. We make use of SKBTX_DEV_ZEROCOPY to get notified
once the skb is freed to be able to reuse pages. On the destructor
callback we will then add the grant to the pending_ring.

The only catch about this approach is: when we orphan frags, there
will be a memcpy on skb_copy_ubufs() (if frags bigger than 0).
Depending on the CPU and number of queues this leads to a performance
drop of between 7-11%. For this reason, SKBTX_DEV_ZEROCOPY skbs will
only be used with persistent grants.

Signed-off-by: Joao Martins <joao.martins@neclab.eu>
---
 drivers/net/xen-netfront.c | 212 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 202 insertions(+), 10 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ae0a13b..7067bbb 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -67,6 +67,7 @@ static const struct ethtool_ops xennet_ethtool_ops;
 
 struct netfront_cb {
 	int pull_to;
+	u16 pending_idx;
 };
 
 #define NETFRONT_SKB_CB(skb)	((struct netfront_cb *)((skb)->cb))
@@ -87,9 +88,13 @@ struct netfront_cb {
 /* IRQ name is queue name with "-tx" or "-rx" appended */
 #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
 
+#define callback_param(queue, id) \
+	(queue->pending_grants[id].callback_struct)
+
 struct grant {
 	grant_ref_t ref;
 	struct page *page;
+	struct ubuf_info callback_struct;
 };
 
 struct netfront_stats {
@@ -146,6 +151,21 @@ struct netfront_queue {
 	struct sk_buff *rx_skbs[NET_RX_RING_SIZE];
 	grant_ref_t gref_rx_head;
 	struct grant grant_rx[NET_RX_RING_SIZE];
+
+	/* Store the grants inflight or freed.
+	 * Only used when persistent grants are enabled
+	 */
+	struct grant pending_grants[NET_RX_RING_SIZE];
+	/* Ring containing the indexes of the free grants */
+	u16 pending_ring[NET_RX_RING_SIZE];
+	unsigned pending_cons;
+	unsigned pending_prod;
+	/* Used to represent how many grants are still inflight */
+	unsigned pending_event;
+
+	/* Protects zerocopy callbacks to race over pending_ring */
+	spinlock_t callback_lock;
+	atomic_t inflight_packets;
 };
 
 struct netfront_info {
@@ -296,6 +316,50 @@ static void release_grant(grant_ref_t ref,
 	gnttab_release_grant_reference(gref_head, ref);
 }
 
+static struct grant *xennet_get_pending_gnt(struct netfront_queue *queue,
+					    unsigned ri)
+{
+	int pending_idx = xennet_rxidx(ri);
+	u16 id = queue->pending_ring[pending_idx];
+
+	return &queue->pending_grants[id];
+}
+
+static void xennet_set_pending_gnt(struct netfront_queue *queue,
+				   grant_ref_t ref, struct sk_buff *skb)
+{
+	int i = xennet_rxidx(queue->pending_event++);
+	struct grant *gnt = &queue->pending_grants[i];
+
+	gnt->ref = ref;
+	gnt->page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
+	NETFRONT_SKB_CB(skb)->pending_idx = gnt->callback_struct.desc;
+}
+
+static bool pending_grant_available(struct netfront_queue *queue)
+{
+	return (queue->pending_prod - queue->pending_cons);
+}
+
+static struct page *xennet_alloc_page(struct netfront_queue *queue,
+				      struct netfront_cb *cb)
+{
+	struct page *page;
+	struct grant *gnt;
+
+	if (!queue->info->feature_persistent)
+		return alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+
+	if (unlikely(!pending_grant_available(queue)))
+		return NULL;
+
+	gnt = xennet_get_pending_gnt(queue, queue->pending_cons++);
+	cb->pending_idx = gnt - queue->pending_grants;
+	page = gnt->page;
+	gnt->page = NULL;
+	return page;
+}
+
 static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 {
 	struct sk_buff *skb;
@@ -307,7 +371,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 	if (unlikely(!skb))
 		return NULL;
 
-	page = alloc_page(GFP_ATOMIC | __GFP_NOWARN);
+	page = xennet_alloc_page(queue, NETFRONT_SKB_CB(skb));
 	if (!page) {
 		kfree_skb(skb);
 		return NULL;
@@ -317,6 +381,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 	/* Align ip header to a 16 bytes boundary */
 	skb_reserve(skb, NET_IP_ALIGN);
 	skb->dev = queue->info->netdev;
+	skb_shinfo(skb)->destructor_arg = NULL;
 
 	return skb;
 }
@@ -324,6 +389,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
 
 static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 {
+	bool use_persistent_gnts = queue->info->feature_persistent;
 	RING_IDX req_prod = queue->rx.req_prod_pvt;
 	int notify;
 
@@ -343,16 +409,24 @@ static void xennet_alloc_rx_buffers(struct netfront_queue *queue)
 		if (!skb)
 			break;
 
+		ref = GRANT_INVALID_REF;
+		if (use_persistent_gnts) {
+			id = NETFRONT_SKB_CB(skb)->pending_idx;
+			ref = queue->pending_grants[id].ref;
+			queue->pending_grants[id].ref = GRANT_INVALID_REF;
+		}
+
 		id = xennet_rxidx(req_prod);
 
 		BUG_ON(queue->rx_skbs[id]);
 		queue->rx_skbs[id] = skb;
 
 		page = skb_frag_page(&skb_shinfo(skb)->frags[0]);
-		ref = claim_grant(page,
-				  &queue->gref_rx_head,
-				  queue->info->xbdev->otherend_id,
-				  0);
+		if (ref == GRANT_INVALID_REF)
+			ref = claim_grant(page,
+					  &queue->gref_rx_head,
+					  queue->info->xbdev->otherend_id,
+					  0);
 
 		queue->grant_rx[id].ref = ref;
 
@@ -771,6 +845,10 @@ static int xennet_get_responses(struct netfront_queue *queue,
 	int max = MAX_SKB_FRAGS + (rx->status <= RX_COPY_THRESHOLD);
 	int slots = 1;
 	int err = 0;
+	bool use_persistent_gnts = queue->info->feature_persistent;
+
+	if (use_persistent_gnts)
+		xennet_set_pending_gnt(queue, ref, skb);
 
 	if (rx->flags & XEN_NETRXF_extra_info) {
 		err = xennet_get_extras(queue, extras, rp);
@@ -801,10 +879,11 @@ static int xennet_get_responses(struct netfront_queue *queue,
 			goto next;
 		}
 
-		release_grant(ref,
-			      &queue->gref_rx_head,
-			      queue->info->xbdev->otherend_id,
-			      0);
+		if (!use_persistent_gnts)
+			release_grant(ref,
+				      &queue->gref_rx_head,
+				      queue->info->xbdev->otherend_id,
+				      0);
 
 		__skb_queue_tail(list, skb);
 
@@ -822,6 +901,8 @@ next:
 		rx = RING_GET_RESPONSE(&queue->rx, cons + slots);
 		skb = xennet_get_rx_skb(queue, cons + slots);
 		ref = xennet_get_rx_ref(queue, cons + slots);
+		if (use_persistent_gnts)
+			xennet_set_pending_gnt(queue, ref, skb);
 		slots++;
 	}
 
@@ -866,6 +947,50 @@ static int xennet_set_skb_gso(struct sk_buff *skb,
 	return 0;
 }
 
+static void xennet_zerocopy_prepare(struct netfront_queue *queue,
+				    struct sk_buff *skb)
+{
+	skb_shinfo(skb)->tx_flags |= SKBTX_DEV_ZEROCOPY;
+	atomic_inc(&queue->inflight_packets);
+}
+
+static void xennet_zerocopy_complete(struct netfront_queue *queue)
+{
+	atomic_dec(&queue->inflight_packets);
+}
+
+static inline struct netfront_queue *ubuf_to_queue(const struct ubuf_info *ubuf)
+{
+	u16 pending_idx = ubuf->desc;
+	struct grant *tmp =
+		container_of(ubuf, struct grant, callback_struct);
+	return container_of(tmp - pending_idx,
+			    struct netfront_queue,
+			    pending_grants[0]);
+}
+
+static void xennet_zerocopy_callback(struct ubuf_info *ubuf,
+				     bool zerocopy_success)
+{
+	struct netfront_queue *queue = ubuf_to_queue(ubuf);
+	unsigned long flags;
+
+	spin_lock_irqsave(&queue->callback_lock, flags);
+	do {
+		int index = xennet_rxidx(queue->pending_prod++);
+
+		BUG_ON(queue->pending_prod - queue->pending_cons
+				>= NET_RX_RING_SIZE);
+		queue->pending_ring[index] = ubuf->desc;
+		ubuf = (struct ubuf_info *)ubuf->ctx;
+	} while (ubuf);
+	spin_unlock_irqrestore(&queue->callback_lock, flags);
+
+	BUG_ON(queue->pending_prod > queue->pending_event);
+
+	xennet_zerocopy_complete(queue);
+}
+
 static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 				  struct sk_buff *skb,
 				  struct sk_buff_head *list)
@@ -873,6 +998,9 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 	struct skb_shared_info *shinfo = skb_shinfo(skb);
 	RING_IDX cons = queue->rx.rsp_cons;
 	struct sk_buff *nskb;
+	bool use_persistent_gnts = queue->info->feature_persistent;
+	u16 prev_pending_idx = NETFRONT_SKB_CB(skb)->pending_idx;
+	u16 pending_idx;
 
 	while ((nskb = __skb_dequeue(list))) {
 		struct xen_netif_rx_response *rx =
@@ -887,6 +1015,16 @@ static RING_IDX xennet_fill_frags(struct netfront_queue *queue,
 		}
 		BUG_ON(shinfo->nr_frags >= MAX_SKB_FRAGS);
 
+		/* Chain it to the previous */
+		if (use_persistent_gnts) {
+			pending_idx = NETFRONT_SKB_CB(nskb)->pending_idx;
+			callback_param(queue, prev_pending_idx).ctx =
+					&callback_param(queue, pending_idx);
+			callback_param(queue, pending_idx).ctx = NULL;
+			prev_pending_idx = pending_idx;
+			get_page(skb_frag_page(nfrag));
+		}
+
 		skb_add_rx_frag(skb, shinfo->nr_frags, skb_frag_page(nfrag),
 				rx->offset, rx->status, PAGE_SIZE);
 
@@ -939,6 +1077,9 @@ static int handle_incoming_queue(struct netfront_queue *queue,
 		skb_reset_network_header(skb);
 
 		if (checksum_setup(queue->info->netdev, skb)) {
+			if (skb_shinfo(skb)->destructor_arg)
+				xennet_zerocopy_prepare(queue, skb);
+
 			kfree_skb(skb);
 			packets_dropped++;
 			queue->info->netdev->stats.rx_errors++;
@@ -950,8 +1091,11 @@ static int handle_incoming_queue(struct netfront_queue *queue,
 		rx_stats->bytes += skb->len;
 		u64_stats_update_end(&rx_stats->syncp);
 
+		if (skb_shinfo(skb)->destructor_arg)
+			xennet_zerocopy_prepare(queue, skb);
+
 		/* Pass it up. */
-		napi_gro_receive(&queue->napi, skb);
+		netif_receive_skb(skb);
 	}
 
 	return packets_dropped;
@@ -1015,6 +1159,16 @@ err:
 		if (NETFRONT_SKB_CB(skb)->pull_to > RX_COPY_THRESHOLD)
 			NETFRONT_SKB_CB(skb)->pull_to = RX_COPY_THRESHOLD;
 
+		if (queue->info->feature_persistent) {
+			u16 pending_idx;
+
+			pending_idx = NETFRONT_SKB_CB(skb)->pending_idx;
+			callback_param(queue, pending_idx).ctx = NULL;
+			skb_shinfo(skb)->destructor_arg =
+				&callback_param(queue, pending_idx);
+			get_page(skb_frag_page(&skb_shinfo(skb)->frags[0]));
+		}
+
 		skb_shinfo(skb)->frags[0].page_offset = rx->offset;
 		skb_frag_size_set(&skb_shinfo(skb)->frags[0], rx->status);
 		skb->data_len = rx->status;
@@ -1158,6 +1312,25 @@ static void xennet_release_rx_bufs(struct netfront_queue *queue)
 	spin_unlock_bh(&queue->rx_lock);
 }
 
+static void xennet_release_pending(struct netfront_queue *queue)
+{
+	RING_IDX i;
+
+	for (i = queue->pending_prod;
+	     i < queue->pending_event; i++) {
+		struct grant *gnt = xennet_get_pending_gnt(queue, i);
+		struct page *page = gnt->page;
+
+		if (gnt->ref == GRANT_INVALID_REF)
+			continue;
+
+		get_page(page);
+		gnttab_end_foreign_access(gnt->ref, 0,
+					  (unsigned long)page_address(page));
+		gnt->ref = GRANT_INVALID_REF;
+	}
+}
+
 static netdev_features_t xennet_fix_features(struct net_device *dev,
 	netdev_features_t features)
 {
@@ -1407,6 +1580,9 @@ static void xennet_disconnect_backend(struct netfront_info *info)
 
 		xennet_release_tx_bufs(queue);
 		xennet_release_rx_bufs(queue);
+		if (queue->info->feature_persistent)
+			xennet_release_pending(queue);
+
 		gnttab_free_grant_references(queue->gref_tx_head);
 		gnttab_free_grant_references(queue->gref_rx_head);
 
@@ -1609,6 +1785,7 @@ static int xennet_init_queue(struct netfront_queue *queue)
 
 	spin_lock_init(&queue->tx_lock);
 	spin_lock_init(&queue->rx_lock);
+	spin_lock_init(&queue->callback_lock);
 
 	init_timer(&queue->rx_refill_timer);
 	queue->rx_refill_timer.data = (unsigned long)queue;
@@ -1633,8 +1810,23 @@ static int xennet_init_queue(struct netfront_queue *queue)
 		queue->rx_skbs[i] = NULL;
 		queue->grant_rx[i].ref = GRANT_INVALID_REF;
 		queue->grant_rx[i].page = NULL;
+
+		if (!queue->info->feature_persistent)
+			continue;
+
+		queue->pending_grants[i].ref = GRANT_INVALID_REF;
+		queue->pending_grants[i].page = alloc_page(GFP_NOIO);
+		queue->pending_grants[i].callback_struct = (struct ubuf_info)
+			{ .callback = xennet_zerocopy_callback,
+			  .ctx = NULL,
+			  .desc = (unsigned long)i };
+
+		queue->pending_ring[i] = i;
+		queue->pending_prod++;
 	}
 
+	queue->pending_event = queue->pending_prod;
+
 	/* A grant for every tx ring slot */
 	if (gnttab_alloc_grant_references(NET_TX_RING_SIZE,
 					  &queue->gref_tx_head) < 0) {
-- 
2.1.3

^ permalink raw reply related	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-12 17:18 ` Joao Martins
                   ` (13 preceding siblings ...)
  (?)
@ 2015-05-13 10:50 ` David Vrabel
  2015-05-13 13:01   ` Joao Martins
  2015-05-13 13:01   ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-13 10:50 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: wei.liu2, ian.campbell, david.vrabel, boris.ostrovsky

On 12/05/15 18:18, Joao Martins wrote:
> 
> Packet I/O Tests:
> 
> Measured on a Intel Xeon E5-1650 v2, Xen 4.5, no HT. Used pktgen "burst 1"
> and "clone_skbs 100000" (to avoid alloc skb overheads) with various pkt
> sizes. All tests are DomU <-> Dom0, unless specified otherwise.

Are all these measurements with a single domU with a single VIF?

The biggest problem with a persistent grant method is the amount of
grant table and maptrack resources it requires.  How well does this
scale to 1000s of VIFs?

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-12 17:18 ` Joao Martins
                   ` (14 preceding siblings ...)
  (?)
@ 2015-05-13 10:50 ` David Vrabel
  -1 siblings, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-13 10:50 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: boris.ostrovsky, wei.liu2, ian.campbell, david.vrabel

On 12/05/15 18:18, Joao Martins wrote:
> 
> Packet I/O Tests:
> 
> Measured on a Intel Xeon E5-1650 v2, Xen 4.5, no HT. Used pktgen "burst 1"
> and "clone_skbs 100000" (to avoid alloc skb overheads) with various pkt
> sizes. All tests are DomU <-> Dom0, unless specified otherwise.

Are all these measurements with a single domU with a single VIF?

The biggest problem with a persistent grant method is the amount of
grant table and maptrack resources it requires.  How well does this
scale to 1000s of VIFs?

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-13 10:50 ` [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers David Vrabel
@ 2015-05-13 13:01   ` Joao Martins
  2015-05-13 13:01   ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-13 13:01 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, netdev, wei.liu2, ian.campbell, boris.ostrovsky


On 13 May 2015, at 12:50, David Vrabel <david.vrabel@citrix.com> wrote:

> On 12/05/15 18:18, Joao Martins wrote:
>> 
>> Packet I/O Tests:
>> 
>> Measured on a Intel Xeon E5-1650 v2, Xen 4.5, no HT. Used pktgen "burst 1"
>> and "clone_skbs 100000" (to avoid alloc skb overheads) with various pkt
>> sizes. All tests are DomU <-> Dom0, unless specified otherwise.
> 
> Are all these measurements with a single domU with a single VIF?
> 
> The biggest problem with a persistent grant method is the amount of
> grant table and maptrack resources it requires.  How well does this
> scale to 1000s of VIFs?

Correct. I was more focused on throughput benefits with persistent grants,
as opposed to scalability with a large number of guests. I will do more tests
with more guests and provide you the numbers. Most likely it won't scale to
that numbers of VIFs, given that the maptrack size increases much quicker 
(nr_vifs * 512 * nr_queues grants mapped) thus I also added the option of not
exposing "feature-persistent" if xen-netback.max_persistent_gnts module param is
set to 0. I am aware of these issues with persistent grants, but the case I had
in mind, was fewer VMs with higher throughput, which I believe to be the compromise
with persistent grants.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-13 10:50 ` [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers David Vrabel
  2015-05-13 13:01   ` Joao Martins
@ 2015-05-13 13:01   ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-13 13:01 UTC (permalink / raw)
  To: David Vrabel; +Cc: xen-devel, boris.ostrovsky, wei.liu2, ian.campbell, netdev


On 13 May 2015, at 12:50, David Vrabel <david.vrabel@citrix.com> wrote:

> On 12/05/15 18:18, Joao Martins wrote:
>> 
>> Packet I/O Tests:
>> 
>> Measured on a Intel Xeon E5-1650 v2, Xen 4.5, no HT. Used pktgen "burst 1"
>> and "clone_skbs 100000" (to avoid alloc skb overheads) with various pkt
>> sizes. All tests are DomU <-> Dom0, unless specified otherwise.
> 
> Are all these measurements with a single domU with a single VIF?
> 
> The biggest problem with a persistent grant method is the amount of
> grant table and maptrack resources it requires.  How well does this
> scale to 1000s of VIFs?

Correct. I was more focused on throughput benefits with persistent grants,
as opposed to scalability with a large number of guests. I will do more tests
with more guests and provide you the numbers. Most likely it won't scale to
that numbers of VIFs, given that the maptrack size increases much quicker 
(nr_vifs * 512 * nr_queues grants mapped) thus I also added the option of not
exposing "feature-persistent" if xen-netback.max_persistent_gnts module param is
set to 0. I am aware of these issues with persistent grants, but the case I had
in mind, was fewer VMs with higher throughput, which I believe to be the compromise
with persistent grants.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-18 15:44   ` David Vrabel
  2015-05-19 10:19     ` Joao Martins
  2015-05-19 10:19     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:44 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: wei.liu2, ian.campbell, david.vrabel, boris.ostrovsky

On 12/05/15 18:18, Joao Martins wrote:
> Refactors a little bit how grants are stored by moving
> grant_rx_ref/grant_tx_ref and grant_tx_page to its
> own structure, namely struct grant.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Although...

> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -87,6 +87,11 @@ struct netfront_cb {
>  /* IRQ name is queue name with "-tx" or "-rx" appended */
>  #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>  
> +struct grant {
> +	grant_ref_t ref;
> +	struct page *page;
> +};

Is this sort of structure (and the following patch) useful for other
frontends?

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-18 15:44   ` David Vrabel
  -1 siblings, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:44 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: boris.ostrovsky, wei.liu2, ian.campbell, david.vrabel

On 12/05/15 18:18, Joao Martins wrote:
> Refactors a little bit how grants are stored by moving
> grant_rx_ref/grant_tx_ref and grant_tx_page to its
> own structure, namely struct grant.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

Although...

> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -87,6 +87,11 @@ struct netfront_cb {
>  /* IRQ name is queue name with "-tx" or "-rx" appended */
>  #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>  
> +struct grant {
> +	grant_ref_t ref;
> +	struct page *page;
> +};

Is this sort of structure (and the following patch) useful for other
frontends?

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 10/13] xen-netfront: refactor claim/release grant
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-18 15:48   ` David Vrabel
  2015-05-19 10:19     ` Joao Martins
  2015-05-19 10:19     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:48 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: wei.liu2, ian.campbell, david.vrabel, boris.ostrovsky

On 12/05/15 18:18, Joao Martins wrote:
> Refactors how grants are claimed/released/revoked by moving that code
> into claim_grant and release_grant helpers routines that can be shared
> in both TX/RX path.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

But should this be generic?  Is it useful to other frontends?  And some
of the line splitting looks a bit odd.

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 10/13] xen-netfront: refactor claim/release grant
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-18 15:48   ` David Vrabel
  -1 siblings, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:48 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: boris.ostrovsky, wei.liu2, ian.campbell, david.vrabel

On 12/05/15 18:18, Joao Martins wrote:
> Refactors how grants are claimed/released/revoked by moving that code
> into claim_grant and release_grant helpers routines that can be shared
> in both TX/RX path.

Reviewed-by: David Vrabel <david.vrabel@citrix.com>

But should this be generic?  Is it useful to other frontends?  And some
of the line splitting looks a bit odd.

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-18 15:51   ` David Vrabel
  2015-05-19 10:19     ` Joao Martins
  2015-05-19 10:19     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:51 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: wei.liu2, ian.campbell, david.vrabel, boris.ostrovsky

On 12/05/15 18:18, Joao Martins wrote:
> "feature-persistent" check on xenbus for persistent grants
> support on the backend.

You can't expose/check for this feature until you actually support it.
This should probably be the last patch.

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-18 15:51   ` David Vrabel
  -1 siblings, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:51 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: boris.ostrovsky, wei.liu2, ian.campbell, david.vrabel

On 12/05/15 18:18, Joao Martins wrote:
> "feature-persistent" check on xenbus for persistent grants
> support on the backend.

You can't expose/check for this feature until you actually support it.
This should probably be the last patch.

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-18 15:55   ` David Vrabel
  2015-05-19 10:20     ` Joao Martins
  2015-05-19 10:20     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:55 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: wei.liu2, ian.campbell, david.vrabel, boris.ostrovsky

On 12/05/15 18:18, Joao Martins wrote:
> Instead of grant/revoking the buffer related to the skb, it will use
> an already granted page and memcpy  to it. The grants will be mapped
> by xen-netback and reused overtime, but only unmapped when the vif
> disconnects, as opposed to every packet.
> 
> This only happens if the backend supports persistent grants since it
> would, otherwise, introduce the overhead of a memcpy on top of the
> grant map.
[...]
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
[...]
> @@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
>  	for (i = 0; i < NET_TX_RING_SIZE; i++) {
>  		skb_entry_set_link(&queue->tx_skbs[i], i+1);
>  		queue->grant_tx[i].ref = GRANT_INVALID_REF;
> -		queue->grant_tx[i].page = NULL;
> +		if (queue->info->feature_persistent)
> +			queue->grant_tx[i].page = alloc_page(GFP_NOIO);

Need to check for alloc failure here and unwind correctly?

Why NOIO?

> +		else
> +			queue->grant_tx[i].page = NULL;
>  	}
>  
>  	/* Clear out rx_skbs */
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-18 15:55   ` David Vrabel
  -1 siblings, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 15:55 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: boris.ostrovsky, wei.liu2, ian.campbell, david.vrabel

On 12/05/15 18:18, Joao Martins wrote:
> Instead of grant/revoking the buffer related to the skb, it will use
> an already granted page and memcpy  to it. The grants will be mapped
> by xen-netback and reused overtime, but only unmapped when the vif
> disconnects, as opposed to every packet.
> 
> This only happens if the backend supports persistent grants since it
> would, otherwise, introduce the overhead of a memcpy on top of the
> grant map.
[...]
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
[...]
> @@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
>  	for (i = 0; i < NET_TX_RING_SIZE; i++) {
>  		skb_entry_set_link(&queue->tx_skbs[i], i+1);
>  		queue->grant_tx[i].ref = GRANT_INVALID_REF;
> -		queue->grant_tx[i].page = NULL;
> +		if (queue->info->feature_persistent)
> +			queue->grant_tx[i].page = alloc_page(GFP_NOIO);

Need to check for alloc failure here and unwind correctly?

Why NOIO?

> +		else
> +			queue->grant_tx[i].page = NULL;
>  	}
>  
>  	/* Clear out rx_skbs */
> 

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 13/13] xen-netfront: implement RX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-18 16:04   ` David Vrabel
  2015-05-19 10:22     ` Joao Martins
  2015-05-19 10:22     ` [Xen-devel] " Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 16:04 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: wei.liu2, ian.campbell, david.vrabel, boris.ostrovsky

On 12/05/15 18:18, Joao Martins wrote:
> It allows a newly allocated skb to reuse the gref taken from the
> pending_ring, which means xennet will grant the pages once and release
> them only when freeing the device. It changes how netfront handles news
> skbs to be able to reuse the allocated pages similarly to how netback
> is already doing for the netback TX path.
> 
> alloc_rx_buffers() will consume pages from the pending_ring to
> allocate new skbs. When responses are handled we will move the grants
> from the grant_rx to the pending_grants. The latter is a shadow ring
> that keeps all grants belonging to inflight skbs. Finally chaining
> all skbs ubuf_info together to finally pass the packet up to the
> network stack. We make use of SKBTX_DEV_ZEROCOPY to get notified
> once the skb is freed to be able to reuse pages. On the destructor
> callback we will then add the grant to the pending_ring.
> 
> The only catch about this approach is: when we orphan frags, there
> will be a memcpy on skb_copy_ubufs() (if frags bigger than 0).
> Depending on the CPU and number of queues this leads to a performance
> drop of between 7-11%. For this reason, SKBTX_DEV_ZEROCOPY skbs will
> only be used with persistent grants.

This means that skbs are passed further up the stack while they are
still granted to the backend.

I think this makes it too difficult to validate that the backend can't
fiddle with the skb frags inappropriately (both now in the the future
when other changes in the network stack are made).

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 13/13] xen-netfront: implement RX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-18 16:04   ` David Vrabel
  -1 siblings, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-18 16:04 UTC (permalink / raw)
  To: Joao Martins, xen-devel, netdev
  Cc: boris.ostrovsky, wei.liu2, ian.campbell, david.vrabel

On 12/05/15 18:18, Joao Martins wrote:
> It allows a newly allocated skb to reuse the gref taken from the
> pending_ring, which means xennet will grant the pages once and release
> them only when freeing the device. It changes how netfront handles news
> skbs to be able to reuse the allocated pages similarly to how netback
> is already doing for the netback TX path.
> 
> alloc_rx_buffers() will consume pages from the pending_ring to
> allocate new skbs. When responses are handled we will move the grants
> from the grant_rx to the pending_grants. The latter is a shadow ring
> that keeps all grants belonging to inflight skbs. Finally chaining
> all skbs ubuf_info together to finally pass the packet up to the
> network stack. We make use of SKBTX_DEV_ZEROCOPY to get notified
> once the skb is freed to be able to reuse pages. On the destructor
> callback we will then add the grant to the pending_ring.
> 
> The only catch about this approach is: when we orphan frags, there
> will be a memcpy on skb_copy_ubufs() (if frags bigger than 0).
> Depending on the CPU and number of queues this leads to a performance
> drop of between 7-11%. For this reason, SKBTX_DEV_ZEROCOPY skbs will
> only be used with persistent grants.

This means that skbs are passed further up the stack while they are
still granted to the backend.

I think this makes it too difficult to validate that the backend can't
fiddle with the skb frags inappropriately (both now in the the future
when other changes in the network stack are made).

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant
  2015-05-18 15:44   ` [Xen-devel] " David Vrabel
@ 2015-05-19 10:19     ` Joao Martins
  2015-05-19 10:19     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, wei.liu2, ian.campbell, boris.ostrovsky


On 18 May 2015, at 17:44, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> Refactors a little bit how grants are stored by moving
>> grant_rx_ref/grant_tx_ref and grant_tx_page to its
>> own structure, namely struct grant.
> 
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> Although...
> 
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -87,6 +87,11 @@ struct netfront_cb {
>> /* IRQ name is queue name with "-tx" or "-rx" appended */
>> #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>> 
>> +struct grant {
>> +	grant_ref_t ref;
>> +	struct page *page;
>> +};
> 
> Is this sort of structure (and the following patch) useful for other
> frontends?

Perhaps not. It seems that blkfront is the only one that uses similar structure.
Though it creates a struct grant containing an additional struct list_node
field that is used for the free grants list within blkfront. In my case I
extend the struct grant later on the patch  "xen-netfront: implement RX 
persistent grants" to have the struct ubuf_info.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant
  2015-05-18 15:44   ` [Xen-devel] " David Vrabel
  2015-05-19 10:19     ` Joao Martins
@ 2015-05-19 10:19     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, boris.ostrovsky, wei.liu2, ian.campbell


On 18 May 2015, at 17:44, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> Refactors a little bit how grants are stored by moving
>> grant_rx_ref/grant_tx_ref and grant_tx_page to its
>> own structure, namely struct grant.
> 
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> Although...
> 
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
>> @@ -87,6 +87,11 @@ struct netfront_cb {
>> /* IRQ name is queue name with "-tx" or "-rx" appended */
>> #define IRQ_NAME_SIZE (QUEUE_NAME_SIZE + 3)
>> 
>> +struct grant {
>> +	grant_ref_t ref;
>> +	struct page *page;
>> +};
> 
> Is this sort of structure (and the following patch) useful for other
> frontends?

Perhaps not. It seems that blkfront is the only one that uses similar structure.
Though it creates a struct grant containing an additional struct list_node
field that is used for the free grants list within blkfront. In my case I
extend the struct grant later on the patch  "xen-netfront: implement RX 
persistent grants" to have the struct ubuf_info.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 10/13] xen-netfront: refactor claim/release grant
  2015-05-18 15:48   ` [Xen-devel] " David Vrabel
@ 2015-05-19 10:19     ` Joao Martins
  2015-05-19 10:19     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, wei.liu2, ian.campbell, boris.ostrovsky


On 18 May 2015, at 17:48, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> Refactors how grants are claimed/released/revoked by moving that code
>> into claim_grant and release_grant helpers routines that can be shared
>> in both TX/RX path.
> 
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> But should this be generic?  Is it useful to other frontends?  And some
> of the line splitting looks a bit odd.

It looks like scsifront (lines 384-389, 418-423), blkfront (lines 250-257)
and netfront both grant their buffers in the same way, so perhaps these two
helpers could be reused.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 10/13] xen-netfront: refactor claim/release grant
  2015-05-18 15:48   ` [Xen-devel] " David Vrabel
  2015-05-19 10:19     ` Joao Martins
@ 2015-05-19 10:19     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, boris.ostrovsky, wei.liu2, ian.campbell


On 18 May 2015, at 17:48, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> Refactors how grants are claimed/released/revoked by moving that code
>> into claim_grant and release_grant helpers routines that can be shared
>> in both TX/RX path.
> 
> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
> 
> But should this be generic?  Is it useful to other frontends?  And some
> of the line splitting looks a bit odd.

It looks like scsifront (lines 384-389, 418-423), blkfront (lines 250-257)
and netfront both grant their buffers in the same way, so perhaps these two
helpers could be reused.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support
  2015-05-18 15:51   ` [Xen-devel] " David Vrabel
@ 2015-05-19 10:19     ` Joao Martins
  2015-05-19 10:19     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, wei.liu2, ian.campbell, boris.ostrovsky


On 18 May 2015, at 17:51, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> "feature-persistent" check on xenbus for persistent grants
>> support on the backend.
> 
> You can't expose/check for this feature until you actually support it.
> This should probably be the last patch.

Makes sense. Will address this and put it as last.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support
  2015-05-18 15:51   ` [Xen-devel] " David Vrabel
  2015-05-19 10:19     ` Joao Martins
@ 2015-05-19 10:19     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:19 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, boris.ostrovsky, wei.liu2, ian.campbell


On 18 May 2015, at 17:51, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> "feature-persistent" check on xenbus for persistent grants
>> support on the backend.
> 
> You can't expose/check for this feature until you actually support it.
> This should probably be the last patch.

Makes sense. Will address this and put it as last.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-18 15:55   ` [Xen-devel] " David Vrabel
@ 2015-05-19 10:20     ` Joao Martins
  2015-05-19 10:23       ` David Vrabel
  2015-05-19 10:23       ` [Xen-devel] " David Vrabel
  2015-05-19 10:20     ` Joao Martins
  1 sibling, 2 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:20 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, wei.liu2, ian.campbell, boris.ostrovsky


On 18 May 2015, at 17:55, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> Instead of grant/revoking the buffer related to the skb, it will use
>> an already granted page and memcpy  to it. The grants will be mapped
>> by xen-netback and reused overtime, but only unmapped when the vif
>> disconnects, as opposed to every packet.
>> 
>> This only happens if the backend supports persistent grants since it
>> would, otherwise, introduce the overhead of a memcpy on top of the
>> grant map.
> [...]
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
> [...]
>> @@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
>> 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
>> 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
>> 		queue->grant_tx[i].ref = GRANT_INVALID_REF;
>> -		queue->grant_tx[i].page = NULL;
>> +		if (queue->info->feature_persistent)
>> +			queue->grant_tx[i].page = alloc_page(GFP_NOIO);
> 
> Need to check for alloc failure here and unwind correctly?
Sorry, I overlooked this check. I will fix that.

> Why NOIO?
May be I am misusing NOIO where I meant __GFP_WAIT.
Tough given we are under rtnl_lock() perhaps GFP_ATOMIC should be used instead.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-18 15:55   ` [Xen-devel] " David Vrabel
  2015-05-19 10:20     ` Joao Martins
@ 2015-05-19 10:20     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:20 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, boris.ostrovsky, wei.liu2, ian.campbell


On 18 May 2015, at 17:55, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> Instead of grant/revoking the buffer related to the skb, it will use
>> an already granted page and memcpy  to it. The grants will be mapped
>> by xen-netback and reused overtime, but only unmapped when the vif
>> disconnects, as opposed to every packet.
>> 
>> This only happens if the backend supports persistent grants since it
>> would, otherwise, introduce the overhead of a memcpy on top of the
>> grant map.
> [...]
>> --- a/drivers/net/xen-netfront.c
>> +++ b/drivers/net/xen-netfront.c
> [...]
>> @@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
>> 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
>> 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
>> 		queue->grant_tx[i].ref = GRANT_INVALID_REF;
>> -		queue->grant_tx[i].page = NULL;
>> +		if (queue->info->feature_persistent)
>> +			queue->grant_tx[i].page = alloc_page(GFP_NOIO);
> 
> Need to check for alloc failure here and unwind correctly?
Sorry, I overlooked this check. I will fix that.

> Why NOIO?
May be I am misusing NOIO where I meant __GFP_WAIT.
Tough given we are under rtnl_lock() perhaps GFP_ATOMIC should be used instead.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 13/13] xen-netfront: implement RX persistent grants
  2015-05-18 16:04   ` [Xen-devel] " David Vrabel
  2015-05-19 10:22     ` Joao Martins
@ 2015-05-19 10:22     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:22 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, wei.liu2, ian.campbell, boris.ostrovsky


On 18 May 2015, at 18:04, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> It allows a newly allocated skb to reuse the gref taken from the
>> pending_ring, which means xennet will grant the pages once and release
>> them only when freeing the device. It changes how netfront handles news
>> skbs to be able to reuse the allocated pages similarly to how netback
>> is already doing for the netback TX path.
>> 
>> alloc_rx_buffers() will consume pages from the pending_ring to
>> allocate new skbs. When responses are handled we will move the grants
>> from the grant_rx to the pending_grants. The latter is a shadow ring
>> that keeps all grants belonging to inflight skbs. Finally chaining
>> all skbs ubuf_info together to finally pass the packet up to the
>> network stack. We make use of SKBTX_DEV_ZEROCOPY to get notified
>> once the skb is freed to be able to reuse pages. On the destructor
>> callback we will then add the grant to the pending_ring.
>> 
>> The only catch about this approach is: when we orphan frags, there
>> will be a memcpy on skb_copy_ubufs() (if frags bigger than 0).
>> Depending on the CPU and number of queues this leads to a performance
>> drop of between 7-11%. For this reason, SKBTX_DEV_ZEROCOPY skbs will
>> only be used with persistent grants.
> 
> This means that skbs are passed further up the stack while they are
> still granted to the backend.

__pskb_pull_tail copies to skb->data and unref the frag if no data in
frag is remaining pull. When the packet is then delivered to the stack (in 
netif_receive_skb) skb_orphan_frags will be called where it will allocate
pages for frags and memcpy to them (from the granted pages). The zerocopy
callback is then called which then releases the grants. So, in the end the
granted buffers aren't passed up to the protocol stack, but could be the case
that it changes in the future like you said. Would you prefer memcpy explicitly
instead of using SKBTX_DEV_ZEROCOPY?

> I think this makes it too difficult to validate that the backend can't
> fiddle with the skb frags inappropriately (both now in the the future
> when other changes in the network stack are made).

But wouldn't this the case for netback TX as well, since it uses the similar
approach?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 13/13] xen-netfront: implement RX persistent grants
  2015-05-18 16:04   ` [Xen-devel] " David Vrabel
@ 2015-05-19 10:22     ` Joao Martins
  2015-05-19 10:22     ` [Xen-devel] " Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-19 10:22 UTC (permalink / raw)
  To: David Vrabel, xen-devel; +Cc: netdev, boris.ostrovsky, wei.liu2, ian.campbell


On 18 May 2015, at 18:04, David Vrabel <david.vrabel@citrix.com> wrote:
> On 12/05/15 18:18, Joao Martins wrote:
>> It allows a newly allocated skb to reuse the gref taken from the
>> pending_ring, which means xennet will grant the pages once and release
>> them only when freeing the device. It changes how netfront handles news
>> skbs to be able to reuse the allocated pages similarly to how netback
>> is already doing for the netback TX path.
>> 
>> alloc_rx_buffers() will consume pages from the pending_ring to
>> allocate new skbs. When responses are handled we will move the grants
>> from the grant_rx to the pending_grants. The latter is a shadow ring
>> that keeps all grants belonging to inflight skbs. Finally chaining
>> all skbs ubuf_info together to finally pass the packet up to the
>> network stack. We make use of SKBTX_DEV_ZEROCOPY to get notified
>> once the skb is freed to be able to reuse pages. On the destructor
>> callback we will then add the grant to the pending_ring.
>> 
>> The only catch about this approach is: when we orphan frags, there
>> will be a memcpy on skb_copy_ubufs() (if frags bigger than 0).
>> Depending on the CPU and number of queues this leads to a performance
>> drop of between 7-11%. For this reason, SKBTX_DEV_ZEROCOPY skbs will
>> only be used with persistent grants.
> 
> This means that skbs are passed further up the stack while they are
> still granted to the backend.

__pskb_pull_tail copies to skb->data and unref the frag if no data in
frag is remaining pull. When the packet is then delivered to the stack (in 
netif_receive_skb) skb_orphan_frags will be called where it will allocate
pages for frags and memcpy to them (from the granted pages). The zerocopy
callback is then called which then releases the grants. So, in the end the
granted buffers aren't passed up to the protocol stack, but could be the case
that it changes in the future like you said. Would you prefer memcpy explicitly
instead of using SKBTX_DEV_ZEROCOPY?

> I think this makes it too difficult to validate that the backend can't
> fiddle with the skb frags inappropriately (both now in the the future
> when other changes in the network stack are made).

But wouldn't this the case for netback TX as well, since it uses the similar
approach?

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-19 10:20     ` Joao Martins
  2015-05-19 10:23       ` David Vrabel
@ 2015-05-19 10:23       ` David Vrabel
  1 sibling, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-19 10:23 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: netdev, wei.liu2, ian.campbell, boris.ostrovsky

On 19/05/15 11:20, Joao Martins wrote:
> 
> On 18 May 2015, at 17:55, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 12/05/15 18:18, Joao Martins wrote:
>>> Instead of grant/revoking the buffer related to the skb, it will use
>>> an already granted page and memcpy  to it. The grants will be mapped
>>> by xen-netback and reused overtime, but only unmapped when the vif
>>> disconnects, as opposed to every packet.
>>>
>>> This only happens if the backend supports persistent grants since it
>>> would, otherwise, introduce the overhead of a memcpy on top of the
>>> grant map.
>> [...]
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>> [...]
>>> @@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
>>> 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
>>> 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
>>> 		queue->grant_tx[i].ref = GRANT_INVALID_REF;
>>> -		queue->grant_tx[i].page = NULL;
>>> +		if (queue->info->feature_persistent)
>>> +			queue->grant_tx[i].page = alloc_page(GFP_NOIO);
>>
>> Need to check for alloc failure here and unwind correctly?
> Sorry, I overlooked this check. I will fix that.
> 
>> Why NOIO?
> May be I am misusing NOIO where I meant __GFP_WAIT.
> Tough given we are under rtnl_lock() perhaps GFP_ATOMIC should be used instead.

rtnl_lock() is a mutex, so sleeping is allowed, so GFP_KERNEL is fine
here I think.

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 12/13] xen-netfront: implement TX persistent grants
  2015-05-19 10:20     ` Joao Martins
@ 2015-05-19 10:23       ` David Vrabel
  2015-05-19 10:23       ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 98+ messages in thread
From: David Vrabel @ 2015-05-19 10:23 UTC (permalink / raw)
  To: Joao Martins, xen-devel; +Cc: netdev, boris.ostrovsky, wei.liu2, ian.campbell

On 19/05/15 11:20, Joao Martins wrote:
> 
> On 18 May 2015, at 17:55, David Vrabel <david.vrabel@citrix.com> wrote:
>> On 12/05/15 18:18, Joao Martins wrote:
>>> Instead of grant/revoking the buffer related to the skb, it will use
>>> an already granted page and memcpy  to it. The grants will be mapped
>>> by xen-netback and reused overtime, but only unmapped when the vif
>>> disconnects, as opposed to every packet.
>>>
>>> This only happens if the backend supports persistent grants since it
>>> would, otherwise, introduce the overhead of a memcpy on top of the
>>> grant map.
>> [...]
>>> --- a/drivers/net/xen-netfront.c
>>> +++ b/drivers/net/xen-netfront.c
>> [...]
>>> @@ -1610,7 +1622,10 @@ static int xennet_init_queue(struct netfront_queue *queue)
>>> 	for (i = 0; i < NET_TX_RING_SIZE; i++) {
>>> 		skb_entry_set_link(&queue->tx_skbs[i], i+1);
>>> 		queue->grant_tx[i].ref = GRANT_INVALID_REF;
>>> -		queue->grant_tx[i].page = NULL;
>>> +		if (queue->info->feature_persistent)
>>> +			queue->grant_tx[i].page = alloc_page(GFP_NOIO);
>>
>> Need to check for alloc failure here and unwind correctly?
> Sorry, I overlooked this check. I will fix that.
> 
>> Why NOIO?
> May be I am misusing NOIO where I meant __GFP_WAIT.
> Tough given we are under rtnl_lock() perhaps GFP_ATOMIC should be used instead.

rtnl_lock() is a mutex, so sleeping is allowed, so GFP_KERNEL is fine
here I think.

David

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 02/13] xen-netback: xenbus feature persistent support
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:19   ` Wei Liu
  2015-05-22 10:24     ` Joao Martins
  2015-05-22 10:24     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:19 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:26PM +0200, Joao Martins wrote:
> Checks for "feature-persistent" that indicates persistent grants
> support. Adds max_persistent_grants module param that specifies the max
> number of persistent grants, which if set to zero disables persistent
> grants.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>

This patch needs to be moved later. The feature needs to be implemented
first.

Also you need to patch netif.h to document this new feature. To do this,
you need to patch the master netif.h in Xen tree then sync the change to
Linux. Of course I don't mind if we discuss wording in the Linux
copy then you devise a patch for Xen.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 02/13] xen-netback: xenbus feature persistent support
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:19   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:19 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:26PM +0200, Joao Martins wrote:
> Checks for "feature-persistent" that indicates persistent grants
> support. Adds max_persistent_grants module param that specifies the max
> number of persistent grants, which if set to zero disables persistent
> grants.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>

This patch needs to be moved later. The feature needs to be implemented
first.

Also you need to patch netif.h to document this new feature. To do this,
you need to patch the master netif.h in Xen tree then sync the change to
Linux. Of course I don't mind if we discuss wording in the Linux
copy then you devise a patch for Xen.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:23   ` Wei Liu
  2015-05-22 10:24     ` Joao Martins
  2015-05-22 10:24     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:23 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
> Introduces persistent grants for TX path which follows similar code path
> as the grant mapping.
> 
> It starts by checking if there's a persistent grant available for header
> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
> is found in the tree for the header it will resort to grant copy (but
> preparing the map ops and add them laster). For the frags it will use the
                                     ^
                                     later

> tree page pool, and in case of no pages it fallbacks to grant map/unmap
> using mmap_pages. When skb destructor callback gets called we release the
> slot and persistent grant within the callback to avoid waking up the
> dealloc thread. As long as there are no unmaps to done the dealloc thread
> will remain inactive.
> 

This scheme looks complicated. Can we just only use one
scheme at a time? What's the rationale for using this combined scheme?
Maybe you're thinking about using a max_grants < ring_size to save
memory?

Only skim the patch. I will do detailed reviews after we're sure this is
the right way to go.

> Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
> measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
> with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
> Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> ---
>  drivers/net/xen-netback/common.h    |  12 ++
>  drivers/net/xen-netback/interface.c |  46 +++++
>  drivers/net/xen-netback/netback.c   | 341 +++++++++++++++++++++++++++++++-----
>  3 files changed, 360 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index e70ace7..e5ee220 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -191,6 +191,15 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>  	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
>  	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
>  	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
> +
> +	/* Tree to store the TX grants
> +	 * Only used if feature-persistent = 1
> +	 */
> +	struct persistent_gnt_tree tx_gnts_tree;
> +	struct page *tx_gnts_pages[XEN_NETIF_TX_RING_SIZE];
> +	/* persistent grants in use */
> +	struct persistent_gnt *tx_pgrants[MAX_PENDING_REQS];
> +
>  	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
>  	struct page *pages_to_map[MAX_PENDING_REQS];
>  	struct page *pages_to_unmap[MAX_PENDING_REQS];
> @@ -361,6 +370,9 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
>  
>  /* Unmap a pending page and release it back to the guest */
>  void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
> +void xenvif_page_unmap(struct xenvif_queue *queue,
> +		       grant_handle_t handle,
> +		       struct page **page);
>  
>  static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
>  {
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1a83e19..6f996ac 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -456,6 +456,34 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	return vif;
>  }
>  
> +static int init_persistent_gnt_tree(struct persistent_gnt_tree *tree,
> +				    struct page **pages, int max)
> +{
> +	int err;
> +
> +	tree->gnt_max = min_t(unsigned, max, xenvif_max_pgrants);
> +	tree->root.rb_node = NULL;
> +	atomic_set(&tree->gnt_in_use, 0);
> +
> +	err = gnttab_alloc_pages(tree->gnt_max, pages);
> +	if (!err) {
> +		tree->free_pages_num = 0;
> +		INIT_LIST_HEAD(&tree->free_pages);
> +		put_free_pages(tree, pages, tree->gnt_max);
> +	}
> +
> +	return err;
> +}
> +
> +static void deinit_persistent_gnt_tree(struct persistent_gnt_tree *tree,
> +				       struct page **pages)
> +{
> +	free_persistent_gnts(tree, tree->gnt_c);
> +	BUG_ON(!RB_EMPTY_ROOT(&tree->root));
> +	tree->gnt_c = 0;
> +	gnttab_free_pages(tree->gnt_max, pages);
> +}
> +
>  int xenvif_init_queue(struct xenvif_queue *queue)
>  {
>  	int err, i;
> @@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>  			  .ctx = NULL,
>  			  .desc = i };
>  		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
> +		queue->tx_pgrants[i] = NULL;
> +	}
> +
> +	if (queue->vif->persistent_grants) {
> +		err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
> +					       queue->tx_gnts_pages,
> +					       XEN_NETIF_TX_RING_SIZE);
> +		if (err)
> +			goto err_disable;
>  	}
>  
>  	return 0;
> +
> +err_disable:
> +	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
> +	queue->vif->persistent_grants = 0;

You can just move the above two lines under `if (err)'.

Also see below.

> +	return 0;
>  }
>  
>  void xenvif_carrier_on(struct xenvif *vif)
> @@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
>  		}
>  
>  		xenvif_unmap_frontend_rings(queue);
> +
> +		if (queue->vif->persistent_grants)
> +			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
> +						   queue->tx_gnts_pages);

If the init function fails on queue N (N>0) you now leak resources.

>  	}
>  }
>  
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 332e489..529d7c3 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
>  	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
>  }
>  
> +static inline void *page_to_kaddr(struct page *page)
> +{
> +	return pfn_to_kaddr(page_to_pfn(page));
> +}
> +
>  #define callback_param(vif, pending_idx) \
>  	(vif->pending_tx_info[pending_idx].callback_struct)
>  
> @@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
>  	return i & (MAX_PENDING_REQS-1);
>  }
>  
> +/*  Creates a new persistent grant and add it to the tree.
> + */
> +static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
> +						struct gnttab_map_grant_ref *gop)
> +{
> +	struct persistent_gnt *persistent_gnt;
> +
> +	persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);

xenvif_pgrant_new can be called in NAPI which runs in softirq context
which doesn't allow you to sleep.

> +	if (!persistent_gnt)
> +		return NULL;
> +
> +	persistent_gnt->gnt = gop->ref;
> +	persistent_gnt->page = virt_to_page(gop->host_addr);
> +	persistent_gnt->handle = gop->handle;
> +
> +	if (unlikely(add_persistent_gnt(tree, persistent_gnt))) {
> +		kfree(persistent_gnt);
> +		persistent_gnt = NULL;
> +	}
> +
> +	return persistent_gnt;
> +}
> +
>  bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
>  {
>  	RING_IDX prod, cons;
> @@ -927,22 +955,59 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
>  
>  struct xenvif_tx_cb {
>  	u16 pending_idx;
> +	bool pending_map;
>  };
>  
>  #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
>  
> +static inline void xenvif_pgrant_set(struct xenvif_queue *queue,
> +				     u16 pending_idx,
> +				     struct persistent_gnt *pgrant)
> +{
> +	if (unlikely(queue->tx_pgrants[pending_idx])) {
> +		netdev_err(queue->vif->dev,
> +			   "Trying to overwrite an active persistent grant ! pending_idx: %x\n",
> +			   pending_idx);
> +		BUG();
> +	}
> +	queue->tx_pgrants[pending_idx] = pgrant;
> +}
> +
> +static inline void xenvif_pgrant_reset(struct xenvif_queue *queue,
> +				       u16 pending_idx)
> +{
> +	struct persistent_gnt *pgrant = queue->tx_pgrants[pending_idx];
> +
> +	if (unlikely(!pgrant)) {
> +		netdev_err(queue->vif->dev,
> +			   "Trying to release an inactive persistent_grant ! pending_idx: %x\n",
> +			   pending_idx);
> +		BUG();
> +	}
> +	put_persistent_gnt(&queue->tx_gnts_tree, pgrant);
> +	queue->tx_pgrants[pending_idx] = NULL;
> +}
> +
>  static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
> -					  u16 pending_idx,
> -					  struct xen_netif_tx_request *txp,
> -					  struct gnttab_map_grant_ref *mop)
> +					   u16 pending_idx,
> +					   struct xen_netif_tx_request *txp,
> +					   struct gnttab_map_grant_ref *mop,
> +					   bool use_persistent_gnts)
>  {
> -	queue->pages_to_map[mop-queue->tx_map_ops] = queue->mmap_pages[pending_idx];
> -	gnttab_set_map_op(mop, idx_to_kaddr(queue, pending_idx),
> +	struct page *page = NULL;
> +
> +	if (use_persistent_gnts &&
> +	    get_free_page(&queue->tx_gnts_tree, &page)) {
> +		xenvif_pgrant_reset(queue, pending_idx);
> +		use_persistent_gnts = false;
> +	}
> +
> +	page = (!use_persistent_gnts ? queue->mmap_pages[pending_idx] : page);
> +	queue->pages_to_map[mop - queue->tx_map_ops] = page;
> +	gnttab_set_map_op(mop,
> +			  (unsigned long)page_to_kaddr(page),
>  			  GNTMAP_host_map | GNTMAP_readonly,
>  			  txp->gref, queue->vif->domid);
> -
> -	memcpy(&queue->pending_tx_info[pending_idx].req, txp,
> -	       sizeof(*txp));
>  }
>  
>  static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
> @@ -962,6 +1027,39 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
>  	return skb;
>  }
>  
> +/* Checks if there's a persistent grant available for gref and
> + * if so, set it also in the tx_pgrants array that keeps the ones
> + * in use.
> + */
> +static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
> +				       grant_ref_t ref, u16 pending_idx,
> +				       bool *can_map)
> +{
> +	struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
> +	struct persistent_gnt *persistent_gnt;
> +	bool busy;
> +
> +	if (!queue->vif->persistent_grants)
> +		return false;
> +
> +	persistent_gnt = get_persistent_gnt(tree, ref);
> +
> +	/* If gref is already in use we fallback, since it would
> +	 * otherwise mean re-adding the same gref to the tree
> +	 */
> +	busy = IS_ERR(persistent_gnt);
> +	if (unlikely(busy))
> +		persistent_gnt = NULL;
> +

Under what circumstance can we retrieve a already in use persistent
grant? You seem to suggest this is a bug in RX case.

> +	xenvif_pgrant_set(queue, pending_idx, persistent_gnt);
> +	if (likely(persistent_gnt))
> +		return true;
> +
> +	/* Check if we can create another persistent grant */
> +	*can_map = (!busy && tree->free_pages_num);
> +	return false;
> +}
> +
>  static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue,
>  							struct sk_buff *skb,
>  							struct xen_netif_tx_request *txp,
> @@ -973,6 +1071,7 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
>  	int start;
>  	pending_ring_idx_t index;
>  	unsigned int nr_slots, frag_overflow = 0;
> +	bool map_pgrant = false;
>  
>  	/* At this point shinfo->nr_frags is in fact the number of
>  	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
> @@ -988,11 +1087,16 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
>  	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
> -	     shinfo->nr_frags++, txp++, gop++) {
> +	     shinfo->nr_frags++, txp++) {
>  		index = pending_index(queue->pending_cons++);
>  		pending_idx = queue->pending_ring[index];
> -		xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
>  		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
> +		memcpy(&queue->pending_tx_info[pending_idx].req, txp,
> +		       sizeof(*txp));
> +		if (!xenvif_tx_pgrant_available(queue, txp->gref, pending_idx,
> +						&map_pgrant))
> +			xenvif_tx_create_map_op(queue, pending_idx, txp, gop++,
> +						map_pgrant);
>  	}
>  
>  	if (frag_overflow) {

[...]

>  			MAX_PENDING_REQS);
>  		index = pending_index(queue->dealloc_prod);
> @@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>  		smp_wmb();
>  		queue->dealloc_prod++;
>  	} while (ubuf);
> -	wake_up(&queue->dealloc_wq);
> +	/* Wake up only when there are grants to unmap */
> +	if (dealloc_prod_save != queue->dealloc_prod)
> +		wake_up(&queue->dealloc_wq);
> +
>  	spin_unlock_irqrestore(&queue->callback_lock, flags);
>  
>  	if (likely(zerocopy_success))
> @@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
>  
>  	xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
>  
> -	if (nr_cops == 0)
> +	if (!queue->vif->persistent_grants &&
> +	    nr_cops == 0)

You can just move nr_cops to previous line.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:23   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:23 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
> Introduces persistent grants for TX path which follows similar code path
> as the grant mapping.
> 
> It starts by checking if there's a persistent grant available for header
> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
> is found in the tree for the header it will resort to grant copy (but
> preparing the map ops and add them laster). For the frags it will use the
                                     ^
                                     later

> tree page pool, and in case of no pages it fallbacks to grant map/unmap
> using mmap_pages. When skb destructor callback gets called we release the
> slot and persistent grant within the callback to avoid waking up the
> dealloc thread. As long as there are no unmaps to done the dealloc thread
> will remain inactive.
> 

This scheme looks complicated. Can we just only use one
scheme at a time? What's the rationale for using this combined scheme?
Maybe you're thinking about using a max_grants < ring_size to save
memory?

Only skim the patch. I will do detailed reviews after we're sure this is
the right way to go.

> Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
> measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
> with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
> Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> ---
>  drivers/net/xen-netback/common.h    |  12 ++
>  drivers/net/xen-netback/interface.c |  46 +++++
>  drivers/net/xen-netback/netback.c   | 341 +++++++++++++++++++++++++++++++-----
>  3 files changed, 360 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index e70ace7..e5ee220 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -191,6 +191,15 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>  	struct gnttab_copy tx_copy_ops[MAX_PENDING_REQS];
>  	struct gnttab_map_grant_ref tx_map_ops[MAX_PENDING_REQS];
>  	struct gnttab_unmap_grant_ref tx_unmap_ops[MAX_PENDING_REQS];
> +
> +	/* Tree to store the TX grants
> +	 * Only used if feature-persistent = 1
> +	 */
> +	struct persistent_gnt_tree tx_gnts_tree;
> +	struct page *tx_gnts_pages[XEN_NETIF_TX_RING_SIZE];
> +	/* persistent grants in use */
> +	struct persistent_gnt *tx_pgrants[MAX_PENDING_REQS];
> +
>  	/* passed to gnttab_[un]map_refs with pages under (un)mapping */
>  	struct page *pages_to_map[MAX_PENDING_REQS];
>  	struct page *pages_to_unmap[MAX_PENDING_REQS];
> @@ -361,6 +370,9 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success);
>  
>  /* Unmap a pending page and release it back to the guest */
>  void xenvif_idx_unmap(struct xenvif_queue *queue, u16 pending_idx);
> +void xenvif_page_unmap(struct xenvif_queue *queue,
> +		       grant_handle_t handle,
> +		       struct page **page);
>  
>  static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
>  {
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1a83e19..6f996ac 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -456,6 +456,34 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid,
>  	return vif;
>  }
>  
> +static int init_persistent_gnt_tree(struct persistent_gnt_tree *tree,
> +				    struct page **pages, int max)
> +{
> +	int err;
> +
> +	tree->gnt_max = min_t(unsigned, max, xenvif_max_pgrants);
> +	tree->root.rb_node = NULL;
> +	atomic_set(&tree->gnt_in_use, 0);
> +
> +	err = gnttab_alloc_pages(tree->gnt_max, pages);
> +	if (!err) {
> +		tree->free_pages_num = 0;
> +		INIT_LIST_HEAD(&tree->free_pages);
> +		put_free_pages(tree, pages, tree->gnt_max);
> +	}
> +
> +	return err;
> +}
> +
> +static void deinit_persistent_gnt_tree(struct persistent_gnt_tree *tree,
> +				       struct page **pages)
> +{
> +	free_persistent_gnts(tree, tree->gnt_c);
> +	BUG_ON(!RB_EMPTY_ROOT(&tree->root));
> +	tree->gnt_c = 0;
> +	gnttab_free_pages(tree->gnt_max, pages);
> +}
> +
>  int xenvif_init_queue(struct xenvif_queue *queue)
>  {
>  	int err, i;
> @@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>  			  .ctx = NULL,
>  			  .desc = i };
>  		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
> +		queue->tx_pgrants[i] = NULL;
> +	}
> +
> +	if (queue->vif->persistent_grants) {
> +		err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
> +					       queue->tx_gnts_pages,
> +					       XEN_NETIF_TX_RING_SIZE);
> +		if (err)
> +			goto err_disable;
>  	}
>  
>  	return 0;
> +
> +err_disable:
> +	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
> +	queue->vif->persistent_grants = 0;

You can just move the above two lines under `if (err)'.

Also see below.

> +	return 0;
>  }
>  
>  void xenvif_carrier_on(struct xenvif *vif)
> @@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
>  		}
>  
>  		xenvif_unmap_frontend_rings(queue);
> +
> +		if (queue->vif->persistent_grants)
> +			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
> +						   queue->tx_gnts_pages);

If the init function fails on queue N (N>0) you now leak resources.

>  	}
>  }
>  
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 332e489..529d7c3 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
>  	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
>  }
>  
> +static inline void *page_to_kaddr(struct page *page)
> +{
> +	return pfn_to_kaddr(page_to_pfn(page));
> +}
> +
>  #define callback_param(vif, pending_idx) \
>  	(vif->pending_tx_info[pending_idx].callback_struct)
>  
> @@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
>  	return i & (MAX_PENDING_REQS-1);
>  }
>  
> +/*  Creates a new persistent grant and add it to the tree.
> + */
> +static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
> +						struct gnttab_map_grant_ref *gop)
> +{
> +	struct persistent_gnt *persistent_gnt;
> +
> +	persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);

xenvif_pgrant_new can be called in NAPI which runs in softirq context
which doesn't allow you to sleep.

> +	if (!persistent_gnt)
> +		return NULL;
> +
> +	persistent_gnt->gnt = gop->ref;
> +	persistent_gnt->page = virt_to_page(gop->host_addr);
> +	persistent_gnt->handle = gop->handle;
> +
> +	if (unlikely(add_persistent_gnt(tree, persistent_gnt))) {
> +		kfree(persistent_gnt);
> +		persistent_gnt = NULL;
> +	}
> +
> +	return persistent_gnt;
> +}
> +
>  bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
>  {
>  	RING_IDX prod, cons;
> @@ -927,22 +955,59 @@ static int xenvif_count_requests(struct xenvif_queue *queue,
>  
>  struct xenvif_tx_cb {
>  	u16 pending_idx;
> +	bool pending_map;
>  };
>  
>  #define XENVIF_TX_CB(skb) ((struct xenvif_tx_cb *)(skb)->cb)
>  
> +static inline void xenvif_pgrant_set(struct xenvif_queue *queue,
> +				     u16 pending_idx,
> +				     struct persistent_gnt *pgrant)
> +{
> +	if (unlikely(queue->tx_pgrants[pending_idx])) {
> +		netdev_err(queue->vif->dev,
> +			   "Trying to overwrite an active persistent grant ! pending_idx: %x\n",
> +			   pending_idx);
> +		BUG();
> +	}
> +	queue->tx_pgrants[pending_idx] = pgrant;
> +}
> +
> +static inline void xenvif_pgrant_reset(struct xenvif_queue *queue,
> +				       u16 pending_idx)
> +{
> +	struct persistent_gnt *pgrant = queue->tx_pgrants[pending_idx];
> +
> +	if (unlikely(!pgrant)) {
> +		netdev_err(queue->vif->dev,
> +			   "Trying to release an inactive persistent_grant ! pending_idx: %x\n",
> +			   pending_idx);
> +		BUG();
> +	}
> +	put_persistent_gnt(&queue->tx_gnts_tree, pgrant);
> +	queue->tx_pgrants[pending_idx] = NULL;
> +}
> +
>  static inline void xenvif_tx_create_map_op(struct xenvif_queue *queue,
> -					  u16 pending_idx,
> -					  struct xen_netif_tx_request *txp,
> -					  struct gnttab_map_grant_ref *mop)
> +					   u16 pending_idx,
> +					   struct xen_netif_tx_request *txp,
> +					   struct gnttab_map_grant_ref *mop,
> +					   bool use_persistent_gnts)
>  {
> -	queue->pages_to_map[mop-queue->tx_map_ops] = queue->mmap_pages[pending_idx];
> -	gnttab_set_map_op(mop, idx_to_kaddr(queue, pending_idx),
> +	struct page *page = NULL;
> +
> +	if (use_persistent_gnts &&
> +	    get_free_page(&queue->tx_gnts_tree, &page)) {
> +		xenvif_pgrant_reset(queue, pending_idx);
> +		use_persistent_gnts = false;
> +	}
> +
> +	page = (!use_persistent_gnts ? queue->mmap_pages[pending_idx] : page);
> +	queue->pages_to_map[mop - queue->tx_map_ops] = page;
> +	gnttab_set_map_op(mop,
> +			  (unsigned long)page_to_kaddr(page),
>  			  GNTMAP_host_map | GNTMAP_readonly,
>  			  txp->gref, queue->vif->domid);
> -
> -	memcpy(&queue->pending_tx_info[pending_idx].req, txp,
> -	       sizeof(*txp));
>  }
>  
>  static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
> @@ -962,6 +1027,39 @@ static inline struct sk_buff *xenvif_alloc_skb(unsigned int size)
>  	return skb;
>  }
>  
> +/* Checks if there's a persistent grant available for gref and
> + * if so, set it also in the tx_pgrants array that keeps the ones
> + * in use.
> + */
> +static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
> +				       grant_ref_t ref, u16 pending_idx,
> +				       bool *can_map)
> +{
> +	struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
> +	struct persistent_gnt *persistent_gnt;
> +	bool busy;
> +
> +	if (!queue->vif->persistent_grants)
> +		return false;
> +
> +	persistent_gnt = get_persistent_gnt(tree, ref);
> +
> +	/* If gref is already in use we fallback, since it would
> +	 * otherwise mean re-adding the same gref to the tree
> +	 */
> +	busy = IS_ERR(persistent_gnt);
> +	if (unlikely(busy))
> +		persistent_gnt = NULL;
> +

Under what circumstance can we retrieve a already in use persistent
grant? You seem to suggest this is a bug in RX case.

> +	xenvif_pgrant_set(queue, pending_idx, persistent_gnt);
> +	if (likely(persistent_gnt))
> +		return true;
> +
> +	/* Check if we can create another persistent grant */
> +	*can_map = (!busy && tree->free_pages_num);
> +	return false;
> +}
> +
>  static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *queue,
>  							struct sk_buff *skb,
>  							struct xen_netif_tx_request *txp,
> @@ -973,6 +1071,7 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
>  	int start;
>  	pending_ring_idx_t index;
>  	unsigned int nr_slots, frag_overflow = 0;
> +	bool map_pgrant = false;
>  
>  	/* At this point shinfo->nr_frags is in fact the number of
>  	 * slots, which can be as large as XEN_NETBK_LEGACY_SLOTS_MAX.
> @@ -988,11 +1087,16 @@ static struct gnttab_map_grant_ref *xenvif_get_requests(struct xenvif_queue *que
>  	start = (frag_get_pending_idx(&shinfo->frags[0]) == pending_idx);
>  
>  	for (shinfo->nr_frags = start; shinfo->nr_frags < nr_slots;
> -	     shinfo->nr_frags++, txp++, gop++) {
> +	     shinfo->nr_frags++, txp++) {
>  		index = pending_index(queue->pending_cons++);
>  		pending_idx = queue->pending_ring[index];
> -		xenvif_tx_create_map_op(queue, pending_idx, txp, gop);
>  		frag_set_pending_idx(&frags[shinfo->nr_frags], pending_idx);
> +		memcpy(&queue->pending_tx_info[pending_idx].req, txp,
> +		       sizeof(*txp));
> +		if (!xenvif_tx_pgrant_available(queue, txp->gref, pending_idx,
> +						&map_pgrant))
> +			xenvif_tx_create_map_op(queue, pending_idx, txp, gop++,
> +						map_pgrant);
>  	}
>  
>  	if (frag_overflow) {

[...]

>  			MAX_PENDING_REQS);
>  		index = pending_index(queue->dealloc_prod);
> @@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>  		smp_wmb();
>  		queue->dealloc_prod++;
>  	} while (ubuf);
> -	wake_up(&queue->dealloc_wq);
> +	/* Wake up only when there are grants to unmap */
> +	if (dealloc_prod_save != queue->dealloc_prod)
> +		wake_up(&queue->dealloc_wq);
> +
>  	spin_unlock_irqrestore(&queue->callback_lock, flags);
>  
>  	if (likely(zerocopy_success))
> @@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
>  
>  	xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
>  
> -	if (nr_cops == 0)
> +	if (!queue->vif->persistent_grants &&
> +	    nr_cops == 0)

You can just move nr_cops to previous line.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:32   ` Wei Liu
  2015-05-22 10:25     ` Joao Martins
  2015-05-22 10:25     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:32 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
> It starts by doing a lookup in the tree for a gref. If no persistent
> grant is found on the tree, it will do grant copy and prepare
> the grant maps. Finally valides the grant map and adds it to the tree.

validates?

> After mapped these grants can be pulled from the tree in the subsequent
> requests. If it's out of pages in the tree pool, it will fallback to
> grant copy.
> 

Again, this looks complicated. Why use combined scheme? I will do
detailed reviews after we're sure we need such scheme.

> It adds four new fields in the netrx_pending_operations: copy_done
> to track how many copies were made; map_prod and map_cons to track
> how many maps are outstanding validation and finally copy_page for
> the correspondent page (in tree) for copy_gref.
> 
> Results are 1.04 Mpps measured with pktgen (pkt_size 64, burst 1)
> with persistent grants versus 1.23 Mpps with grant copy (20%
> regression). With persistent grants it adds up contention on
> queue->wq as the kthread_guest_rx goes to sleep more often. If we
> speed up the sender (burst 2,4 and 8) it goes up to 1.7 Mpps with
> persistent grants. This issue is addressed in later a commit, by
> copying the skb on xenvif_start_xmit() instead of going through
> the RX kthread.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> ---
>  drivers/net/xen-netback/common.h    |   7 ++
>  drivers/net/xen-netback/interface.c |  14 ++-
>  drivers/net/xen-netback/netback.c   | 190 ++++++++++++++++++++++++++++++------
>  3 files changed, 178 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index e5ee220..23deb6a 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -235,6 +235,13 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>  
>  	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>  
> +	/* To map the grefs to be added to the tree */
> +	struct gnttab_map_grant_ref rx_map_ops[XEN_NETIF_RX_RING_SIZE];
> +	struct page *rx_pages_to_map[XEN_NETIF_RX_RING_SIZE];
> +	/* Only used if feature-persistent = 1 */

This comment applies to rx_map_ops and rx_pages_to_map as well. Could
you move it up?

> +	struct persistent_gnt_tree rx_gnts_tree;
> +	struct page *rx_gnts_pages[XEN_NETIF_RX_RING_SIZE];
> +
>  	/* We create one meta structure per ring request we consume, so
>  	 * the maximum number is the same as the ring size.
>  	 */
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c

[...]

>  
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 529d7c3..738b6ee 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -413,14 +413,62 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
>  }
>  
>  struct netrx_pending_operations {
> +	unsigned map_prod, map_cons;
>  	unsigned copy_prod, copy_cons;
>  	unsigned meta_prod, meta_cons;
>  	struct gnttab_copy *copy;
>  	struct xenvif_rx_meta *meta;
>  	int copy_off;
>  	grant_ref_t copy_gref;
> +	struct page *copy_page;
> +	unsigned copy_done;
>  };
>  
> +static void xenvif_create_rx_map_op(struct xenvif_queue *queue,
> +				    struct gnttab_map_grant_ref *mop,
> +				    grant_ref_t ref,
> +				    struct page *page)

Rename it to xenvif_rx_create_map_op to be consistent with
xenvif_tx_create_map_op?

> +{
> +	queue->rx_pages_to_map[mop - queue->rx_map_ops] = page;
> +	gnttab_set_map_op(mop,
> +			  (unsigned long)page_to_kaddr(page),
> +			  GNTMAP_host_map,
> +			  ref, queue->vif->domid);
> +}
> +

[...]

> +
> +		persistent_gnt = xenvif_pgrant_new(tree, gop_map);
> +		if (unlikely(!persistent_gnt)) {
> +			netdev_err(queue->vif->dev,
> +				   "Couldn't add gref to the tree! ref: %d",
> +				   gop_map->ref);
> +			xenvif_page_unmap(queue, gop_map->handle, &page);
> +			put_free_pages(tree, &page, 1);
> +			kfree(persistent_gnt);
> +			persistent_gnt = NULL;

persistent_gnt is already NULL.

So the kfree and = NULL is pointless.

> +			continue;
> +		}
> +
> +		put_persistent_gnt(tree, persistent_gnt);
> +	}
> +}
> +
> +/*
>   * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
>   * used to set up the operations on the top of
>   * netrx_pending_operations, which have since been done.  Check that
>   * they didn't give any errors and advance over them.
>   */
> -static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
> +static int xenvif_check_gop(struct xenvif_queue *queue, int nr_meta_slots,
>  			    struct netrx_pending_operations *npo)
>  {
>  	struct gnttab_copy     *copy_op;
>  	int status = XEN_NETIF_RSP_OKAY;
>  	int i;
>  
> +	nr_meta_slots -= npo->copy_done;
> +	if (npo->map_prod)

Should be "if (npo->map_prod != npo->map_cons)"?

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:32   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:32 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
> It starts by doing a lookup in the tree for a gref. If no persistent
> grant is found on the tree, it will do grant copy and prepare
> the grant maps. Finally valides the grant map and adds it to the tree.

validates?

> After mapped these grants can be pulled from the tree in the subsequent
> requests. If it's out of pages in the tree pool, it will fallback to
> grant copy.
> 

Again, this looks complicated. Why use combined scheme? I will do
detailed reviews after we're sure we need such scheme.

> It adds four new fields in the netrx_pending_operations: copy_done
> to track how many copies were made; map_prod and map_cons to track
> how many maps are outstanding validation and finally copy_page for
> the correspondent page (in tree) for copy_gref.
> 
> Results are 1.04 Mpps measured with pktgen (pkt_size 64, burst 1)
> with persistent grants versus 1.23 Mpps with grant copy (20%
> regression). With persistent grants it adds up contention on
> queue->wq as the kthread_guest_rx goes to sleep more often. If we
> speed up the sender (burst 2,4 and 8) it goes up to 1.7 Mpps with
> persistent grants. This issue is addressed in later a commit, by
> copying the skb on xenvif_start_xmit() instead of going through
> the RX kthread.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> ---
>  drivers/net/xen-netback/common.h    |   7 ++
>  drivers/net/xen-netback/interface.c |  14 ++-
>  drivers/net/xen-netback/netback.c   | 190 ++++++++++++++++++++++++++++++------
>  3 files changed, 178 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index e5ee220..23deb6a 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -235,6 +235,13 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>  
>  	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>  
> +	/* To map the grefs to be added to the tree */
> +	struct gnttab_map_grant_ref rx_map_ops[XEN_NETIF_RX_RING_SIZE];
> +	struct page *rx_pages_to_map[XEN_NETIF_RX_RING_SIZE];
> +	/* Only used if feature-persistent = 1 */

This comment applies to rx_map_ops and rx_pages_to_map as well. Could
you move it up?

> +	struct persistent_gnt_tree rx_gnts_tree;
> +	struct page *rx_gnts_pages[XEN_NETIF_RX_RING_SIZE];
> +
>  	/* We create one meta structure per ring request we consume, so
>  	 * the maximum number is the same as the ring size.
>  	 */
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c

[...]

>  
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 529d7c3..738b6ee 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -413,14 +413,62 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
>  }
>  
>  struct netrx_pending_operations {
> +	unsigned map_prod, map_cons;
>  	unsigned copy_prod, copy_cons;
>  	unsigned meta_prod, meta_cons;
>  	struct gnttab_copy *copy;
>  	struct xenvif_rx_meta *meta;
>  	int copy_off;
>  	grant_ref_t copy_gref;
> +	struct page *copy_page;
> +	unsigned copy_done;
>  };
>  
> +static void xenvif_create_rx_map_op(struct xenvif_queue *queue,
> +				    struct gnttab_map_grant_ref *mop,
> +				    grant_ref_t ref,
> +				    struct page *page)

Rename it to xenvif_rx_create_map_op to be consistent with
xenvif_tx_create_map_op?

> +{
> +	queue->rx_pages_to_map[mop - queue->rx_map_ops] = page;
> +	gnttab_set_map_op(mop,
> +			  (unsigned long)page_to_kaddr(page),
> +			  GNTMAP_host_map,
> +			  ref, queue->vif->domid);
> +}
> +

[...]

> +
> +		persistent_gnt = xenvif_pgrant_new(tree, gop_map);
> +		if (unlikely(!persistent_gnt)) {
> +			netdev_err(queue->vif->dev,
> +				   "Couldn't add gref to the tree! ref: %d",
> +				   gop_map->ref);
> +			xenvif_page_unmap(queue, gop_map->handle, &page);
> +			put_free_pages(tree, &page, 1);
> +			kfree(persistent_gnt);
> +			persistent_gnt = NULL;

persistent_gnt is already NULL.

So the kfree and = NULL is pointless.

> +			continue;
> +		}
> +
> +		put_persistent_gnt(tree, persistent_gnt);
> +	}
> +}
> +
> +/*
>   * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
>   * used to set up the operations on the top of
>   * netrx_pending_operations, which have since been done.  Check that
>   * they didn't give any errors and advance over them.
>   */
> -static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
> +static int xenvif_check_gop(struct xenvif_queue *queue, int nr_meta_slots,
>  			    struct netrx_pending_operations *npo)
>  {
>  	struct gnttab_copy     *copy_op;
>  	int status = XEN_NETIF_RSP_OKAY;
>  	int i;
>  
> +	nr_meta_slots -= npo->copy_done;
> +	if (npo->map_prod)

Should be "if (npo->map_prod != npo->map_cons)"?

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 05/13] xen-netback: refactor xenvif_rx_action
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:32   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:32 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:29PM +0200, Joao Martins wrote:
> Refactor xenvif_rx_action by dividing it into build_gops and
> submit, similar to what xenvif_tx_action looks like.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 05/13] xen-netback: refactor xenvif_rx_action
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:32   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:32 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:29PM +0200, Joao Martins wrote:
> Refactor xenvif_rx_action by dividing it into build_gops and
> submit, similar to what xenvif_tx_action looks like.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:35   ` Wei Liu
  2015-05-22 10:26     ` Joao Martins
  2015-05-22 10:26     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:35 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:30PM +0200, Joao Martins wrote:
> By introducing persistent grants we speed up the RX thread with the
> decreased copy cost, that leads to a throughput decrease of 20%.
> It is observed that the rx_queue stays mostly at 10% of its capacity,
> as opposed to full capacity when using grant copy. And a finer measure
> with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
> queue contention on queue->wq, which hints that the RX kthread is
> waits/wake_up more often, that is actually doing work.
> 
> Without persistent grants:
> 
> class name    con-bounces    contentions   waittime-min   waittime-max
> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min
> holdtime-max holdtime-total   holdtime-avg
> --------------------------------------------------------------------------
> &queue->wq:   792            792           0.36          24.36
> 1140.30           1.44           4208        1002671           0.00
> 46.75      538164.02           0.54
> ----------
> &queue->wq    326          [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq    410          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> &queue->wq     56          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> ----------
> &queue->wq    202          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> &queue->wq    467          [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq    123          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> 
> With persistent grants:
> 
> &queue->wq:   61834          61836           0.32          30.12
> 99710.27           1.61         241400        1125308           0.00
> 75.61     1106578.82           0.98
> ----------
> &queue->wq     5079        [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq    56280        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> &queue->wq      479        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> ----------
> &queue->wq     1005        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> &queue->wq    56761        [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq     4072        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> 
> Also, with persistent grants, we don't require batching grant copy ops
> (besides the initial copy+map) which makes me believe that deferring
> the skb to the RX kthread just adds up unnecessary overhead (for this
> particular case). This patch proposes copying the buffer on
> xenvif_start_xmit(), which lets us both remove the contention on
> queue->wq and lock on rx_queue. Here, an alternative to
> xenvif_rx_action routine is added namely xenvif_rx_map() that maps
> and copies the buffer to the guest. This is only used when persistent
> grants are used, since it would otherwise mean an hypercall per
> packet.
> 
> Improvements are up to a factor of 2.14 with a single queue getting us
> from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
> (burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
> copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
> an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> ---
>  drivers/net/xen-netback/common.h    |  2 ++
>  drivers/net/xen-netback/interface.c | 11 +++++---
>  drivers/net/xen-netback/netback.c   | 52 +++++++++++++++++++++++++++++--------
>  3 files changed, 51 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 23deb6a..f3ece12 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
>  
>  int xenvif_dealloc_kthread(void *data);
>  
> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
> +
>  void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
>  
>  /* Determine whether the needed number of slots (req) are available,
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1103568..dfe2b7b 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
>  {
>  	struct xenvif_queue *queue = dev_id;
>  
> -	xenvif_kick_thread(queue);
> +	if (!queue->vif->persistent_grants)
> +		xenvif_kick_thread(queue);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	cb = XENVIF_RX_CB(skb);
>  	cb->expires = jiffies + vif->drain_timeout;
>  
> -	xenvif_rx_queue_tail(queue, skb);
> -	xenvif_kick_thread(queue);
> +	if (!queue->vif->persistent_grants) {
> +		xenvif_rx_queue_tail(queue, skb);
> +		xenvif_kick_thread(queue);
> +	} else if (xenvif_rx_map(queue, skb)) {
> +		return NETDEV_TX_BUSY;
> +	}
>  

We now have two different functions for guest RX, one is xenvif_rx_map,
the other is xenvif_rx_action. They look very similar. Can we only have
one?

>  	return NETDEV_TX_OK;
>  
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index c4f57d7..228df92 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,
>  	return !!ret;
>  }
>  
> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb)
> +{
> +	int ret = -EBUSY;
> +	struct netrx_pending_operations npo = {
> +		.copy  = queue->grant_copy_op,
> +		.meta  = queue->meta
> +	};
> +
> +	if (!xenvif_rx_ring_slots_available(queue, XEN_NETBK_LEGACY_SLOTS_MAX))

I think you meant XEN_NETBK_RX_SLOTS_MAX?

> +		goto done;
> +
> +	xenvif_rx_build_gops(queue, &npo, skb);
> +
> +	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
> +	if (!npo.copy_done && !npo.copy_prod)
> +		goto done;
> +
> +	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
> +	if (npo.map_prod) {
> +		ret = gnttab_map_refs(queue->rx_map_ops,
> +				      NULL,
> +				      queue->rx_pages_to_map,
> +				      npo.map_prod);
> +		BUG_ON(ret);
> +	}
> +
> +	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
> +	if (npo.copy_prod)
> +		gnttab_batch_copy(npo.copy, npo.copy_prod);
> +
> +	if (xenvif_rx_submit(queue, &npo, skb))
> +		notify_remote_via_irq(queue->rx_irq);
> +
> +	ret = 0; /* clear error */

No need to have that comment.

> +done:
> +	if (xenvif_queue_stopped(queue))
> +		xenvif_wake_queue(queue);
> +	return ret;
> +}
> +
>  static void xenvif_rx_action(struct xenvif_queue *queue)
>  {
> -	int ret;
>  	struct sk_buff *skb;
>  	struct sk_buff_head rxq;
>  	bool need_to_notify = false;
> @@ -905,22 +944,13 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>  	}
>  
>  	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
> -	if (!npo.copy_done && !npo.copy_prod)
> +	if (!npo.copy_prod)

You modified this line back and forth. You could just avoid modifying it
in you previous patch to implement RX persistent grants.

>  		return;
>  
>  	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
>  	if (npo.copy_prod)
>  		gnttab_batch_copy(npo.copy, npo.copy_prod);
>  
> -	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
> -	if (npo.map_prod) {
> -		ret = gnttab_map_refs(queue->rx_map_ops,
> -				      NULL,
> -				      queue->rx_pages_to_map,
> -				      npo.map_prod);
> -		BUG_ON(ret);
> -	}
> -

And this? You delete the hunk you added in previous patch.

Wei.

>  	while ((skb = __skb_dequeue(&rxq)) != NULL)
>  		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
>  
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:35   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:35 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:30PM +0200, Joao Martins wrote:
> By introducing persistent grants we speed up the RX thread with the
> decreased copy cost, that leads to a throughput decrease of 20%.
> It is observed that the rx_queue stays mostly at 10% of its capacity,
> as opposed to full capacity when using grant copy. And a finer measure
> with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
> queue contention on queue->wq, which hints that the RX kthread is
> waits/wake_up more often, that is actually doing work.
> 
> Without persistent grants:
> 
> class name    con-bounces    contentions   waittime-min   waittime-max
> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min
> holdtime-max holdtime-total   holdtime-avg
> --------------------------------------------------------------------------
> &queue->wq:   792            792           0.36          24.36
> 1140.30           1.44           4208        1002671           0.00
> 46.75      538164.02           0.54
> ----------
> &queue->wq    326          [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq    410          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> &queue->wq     56          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> ----------
> &queue->wq    202          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> &queue->wq    467          [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq    123          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> 
> With persistent grants:
> 
> &queue->wq:   61834          61836           0.32          30.12
> 99710.27           1.61         241400        1125308           0.00
> 75.61     1106578.82           0.98
> ----------
> &queue->wq     5079        [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq    56280        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> &queue->wq      479        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> ----------
> &queue->wq     1005        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
> &queue->wq    56761        [<ffffffff8115949f>] __wake_up+0x2f/0x80
> &queue->wq     4072        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
> 
> Also, with persistent grants, we don't require batching grant copy ops
> (besides the initial copy+map) which makes me believe that deferring
> the skb to the RX kthread just adds up unnecessary overhead (for this
> particular case). This patch proposes copying the buffer on
> xenvif_start_xmit(), which lets us both remove the contention on
> queue->wq and lock on rx_queue. Here, an alternative to
> xenvif_rx_action routine is added namely xenvif_rx_map() that maps
> and copies the buffer to the guest. This is only used when persistent
> grants are used, since it would otherwise mean an hypercall per
> packet.
> 
> Improvements are up to a factor of 2.14 with a single queue getting us
> from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
> (burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
> copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
> an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> ---
>  drivers/net/xen-netback/common.h    |  2 ++
>  drivers/net/xen-netback/interface.c | 11 +++++---
>  drivers/net/xen-netback/netback.c   | 52 +++++++++++++++++++++++++++++--------
>  3 files changed, 51 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
> index 23deb6a..f3ece12 100644
> --- a/drivers/net/xen-netback/common.h
> +++ b/drivers/net/xen-netback/common.h
> @@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
>  
>  int xenvif_dealloc_kthread(void *data);
>  
> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
> +
>  void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
>  
>  /* Determine whether the needed number of slots (req) are available,
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> index 1103568..dfe2b7b 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
>  {
>  	struct xenvif_queue *queue = dev_id;
>  
> -	xenvif_kick_thread(queue);
> +	if (!queue->vif->persistent_grants)
> +		xenvif_kick_thread(queue);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	cb = XENVIF_RX_CB(skb);
>  	cb->expires = jiffies + vif->drain_timeout;
>  
> -	xenvif_rx_queue_tail(queue, skb);
> -	xenvif_kick_thread(queue);
> +	if (!queue->vif->persistent_grants) {
> +		xenvif_rx_queue_tail(queue, skb);
> +		xenvif_kick_thread(queue);
> +	} else if (xenvif_rx_map(queue, skb)) {
> +		return NETDEV_TX_BUSY;
> +	}
>  

We now have two different functions for guest RX, one is xenvif_rx_map,
the other is xenvif_rx_action. They look very similar. Can we only have
one?

>  	return NETDEV_TX_OK;
>  
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index c4f57d7..228df92 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,
>  	return !!ret;
>  }
>  
> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb)
> +{
> +	int ret = -EBUSY;
> +	struct netrx_pending_operations npo = {
> +		.copy  = queue->grant_copy_op,
> +		.meta  = queue->meta
> +	};
> +
> +	if (!xenvif_rx_ring_slots_available(queue, XEN_NETBK_LEGACY_SLOTS_MAX))

I think you meant XEN_NETBK_RX_SLOTS_MAX?

> +		goto done;
> +
> +	xenvif_rx_build_gops(queue, &npo, skb);
> +
> +	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
> +	if (!npo.copy_done && !npo.copy_prod)
> +		goto done;
> +
> +	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
> +	if (npo.map_prod) {
> +		ret = gnttab_map_refs(queue->rx_map_ops,
> +				      NULL,
> +				      queue->rx_pages_to_map,
> +				      npo.map_prod);
> +		BUG_ON(ret);
> +	}
> +
> +	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
> +	if (npo.copy_prod)
> +		gnttab_batch_copy(npo.copy, npo.copy_prod);
> +
> +	if (xenvif_rx_submit(queue, &npo, skb))
> +		notify_remote_via_irq(queue->rx_irq);
> +
> +	ret = 0; /* clear error */

No need to have that comment.

> +done:
> +	if (xenvif_queue_stopped(queue))
> +		xenvif_wake_queue(queue);
> +	return ret;
> +}
> +
>  static void xenvif_rx_action(struct xenvif_queue *queue)
>  {
> -	int ret;
>  	struct sk_buff *skb;
>  	struct sk_buff_head rxq;
>  	bool need_to_notify = false;
> @@ -905,22 +944,13 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>  	}
>  
>  	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
> -	if (!npo.copy_done && !npo.copy_prod)
> +	if (!npo.copy_prod)

You modified this line back and forth. You could just avoid modifying it
in you previous patch to implement RX persistent grants.

>  		return;
>  
>  	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
>  	if (npo.copy_prod)
>  		gnttab_batch_copy(npo.copy, npo.copy_prod);
>  
> -	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
> -	if (npo.map_prod) {
> -		ret = gnttab_map_refs(queue->rx_map_ops,
> -				      NULL,
> -				      queue->rx_pages_to_map,
> -				      npo.map_prod);
> -		BUG_ON(ret);
> -	}
> -

And this? You delete the hunk you added in previous patch.

Wei.

>  	while ((skb = __skb_dequeue(&rxq)) != NULL)
>  		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
>  
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 07/13] xen-netback: add persistent tree counters to debugfs
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:36   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:36 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:31PM +0200, Joao Martins wrote:
> Prints the total/max number of persistent grants and how many of
> them are in use.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

> ---
>  drivers/net/xen-netback/xenbus.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index 766f7e5..1e6f27a 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -121,6 +121,17 @@ static int xenvif_read_io_ring(struct seq_file *m, void *v)
>  		   skb_queue_len(&queue->rx_queue),
>  		   netif_tx_queue_stopped(dev_queue) ? "stopped" : "running");
>  
> +	if (queue->vif->persistent_grants) {
> +		seq_printf(m, "\nRx persistent_gnts: in_use %d max %d gnts %d\n",
> +			   atomic_read(&queue->rx_gnts_tree.gnt_in_use),
> +			   queue->rx_gnts_tree.gnt_max,
> +			   queue->rx_gnts_tree.gnt_c);
> +		seq_printf(m, "\nTx persistent_gnts: in_use %d max %d gnts %d\n",
> +			   atomic_read(&queue->tx_gnts_tree.gnt_in_use),
> +			   queue->tx_gnts_tree.gnt_max,
> +			   queue->tx_gnts_tree.gnt_c);
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 07/13] xen-netback: add persistent tree counters to debugfs
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:36   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:36 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:31PM +0200, Joao Martins wrote:
> Prints the total/max number of persistent grants and how many of
> them are in use.
> 
> Signed-off-by: Joao Martins <joao.martins@neclab.eu>

Reviewed-by: Wei Liu <wei.liu2@citrix.com>

> ---
>  drivers/net/xen-netback/xenbus.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
> index 766f7e5..1e6f27a 100644
> --- a/drivers/net/xen-netback/xenbus.c
> +++ b/drivers/net/xen-netback/xenbus.c
> @@ -121,6 +121,17 @@ static int xenvif_read_io_ring(struct seq_file *m, void *v)
>  		   skb_queue_len(&queue->rx_queue),
>  		   netif_tx_queue_stopped(dev_queue) ? "stopped" : "running");
>  
> +	if (queue->vif->persistent_grants) {
> +		seq_printf(m, "\nRx persistent_gnts: in_use %d max %d gnts %d\n",
> +			   atomic_read(&queue->rx_gnts_tree.gnt_in_use),
> +			   queue->rx_gnts_tree.gnt_max,
> +			   queue->rx_gnts_tree.gnt_c);
> +		seq_printf(m, "\nTx persistent_gnts: in_use %d max %d gnts %d\n",
> +			   atomic_read(&queue->tx_gnts_tree.gnt_in_use),
> +			   queue->tx_gnts_tree.gnt_max,
> +			   queue->tx_gnts_tree.gnt_c);
> +	}
> +
>  	return 0;
>  }
>  
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set
  2015-05-12 17:18   ` Joao Martins
  (?)
  (?)
@ 2015-05-19 15:36   ` Wei Liu
  2015-05-22 17:14     ` Joao Martins
  2015-05-22 17:14     ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:36 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:32PM +0200, Joao Martins wrote:
> On xenvif_start_xmit() we have an additional queue to the netback RX
> kthread that will sends the packet. When using burst>1 pktgen sets
> skb->xmit_more to tell the driver that there more skbs in the queue.
> However, pktgen transmits the same skb <burst> times, which leads to
> the BUG below. Long story short adding the same skb in the rx_queue
> queue leads to crash. Specifically, having pktgen running with burst=2
> what happens is: when we queue the second skb (that is the same as
> the first queued skb), the list will have the tail element with skb->prev
> which is the skb itself. On skb_unlink (i.e. when dequeueing the skb)
> skb->prev will become NULL, but still having list->next pointing to the
> unlinked skb. Because of this skb_peek will still return an skb, which
> will redo the skb_unlink trying to set (skb->prev)->next where skb->prev
> is now NULL, thus leading to the crash (trace below).
> 

>From your description this doesn't sound Xen specific. Sounds like
pktgen breaks in any driver that has an internal queue, which is plenty.

> I'm not sure what the best way to fix this but since it's only happening
> when we use pktgen with burst>1: I chose doing an skb_clone when we don't
> use persistent grants and skb->xmit_more flag is set, and when
> CONFIG_NET_PKTGEN is compiled builtin.
> 

I don't think we should do this.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set
  2015-05-12 17:18   ` Joao Martins
  (?)
@ 2015-05-19 15:36   ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:36 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:32PM +0200, Joao Martins wrote:
> On xenvif_start_xmit() we have an additional queue to the netback RX
> kthread that will sends the packet. When using burst>1 pktgen sets
> skb->xmit_more to tell the driver that there more skbs in the queue.
> However, pktgen transmits the same skb <burst> times, which leads to
> the BUG below. Long story short adding the same skb in the rx_queue
> queue leads to crash. Specifically, having pktgen running with burst=2
> what happens is: when we queue the second skb (that is the same as
> the first queued skb), the list will have the tail element with skb->prev
> which is the skb itself. On skb_unlink (i.e. when dequeueing the skb)
> skb->prev will become NULL, but still having list->next pointing to the
> unlinked skb. Because of this skb_peek will still return an skb, which
> will redo the skb_unlink trying to set (skb->prev)->next where skb->prev
> is now NULL, thus leading to the crash (trace below).
> 

>From your description this doesn't sound Xen specific. Sounds like
pktgen breaks in any driver that has an internal queue, which is plenty.

> I'm not sure what the best way to fix this but since it's only happening
> when we use pktgen with burst>1: I chose doing an skb_clone when we don't
> use persistent grants and skb->xmit_more flag is set, and when
> CONFIG_NET_PKTGEN is compiled builtin.
> 

I don't think we should do this.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-12 17:18 ` Joao Martins
                   ` (16 preceding siblings ...)
  (?)
@ 2015-05-19 15:39 ` Wei Liu
  2015-05-22 10:27   ` Joao Martins
  2015-05-22 10:27   ` Joao Martins
  -1 siblings, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:39 UTC (permalink / raw)
  To: Joao Martins
  Cc: xen-devel, netdev, wei.liu2, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:

> There have been recently[3] some discussions and issues raised on
> persistent grants for the block layer, though the numbers above
> show some significant improvements specially on more network intensive
> workloads and provide a margin for comparison against future map/unmap
> improvements.
> 
> Any comments or suggestions are welcome,
> Thanks!

Thanks, the numbers certainly look interesting.

I'm just a bit concerned about the complexity of netback. I've
commented on individual patches, we can discuss the issues there.

Wei.

> Joao
> 
> [1] http://article.gmane.org/gmane.linux.network/249383
> [2] http://bit.ly/1IhJfXD
> [3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
> 
> Joao Martins (13):
>   xen-netback: add persistent grant tree ops
>   xen-netback: xenbus feature persistent support
>   xen-netback: implement TX persistent grants
>   xen-netback: implement RX persistent grants
>   xen-netback: refactor xenvif_rx_action
>   xen-netback: copy buffer on xenvif_start_xmit()
>   xen-netback: add persistent tree counters to debugfs
>   xen-netback: clone skb if skb->xmit_more is set
>   xen-netfront: move grant_{ref,page} to struct grant
>   xen-netfront: refactor claim/release grant
>   xen-netfront: feature-persistent xenbus support
>   xen-netfront: implement TX persistent grants
>   xen-netfront: implement RX persistent grants
> 
>  drivers/net/xen-netback/common.h    |  79 ++++
>  drivers/net/xen-netback/interface.c |  78 +++-
>  drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>  drivers/net/xen-netback/xenbus.c    |  24 +
>  drivers/net/xen-netfront.c          | 362 ++++++++++++---
>  5 files changed, 1216 insertions(+), 200 deletions(-)
> 
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-12 17:18 ` Joao Martins
                   ` (15 preceding siblings ...)
  (?)
@ 2015-05-19 15:39 ` Wei Liu
  -1 siblings, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-05-19 15:39 UTC (permalink / raw)
  To: Joao Martins
  Cc: wei.liu2, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:

> There have been recently[3] some discussions and issues raised on
> persistent grants for the block layer, though the numbers above
> show some significant improvements specially on more network intensive
> workloads and provide a margin for comparison against future map/unmap
> improvements.
> 
> Any comments or suggestions are welcome,
> Thanks!

Thanks, the numbers certainly look interesting.

I'm just a bit concerned about the complexity of netback. I've
commented on individual patches, we can discuss the issues there.

Wei.

> Joao
> 
> [1] http://article.gmane.org/gmane.linux.network/249383
> [2] http://bit.ly/1IhJfXD
> [3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
> 
> Joao Martins (13):
>   xen-netback: add persistent grant tree ops
>   xen-netback: xenbus feature persistent support
>   xen-netback: implement TX persistent grants
>   xen-netback: implement RX persistent grants
>   xen-netback: refactor xenvif_rx_action
>   xen-netback: copy buffer on xenvif_start_xmit()
>   xen-netback: add persistent tree counters to debugfs
>   xen-netback: clone skb if skb->xmit_more is set
>   xen-netfront: move grant_{ref,page} to struct grant
>   xen-netfront: refactor claim/release grant
>   xen-netfront: feature-persistent xenbus support
>   xen-netfront: implement TX persistent grants
>   xen-netfront: implement RX persistent grants
> 
>  drivers/net/xen-netback/common.h    |  79 ++++
>  drivers/net/xen-netback/interface.c |  78 +++-
>  drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>  drivers/net/xen-netback/xenbus.c    |  24 +
>  drivers/net/xen-netfront.c          | 362 ++++++++++++---
>  5 files changed, 1216 insertions(+), 200 deletions(-)
> 
> -- 
> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 02/13] xen-netback: xenbus feature persistent support
  2015-05-19 15:19   ` Wei Liu
  2015-05-22 10:24     ` Joao Martins
@ 2015-05-22 10:24     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 19 May 2015, at 17:19, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:26PM +0200, Joao Martins wrote:
>> Checks for "feature-persistent" that indicates persistent grants
>> support. Adds max_persistent_grants module param that specifies the max
>> number of persistent grants, which if set to zero disables persistent
>> grants.
>> 
>> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> 
> This patch needs to be moved later. The feature needs to be implemented
> first.
> 
> Also you need to patch netif.h to document this new feature. To do this,
> you need to patch the master netif.h in Xen tree then sync the change to
> Linux. Of course I don't mind if we discuss wording in the Linux
> copy then you devise a patch for Xen.

Ok, I will do that. I made the same mistake on xen-netfront.

Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 02/13] xen-netback: xenbus feature persistent support
  2015-05-19 15:19   ` Wei Liu
@ 2015-05-22 10:24     ` Joao Martins
  2015-05-22 10:24     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:24 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 19 May 2015, at 17:19, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:26PM +0200, Joao Martins wrote:
>> Checks for "feature-persistent" that indicates persistent grants
>> support. Adds max_persistent_grants module param that specifies the max
>> number of persistent grants, which if set to zero disables persistent
>> grants.
>> 
>> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
> 
> This patch needs to be moved later. The feature needs to be implemented
> first.
> 
> Also you need to patch netif.h to document this new feature. To do this,
> you need to patch the master netif.h in Xen tree then sync the change to
> Linux. Of course I don't mind if we discuss wording in the Linux
> copy then you devise a patch for Xen.

Ok, I will do that. I made the same mistake on xen-netfront.

Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-19 15:23   ` Wei Liu
  2015-05-22 10:24     ` Joao Martins
@ 2015-05-22 10:24     ` Joao Martins
  2015-06-02 14:53       ` Wei Liu
  2015-06-02 14:53       ` Wei Liu
  1 sibling, 2 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 19 May 2015, at 17:23, Wei Liu <wei.liu2@citrix.com> wrote:
> On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
>> Introduces persistent grants for TX path which follows similar code path
>> as the grant mapping.
>> 
>> It starts by checking if there's a persistent grant available for header
>> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
>> is found in the tree for the header it will resort to grant copy (but
>> preparing the map ops and add them laster). For the frags it will use the
>                                     ^
>                                     later
> 
>> tree page pool, and in case of no pages it fallbacks to grant map/unmap
>> using mmap_pages. When skb destructor callback gets called we release the
>> slot and persistent grant within the callback to avoid waking up the
>> dealloc thread. As long as there are no unmaps to done the dealloc thread
>> will remain inactive.
>> 
> 
> This scheme looks complicated. Can we just only use one
> scheme at a time? What's the rationale for using this combined scheme?
> Maybe you're thinking about using a max_grants < ring_size to save
> memory?

Yes, my purpose was to allow a max_grants < ring_size to save amount of
memory mapped. I did a bulk transfer test with iperf and the max amount of
grants in tree was <160 TX gnts, without affecting the max performance;
tough using pktgen fills the tree completely.
The second reason is to handle the case for a (malicious?) frontend providing
more grefs than the max allowed in which I would fallback to grant map/unmap.

> 
> Only skim the patch. I will do detailed reviews after we're sure this is
> the right way to go.
> 
>> Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
>> measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
>> with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
>> Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.
>> […]
>> int xenvif_init_queue(struct xenvif_queue *queue)
>> {
>> 	int err, i;
>> @@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>> 			  .ctx = NULL,
>> 			  .desc = i };
>> 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
>> +		queue->tx_pgrants[i] = NULL;
>> +	}
>> +
>> +	if (queue->vif->persistent_grants) {
>> +		err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
>> +					       queue->tx_gnts_pages,
>> +					       XEN_NETIF_TX_RING_SIZE);
>> +		if (err)
>> +			goto err_disable;
>> 	}
>> 
>> 	return 0;
>> +
>> +err_disable:
>> +	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
>> +	queue->vif->persistent_grants = 0;
> 
> You can just move the above two lines under `if (err)'.
> 
> Also see below.
In the next patch this is also common cleanup path. I did it this way to 
avoid moving this hunk around.


>> +	return 0;
>> }
>> 
>> void xenvif_carrier_on(struct xenvif *vif)
>> @@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
>> 		}
>> 
>> 		xenvif_unmap_frontend_rings(queue);
>> +
>> +		if (queue->vif->persistent_grants)
>> +			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
>> +						   queue->tx_gnts_pages);
> 
> If the init function fails on queue N (N>0) you now leak resources.
Correct. I should return -ENOMEM (on init) and not 0.


>> 	}
>> }
>> 
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 332e489..529d7c3 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
>> 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
>> }
>> 
>> +static inline void *page_to_kaddr(struct page *page)
>> +{
>> +	return pfn_to_kaddr(page_to_pfn(page));
>> +}
>> +
>> #define callback_param(vif, pending_idx) \
>> 	(vif->pending_tx_info[pending_idx].callback_struct)
>> 
>> @@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
>> 	return i & (MAX_PENDING_REQS-1);
>> }
>> 
>> +/*  Creates a new persistent grant and add it to the tree.
>> + */
>> +static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
>> +						struct gnttab_map_grant_ref *gop)
>> +{
>> +	struct persistent_gnt *persistent_gnt;
>> +
>> +	persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);
> 
> xenvif_pgrant_new can be called in NAPI which runs in softirq context
> which doesn't allow you to sleep.

Silly mistake, I will fix this. The whole point of the tree page pool was that we
aren't allowed to sleep both here and in ndo_start_xmit (in patch #6).

>> […]
>> 
>> +/* Checks if there's a persistent grant available for gref and
>> + * if so, set it also in the tx_pgrants array that keeps the ones
>> + * in use.
>> + */
>> +static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
>> +				       grant_ref_t ref, u16 pending_idx,
>> +				       bool *can_map)
>> +{
>> +	struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
>> +	struct persistent_gnt *persistent_gnt;
>> +	bool busy;
>> +
>> +	if (!queue->vif->persistent_grants)
>> +		return false;
>> +
>> +	persistent_gnt = get_persistent_gnt(tree, ref);
>> +
>> +	/* If gref is already in use we fallback, since it would
>> +	 * otherwise mean re-adding the same gref to the tree
>> +	 */
>> +	busy = IS_ERR(persistent_gnt);
>> +	if (unlikely(busy))
>> +		persistent_gnt = NULL;
>> +
> 
> Under what circumstance can we retrieve a already in use persistent
> grant? You seem to suggest this is a bug in RX case.

A guest could share try to share the same mapped page in multiple frags,
in which case I fallback to map/unmap. I think this is a limitation in
the way we manage the persistent gnts where we can only have a single
reference of a persistent grant inflight.

>> […]
>> 			MAX_PENDING_REQS);
>> 		index = pending_index(queue->dealloc_prod);
>> @@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>> 		smp_wmb();
>> 		queue->dealloc_prod++;
>> 	} while (ubuf);
>> -	wake_up(&queue->dealloc_wq);
>> +	/* Wake up only when there are grants to unmap */
>> +	if (dealloc_prod_save != queue->dealloc_prod)
>> +		wake_up(&queue->dealloc_wq);
>> +
>> 	spin_unlock_irqrestore(&queue->callback_lock, flags);
>> 
>> 	if (likely(zerocopy_success))
>> @@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
>> 
>> 	xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
>> 
>> -	if (nr_cops == 0)
>> +	if (!queue->vif->persistent_grants &&
>> +	    nr_cops == 0)
> 
> You can just move nr_cops to previous line.
> 
> Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-19 15:23   ` Wei Liu
@ 2015-05-22 10:24     ` Joao Martins
  2015-05-22 10:24     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:24 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 19 May 2015, at 17:23, Wei Liu <wei.liu2@citrix.com> wrote:
> On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
>> Introduces persistent grants for TX path which follows similar code path
>> as the grant mapping.
>> 
>> It starts by checking if there's a persistent grant available for header
>> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
>> is found in the tree for the header it will resort to grant copy (but
>> preparing the map ops and add them laster). For the frags it will use the
>                                     ^
>                                     later
> 
>> tree page pool, and in case of no pages it fallbacks to grant map/unmap
>> using mmap_pages. When skb destructor callback gets called we release the
>> slot and persistent grant within the callback to avoid waking up the
>> dealloc thread. As long as there are no unmaps to done the dealloc thread
>> will remain inactive.
>> 
> 
> This scheme looks complicated. Can we just only use one
> scheme at a time? What's the rationale for using this combined scheme?
> Maybe you're thinking about using a max_grants < ring_size to save
> memory?

Yes, my purpose was to allow a max_grants < ring_size to save amount of
memory mapped. I did a bulk transfer test with iperf and the max amount of
grants in tree was <160 TX gnts, without affecting the max performance;
tough using pktgen fills the tree completely.
The second reason is to handle the case for a (malicious?) frontend providing
more grefs than the max allowed in which I would fallback to grant map/unmap.

> 
> Only skim the patch. I will do detailed reviews after we're sure this is
> the right way to go.
> 
>> Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
>> measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
>> with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
>> Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.
>> […]
>> int xenvif_init_queue(struct xenvif_queue *queue)
>> {
>> 	int err, i;
>> @@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>> 			  .ctx = NULL,
>> 			  .desc = i };
>> 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
>> +		queue->tx_pgrants[i] = NULL;
>> +	}
>> +
>> +	if (queue->vif->persistent_grants) {
>> +		err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
>> +					       queue->tx_gnts_pages,
>> +					       XEN_NETIF_TX_RING_SIZE);
>> +		if (err)
>> +			goto err_disable;
>> 	}
>> 
>> 	return 0;
>> +
>> +err_disable:
>> +	netdev_err(queue->vif->dev, "Could not reserve tree pages.");
>> +	queue->vif->persistent_grants = 0;
> 
> You can just move the above two lines under `if (err)'.
> 
> Also see below.
In the next patch this is also common cleanup path. I did it this way to 
avoid moving this hunk around.


>> +	return 0;
>> }
>> 
>> void xenvif_carrier_on(struct xenvif *vif)
>> @@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
>> 		}
>> 
>> 		xenvif_unmap_frontend_rings(queue);
>> +
>> +		if (queue->vif->persistent_grants)
>> +			deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
>> +						   queue->tx_gnts_pages);
> 
> If the init function fails on queue N (N>0) you now leak resources.
Correct. I should return -ENOMEM (on init) and not 0.


>> 	}
>> }
>> 
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 332e489..529d7c3 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
>> 	return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
>> }
>> 
>> +static inline void *page_to_kaddr(struct page *page)
>> +{
>> +	return pfn_to_kaddr(page_to_pfn(page));
>> +}
>> +
>> #define callback_param(vif, pending_idx) \
>> 	(vif->pending_tx_info[pending_idx].callback_struct)
>> 
>> @@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
>> 	return i & (MAX_PENDING_REQS-1);
>> }
>> 
>> +/*  Creates a new persistent grant and add it to the tree.
>> + */
>> +static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
>> +						struct gnttab_map_grant_ref *gop)
>> +{
>> +	struct persistent_gnt *persistent_gnt;
>> +
>> +	persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);
> 
> xenvif_pgrant_new can be called in NAPI which runs in softirq context
> which doesn't allow you to sleep.

Silly mistake, I will fix this. The whole point of the tree page pool was that we
aren't allowed to sleep both here and in ndo_start_xmit (in patch #6).

>> […]
>> 
>> +/* Checks if there's a persistent grant available for gref and
>> + * if so, set it also in the tx_pgrants array that keeps the ones
>> + * in use.
>> + */
>> +static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
>> +				       grant_ref_t ref, u16 pending_idx,
>> +				       bool *can_map)
>> +{
>> +	struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
>> +	struct persistent_gnt *persistent_gnt;
>> +	bool busy;
>> +
>> +	if (!queue->vif->persistent_grants)
>> +		return false;
>> +
>> +	persistent_gnt = get_persistent_gnt(tree, ref);
>> +
>> +	/* If gref is already in use we fallback, since it would
>> +	 * otherwise mean re-adding the same gref to the tree
>> +	 */
>> +	busy = IS_ERR(persistent_gnt);
>> +	if (unlikely(busy))
>> +		persistent_gnt = NULL;
>> +
> 
> Under what circumstance can we retrieve a already in use persistent
> grant? You seem to suggest this is a bug in RX case.

A guest could share try to share the same mapped page in multiple frags,
in which case I fallback to map/unmap. I think this is a limitation in
the way we manage the persistent gnts where we can only have a single
reference of a persistent grant inflight.

>> […]
>> 			MAX_PENDING_REQS);
>> 		index = pending_index(queue->dealloc_prod);
>> @@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>> 		smp_wmb();
>> 		queue->dealloc_prod++;
>> 	} while (ubuf);
>> -	wake_up(&queue->dealloc_wq);
>> +	/* Wake up only when there are grants to unmap */
>> +	if (dealloc_prod_save != queue->dealloc_prod)
>> +		wake_up(&queue->dealloc_wq);
>> +
>> 	spin_unlock_irqrestore(&queue->callback_lock, flags);
>> 
>> 	if (likely(zerocopy_success))
>> @@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
>> 
>> 	xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
>> 
>> -	if (nr_cops == 0)
>> +	if (!queue->vif->persistent_grants &&
>> +	    nr_cops == 0)
> 
> You can just move nr_cops to previous line.
> 
> Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-19 15:32   ` Wei Liu
  2015-05-22 10:25     ` Joao Martins
@ 2015-05-22 10:25     ` Joao Martins
  2015-06-02 15:07       ` Wei Liu
  2015-06-02 15:07       ` Wei Liu
  1 sibling, 2 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:25 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 19 May 2015, at 17:32, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
>> It starts by doing a lookup in the tree for a gref. If no persistent
>> grant is found on the tree, it will do grant copy and prepare
>> the grant maps. Finally valides the grant map and adds it to the tree.
> 
> validates?
> 
>> After mapped these grants can be pulled from the tree in the subsequent
>> requests. If it's out of pages in the tree pool, it will fallback to
>> grant copy.
>> 
> 
> Again, this looks complicated. Why use combined scheme? I will do
> detailed reviews after we're sure we need such scheme.
When we don't have the gref in tree we need to map it and then copying
afterwards into the newly mapped page (and this only happens once until
the grant is in tree). My options here were to either do this explicitly,
after we add the persistent grant in which we would need to save to
dst/src address and len to copy. The other option is to reuse the grant
copy (since it's only once until the grant is in the tree) and use memcpy
in followings requests. Additionally I allow the fallback to grant copy in
case the guest provides providing more grefs > max_grants.

Note that this is also the case for TX as well, with regard to grant
copying the header. I was unsure about which one is the most correct way
of doing it, but ultimately the latter involved a smaller codepath, and
that's why I chose it. What do you think?


>> It adds four new fields in the netrx_pending_operations: copy_done
>> to track how many copies were made; map_prod and map_cons to track
>> how many maps are outstanding validation and finally copy_page for
>> the correspondent page (in tree) for copy_gref.
>> 
>> Results are 1.04 Mpps measured with pktgen (pkt_size 64, burst 1)
>> with persistent grants versus 1.23 Mpps with grant copy (20%
>> regression). With persistent grants it adds up contention on
>> queue->wq as the kthread_guest_rx goes to sleep more often. If we
>> speed up the sender (burst 2,4 and 8) it goes up to 1.7 Mpps with
>> persistent grants. This issue is addressed in later a commit, by
>> copying the skb on xenvif_start_xmit() instead of going through
>> the RX kthread.
>> 
>> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
>> ---
>> drivers/net/xen-netback/common.h    |   7 ++
>> drivers/net/xen-netback/interface.c |  14 ++-
>> drivers/net/xen-netback/netback.c   | 190 ++++++++++++++++++++++++++++++------
>> 3 files changed, 178 insertions(+), 33 deletions(-)
>> 
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index e5ee220..23deb6a 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -235,6 +235,13 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>> 
>> 	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>> 
>> +	/* To map the grefs to be added to the tree */
>> +	struct gnttab_map_grant_ref rx_map_ops[XEN_NETIF_RX_RING_SIZE];
>> +	struct page *rx_pages_to_map[XEN_NETIF_RX_RING_SIZE];
>> +	/* Only used if feature-persistent = 1 */
> 
> This comment applies to rx_map_ops and rx_pages_to_map as well. Could
> you move it up?
Ok.

>> +	struct persistent_gnt_tree rx_gnts_tree;
>> +	struct page *rx_gnts_pages[XEN_NETIF_RX_RING_SIZE];
>> +
>> 	/* We create one meta structure per ring request we consume, so
>> 	 * the maximum number is the same as the ring size.
>> 	 */
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> 
> [...]
> 
>> 
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 529d7c3..738b6ee 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -413,14 +413,62 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
>> }
>> 
>> struct netrx_pending_operations {
>> +	unsigned map_prod, map_cons;
>> 	unsigned copy_prod, copy_cons;
>> 	unsigned meta_prod, meta_cons;
>> 	struct gnttab_copy *copy;
>> 	struct xenvif_rx_meta *meta;
>> 	int copy_off;
>> 	grant_ref_t copy_gref;
>> +	struct page *copy_page;
>> +	unsigned copy_done;
>> };
>> 
>> +static void xenvif_create_rx_map_op(struct xenvif_queue *queue,
>> +				    struct gnttab_map_grant_ref *mop,
>> +				    grant_ref_t ref,
>> +				    struct page *page)
> 
> Rename it to xenvif_rx_create_map_op to be consistent with
> xenvif_tx_create_map_op?
> 
>> +{
>> +	queue->rx_pages_to_map[mop - queue->rx_map_ops] = page;
>> +	gnttab_set_map_op(mop,
>> +			  (unsigned long)page_to_kaddr(page),
>> +			  GNTMAP_host_map,
>> +			  ref, queue->vif->domid);
>> +}
>> +
> 
> [...]
> 
>> +
>> +		persistent_gnt = xenvif_pgrant_new(tree, gop_map);
>> +		if (unlikely(!persistent_gnt)) {
>> +			netdev_err(queue->vif->dev,
>> +				   "Couldn't add gref to the tree! ref: %d",
>> +				   gop_map->ref);
>> +			xenvif_page_unmap(queue, gop_map->handle, &page);
>> +			put_free_pages(tree, &page, 1);
>> +			kfree(persistent_gnt);
>> +			persistent_gnt = NULL;
> 
> persistent_gnt is already NULL.
> 
> So the kfree and = NULL is pointless.
Indeed, I will also retest this error path. This was a remnant of a refactoring I did in
the RX/TX shared paths.

>> +			continue;
>> +		}
>> +
>> +		put_persistent_gnt(tree, persistent_gnt);
>> +	}
>> +}
>> +
>> +/*
>>  * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
>>  * used to set up the operations on the top of
>>  * netrx_pending_operations, which have since been done.  Check that
>>  * they didn't give any errors and advance over them.
>>  */
>> -static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
>> +static int xenvif_check_gop(struct xenvif_queue *queue, int nr_meta_slots,
>> 			    struct netrx_pending_operations *npo)
>> {
>> 	struct gnttab_copy     *copy_op;
>> 	int status = XEN_NETIF_RSP_OKAY;
>> 	int i;
>> 
>> +	nr_meta_slots -= npo->copy_done;
>> +	if (npo->map_prod)
> 
> Should be "if (npo->map_prod != npo->map_cons)”?
Correct. Using just npo->map_prod is buggy.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-19 15:32   ` Wei Liu
@ 2015-05-22 10:25     ` Joao Martins
  2015-05-22 10:25     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:25 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 19 May 2015, at 17:32, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
>> It starts by doing a lookup in the tree for a gref. If no persistent
>> grant is found on the tree, it will do grant copy and prepare
>> the grant maps. Finally valides the grant map and adds it to the tree.
> 
> validates?
> 
>> After mapped these grants can be pulled from the tree in the subsequent
>> requests. If it's out of pages in the tree pool, it will fallback to
>> grant copy.
>> 
> 
> Again, this looks complicated. Why use combined scheme? I will do
> detailed reviews after we're sure we need such scheme.
When we don't have the gref in tree we need to map it and then copying
afterwards into the newly mapped page (and this only happens once until
the grant is in tree). My options here were to either do this explicitly,
after we add the persistent grant in which we would need to save to
dst/src address and len to copy. The other option is to reuse the grant
copy (since it's only once until the grant is in the tree) and use memcpy
in followings requests. Additionally I allow the fallback to grant copy in
case the guest provides providing more grefs > max_grants.

Note that this is also the case for TX as well, with regard to grant
copying the header. I was unsure about which one is the most correct way
of doing it, but ultimately the latter involved a smaller codepath, and
that's why I chose it. What do you think?


>> It adds four new fields in the netrx_pending_operations: copy_done
>> to track how many copies were made; map_prod and map_cons to track
>> how many maps are outstanding validation and finally copy_page for
>> the correspondent page (in tree) for copy_gref.
>> 
>> Results are 1.04 Mpps measured with pktgen (pkt_size 64, burst 1)
>> with persistent grants versus 1.23 Mpps with grant copy (20%
>> regression). With persistent grants it adds up contention on
>> queue->wq as the kthread_guest_rx goes to sleep more often. If we
>> speed up the sender (burst 2,4 and 8) it goes up to 1.7 Mpps with
>> persistent grants. This issue is addressed in later a commit, by
>> copying the skb on xenvif_start_xmit() instead of going through
>> the RX kthread.
>> 
>> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
>> ---
>> drivers/net/xen-netback/common.h    |   7 ++
>> drivers/net/xen-netback/interface.c |  14 ++-
>> drivers/net/xen-netback/netback.c   | 190 ++++++++++++++++++++++++++++++------
>> 3 files changed, 178 insertions(+), 33 deletions(-)
>> 
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index e5ee220..23deb6a 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -235,6 +235,13 @@ struct xenvif_queue { /* Per-queue data for xenvif */
>> 
>> 	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>> 
>> +	/* To map the grefs to be added to the tree */
>> +	struct gnttab_map_grant_ref rx_map_ops[XEN_NETIF_RX_RING_SIZE];
>> +	struct page *rx_pages_to_map[XEN_NETIF_RX_RING_SIZE];
>> +	/* Only used if feature-persistent = 1 */
> 
> This comment applies to rx_map_ops and rx_pages_to_map as well. Could
> you move it up?
Ok.

>> +	struct persistent_gnt_tree rx_gnts_tree;
>> +	struct page *rx_gnts_pages[XEN_NETIF_RX_RING_SIZE];
>> +
>> 	/* We create one meta structure per ring request we consume, so
>> 	 * the maximum number is the same as the ring size.
>> 	 */
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
> 
> [...]
> 
>> 
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 529d7c3..738b6ee 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -413,14 +413,62 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
>> }
>> 
>> struct netrx_pending_operations {
>> +	unsigned map_prod, map_cons;
>> 	unsigned copy_prod, copy_cons;
>> 	unsigned meta_prod, meta_cons;
>> 	struct gnttab_copy *copy;
>> 	struct xenvif_rx_meta *meta;
>> 	int copy_off;
>> 	grant_ref_t copy_gref;
>> +	struct page *copy_page;
>> +	unsigned copy_done;
>> };
>> 
>> +static void xenvif_create_rx_map_op(struct xenvif_queue *queue,
>> +				    struct gnttab_map_grant_ref *mop,
>> +				    grant_ref_t ref,
>> +				    struct page *page)
> 
> Rename it to xenvif_rx_create_map_op to be consistent with
> xenvif_tx_create_map_op?
> 
>> +{
>> +	queue->rx_pages_to_map[mop - queue->rx_map_ops] = page;
>> +	gnttab_set_map_op(mop,
>> +			  (unsigned long)page_to_kaddr(page),
>> +			  GNTMAP_host_map,
>> +			  ref, queue->vif->domid);
>> +}
>> +
> 
> [...]
> 
>> +
>> +		persistent_gnt = xenvif_pgrant_new(tree, gop_map);
>> +		if (unlikely(!persistent_gnt)) {
>> +			netdev_err(queue->vif->dev,
>> +				   "Couldn't add gref to the tree! ref: %d",
>> +				   gop_map->ref);
>> +			xenvif_page_unmap(queue, gop_map->handle, &page);
>> +			put_free_pages(tree, &page, 1);
>> +			kfree(persistent_gnt);
>> +			persistent_gnt = NULL;
> 
> persistent_gnt is already NULL.
> 
> So the kfree and = NULL is pointless.
Indeed, I will also retest this error path. This was a remnant of a refactoring I did in
the RX/TX shared paths.

>> +			continue;
>> +		}
>> +
>> +		put_persistent_gnt(tree, persistent_gnt);
>> +	}
>> +}
>> +
>> +/*
>>  * This is a twin to xenvif_gop_skb.  Assume that xenvif_gop_skb was
>>  * used to set up the operations on the top of
>>  * netrx_pending_operations, which have since been done.  Check that
>>  * they didn't give any errors and advance over them.
>>  */
>> -static int xenvif_check_gop(struct xenvif *vif, int nr_meta_slots,
>> +static int xenvif_check_gop(struct xenvif_queue *queue, int nr_meta_slots,
>> 			    struct netrx_pending_operations *npo)
>> {
>> 	struct gnttab_copy     *copy_op;
>> 	int status = XEN_NETIF_RSP_OKAY;
>> 	int i;
>> 
>> +	nr_meta_slots -= npo->copy_done;
>> +	if (npo->map_prod)
> 
> Should be "if (npo->map_prod != npo->map_cons)”?
Correct. Using just npo->map_prod is buggy.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-19 15:35   ` Wei Liu
  2015-05-22 10:26     ` Joao Martins
@ 2015-05-22 10:26     ` Joao Martins
  2015-06-02 15:10       ` Wei Liu
  2015-06-02 15:10       ` Wei Liu
  1 sibling, 2 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:26 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 19 May 2015, at 17:35, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:30PM +0200, Joao Martins wrote:
>> By introducing persistent grants we speed up the RX thread with the
>> decreased copy cost, that leads to a throughput decrease of 20%.
>> It is observed that the rx_queue stays mostly at 10% of its capacity,
>> as opposed to full capacity when using grant copy. And a finer measure
>> with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
>> queue contention on queue->wq, which hints that the RX kthread is
>> waits/wake_up more often, that is actually doing work.
>> 
>> Without persistent grants:
>> 
>> class name    con-bounces    contentions   waittime-min   waittime-max
>> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min
>> holdtime-max holdtime-total   holdtime-avg
>> --------------------------------------------------------------------------
>> &queue->wq:   792            792           0.36          24.36
>> 1140.30           1.44           4208        1002671           0.00
>> 46.75      538164.02           0.54
>> ----------
>> &queue->wq    326          [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq    410          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> &queue->wq     56          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> ----------
>> &queue->wq    202          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> &queue->wq    467          [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq    123          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> 
>> With persistent grants:
>> 
>> &queue->wq:   61834          61836           0.32          30.12
>> 99710.27           1.61         241400        1125308           0.00
>> 75.61     1106578.82           0.98
>> ----------
>> &queue->wq     5079        [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq    56280        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> &queue->wq      479        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> ----------
>> &queue->wq     1005        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> &queue->wq    56761        [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq     4072        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> 
>> Also, with persistent grants, we don't require batching grant copy ops
>> (besides the initial copy+map) which makes me believe that deferring
>> the skb to the RX kthread just adds up unnecessary overhead (for this
>> particular case). This patch proposes copying the buffer on
>> xenvif_start_xmit(), which lets us both remove the contention on
>> queue->wq and lock on rx_queue. Here, an alternative to
>> xenvif_rx_action routine is added namely xenvif_rx_map() that maps
>> and copies the buffer to the guest. This is only used when persistent
>> grants are used, since it would otherwise mean an hypercall per
>> packet.
>> 
>> Improvements are up to a factor of 2.14 with a single queue getting us
>> from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
>> (burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
>> copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
>> an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.
>> 
>> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
>> ---
>> drivers/net/xen-netback/common.h    |  2 ++
>> drivers/net/xen-netback/interface.c | 11 +++++---
>> drivers/net/xen-netback/netback.c   | 52 +++++++++++++++++++++++++++++--------
>> 3 files changed, 51 insertions(+), 14 deletions(-)
>> 
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index 23deb6a..f3ece12 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
>> 
>> int xenvif_dealloc_kthread(void *data);
>> 
>> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
>> +
>> void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
>> 
>> /* Determine whether the needed number of slots (req) are available,
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
>> index 1103568..dfe2b7b 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
>> {
>> 	struct xenvif_queue *queue = dev_id;
>> 
>> -	xenvif_kick_thread(queue);
>> +	if (!queue->vif->persistent_grants)
>> +		xenvif_kick_thread(queue);
>> 
>> 	return IRQ_HANDLED;
>> }
>> @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> 	cb = XENVIF_RX_CB(skb);
>> 	cb->expires = jiffies + vif->drain_timeout;
>> 
>> -	xenvif_rx_queue_tail(queue, skb);
>> -	xenvif_kick_thread(queue);
>> +	if (!queue->vif->persistent_grants) {
>> +		xenvif_rx_queue_tail(queue, skb);
>> +		xenvif_kick_thread(queue);
>> +	} else if (xenvif_rx_map(queue, skb)) {
>> +		return NETDEV_TX_BUSY;
>> +	}
>> 
> 
> We now have two different functions for guest RX, one is xenvif_rx_map,
> the other is xenvif_rx_action. They look very similar. Can we only have
> one?
I think I can merge this into xenvif_rx_action, and I notice that the stall
detection its missing. I will also add that.
Perhaps I could also disable the RX kthread, since this doesn't get used with
persistent grants?

>> 	return NETDEV_TX_OK;
>> 
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index c4f57d7..228df92 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,
>> 	return !!ret;
>> }
>> 
>> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb)
>> +{
>> +	int ret = -EBUSY;
>> +	struct netrx_pending_operations npo = {
>> +		.copy  = queue->grant_copy_op,
>> +		.meta  = queue->meta
>> +	};
>> +
>> +	if (!xenvif_rx_ring_slots_available(queue, XEN_NETBK_LEGACY_SLOTS_MAX))
> 
> I think you meant XEN_NETBK_RX_SLOTS_MAX?
Yes.

>> +		goto done;
>> +
>> +	xenvif_rx_build_gops(queue, &npo, skb);
>> +
>> +	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
>> +	if (!npo.copy_done && !npo.copy_prod)
>> +		goto done;
>> +
>> +	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
>> +	if (npo.map_prod) {
>> +		ret = gnttab_map_refs(queue->rx_map_ops,
>> +				      NULL,
>> +				      queue->rx_pages_to_map,
>> +				      npo.map_prod);
>> +		BUG_ON(ret);
>> +	}
>> +
>> +	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
>> +	if (npo.copy_prod)
>> +		gnttab_batch_copy(npo.copy, npo.copy_prod);
>> +
>> +	if (xenvif_rx_submit(queue, &npo, skb))
>> +		notify_remote_via_irq(queue->rx_irq);
>> +
>> +	ret = 0; /* clear error */
> 
> No need to have that comment.
> 
>> +done:
>> +	if (xenvif_queue_stopped(queue))
>> +		xenvif_wake_queue(queue);
>> +	return ret;
>> +}
>> +
>> static void xenvif_rx_action(struct xenvif_queue *queue)
>> {
>> -	int ret;
>> 	struct sk_buff *skb;
>> 	struct sk_buff_head rxq;
>> 	bool need_to_notify = false;
>> @@ -905,22 +944,13 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>> 	}
>> 
>> 	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
>> -	if (!npo.copy_done && !npo.copy_prod)
>> +	if (!npo.copy_prod)
> 
> You modified this line back and forth. You could just avoid modifying it
> in you previous patch to implement RX persistent grants.
> 
>> 		return;
>> 
>> 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
>> 	if (npo.copy_prod)
>> 		gnttab_batch_copy(npo.copy, npo.copy_prod);
>> 
>> -	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
>> -	if (npo.map_prod) {
>> -		ret = gnttab_map_refs(queue->rx_map_ops,
>> -				      NULL,
>> -				      queue->rx_pages_to_map,
>> -				      npo.map_prod);
>> -		BUG_ON(ret);
>> -	}
>> -
> 
> And this? You delete the hunk you added in previous patch.
Having only one routine instead of adding xenvif_rx_map like you previously
suggested, solves moving this hunks around.

> 
>> 	while ((skb = __skb_dequeue(&rxq)) != NULL)
>> 		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
>> 
>> -- 
>> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-19 15:35   ` Wei Liu
@ 2015-05-22 10:26     ` Joao Martins
  2015-05-22 10:26     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:26 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 19 May 2015, at 17:35, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:30PM +0200, Joao Martins wrote:
>> By introducing persistent grants we speed up the RX thread with the
>> decreased copy cost, that leads to a throughput decrease of 20%.
>> It is observed that the rx_queue stays mostly at 10% of its capacity,
>> as opposed to full capacity when using grant copy. And a finer measure
>> with lock_stat (below with pkt_size 64, burst 1) shows much higher wait
>> queue contention on queue->wq, which hints that the RX kthread is
>> waits/wake_up more often, that is actually doing work.
>> 
>> Without persistent grants:
>> 
>> class name    con-bounces    contentions   waittime-min   waittime-max
>> waittime-total   waittime-avg    acq-bounces   acquisitions   holdtime-min
>> holdtime-max holdtime-total   holdtime-avg
>> --------------------------------------------------------------------------
>> &queue->wq:   792            792           0.36          24.36
>> 1140.30           1.44           4208        1002671           0.00
>> 46.75      538164.02           0.54
>> ----------
>> &queue->wq    326          [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq    410          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> &queue->wq     56          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> ----------
>> &queue->wq    202          [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> &queue->wq    467          [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq    123          [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> 
>> With persistent grants:
>> 
>> &queue->wq:   61834          61836           0.32          30.12
>> 99710.27           1.61         241400        1125308           0.00
>> 75.61     1106578.82           0.98
>> ----------
>> &queue->wq     5079        [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq    56280        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> &queue->wq      479        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> ----------
>> &queue->wq     1005        [<ffffffff811592bf>] finish_wait+0x4f/0xa0
>> &queue->wq    56761        [<ffffffff8115949f>] __wake_up+0x2f/0x80
>> &queue->wq     4072        [<ffffffff811593eb>] prepare_to_wait+0x2b/0xb0
>> 
>> Also, with persistent grants, we don't require batching grant copy ops
>> (besides the initial copy+map) which makes me believe that deferring
>> the skb to the RX kthread just adds up unnecessary overhead (for this
>> particular case). This patch proposes copying the buffer on
>> xenvif_start_xmit(), which lets us both remove the contention on
>> queue->wq and lock on rx_queue. Here, an alternative to
>> xenvif_rx_action routine is added namely xenvif_rx_map() that maps
>> and copies the buffer to the guest. This is only used when persistent
>> grants are used, since it would otherwise mean an hypercall per
>> packet.
>> 
>> Improvements are up to a factor of 2.14 with a single queue getting us
>> from 1.04 Mpps to 1.7 Mpps (burst 1, pkt_size 64) and 1.5 to 2.6 Mpps
>> (burst 2, pkt_size 64) compared to using the kthread. Maximum with grant
>> copy is 1.2 Mpps, irrespective of the burst. All of this, measured on
>> an Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz.
>> 
>> Signed-off-by: Joao Martins <joao.martins@neclab.eu>
>> ---
>> drivers/net/xen-netback/common.h    |  2 ++
>> drivers/net/xen-netback/interface.c | 11 +++++---
>> drivers/net/xen-netback/netback.c   | 52 +++++++++++++++++++++++++++++--------
>> 3 files changed, 51 insertions(+), 14 deletions(-)
>> 
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index 23deb6a..f3ece12 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -363,6 +363,8 @@ void xenvif_kick_thread(struct xenvif_queue *queue);
>> 
>> int xenvif_dealloc_kthread(void *data);
>> 
>> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb);
>> +
>> void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
>> 
>> /* Determine whether the needed number of slots (req) are available,
>> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
>> index 1103568..dfe2b7b 100644
>> --- a/drivers/net/xen-netback/interface.c
>> +++ b/drivers/net/xen-netback/interface.c
>> @@ -109,7 +109,8 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
>> {
>> 	struct xenvif_queue *queue = dev_id;
>> 
>> -	xenvif_kick_thread(queue);
>> +	if (!queue->vif->persistent_grants)
>> +		xenvif_kick_thread(queue);
>> 
>> 	return IRQ_HANDLED;
>> }
>> @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
>> 	cb = XENVIF_RX_CB(skb);
>> 	cb->expires = jiffies + vif->drain_timeout;
>> 
>> -	xenvif_rx_queue_tail(queue, skb);
>> -	xenvif_kick_thread(queue);
>> +	if (!queue->vif->persistent_grants) {
>> +		xenvif_rx_queue_tail(queue, skb);
>> +		xenvif_kick_thread(queue);
>> +	} else if (xenvif_rx_map(queue, skb)) {
>> +		return NETDEV_TX_BUSY;
>> +	}
>> 
> 
> We now have two different functions for guest RX, one is xenvif_rx_map,
> the other is xenvif_rx_action. They look very similar. Can we only have
> one?
I think I can merge this into xenvif_rx_action, and I notice that the stall
detection its missing. I will also add that.
Perhaps I could also disable the RX kthread, since this doesn't get used with
persistent grants?

>> 	return NETDEV_TX_OK;
>> 
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index c4f57d7..228df92 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -883,9 +883,48 @@ static bool xenvif_rx_submit(struct xenvif_queue *queue,
>> 	return !!ret;
>> }
>> 
>> +int xenvif_rx_map(struct xenvif_queue *queue, struct sk_buff *skb)
>> +{
>> +	int ret = -EBUSY;
>> +	struct netrx_pending_operations npo = {
>> +		.copy  = queue->grant_copy_op,
>> +		.meta  = queue->meta
>> +	};
>> +
>> +	if (!xenvif_rx_ring_slots_available(queue, XEN_NETBK_LEGACY_SLOTS_MAX))
> 
> I think you meant XEN_NETBK_RX_SLOTS_MAX?
Yes.

>> +		goto done;
>> +
>> +	xenvif_rx_build_gops(queue, &npo, skb);
>> +
>> +	BUG_ON(npo.meta_prod > ARRAY_SIZE(queue->meta));
>> +	if (!npo.copy_done && !npo.copy_prod)
>> +		goto done;
>> +
>> +	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
>> +	if (npo.map_prod) {
>> +		ret = gnttab_map_refs(queue->rx_map_ops,
>> +				      NULL,
>> +				      queue->rx_pages_to_map,
>> +				      npo.map_prod);
>> +		BUG_ON(ret);
>> +	}
>> +
>> +	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
>> +	if (npo.copy_prod)
>> +		gnttab_batch_copy(npo.copy, npo.copy_prod);
>> +
>> +	if (xenvif_rx_submit(queue, &npo, skb))
>> +		notify_remote_via_irq(queue->rx_irq);
>> +
>> +	ret = 0; /* clear error */
> 
> No need to have that comment.
> 
>> +done:
>> +	if (xenvif_queue_stopped(queue))
>> +		xenvif_wake_queue(queue);
>> +	return ret;
>> +}
>> +
>> static void xenvif_rx_action(struct xenvif_queue *queue)
>> {
>> -	int ret;
>> 	struct sk_buff *skb;
>> 	struct sk_buff_head rxq;
>> 	bool need_to_notify = false;
>> @@ -905,22 +944,13 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
>> 	}
>> 
>> 	BUG_ON(npo.meta_prod > XEN_NETIF_RX_RING_SIZE);
>> -	if (!npo.copy_done && !npo.copy_prod)
>> +	if (!npo.copy_prod)
> 
> You modified this line back and forth. You could just avoid modifying it
> in you previous patch to implement RX persistent grants.
> 
>> 		return;
>> 
>> 	BUG_ON(npo.copy_prod > MAX_GRANT_COPY_OPS);
>> 	if (npo.copy_prod)
>> 		gnttab_batch_copy(npo.copy, npo.copy_prod);
>> 
>> -	BUG_ON(npo.map_prod > MAX_GRANT_COPY_OPS);
>> -	if (npo.map_prod) {
>> -		ret = gnttab_map_refs(queue->rx_map_ops,
>> -				      NULL,
>> -				      queue->rx_pages_to_map,
>> -				      npo.map_prod);
>> -		BUG_ON(ret);
>> -	}
>> -
> 
> And this? You delete the hunk you added in previous patch.
Having only one routine instead of adding xenvif_rx_map like you previously
suggested, solves moving this hunks around.

> 
>> 	while ((skb = __skb_dequeue(&rxq)) != NULL)
>> 		need_to_notify |= xenvif_rx_submit(queue, &npo, skb);
>> 
>> -- 
>> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-19 15:39 ` Wei Liu
  2015-05-22 10:27   ` Joao Martins
@ 2015-05-22 10:27   ` Joao Martins
  2015-05-29  6:53     ` Yuzhou (C)
  2015-05-29  6:53     ` [Xen-devel] " Yuzhou (C)
  1 sibling, 2 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:27 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 19 May 2015, at 17:39, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:
> 
>> There have been recently[3] some discussions and issues raised on
>> persistent grants for the block layer, though the numbers above
>> show some significant improvements specially on more network intensive
>> workloads and provide a margin for comparison against future map/unmap
>> improvements.
>> 
>> Any comments or suggestions are welcome,
>> Thanks!
> 
> Thanks, the numbers certainly look interesting.
> 
> I'm just a bit concerned about the complexity of netback. I've
> commented on individual patches, we can discuss the issues there.

Thanks a lot for the review! It does add more complexity, mainly for
the TX path, but I also would like to mention that a portion of this
changeset is also the persistent grants ops that could potentially live
outside.

Joao

>> [1] http://article.gmane.org/gmane.linux.network/249383
>> [2] http://bit.ly/1IhJfXD
>> [3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
>> 
>> Joao Martins (13):
>>  xen-netback: add persistent grant tree ops
>>  xen-netback: xenbus feature persistent support
>>  xen-netback: implement TX persistent grants
>>  xen-netback: implement RX persistent grants
>>  xen-netback: refactor xenvif_rx_action
>>  xen-netback: copy buffer on xenvif_start_xmit()
>>  xen-netback: add persistent tree counters to debugfs
>>  xen-netback: clone skb if skb->xmit_more is set
>>  xen-netfront: move grant_{ref,page} to struct grant
>>  xen-netfront: refactor claim/release grant
>>  xen-netfront: feature-persistent xenbus support
>>  xen-netfront: implement TX persistent grants
>>  xen-netfront: implement RX persistent grants
>> 
>> drivers/net/xen-netback/common.h    |  79 ++++
>> drivers/net/xen-netback/interface.c |  78 +++-
>> drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>> drivers/net/xen-netback/xenbus.c    |  24 +
>> drivers/net/xen-netfront.c          | 362 ++++++++++++---
>> 5 files changed, 1216 insertions(+), 200 deletions(-)
>> 
>> -- 
>> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-19 15:39 ` Wei Liu
@ 2015-05-22 10:27   ` Joao Martins
  2015-05-22 10:27   ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 10:27 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 19 May 2015, at 17:39, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:
> 
>> There have been recently[3] some discussions and issues raised on
>> persistent grants for the block layer, though the numbers above
>> show some significant improvements specially on more network intensive
>> workloads and provide a margin for comparison against future map/unmap
>> improvements.
>> 
>> Any comments or suggestions are welcome,
>> Thanks!
> 
> Thanks, the numbers certainly look interesting.
> 
> I'm just a bit concerned about the complexity of netback. I've
> commented on individual patches, we can discuss the issues there.

Thanks a lot for the review! It does add more complexity, mainly for
the TX path, but I also would like to mention that a portion of this
changeset is also the persistent grants ops that could potentially live
outside.

Joao

>> [1] http://article.gmane.org/gmane.linux.network/249383
>> [2] http://bit.ly/1IhJfXD
>> [3] http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
>> 
>> Joao Martins (13):
>>  xen-netback: add persistent grant tree ops
>>  xen-netback: xenbus feature persistent support
>>  xen-netback: implement TX persistent grants
>>  xen-netback: implement RX persistent grants
>>  xen-netback: refactor xenvif_rx_action
>>  xen-netback: copy buffer on xenvif_start_xmit()
>>  xen-netback: add persistent tree counters to debugfs
>>  xen-netback: clone skb if skb->xmit_more is set
>>  xen-netfront: move grant_{ref,page} to struct grant
>>  xen-netfront: refactor claim/release grant
>>  xen-netfront: feature-persistent xenbus support
>>  xen-netfront: implement TX persistent grants
>>  xen-netfront: implement RX persistent grants
>> 
>> drivers/net/xen-netback/common.h    |  79 ++++
>> drivers/net/xen-netback/interface.c |  78 +++-
>> drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>> drivers/net/xen-netback/xenbus.c    |  24 +
>> drivers/net/xen-netfront.c          | 362 ++++++++++++---
>> 5 files changed, 1216 insertions(+), 200 deletions(-)
>> 
>> -- 
>> 2.1.3

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set
  2015-05-19 15:36   ` Wei Liu
  2015-05-22 17:14     ` Joao Martins
@ 2015-05-22 17:14     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 17:14 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 19 May 2015, at 17:36, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:32PM +0200, Joao Martins wrote:
>> On xenvif_start_xmit() we have an additional queue to the netback RX
>> kthread that will sends the packet. When using burst>1 pktgen sets
>> skb->xmit_more to tell the driver that there more skbs in the queue.
>> However, pktgen transmits the same skb <burst> times, which leads to
>> the BUG below. Long story short adding the same skb in the rx_queue
>> queue leads to crash. Specifically, having pktgen running with burst=2
>> what happens is: when we queue the second skb (that is the same as
>> the first queued skb), the list will have the tail element with skb->prev
>> which is the skb itself. On skb_unlink (i.e. when dequeueing the skb)
>> skb->prev will become NULL, but still having list->next pointing to the
>> unlinked skb. Because of this skb_peek will still return an skb, which
>> will redo the skb_unlink trying to set (skb->prev)->next where skb->prev
>> is now NULL, thus leading to the crash (trace below).
>> 
> 
> From your description this doesn't sound Xen specific. Sounds like
> pktgen breaks in any driver that has an internal queue, which is plenty.
Yes, it’s only on drivers with an internal queue.

>> I'm not sure what the best way to fix this but since it's only happening
>> when we use pktgen with burst>1: I chose doing an skb_clone when we don't
>> use persistent grants and skb->xmit_more flag is set, and when
>> CONFIG_NET_PKTGEN is compiled builtin.
>> 
> 
> I don't think we should do this.
Ok, I will remove this one out then. 
Part of the reason I submitted it was to make you aware of the crash.

Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set
  2015-05-19 15:36   ` Wei Liu
@ 2015-05-22 17:14     ` Joao Martins
  2015-05-22 17:14     ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-22 17:14 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 19 May 2015, at 17:36, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:32PM +0200, Joao Martins wrote:
>> On xenvif_start_xmit() we have an additional queue to the netback RX
>> kthread that will sends the packet. When using burst>1 pktgen sets
>> skb->xmit_more to tell the driver that there more skbs in the queue.
>> However, pktgen transmits the same skb <burst> times, which leads to
>> the BUG below. Long story short adding the same skb in the rx_queue
>> queue leads to crash. Specifically, having pktgen running with burst=2
>> what happens is: when we queue the second skb (that is the same as
>> the first queued skb), the list will have the tail element with skb->prev
>> which is the skb itself. On skb_unlink (i.e. when dequeueing the skb)
>> skb->prev will become NULL, but still having list->next pointing to the
>> unlinked skb. Because of this skb_peek will still return an skb, which
>> will redo the skb_unlink trying to set (skb->prev)->next where skb->prev
>> is now NULL, thus leading to the crash (trace below).
>> 
> 
> From your description this doesn't sound Xen specific. Sounds like
> pktgen breaks in any driver that has an internal queue, which is plenty.
Yes, it’s only on drivers with an internal queue.

>> I'm not sure what the best way to fix this but since it's only happening
>> when we use pktgen with burst>1: I chose doing an skb_clone when we don't
>> use persistent grants and skb->xmit_more flag is set, and when
>> CONFIG_NET_PKTGEN is compiled builtin.
>> 
> 
> I don't think we should do this.
Ok, I will remove this one out then. 
Part of the reason I submitted it was to make you aware of the crash.

Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* RE: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-22 10:27   ` Joao Martins
  2015-05-29  6:53     ` Yuzhou (C)
@ 2015-05-29  6:53     ` Yuzhou (C)
  2015-05-29 14:51       ` Joao Martins
  2015-05-29 14:51       ` [Xen-devel] " Joao Martins
  1 sibling, 2 replies; 98+ messages in thread
From: Yuzhou (C) @ 2015-05-29  6:53 UTC (permalink / raw)
  To: Joao Martins, Wei Liu
  Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky,
	Luohao (brian), Zhangleiqiang (Trump), Zhuangyuxin, Xiaoding (B)

Hi,

	About rx zerocopy, I have a question:

	If some application make a socket, then listen and accept, the client sends packets to it, but it doesn't recv from this socket right now, all persistent grant page would be in used.
So other application cannot receive any packets.  Is my guess right or wrong?

YuZhou

-----Original Message-----
From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Joao Martins
Sent: Friday, May 22, 2015 6:27 PM
To: Wei Liu
Cc: ian.campbell@citrix.com; netdev@vger.kernel.org; david.vrabel@citrix.com; xen-devel@lists.xenproject.org; boris.ostrovsky@oracle.com
Subject: Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers


On 19 May 2015, at 17:39, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:
> 
>> There have been recently[3] some discussions and issues raised on 
>> persistent grants for the block layer, though the numbers above show 
>> some significant improvements specially on more network intensive 
>> workloads and provide a margin for comparison against future 
>> map/unmap improvements.
>> 
>> Any comments or suggestions are welcome, Thanks!
> 
> Thanks, the numbers certainly look interesting.
> 
> I'm just a bit concerned about the complexity of netback. I've 
> commented on individual patches, we can discuss the issues there.

Thanks a lot for the review! It does add more complexity, mainly for the TX path, but I also would like to mention that a portion of this changeset is also the persistent grants ops that could potentially live outside.

Joao

>> [1] http://article.gmane.org/gmane.linux.network/249383
>> [2] http://bit.ly/1IhJfXD
>> [3] 
>> http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
>> 
>> Joao Martins (13):
>>  xen-netback: add persistent grant tree ops
>>  xen-netback: xenbus feature persistent support
>>  xen-netback: implement TX persistent grants
>>  xen-netback: implement RX persistent grants
>>  xen-netback: refactor xenvif_rx_action
>>  xen-netback: copy buffer on xenvif_start_xmit()
>>  xen-netback: add persistent tree counters to debugfs
>>  xen-netback: clone skb if skb->xmit_more is set
>>  xen-netfront: move grant_{ref,page} to struct grant
>>  xen-netfront: refactor claim/release grant
>>  xen-netfront: feature-persistent xenbus support
>>  xen-netfront: implement TX persistent grants
>>  xen-netfront: implement RX persistent grants
>> 
>> drivers/net/xen-netback/common.h    |  79 ++++
>> drivers/net/xen-netback/interface.c |  78 +++-
>> drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>> drivers/net/xen-netback/xenbus.c    |  24 +
>> drivers/net/xen-netfront.c          | 362 ++++++++++++---
>> 5 files changed, 1216 insertions(+), 200 deletions(-)
>> 
>> --
>> 2.1.3



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-22 10:27   ` Joao Martins
@ 2015-05-29  6:53     ` Yuzhou (C)
  2015-05-29  6:53     ` [Xen-devel] " Yuzhou (C)
  1 sibling, 0 replies; 98+ messages in thread
From: Yuzhou (C) @ 2015-05-29  6:53 UTC (permalink / raw)
  To: Joao Martins, Wei Liu
  Cc: Zhangleiqiang (Trump), Luohao (brian),
	ian.campbell, Zhuangyuxin, netdev, david.vrabel, Xiaoding (B),
	xen-devel, boris.ostrovsky

Hi,

	About rx zerocopy, I have a question:

	If some application make a socket, then listen and accept, the client sends packets to it, but it doesn't recv from this socket right now, all persistent grant page would be in used.
So other application cannot receive any packets.  Is my guess right or wrong?

YuZhou

-----Original Message-----
From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Joao Martins
Sent: Friday, May 22, 2015 6:27 PM
To: Wei Liu
Cc: ian.campbell@citrix.com; netdev@vger.kernel.org; david.vrabel@citrix.com; xen-devel@lists.xenproject.org; boris.ostrovsky@oracle.com
Subject: Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers


On 19 May 2015, at 17:39, Wei Liu <wei.liu2@citrix.com> wrote:

> On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:
> 
>> There have been recently[3] some discussions and issues raised on 
>> persistent grants for the block layer, though the numbers above show 
>> some significant improvements specially on more network intensive 
>> workloads and provide a margin for comparison against future 
>> map/unmap improvements.
>> 
>> Any comments or suggestions are welcome, Thanks!
> 
> Thanks, the numbers certainly look interesting.
> 
> I'm just a bit concerned about the complexity of netback. I've 
> commented on individual patches, we can discuss the issues there.

Thanks a lot for the review! It does add more complexity, mainly for the TX path, but I also would like to mention that a portion of this changeset is also the persistent grants ops that could potentially live outside.

Joao

>> [1] http://article.gmane.org/gmane.linux.network/249383
>> [2] http://bit.ly/1IhJfXD
>> [3] 
>> http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
>> 
>> Joao Martins (13):
>>  xen-netback: add persistent grant tree ops
>>  xen-netback: xenbus feature persistent support
>>  xen-netback: implement TX persistent grants
>>  xen-netback: implement RX persistent grants
>>  xen-netback: refactor xenvif_rx_action
>>  xen-netback: copy buffer on xenvif_start_xmit()
>>  xen-netback: add persistent tree counters to debugfs
>>  xen-netback: clone skb if skb->xmit_more is set
>>  xen-netfront: move grant_{ref,page} to struct grant
>>  xen-netfront: refactor claim/release grant
>>  xen-netfront: feature-persistent xenbus support
>>  xen-netfront: implement TX persistent grants
>>  xen-netfront: implement RX persistent grants
>> 
>> drivers/net/xen-netback/common.h    |  79 ++++
>> drivers/net/xen-netback/interface.c |  78 +++-
>> drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>> drivers/net/xen-netback/xenbus.c    |  24 +
>> drivers/net/xen-netfront.c          | 362 ++++++++++++---
>> 5 files changed, 1216 insertions(+), 200 deletions(-)
>> 
>> --
>> 2.1.3



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-29  6:53     ` [Xen-devel] " Yuzhou (C)
  2015-05-29 14:51       ` Joao Martins
@ 2015-05-29 14:51       ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-29 14:51 UTC (permalink / raw)
  To: Yuzhou (C)
  Cc: Wei Liu, ian.campbell, netdev, david.vrabel, xen-devel,
	boris.ostrovsky, Luohao (brian), Zhangleiqiang (Trump),
	Zhuangyuxin, Xiaoding (B)


On 29 May 2015, at 08:53, Yuzhou (C) <vitas.yuzhou@huawei.com> wrote:
> Hi,
> 
> 	About rx zerocopy, I have a question:
> 
> 	If some application make a socket, then listen and accept, the client sends packets to it, but it doesn't recv from this socket right now, all persistent grant page would be in used.
> So other application cannot receive any packets.  Is my guess right or wrong?

I believe that doesn’t happen: before the skb gets delivered to the protocol stack,
skb_orphan_frags gets called which releases the original pages (i.e. the persistent
grants) and memcpy to new ones (if the skb is fragmented). This happens because I
previously set the flag SKBTX_DEV_ZEROCOPY, which also invokes a callback
in such event.

Once the callback is invoked, the released pages are added a pool within
xen-netfront which later will use for new requests to the backend. Note that part
of the data is previously copied on pskb_pull_tail and may unref the initial frag
before skb_orphan_frags is called. The callback is still invoked and this page is added
to the pool as well.

Joao

> YuZhou
> 
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Joao Martins
> Sent: Friday, May 22, 2015 6:27 PM
> To: Wei Liu
> Cc: ian.campbell@citrix.com; netdev@vger.kernel.org; david.vrabel@citrix.com; xen-devel@lists.xenproject.org;boris.ostrovsky@oracle.com
> Subject: Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
> 
> 
> On 19 May 2015, at 17:39, Wei Liu <wei.liu2@citrix.com> wrote:
> 
>> On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:
>> 
>>> There have been recently[3] some discussions and issues raised on 
>>> persistent grants for the block layer, though the numbers above show 
>>> some significant improvements specially on more network intensive 
>>> workloads and provide a margin for comparison against future 
>>> map/unmap improvements.
>>> 
>>> Any comments or suggestions are welcome, Thanks!
>> 
>> Thanks, the numbers certainly look interesting.
>> 
>> I'm just a bit concerned about the complexity of netback. I've 
>> commented on individual patches, we can discuss the issues there.
> 
> Thanks a lot for the review! It does add more complexity, mainly for the TX path, but I also would like to mention that a portion of this changeset is also the persistent grants ops that could potentially live outside.
> 
> Joao
> 
>>> [1] http://article.gmane.org/gmane.linux.network/249383
>>> [2] http://bit.ly/1IhJfXD
>>> [3] 
>>> http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
>>> 
>>> Joao Martins (13):
>>> xen-netback: add persistent grant tree ops
>>> xen-netback: xenbus feature persistent support
>>> xen-netback: implement TX persistent grants
>>> xen-netback: implement RX persistent grants
>>> xen-netback: refactor xenvif_rx_action
>>> xen-netback: copy buffer on xenvif_start_xmit()
>>> xen-netback: add persistent tree counters to debugfs
>>> xen-netback: clone skb if skb->xmit_more is set
>>> xen-netfront: move grant_{ref,page} to struct grant
>>> xen-netfront: refactor claim/release grant
>>> xen-netfront: feature-persistent xenbus support
>>> xen-netfront: implement TX persistent grants
>>> xen-netfront: implement RX persistent grants
>>> 
>>> drivers/net/xen-netback/common.h    |  79 ++++
>>> drivers/net/xen-netback/interface.c |  78 +++-
>>> drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>>> drivers/net/xen-netback/xenbus.c    |  24 +
>>> drivers/net/xen-netfront.c          | 362 ++++++++++++---
>>> 5 files changed, 1216 insertions(+), 200 deletions(-)
>>> 
>>> --
>>> 2.1.3
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

===================================================================
João Martins
Research Scientist, Networked Systems and Data Analytics Group

NEC Laboratories Europe
Kurfuerstenanlage 36
D-69115 Heidelberg

Tel.     +49 (0)6221 4342-208
Fax:     +49 (0)6221 4342-155
e-mail:  joao.martins@neclab.eu
===================================================================
NEC Europe Ltd | Registered Office: Athene, Odyssey Business Park,
West End Road, London, HA4 6QE, GB | Registered in England 2832014

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 00/13] Persistent grant maps for xen net drivers
  2015-05-29  6:53     ` [Xen-devel] " Yuzhou (C)
@ 2015-05-29 14:51       ` Joao Martins
  2015-05-29 14:51       ` [Xen-devel] " Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-05-29 14:51 UTC (permalink / raw)
  To: Yuzhou (C)
  Cc: Zhangleiqiang (Trump), Luohao (brian),
	Wei Liu, ian.campbell, Zhuangyuxin, netdev, david.vrabel,
	Xiaoding (B),
	xen-devel, boris.ostrovsky


On 29 May 2015, at 08:53, Yuzhou (C) <vitas.yuzhou@huawei.com> wrote:
> Hi,
> 
> 	About rx zerocopy, I have a question:
> 
> 	If some application make a socket, then listen and accept, the client sends packets to it, but it doesn't recv from this socket right now, all persistent grant page would be in used.
> So other application cannot receive any packets.  Is my guess right or wrong?

I believe that doesn’t happen: before the skb gets delivered to the protocol stack,
skb_orphan_frags gets called which releases the original pages (i.e. the persistent
grants) and memcpy to new ones (if the skb is fragmented). This happens because I
previously set the flag SKBTX_DEV_ZEROCOPY, which also invokes a callback
in such event.

Once the callback is invoked, the released pages are added a pool within
xen-netfront which later will use for new requests to the backend. Note that part
of the data is previously copied on pskb_pull_tail and may unref the initial frag
before skb_orphan_frags is called. The callback is still invoked and this page is added
to the pool as well.

Joao

> YuZhou
> 
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Joao Martins
> Sent: Friday, May 22, 2015 6:27 PM
> To: Wei Liu
> Cc: ian.campbell@citrix.com; netdev@vger.kernel.org; david.vrabel@citrix.com; xen-devel@lists.xenproject.org;boris.ostrovsky@oracle.com
> Subject: Re: [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers
> 
> 
> On 19 May 2015, at 17:39, Wei Liu <wei.liu2@citrix.com> wrote:
> 
>> On Tue, May 12, 2015 at 07:18:24PM +0200, Joao Martins wrote:
>> 
>>> There have been recently[3] some discussions and issues raised on 
>>> persistent grants for the block layer, though the numbers above show 
>>> some significant improvements specially on more network intensive 
>>> workloads and provide a margin for comparison against future 
>>> map/unmap improvements.
>>> 
>>> Any comments or suggestions are welcome, Thanks!
>> 
>> Thanks, the numbers certainly look interesting.
>> 
>> I'm just a bit concerned about the complexity of netback. I've 
>> commented on individual patches, we can discuss the issues there.
> 
> Thanks a lot for the review! It does add more complexity, mainly for the TX path, but I also would like to mention that a portion of this changeset is also the persistent grants ops that could potentially live outside.
> 
> Joao
> 
>>> [1] http://article.gmane.org/gmane.linux.network/249383
>>> [2] http://bit.ly/1IhJfXD
>>> [3] 
>>> http://lists.xen.org/archives/html/xen-devel/2015-02/msg02292.html
>>> 
>>> Joao Martins (13):
>>> xen-netback: add persistent grant tree ops
>>> xen-netback: xenbus feature persistent support
>>> xen-netback: implement TX persistent grants
>>> xen-netback: implement RX persistent grants
>>> xen-netback: refactor xenvif_rx_action
>>> xen-netback: copy buffer on xenvif_start_xmit()
>>> xen-netback: add persistent tree counters to debugfs
>>> xen-netback: clone skb if skb->xmit_more is set
>>> xen-netfront: move grant_{ref,page} to struct grant
>>> xen-netfront: refactor claim/release grant
>>> xen-netfront: feature-persistent xenbus support
>>> xen-netfront: implement TX persistent grants
>>> xen-netfront: implement RX persistent grants
>>> 
>>> drivers/net/xen-netback/common.h    |  79 ++++
>>> drivers/net/xen-netback/interface.c |  78 +++-
>>> drivers/net/xen-netback/netback.c   | 873 ++++++++++++++++++++++++++++++------
>>> drivers/net/xen-netback/xenbus.c    |  24 +
>>> drivers/net/xen-netfront.c          | 362 ++++++++++++---
>>> 5 files changed, 1216 insertions(+), 200 deletions(-)
>>> 
>>> --
>>> 2.1.3
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

===================================================================
João Martins
Research Scientist, Networked Systems and Data Analytics Group

NEC Laboratories Europe
Kurfuerstenanlage 36
D-69115 Heidelberg

Tel.     +49 (0)6221 4342-208
Fax:     +49 (0)6221 4342-155
e-mail:  joao.martins@neclab.eu
===================================================================
NEC Europe Ltd | Registered Office: Athene, Odyssey Business Park,
West End Road, London, HA4 6QE, GB | Registered in England 2832014

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-22 10:24     ` Joao Martins
  2015-06-02 14:53       ` Wei Liu
@ 2015-06-02 14:53       ` Wei Liu
  2015-06-03 17:07         ` Joao Martins
  2015-06-03 17:07         ` Joao Martins
  1 sibling, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-02 14:53 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, xen-devel, netdev, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Fri, May 22, 2015 at 10:24:39AM +0000, Joao Martins wrote:
> 
> On 19 May 2015, at 17:23, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
> >> Introduces persistent grants for TX path which follows similar code path
> >> as the grant mapping.
> >> 
> >> It starts by checking if there's a persistent grant available for header
> >> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
> >> is found in the tree for the header it will resort to grant copy (but
> >> preparing the map ops and add them laster). For the frags it will use the
> >                                     ^
> >                                     later
> > 
> >> tree page pool, and in case of no pages it fallbacks to grant map/unmap
> >> using mmap_pages. When skb destructor callback gets called we release the
> >> slot and persistent grant within the callback to avoid waking up the
> >> dealloc thread. As long as there are no unmaps to done the dealloc thread
> >> will remain inactive.
> >> 
> > 
> > This scheme looks complicated. Can we just only use one
> > scheme at a time? What's the rationale for using this combined scheme?
> > Maybe you're thinking about using a max_grants < ring_size to save
> > memory?
> 
> Yes, my purpose was to allow a max_grants < ring_size to save amount of
> memory mapped. I did a bulk transfer test with iperf and the max amount of
> grants in tree was <160 TX gnts, without affecting the max performance;
> tough using pktgen fills the tree completely.
> The second reason is to handle the case for a (malicious?) frontend providing
> more grefs than the max allowed in which I would fallback to grant map/unmap.
> 

This is indeed a valid concern. The only method is to expires oldest
grant when that happens -- but this is just complexity in another place,
not really simplifying anything.

> > 
> > Only skim the patch. I will do detailed reviews after we're sure this is
> > the right way to go.
> > 
[...]
> > 
> > Under what circumstance can we retrieve a already in use persistent
> > grant? You seem to suggest this is a bug in RX case.
> 
> A guest could share try to share the same mapped page in multiple frags,
> in which case I fallback to map/unmap. I think this is a limitation in
> the way we manage the persistent gnts where we can only have a single
> reference of a persistent grant inflight.
> 

How much harder would it be to ref-count inflight grants? Would that
simplify or perplex things? I'm just asking, not suggesting you should
choose ref-counting over current scheme.

In principle I favour simple code path over optimisation for every
possible corner case.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-05-22 10:24     ` Joao Martins
@ 2015-06-02 14:53       ` Wei Liu
  2015-06-02 14:53       ` Wei Liu
  1 sibling, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-02 14:53 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Fri, May 22, 2015 at 10:24:39AM +0000, Joao Martins wrote:
> 
> On 19 May 2015, at 17:23, Wei Liu <wei.liu2@citrix.com> wrote:
> > On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
> >> Introduces persistent grants for TX path which follows similar code path
> >> as the grant mapping.
> >> 
> >> It starts by checking if there's a persistent grant available for header
> >> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
> >> is found in the tree for the header it will resort to grant copy (but
> >> preparing the map ops and add them laster). For the frags it will use the
> >                                     ^
> >                                     later
> > 
> >> tree page pool, and in case of no pages it fallbacks to grant map/unmap
> >> using mmap_pages. When skb destructor callback gets called we release the
> >> slot and persistent grant within the callback to avoid waking up the
> >> dealloc thread. As long as there are no unmaps to done the dealloc thread
> >> will remain inactive.
> >> 
> > 
> > This scheme looks complicated. Can we just only use one
> > scheme at a time? What's the rationale for using this combined scheme?
> > Maybe you're thinking about using a max_grants < ring_size to save
> > memory?
> 
> Yes, my purpose was to allow a max_grants < ring_size to save amount of
> memory mapped. I did a bulk transfer test with iperf and the max amount of
> grants in tree was <160 TX gnts, without affecting the max performance;
> tough using pktgen fills the tree completely.
> The second reason is to handle the case for a (malicious?) frontend providing
> more grefs than the max allowed in which I would fallback to grant map/unmap.
> 

This is indeed a valid concern. The only method is to expires oldest
grant when that happens -- but this is just complexity in another place,
not really simplifying anything.

> > 
> > Only skim the patch. I will do detailed reviews after we're sure this is
> > the right way to go.
> > 
[...]
> > 
> > Under what circumstance can we retrieve a already in use persistent
> > grant? You seem to suggest this is a bug in RX case.
> 
> A guest could share try to share the same mapped page in multiple frags,
> in which case I fallback to map/unmap. I think this is a limitation in
> the way we manage the persistent gnts where we can only have a single
> reference of a persistent grant inflight.
> 

How much harder would it be to ref-count inflight grants? Would that
simplify or perplex things? I'm just asking, not suggesting you should
choose ref-counting over current scheme.

In principle I favour simple code path over optimisation for every
possible corner case.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-22 10:25     ` Joao Martins
  2015-06-02 15:07       ` Wei Liu
@ 2015-06-02 15:07       ` Wei Liu
  2015-06-03 17:08         ` Joao Martins
  2015-06-03 17:08         ` Joao Martins
  1 sibling, 2 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-02 15:07 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, xen-devel, netdev, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Fri, May 22, 2015 at 10:25:10AM +0000, Joao Martins wrote:
> 
> On 19 May 2015, at 17:32, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
> >> It starts by doing a lookup in the tree for a gref. If no persistent
> >> grant is found on the tree, it will do grant copy and prepare
> >> the grant maps. Finally valides the grant map and adds it to the tree.
> > 
> > validates?
> > 
> >> After mapped these grants can be pulled from the tree in the subsequent
> >> requests. If it's out of pages in the tree pool, it will fallback to
> >> grant copy.
> >> 
> > 
> > Again, this looks complicated. Why use combined scheme? I will do
> > detailed reviews after we're sure we need such scheme.
> When we don't have the gref in tree we need to map it and then copying
> afterwards into the newly mapped page (and this only happens once until
> the grant is in tree). My options here were to either do this explicitly,
> after we add the persistent grant in which we would need to save to
> dst/src address and len to copy. The other option is to reuse the grant
> copy (since it's only once until the grant is in the tree) and use memcpy
> in followings requests. Additionally I allow the fallback to grant copy in

Which approach were you using here? I looked at the code but couldn't
quite get which one you were getting at. I guess the first one?

> case the guest provides providing more grefs > max_grants.
> 
> Note that this is also the case for TX as well, with regard to grant
> copying the header. I was unsure about which one is the most correct way
> of doing it, but ultimately the latter involved a smaller codepath, and
> that's why I chose it. What do you think?
> 

Shorter is better. Easier to understand.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-05-22 10:25     ` Joao Martins
@ 2015-06-02 15:07       ` Wei Liu
  2015-06-02 15:07       ` Wei Liu
  1 sibling, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-02 15:07 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Fri, May 22, 2015 at 10:25:10AM +0000, Joao Martins wrote:
> 
> On 19 May 2015, at 17:32, Wei Liu <wei.liu2@citrix.com> wrote:
> 
> > On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
> >> It starts by doing a lookup in the tree for a gref. If no persistent
> >> grant is found on the tree, it will do grant copy and prepare
> >> the grant maps. Finally valides the grant map and adds it to the tree.
> > 
> > validates?
> > 
> >> After mapped these grants can be pulled from the tree in the subsequent
> >> requests. If it's out of pages in the tree pool, it will fallback to
> >> grant copy.
> >> 
> > 
> > Again, this looks complicated. Why use combined scheme? I will do
> > detailed reviews after we're sure we need such scheme.
> When we don't have the gref in tree we need to map it and then copying
> afterwards into the newly mapped page (and this only happens once until
> the grant is in tree). My options here were to either do this explicitly,
> after we add the persistent grant in which we would need to save to
> dst/src address and len to copy. The other option is to reuse the grant
> copy (since it's only once until the grant is in the tree) and use memcpy
> in followings requests. Additionally I allow the fallback to grant copy in

Which approach were you using here? I looked at the code but couldn't
quite get which one you were getting at. I guess the first one?

> case the guest provides providing more grefs > max_grants.
> 
> Note that this is also the case for TX as well, with regard to grant
> copying the header. I was unsure about which one is the most correct way
> of doing it, but ultimately the latter involved a smaller codepath, and
> that's why I chose it. What do you think?
> 

Shorter is better. Easier to understand.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-22 10:26     ` Joao Martins
  2015-06-02 15:10       ` Wei Liu
@ 2015-06-02 15:10       ` Wei Liu
  1 sibling, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-02 15:10 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, xen-devel, netdev, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Fri, May 22, 2015 at 10:26:48AM +0000, Joao Martins wrote:
[...]
> >> 	return IRQ_HANDLED;
> >> }
> >> @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >> 	cb = XENVIF_RX_CB(skb);
> >> 	cb->expires = jiffies + vif->drain_timeout;
> >> 
> >> -	xenvif_rx_queue_tail(queue, skb);
> >> -	xenvif_kick_thread(queue);
> >> +	if (!queue->vif->persistent_grants) {
> >> +		xenvif_rx_queue_tail(queue, skb);
> >> +		xenvif_kick_thread(queue);
> >> +	} else if (xenvif_rx_map(queue, skb)) {
> >> +		return NETDEV_TX_BUSY;
> >> +	}
> >> 
> > 
> > We now have two different functions for guest RX, one is xenvif_rx_map,
> > the other is xenvif_rx_action. They look very similar. Can we only have
> > one?
> I think I can merge this into xenvif_rx_action, and I notice that the stall
> detection its missing. I will also add that.
> Perhaps I could also disable the RX kthread, since this doesn't get used with
> persistent grants?
> 

Disabling that kthread is fine. But we do need to make sure we can do
the same things in start_xmit as we are in kthread. I.e. what context
does start_xmit run in and what are the restrictions.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit()
  2015-05-22 10:26     ` Joao Martins
@ 2015-06-02 15:10       ` Wei Liu
  2015-06-02 15:10       ` Wei Liu
  1 sibling, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-02 15:10 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Fri, May 22, 2015 at 10:26:48AM +0000, Joao Martins wrote:
[...]
> >> 	return IRQ_HANDLED;
> >> }
> >> @@ -168,8 +169,12 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct net_device *dev)
> >> 	cb = XENVIF_RX_CB(skb);
> >> 	cb->expires = jiffies + vif->drain_timeout;
> >> 
> >> -	xenvif_rx_queue_tail(queue, skb);
> >> -	xenvif_kick_thread(queue);
> >> +	if (!queue->vif->persistent_grants) {
> >> +		xenvif_rx_queue_tail(queue, skb);
> >> +		xenvif_kick_thread(queue);
> >> +	} else if (xenvif_rx_map(queue, skb)) {
> >> +		return NETDEV_TX_BUSY;
> >> +	}
> >> 
> > 
> > We now have two different functions for guest RX, one is xenvif_rx_map,
> > the other is xenvif_rx_action. They look very similar. Can we only have
> > one?
> I think I can merge this into xenvif_rx_action, and I notice that the stall
> detection its missing. I will also add that.
> Perhaps I could also disable the RX kthread, since this doesn't get used with
> persistent grants?
> 

Disabling that kthread is fine. But we do need to make sure we can do
the same things in start_xmit as we are in kthread. I.e. what context
does start_xmit run in and what are the restrictions.

Wei.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-06-02 14:53       ` Wei Liu
  2015-06-03 17:07         ` Joao Martins
@ 2015-06-03 17:07         ` Joao Martins
  2015-06-07 12:04           ` Wei Liu
  2015-06-07 12:04           ` Wei Liu
  1 sibling, 2 replies; 98+ messages in thread
From: Joao Martins @ 2015-06-03 17:07 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 02 Jun 2015, at 16:53, Wei Liu <wei.liu2@citrix.com> wrote:

> On Fri, May 22, 2015 at 10:24:39AM +0000, Joao Martins wrote:
>> 
>> On 19 May 2015, at 17:23, Wei Liu <wei.liu2@citrix.com> wrote:
>>> On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
>>>> Introduces persistent grants for TX path which follows similar code path
>>>> as the grant mapping.
>>>> 
>>>> It starts by checking if there's a persistent grant available for header
>>>> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
>>>> is found in the tree for the header it will resort to grant copy (but
>>>> preparing the map ops and add them laster). For the frags it will use the
>>>                                    ^
>>>                                    later
>>> 
>>>> tree page pool, and in case of no pages it fallbacks to grant map/unmap
>>>> using mmap_pages. When skb destructor callback gets called we release the
>>>> slot and persistent grant within the callback to avoid waking up the
>>>> dealloc thread. As long as there are no unmaps to done the dealloc thread
>>>> will remain inactive.
>>>> 
>>> 
>>> This scheme looks complicated. Can we just only use one
>>> scheme at a time? What's the rationale for using this combined scheme?
>>> Maybe you're thinking about using a max_grants < ring_size to save
>>> memory?
>> 
>> Yes, my purpose was to allow a max_grants < ring_size to save amount of
>> memory mapped. I did a bulk transfer test with iperf and the max amount of
>> grants in tree was <160 TX gnts, without affecting the max performance;
>> tough using pktgen fills the tree completely.
>> The second reason is to handle the case for a (malicious?) frontend providing
>> more grefs than the max allowed in which I would fallback to grant map/unmap.
>> 
> 
> This is indeed a valid concern. The only method is to expires oldest
> grant when that happens -- but this is just complexity in another place,
> not really simplifying anything.
> 
>>> 
>>> Only skim the patch. I will do detailed reviews after we're sure this is
>>> the right way to go.
>>> 
> [...]
>>> 
>>> Under what circumstance can we retrieve a already in use persistent
>>> grant? You seem to suggest this is a bug in RX case.
>> 
>> A guest could share try to share the same mapped page in multiple frags,
>> in which case I fallback to map/unmap. I think this is a limitation in
>> the way we manage the persistent gnts where we can only have a single
>> reference of a persistent grant inflight.
>> 
> 
> How much harder would it be to ref-count inflight grants? Would that
> simplify or perplex things? I'm just asking, not suggesting you should
> choose ref-counting over current scheme.
> 
> In principle I favour simple code path over optimisation for every
> possible corner case.

ref-counting the persistent grants would mean eliminating the check for
EBUSY on xenvif_pgrant_new, but though it isn’t that much of a simplification.

What would simplify a lot is if I grant map when we don’t get a persistent_gnt
in xenvif_pgrant_new() and add it to the tree there instead of doing on xenvif_tx_check_gop.
Since this happens only once for persistent grants (and up to ring size), I believe it
wouldn't hurt performance.

This way we would remove a lot of the checks in xenvif_tx_check_gop and
hopefully leaving those parts (almost) intact mainly to be used for grant
map/unmap case. The reason I didn’t do it because I wanted to reuse the
grant map code and thought that preference was given towards batching the
grant maps. But it looks that it definitely makes things more complicated
and adds more corner cases.

The same goes for the RX case where this change would remove a lot of the code
for adding the grant maps (thus sharing a lot from the TX part) besides removing the
mixed initial grant copy + map. What do you think?

Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-06-02 14:53       ` Wei Liu
@ 2015-06-03 17:07         ` Joao Martins
  2015-06-03 17:07         ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-06-03 17:07 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 02 Jun 2015, at 16:53, Wei Liu <wei.liu2@citrix.com> wrote:

> On Fri, May 22, 2015 at 10:24:39AM +0000, Joao Martins wrote:
>> 
>> On 19 May 2015, at 17:23, Wei Liu <wei.liu2@citrix.com> wrote:
>>> On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
>>>> Introduces persistent grants for TX path which follows similar code path
>>>> as the grant mapping.
>>>> 
>>>> It starts by checking if there's a persistent grant available for header
>>>> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
>>>> is found in the tree for the header it will resort to grant copy (but
>>>> preparing the map ops and add them laster). For the frags it will use the
>>>                                    ^
>>>                                    later
>>> 
>>>> tree page pool, and in case of no pages it fallbacks to grant map/unmap
>>>> using mmap_pages. When skb destructor callback gets called we release the
>>>> slot and persistent grant within the callback to avoid waking up the
>>>> dealloc thread. As long as there are no unmaps to done the dealloc thread
>>>> will remain inactive.
>>>> 
>>> 
>>> This scheme looks complicated. Can we just only use one
>>> scheme at a time? What's the rationale for using this combined scheme?
>>> Maybe you're thinking about using a max_grants < ring_size to save
>>> memory?
>> 
>> Yes, my purpose was to allow a max_grants < ring_size to save amount of
>> memory mapped. I did a bulk transfer test with iperf and the max amount of
>> grants in tree was <160 TX gnts, without affecting the max performance;
>> tough using pktgen fills the tree completely.
>> The second reason is to handle the case for a (malicious?) frontend providing
>> more grefs than the max allowed in which I would fallback to grant map/unmap.
>> 
> 
> This is indeed a valid concern. The only method is to expires oldest
> grant when that happens -- but this is just complexity in another place,
> not really simplifying anything.
> 
>>> 
>>> Only skim the patch. I will do detailed reviews after we're sure this is
>>> the right way to go.
>>> 
> [...]
>>> 
>>> Under what circumstance can we retrieve a already in use persistent
>>> grant? You seem to suggest this is a bug in RX case.
>> 
>> A guest could share try to share the same mapped page in multiple frags,
>> in which case I fallback to map/unmap. I think this is a limitation in
>> the way we manage the persistent gnts where we can only have a single
>> reference of a persistent grant inflight.
>> 
> 
> How much harder would it be to ref-count inflight grants? Would that
> simplify or perplex things? I'm just asking, not suggesting you should
> choose ref-counting over current scheme.
> 
> In principle I favour simple code path over optimisation for every
> possible corner case.

ref-counting the persistent grants would mean eliminating the check for
EBUSY on xenvif_pgrant_new, but though it isn’t that much of a simplification.

What would simplify a lot is if I grant map when we don’t get a persistent_gnt
in xenvif_pgrant_new() and add it to the tree there instead of doing on xenvif_tx_check_gop.
Since this happens only once for persistent grants (and up to ring size), I believe it
wouldn't hurt performance.

This way we would remove a lot of the checks in xenvif_tx_check_gop and
hopefully leaving those parts (almost) intact mainly to be used for grant
map/unmap case. The reason I didn’t do it because I wanted to reuse the
grant map code and thought that preference was given towards batching the
grant maps. But it looks that it definitely makes things more complicated
and adds more corner cases.

The same goes for the RX case where this change would remove a lot of the code
for adding the grant maps (thus sharing a lot from the TX part) besides removing the
mixed initial grant copy + map. What do you think?

Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-06-02 15:07       ` Wei Liu
@ 2015-06-03 17:08         ` Joao Martins
  2015-06-03 17:08         ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-06-03 17:08 UTC (permalink / raw)
  To: Wei Liu
  Cc: xen-devel, netdev, ian.campbell, david.vrabel, boris.ostrovsky,
	konrad.wilk


On 02 Jun 2015, at 17:07, Wei Liu <wei.liu2@citrix.com> wrote:

> On Fri, May 22, 2015 at 10:25:10AM +0000, Joao Martins wrote:
>> 
>> On 19 May 2015, at 17:32, Wei Liu <wei.liu2@citrix.com> wrote:
>> 
>>> On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
>>>> It starts by doing a lookup in the tree for a gref. If no persistent
>>>> grant is found on the tree, it will do grant copy and prepare
>>>> the grant maps. Finally valides the grant map and adds it to the tree.
>>> 
>>> validates?
>>> 
>>>> After mapped these grants can be pulled from the tree in the subsequent
>>>> requests. If it's out of pages in the tree pool, it will fallback to
>>>> grant copy.
>>>> 
>>> 
>>> Again, this looks complicated. Why use combined scheme? I will do
>>> detailed reviews after we're sure we need such scheme.
>> When we don't have the gref in tree we need to map it and then copying
>> afterwards into the newly mapped page (and this only happens once until
>> the grant is in tree). My options here were to either do this explicitly,
>> after we add the persistent grant in which we would need to save to
>> dst/src address and len to copy. The other option is to reuse the grant
>> copy (since it's only once until the grant is in the tree) and use memcpy
>> in followings requests. Additionally I allow the fallback to grant copy in
> 
> Which approach were you using here? I looked at the code but couldn't
> quite get which one you were getting at. I guess the first one?

The one I used was the second one i.e. grant copy when the gref is not in tree,
and memcpy on subsequent requests. The only difference in these options is really
to memcpy (first option) or to grant copy (second option) on the first lookup
of the gref in the tree. The problem with the first option is adding more
state to save where to memcpy which we can only do after mapping the grant. Thus
reusing the grant copy simplifies, but probably makes things not as clear.

Perhaps what I suggested in an earlier comment (regarding TX persistent grants) could
simplify things.


>> case the guest provides providing more grefs > max_grants.
>> 
>> Note that this is also the case for TX as well, with regard to grant
>> copying the header. I was unsure about which one is the most correct way
>> of doing it, but ultimately the latter involved a smaller codepath, and
>> that's why I chose it. What do you think?
>> 
> 
> Shorter is better. Easier to understand.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 04/13] xen-netback: implement RX persistent grants
  2015-06-02 15:07       ` Wei Liu
  2015-06-03 17:08         ` Joao Martins
@ 2015-06-03 17:08         ` Joao Martins
  1 sibling, 0 replies; 98+ messages in thread
From: Joao Martins @ 2015-06-03 17:08 UTC (permalink / raw)
  To: Wei Liu; +Cc: ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky


On 02 Jun 2015, at 17:07, Wei Liu <wei.liu2@citrix.com> wrote:

> On Fri, May 22, 2015 at 10:25:10AM +0000, Joao Martins wrote:
>> 
>> On 19 May 2015, at 17:32, Wei Liu <wei.liu2@citrix.com> wrote:
>> 
>>> On Tue, May 12, 2015 at 07:18:28PM +0200, Joao Martins wrote:
>>>> It starts by doing a lookup in the tree for a gref. If no persistent
>>>> grant is found on the tree, it will do grant copy and prepare
>>>> the grant maps. Finally valides the grant map and adds it to the tree.
>>> 
>>> validates?
>>> 
>>>> After mapped these grants can be pulled from the tree in the subsequent
>>>> requests. If it's out of pages in the tree pool, it will fallback to
>>>> grant copy.
>>>> 
>>> 
>>> Again, this looks complicated. Why use combined scheme? I will do
>>> detailed reviews after we're sure we need such scheme.
>> When we don't have the gref in tree we need to map it and then copying
>> afterwards into the newly mapped page (and this only happens once until
>> the grant is in tree). My options here were to either do this explicitly,
>> after we add the persistent grant in which we would need to save to
>> dst/src address and len to copy. The other option is to reuse the grant
>> copy (since it's only once until the grant is in the tree) and use memcpy
>> in followings requests. Additionally I allow the fallback to grant copy in
> 
> Which approach were you using here? I looked at the code but couldn't
> quite get which one you were getting at. I guess the first one?

The one I used was the second one i.e. grant copy when the gref is not in tree,
and memcpy on subsequent requests. The only difference in these options is really
to memcpy (first option) or to grant copy (second option) on the first lookup
of the gref in the tree. The problem with the first option is adding more
state to save where to memcpy which we can only do after mapping the grant. Thus
reusing the grant copy simplifies, but probably makes things not as clear.

Perhaps what I suggested in an earlier comment (regarding TX persistent grants) could
simplify things.


>> case the guest provides providing more grefs > max_grants.
>> 
>> Note that this is also the case for TX as well, with regard to grant
>> copying the header. I was unsure about which one is the most correct way
>> of doing it, but ultimately the latter involved a smaller codepath, and
>> that's why I chose it. What do you think?
>> 
> 
> Shorter is better. Easier to understand.

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-06-03 17:07         ` Joao Martins
  2015-06-07 12:04           ` Wei Liu
@ 2015-06-07 12:04           ` Wei Liu
  1 sibling, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-07 12:04 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, xen-devel, netdev, ian.campbell, david.vrabel,
	boris.ostrovsky, konrad.wilk

On Wed, Jun 03, 2015 at 05:07:59PM +0000, Joao Martins wrote:
[...]
> > 
> > How much harder would it be to ref-count inflight grants? Would that
> > simplify or perplex things? I'm just asking, not suggesting you should
> > choose ref-counting over current scheme.
> > 
> > In principle I favour simple code path over optimisation for every
> > possible corner case.
> 
> ref-counting the persistent grants would mean eliminating the check for
> EBUSY on xenvif_pgrant_new, but though it isn’t that much of a simplification.
> 

Right.

> What would simplify a lot is if I grant map when we don’t get a persistent_gnt
> in xenvif_pgrant_new() and add it to the tree there instead of doing on xenvif_tx_check_gop.
> Since this happens only once for persistent grants (and up to ring size), I believe it
> wouldn't hurt performance.
> 

Yeah. Mapping page inside xenvif_tx_check_gop doesn't sound nice.

> This way we would remove a lot of the checks in xenvif_tx_check_gop and
> hopefully leaving those parts (almost) intact mainly to be used for grant
> map/unmap case. The reason I didn’t do it because I wanted to reuse the
> grant map code and thought that preference was given towards batching the
> grant maps. But it looks that it definitely makes things more complicated
> and adds more corner cases.
> 
> The same goes for the RX case where this change would remove a lot of the code
> for adding the grant maps (thus sharing a lot from the TX part) besides removing the
> mixed initial grant copy + map. What do you think?
> 

I couldn't really comment until I see the code. But in principle I think
this is a step towards the right direction.

Wei.

> Joao

^ permalink raw reply	[flat|nested] 98+ messages in thread

* Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
  2015-06-03 17:07         ` Joao Martins
@ 2015-06-07 12:04           ` Wei Liu
  2015-06-07 12:04           ` Wei Liu
  1 sibling, 0 replies; 98+ messages in thread
From: Wei Liu @ 2015-06-07 12:04 UTC (permalink / raw)
  To: Joao Martins
  Cc: Wei Liu, ian.campbell, netdev, david.vrabel, xen-devel, boris.ostrovsky

On Wed, Jun 03, 2015 at 05:07:59PM +0000, Joao Martins wrote:
[...]
> > 
> > How much harder would it be to ref-count inflight grants? Would that
> > simplify or perplex things? I'm just asking, not suggesting you should
> > choose ref-counting over current scheme.
> > 
> > In principle I favour simple code path over optimisation for every
> > possible corner case.
> 
> ref-counting the persistent grants would mean eliminating the check for
> EBUSY on xenvif_pgrant_new, but though it isn’t that much of a simplification.
> 

Right.

> What would simplify a lot is if I grant map when we don’t get a persistent_gnt
> in xenvif_pgrant_new() and add it to the tree there instead of doing on xenvif_tx_check_gop.
> Since this happens only once for persistent grants (and up to ring size), I believe it
> wouldn't hurt performance.
> 

Yeah. Mapping page inside xenvif_tx_check_gop doesn't sound nice.

> This way we would remove a lot of the checks in xenvif_tx_check_gop and
> hopefully leaving those parts (almost) intact mainly to be used for grant
> map/unmap case. The reason I didn’t do it because I wanted to reuse the
> grant map code and thought that preference was given towards batching the
> grant maps. But it looks that it definitely makes things more complicated
> and adds more corner cases.
> 
> The same goes for the RX case where this change would remove a lot of the code
> for adding the grant maps (thus sharing a lot from the TX part) besides removing the
> mixed initial grant copy + map. What do you think?
> 

I couldn't really comment until I see the code. But in principle I think
this is a step towards the right direction.

Wei.

> Joao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 98+ messages in thread

end of thread, other threads:[~2015-06-07 12:04 UTC | newest]

Thread overview: 98+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-12 17:18 [RFC PATCH 00/13] Persistent grant maps for xen net drivers Joao Martins
2015-05-12 17:18 ` Joao Martins
2015-05-12 17:18 ` [RFC PATCH 01/13] xen-netback: add persistent grant tree ops Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-12 17:18 ` [RFC PATCH 02/13] xen-netback: xenbus feature persistent support Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:19   ` Wei Liu
2015-05-22 10:24     ` Joao Martins
2015-05-22 10:24     ` Joao Martins
2015-05-19 15:19   ` Wei Liu
2015-05-12 17:18 ` [RFC PATCH 03/13] xen-netback: implement TX persistent grants Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:23   ` Wei Liu
2015-05-22 10:24     ` Joao Martins
2015-05-22 10:24     ` Joao Martins
2015-06-02 14:53       ` Wei Liu
2015-06-02 14:53       ` Wei Liu
2015-06-03 17:07         ` Joao Martins
2015-06-03 17:07         ` Joao Martins
2015-06-07 12:04           ` Wei Liu
2015-06-07 12:04           ` Wei Liu
2015-05-19 15:23   ` Wei Liu
2015-05-12 17:18 ` [RFC PATCH 04/13] xen-netback: implement RX " Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:32   ` Wei Liu
2015-05-22 10:25     ` Joao Martins
2015-05-22 10:25     ` Joao Martins
2015-06-02 15:07       ` Wei Liu
2015-06-02 15:07       ` Wei Liu
2015-06-03 17:08         ` Joao Martins
2015-06-03 17:08         ` Joao Martins
2015-05-19 15:32   ` Wei Liu
2015-05-12 17:18 ` [RFC PATCH 05/13] xen-netback: refactor xenvif_rx_action Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:32   ` Wei Liu
2015-05-19 15:32   ` Wei Liu
2015-05-12 17:18 ` [RFC PATCH 06/13] xen-netback: copy buffer on xenvif_start_xmit() Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:35   ` Wei Liu
2015-05-19 15:35   ` Wei Liu
2015-05-22 10:26     ` Joao Martins
2015-05-22 10:26     ` Joao Martins
2015-06-02 15:10       ` Wei Liu
2015-06-02 15:10       ` Wei Liu
2015-05-12 17:18 ` [RFC PATCH 07/13] xen-netback: add persistent tree counters to debugfs Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:36   ` Wei Liu
2015-05-19 15:36   ` Wei Liu
2015-05-12 17:18 ` [RFC PATCH 08/13] xen-netback: clone skb if skb->xmit_more is set Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-19 15:36   ` Wei Liu
2015-05-19 15:36   ` Wei Liu
2015-05-22 17:14     ` Joao Martins
2015-05-22 17:14     ` Joao Martins
2015-05-12 17:18 ` [RFC PATCH 09/13] xen-netfront: move grant_{ref, page} to struct grant Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-18 15:44   ` David Vrabel
2015-05-18 15:44   ` [Xen-devel] " David Vrabel
2015-05-19 10:19     ` Joao Martins
2015-05-19 10:19     ` Joao Martins
2015-05-12 17:18 ` [RFC PATCH 10/13] xen-netfront: refactor claim/release grant Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-18 15:48   ` [Xen-devel] " David Vrabel
2015-05-19 10:19     ` Joao Martins
2015-05-19 10:19     ` Joao Martins
2015-05-18 15:48   ` David Vrabel
2015-05-12 17:18 ` [RFC PATCH 11/13] xen-netfront: feature-persistent xenbus support Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-18 15:51   ` David Vrabel
2015-05-18 15:51   ` [Xen-devel] " David Vrabel
2015-05-19 10:19     ` Joao Martins
2015-05-19 10:19     ` Joao Martins
2015-05-12 17:18 ` [RFC PATCH 12/13] xen-netfront: implement TX persistent grants Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-18 15:55   ` [Xen-devel] " David Vrabel
2015-05-19 10:20     ` Joao Martins
2015-05-19 10:23       ` David Vrabel
2015-05-19 10:23       ` [Xen-devel] " David Vrabel
2015-05-19 10:20     ` Joao Martins
2015-05-18 15:55   ` David Vrabel
2015-05-12 17:18 ` [RFC PATCH 13/13] xen-netfront: implement RX " Joao Martins
2015-05-12 17:18   ` Joao Martins
2015-05-18 16:04   ` [Xen-devel] " David Vrabel
2015-05-19 10:22     ` Joao Martins
2015-05-19 10:22     ` [Xen-devel] " Joao Martins
2015-05-18 16:04   ` David Vrabel
2015-05-13 10:50 ` [Xen-devel] [RFC PATCH 00/13] Persistent grant maps for xen net drivers David Vrabel
2015-05-13 13:01   ` Joao Martins
2015-05-13 13:01   ` Joao Martins
2015-05-13 10:50 ` David Vrabel
2015-05-19 15:39 ` Wei Liu
2015-05-19 15:39 ` Wei Liu
2015-05-22 10:27   ` Joao Martins
2015-05-22 10:27   ` Joao Martins
2015-05-29  6:53     ` Yuzhou (C)
2015-05-29  6:53     ` [Xen-devel] " Yuzhou (C)
2015-05-29 14:51       ` Joao Martins
2015-05-29 14:51       ` [Xen-devel] " Joao Martins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.