All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
@ 2015-01-13 14:05 David Vrabel
  2015-01-13 14:30 ` Wei Liu
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: David Vrabel @ 2015-01-13 14:05 UTC (permalink / raw)
  To: netdev; +Cc: David Vrabel, xen-devel, Ian Campbell, Wei Liu

Always fully coalesce guest Rx packets into the minimum number of ring
slots.  Reducing the number of slots per packet has significant
performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
receive test).

However, this does increase the number of grant ops per packet which
decreases performance with some workloads (intrahost VM to VM)
/unless/ grant copy has been optimized for adjacent ops with the same
source or destination (see "grant-table: defer releasing pages
acquired in a grant copy"[1]).

Do we need to retain the existing path and make the always coalesce
path conditional on a suitable version of Xen?

[1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
---
 drivers/net/xen-netback/common.h  |    1 -
 drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
 2 files changed, 3 insertions(+), 104 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5f1fda4..589fa25 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -251,7 +251,6 @@ struct xenvif {
 struct xenvif_rx_cb {
 	unsigned long expires;
 	int meta_slots_used;
-	bool full_coalesce;
 };
 
 #define XENVIF_RX_CB(skb) ((struct xenvif_rx_cb *)(skb)->cb)
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 908e65e..568238d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -233,51 +233,6 @@ static void xenvif_rx_queue_drop_expired(struct xenvif_queue *queue)
 	}
 }
 
-/*
- * Returns true if we should start a new receive buffer instead of
- * adding 'size' bytes to a buffer which currently contains 'offset'
- * bytes.
- */
-static bool start_new_rx_buffer(int offset, unsigned long size, int head,
-				bool full_coalesce)
-{
-	/* simple case: we have completely filled the current buffer. */
-	if (offset == MAX_BUFFER_OFFSET)
-		return true;
-
-	/*
-	 * complex case: start a fresh buffer if the current frag
-	 * would overflow the current buffer but only if:
-	 *     (i)   this frag would fit completely in the next buffer
-	 * and (ii)  there is already some data in the current buffer
-	 * and (iii) this is not the head buffer.
-	 * and (iv)  there is no need to fully utilize the buffers
-	 *
-	 * Where:
-	 * - (i) stops us splitting a frag into two copies
-	 *   unless the frag is too large for a single buffer.
-	 * - (ii) stops us from leaving a buffer pointlessly empty.
-	 * - (iii) stops us leaving the first buffer
-	 *   empty. Strictly speaking this is already covered
-	 *   by (ii) but is explicitly checked because
-	 *   netfront relies on the first buffer being
-	 *   non-empty and can crash otherwise.
-	 * - (iv) is needed for skbs which can use up more than MAX_SKB_FRAGS
-	 *   slot
-	 *
-	 * This means we will effectively linearise small
-	 * frags but do not needlessly split large buffers
-	 * into multiple copies tend to give large frags their
-	 * own buffers as before.
-	 */
-	BUG_ON(size > MAX_BUFFER_OFFSET);
-	if ((offset + size > MAX_BUFFER_OFFSET) && offset && !head &&
-	    !full_coalesce)
-		return true;
-
-	return false;
-}
-
 struct netrx_pending_operations {
 	unsigned copy_prod, copy_cons;
 	unsigned meta_prod, meta_cons;
@@ -336,24 +291,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff *skb
 		BUG_ON(offset >= PAGE_SIZE);
 		BUG_ON(npo->copy_off > MAX_BUFFER_OFFSET);
 
-		bytes = PAGE_SIZE - offset;
+		if (npo->copy_off == MAX_BUFFER_OFFSET)
+			meta = get_next_rx_buffer(queue, npo);
 
+		bytes = PAGE_SIZE - offset;
 		if (bytes > size)
 			bytes = size;
 
-		if (start_new_rx_buffer(npo->copy_off,
-					bytes,
-					*head,
-					XENVIF_RX_CB(skb)->full_coalesce)) {
-			/*
-			 * Netfront requires there to be some data in the head
-			 * buffer.
-			 */
-			BUG_ON(*head);
-
-			meta = get_next_rx_buffer(queue, npo);
-		}
-
 		if (npo->copy_off + bytes > MAX_BUFFER_OFFSET)
 			bytes = MAX_BUFFER_OFFSET - npo->copy_off;
 
@@ -652,60 +596,16 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
 	while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
 	       && (skb = xenvif_rx_dequeue(queue)) != NULL) {
-		RING_IDX max_slots_needed;
 		RING_IDX old_req_cons;
 		RING_IDX ring_slots_used;
 		int i;
 
 		queue->last_rx_time = jiffies;
 
-		/* We need a cheap worse case estimate for the number of
-		 * slots we'll use.
-		 */
-
-		max_slots_needed = DIV_ROUND_UP(offset_in_page(skb->data) +
-						skb_headlen(skb),
-						PAGE_SIZE);
-		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-			unsigned int size;
-			unsigned int offset;
-
-			size = skb_frag_size(&skb_shinfo(skb)->frags[i]);
-			offset = skb_shinfo(skb)->frags[i].page_offset;
-
-			/* For a worse-case estimate we need to factor in
-			 * the fragment page offset as this will affect the
-			 * number of times xenvif_gop_frag_copy() will
-			 * call start_new_rx_buffer().
-			 */
-			max_slots_needed += DIV_ROUND_UP(offset + size,
-							 PAGE_SIZE);
-		}
-
-		/* To avoid the estimate becoming too pessimal for some
-		 * frontends that limit posted rx requests, cap the estimate
-		 * at MAX_SKB_FRAGS. In this case netback will fully coalesce
-		 * the skb into the provided slots.
-		 */
-		if (max_slots_needed > MAX_SKB_FRAGS) {
-			max_slots_needed = MAX_SKB_FRAGS;
-			XENVIF_RX_CB(skb)->full_coalesce = true;
-		} else {
-			XENVIF_RX_CB(skb)->full_coalesce = false;
-		}
-
-		/* We may need one more slot for GSO metadata */
-		if (skb_is_gso(skb) &&
-		   (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4 ||
-		    skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6))
-			max_slots_needed++;
-
 		old_req_cons = queue->rx.req_cons;
 		XENVIF_RX_CB(skb)->meta_slots_used = xenvif_gop_skb(skb, &npo, queue);
 		ring_slots_used = queue->rx.req_cons - old_req_cons;
 
-		BUG_ON(ring_slots_used > max_slots_needed);
-
 		__skb_queue_tail(&rxq, skb);
 	}
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-13 14:05 [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets David Vrabel
@ 2015-01-13 14:30 ` Wei Liu
  2015-01-19 17:36   ` David Vrabel
  2015-01-19 17:36   ` [Xen-devel] " David Vrabel
  2015-01-13 14:30 ` Wei Liu
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 13+ messages in thread
From: Wei Liu @ 2015-01-13 14:30 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, xen-devel, Ian Campbell, Wei Liu

On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
> Always fully coalesce guest Rx packets into the minimum number of ring
> slots.  Reducing the number of slots per packet has significant
> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> receive test).
> 

Good number.

> However, this does increase the number of grant ops per packet which
> decreases performance with some workloads (intrahost VM to VM)

Do you have figures before and after this change?

> /unless/ grant copy has been optimized for adjacent ops with the same
> source or destination (see "grant-table: defer releasing pages
> acquired in a grant copy"[1]).
> 
> Do we need to retain the existing path and make the always coalesce
> path conditional on a suitable version of Xen?
> 

It the new path improves off-host RX on all Xen versions and doesn't
degrade intrahost VM to VM RX that much, I think we should use it
unconditionally.  Is intrahost VM to VM RX important to XenServer?

I don't consider intrahost VM to VM RX a very important use case, at
least not as important as off-host RX. I would expect in a could
environment users would not count on their VMs reside on the same host.
Plus, some could provider might deliberately route traffic off-host for
various reasons even if VMs are on the same host.  (Verizon for one,
mentioned they do that during last year's Xen Summit IIRC).

Others might disagree. Let's wait for other people to chime in.

> [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/net/xen-netback/common.h  |    1 -
>  drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
>  2 files changed, 3 insertions(+), 104 deletions(-)

Love the diffstat!

Wei.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-13 14:05 [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets David Vrabel
  2015-01-13 14:30 ` Wei Liu
@ 2015-01-13 14:30 ` Wei Liu
  2015-01-20 11:34 ` Ian Campbell
  2015-01-20 11:34 ` Ian Campbell
  3 siblings, 0 replies; 13+ messages in thread
From: Wei Liu @ 2015-01-13 14:30 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, Wei Liu, Ian Campbell, xen-devel

On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
> Always fully coalesce guest Rx packets into the minimum number of ring
> slots.  Reducing the number of slots per packet has significant
> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> receive test).
> 

Good number.

> However, this does increase the number of grant ops per packet which
> decreases performance with some workloads (intrahost VM to VM)

Do you have figures before and after this change?

> /unless/ grant copy has been optimized for adjacent ops with the same
> source or destination (see "grant-table: defer releasing pages
> acquired in a grant copy"[1]).
> 
> Do we need to retain the existing path and make the always coalesce
> path conditional on a suitable version of Xen?
> 

It the new path improves off-host RX on all Xen versions and doesn't
degrade intrahost VM to VM RX that much, I think we should use it
unconditionally.  Is intrahost VM to VM RX important to XenServer?

I don't consider intrahost VM to VM RX a very important use case, at
least not as important as off-host RX. I would expect in a could
environment users would not count on their VMs reside on the same host.
Plus, some could provider might deliberately route traffic off-host for
various reasons even if VMs are on the same host.  (Verizon for one,
mentioned they do that during last year's Xen Summit IIRC).

Others might disagree. Let's wait for other people to chime in.

> [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> ---
>  drivers/net/xen-netback/common.h  |    1 -
>  drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
>  2 files changed, 3 insertions(+), 104 deletions(-)

Love the diffstat!

Wei.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xen-devel] [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-13 14:30 ` Wei Liu
  2015-01-19 17:36   ` David Vrabel
@ 2015-01-19 17:36   ` David Vrabel
  2015-01-20 11:21     ` Ian Campbell
                       ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: David Vrabel @ 2015-01-19 17:36 UTC (permalink / raw)
  To: Wei Liu, David Vrabel; +Cc: netdev, Ian Campbell, xen-devel, Jonathan Davies

On 13/01/15 14:30, Wei Liu wrote:
> On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
>> Always fully coalesce guest Rx packets into the minimum number of ring
>> slots.  Reducing the number of slots per packet has significant
>> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
>> receive test).
>>
> 
> Good number.
> 
>> However, this does increase the number of grant ops per packet which
>> decreases performance with some workloads (intrahost VM to VM)
> 
> Do you have figures before and after this change?

Some better (more rigorous) results done by Jonathan Davies shows no
regressions with full coalescing even without the grant copy
optimization, and a big improvement to single stream receive.

                         baseline    Full coalesce
Interhost aggregate      24 Gb/s     24 Gb/s
Interhost VM receive      7.2 Gb/s   11 Gb/s
Intrahost single stream  14 Gb/s     14 Gb/s
Intrahost aggregate      34 Gb/s     34 Gb/s

We do not measure the performance of dom0 to guest traffic but my ad-hoc
measurements suggest this may be 5-10% slower.  I don't think this is a
very important use case though.

So...

>> /unless/ grant copy has been optimized for adjacent ops with the same
>> source or destination (see "grant-table: defer releasing pages
>> acquired in a grant copy"[1]).
>>
>> Do we need to retain the existing path and make the always coalesce
>> path conditional on a suitable version of Xen?

...I think the answer to this is no.

>> ---
>>  drivers/net/xen-netback/common.h  |    1 -
>>  drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
>>  2 files changed, 3 insertions(+), 104 deletions(-)
> 
> Love the diffstat!

Yes, it's always nice when you delete code and it goes faster... :)

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-13 14:30 ` Wei Liu
@ 2015-01-19 17:36   ` David Vrabel
  2015-01-19 17:36   ` [Xen-devel] " David Vrabel
  1 sibling, 0 replies; 13+ messages in thread
From: David Vrabel @ 2015-01-19 17:36 UTC (permalink / raw)
  To: Wei Liu, David Vrabel; +Cc: netdev, Jonathan Davies, Ian Campbell, xen-devel

On 13/01/15 14:30, Wei Liu wrote:
> On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
>> Always fully coalesce guest Rx packets into the minimum number of ring
>> slots.  Reducing the number of slots per packet has significant
>> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
>> receive test).
>>
> 
> Good number.
> 
>> However, this does increase the number of grant ops per packet which
>> decreases performance with some workloads (intrahost VM to VM)
> 
> Do you have figures before and after this change?

Some better (more rigorous) results done by Jonathan Davies shows no
regressions with full coalescing even without the grant copy
optimization, and a big improvement to single stream receive.

                         baseline    Full coalesce
Interhost aggregate      24 Gb/s     24 Gb/s
Interhost VM receive      7.2 Gb/s   11 Gb/s
Intrahost single stream  14 Gb/s     14 Gb/s
Intrahost aggregate      34 Gb/s     34 Gb/s

We do not measure the performance of dom0 to guest traffic but my ad-hoc
measurements suggest this may be 5-10% slower.  I don't think this is a
very important use case though.

So...

>> /unless/ grant copy has been optimized for adjacent ops with the same
>> source or destination (see "grant-table: defer releasing pages
>> acquired in a grant copy"[1]).
>>
>> Do we need to retain the existing path and make the always coalesce
>> path conditional on a suitable version of Xen?

...I think the answer to this is no.

>> ---
>>  drivers/net/xen-netback/common.h  |    1 -
>>  drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
>>  2 files changed, 3 insertions(+), 104 deletions(-)
> 
> Love the diffstat!

Yes, it's always nice when you delete code and it goes faster... :)

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xen-devel] [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-19 17:36   ` [Xen-devel] " David Vrabel
@ 2015-01-20 11:21     ` Ian Campbell
  2015-01-20 11:21     ` Ian Campbell
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-01-20 11:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, netdev, xen-devel, Jonathan Davies

On Mon, 2015-01-19 at 17:36 +0000, David Vrabel wrote:
> On 13/01/15 14:30, Wei Liu wrote:
> > On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
> >> Always fully coalesce guest Rx packets into the minimum number of ring
> >> slots.  Reducing the number of slots per packet has significant
> >> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> >> receive test).
> >>
> > 
> > Good number.
> > 
> >> However, this does increase the number of grant ops per packet which
> >> decreases performance with some workloads (intrahost VM to VM)
> > 
> > Do you have figures before and after this change?
> 
> Some better (more rigorous) results done by Jonathan Davies shows no
> regressions with full coalescing even without the grant copy
> optimization, and a big improvement to single stream receive.
> 
>                          baseline    Full coalesce
> Interhost aggregate      24 Gb/s     24 Gb/s
> Interhost VM receive      7.2 Gb/s   11 Gb/s
> Intrahost single stream  14 Gb/s     14 Gb/s
> Intrahost aggregate      34 Gb/s     34 Gb/s
> 
> We do not measure the performance of dom0 to guest traffic but my ad-hoc
> measurements suggest this may be 5-10% slower.  I don't think this is a
> very important use case though.

If you are updating your dom0 kernel to take advantage of this
improvement and you care about dom0->domU performance too then also
updating your Xen at the same is not a huge deal, I think. Or at least I
don't consider it a blocker for making progress (certainly not progress
of the order of 50% improvements!).

> So...
> 
> >> /unless/ grant copy has been optimized for adjacent ops with the same
> >> source or destination (see "grant-table: defer releasing pages
> >> acquired in a grant copy"[1]).
> >>
> >> Do we need to retain the existing path and make the always coalesce
> >> path conditional on a suitable version of Xen?
> 
> ...I think the answer to this is no.

Agreed.

> >> ---
> >>  drivers/net/xen-netback/common.h  |    1 -
> >>  drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
> >>  2 files changed, 3 insertions(+), 104 deletions(-)
> > 
> > Love the diffstat!
> 
> Yes, it's always nice when you delete code and it goes faster... :)

Full-Ack to that ;-)

Ian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-19 17:36   ` [Xen-devel] " David Vrabel
  2015-01-20 11:21     ` Ian Campbell
@ 2015-01-20 11:21     ` Ian Campbell
  2015-01-20 11:23     ` Ian Campbell
  2015-01-20 11:23     ` [Xen-devel] " Ian Campbell
  3 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-01-20 11:21 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, Jonathan Davies, Wei Liu, xen-devel

On Mon, 2015-01-19 at 17:36 +0000, David Vrabel wrote:
> On 13/01/15 14:30, Wei Liu wrote:
> > On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
> >> Always fully coalesce guest Rx packets into the minimum number of ring
> >> slots.  Reducing the number of slots per packet has significant
> >> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> >> receive test).
> >>
> > 
> > Good number.
> > 
> >> However, this does increase the number of grant ops per packet which
> >> decreases performance with some workloads (intrahost VM to VM)
> > 
> > Do you have figures before and after this change?
> 
> Some better (more rigorous) results done by Jonathan Davies shows no
> regressions with full coalescing even without the grant copy
> optimization, and a big improvement to single stream receive.
> 
>                          baseline    Full coalesce
> Interhost aggregate      24 Gb/s     24 Gb/s
> Interhost VM receive      7.2 Gb/s   11 Gb/s
> Intrahost single stream  14 Gb/s     14 Gb/s
> Intrahost aggregate      34 Gb/s     34 Gb/s
> 
> We do not measure the performance of dom0 to guest traffic but my ad-hoc
> measurements suggest this may be 5-10% slower.  I don't think this is a
> very important use case though.

If you are updating your dom0 kernel to take advantage of this
improvement and you care about dom0->domU performance too then also
updating your Xen at the same is not a huge deal, I think. Or at least I
don't consider it a blocker for making progress (certainly not progress
of the order of 50% improvements!).

> So...
> 
> >> /unless/ grant copy has been optimized for adjacent ops with the same
> >> source or destination (see "grant-table: defer releasing pages
> >> acquired in a grant copy"[1]).
> >>
> >> Do we need to retain the existing path and make the always coalesce
> >> path conditional on a suitable version of Xen?
> 
> ...I think the answer to this is no.

Agreed.

> >> ---
> >>  drivers/net/xen-netback/common.h  |    1 -
> >>  drivers/net/xen-netback/netback.c |  106 ++-----------------------------------
> >>  2 files changed, 3 insertions(+), 104 deletions(-)
> > 
> > Love the diffstat!
> 
> Yes, it's always nice when you delete code and it goes faster... :)

Full-Ack to that ;-)

Ian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Xen-devel] [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-19 17:36   ` [Xen-devel] " David Vrabel
                       ` (2 preceding siblings ...)
  2015-01-20 11:23     ` Ian Campbell
@ 2015-01-20 11:23     ` Ian Campbell
  3 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-01-20 11:23 UTC (permalink / raw)
  To: David Vrabel; +Cc: Wei Liu, netdev, xen-devel, Jonathan Davies

On Mon, 2015-01-19 at 17:36 +0000, David Vrabel wrote:
> On 13/01/15 14:30, Wei Liu wrote:
> > On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
> >> Always fully coalesce guest Rx packets into the minimum number of ring
> >> slots.  Reducing the number of slots per packet has significant
> >> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> >> receive test).
> >>
> > 
> > Good number.
> > 
> >> However, this does increase the number of grant ops per packet which
> >> decreases performance with some workloads (intrahost VM to VM)
> > 
> > Do you have figures before and after this change?
> 
> Some better (more rigorous) results done by Jonathan Davies shows no
> regressions with full coalescing even without the grant copy
> optimization, and a big improvement to single stream receive.
> 
>                          baseline    Full coalesce
> Interhost aggregate      24 Gb/s     24 Gb/s
> Interhost VM receive      7.2 Gb/s   11 Gb/s
> Intrahost single stream  14 Gb/s     14 Gb/s
> Intrahost aggregate      34 Gb/s     34 Gb/s

Do you have a third column to hand, for Full-coalesce w/ grant copy
optimisation case? (Don't worry if not)

Ian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-19 17:36   ` [Xen-devel] " David Vrabel
  2015-01-20 11:21     ` Ian Campbell
  2015-01-20 11:21     ` Ian Campbell
@ 2015-01-20 11:23     ` Ian Campbell
  2015-01-20 11:23     ` [Xen-devel] " Ian Campbell
  3 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-01-20 11:23 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, Jonathan Davies, Wei Liu, xen-devel

On Mon, 2015-01-19 at 17:36 +0000, David Vrabel wrote:
> On 13/01/15 14:30, Wei Liu wrote:
> > On Tue, Jan 13, 2015 at 02:05:17PM +0000, David Vrabel wrote:
> >> Always fully coalesce guest Rx packets into the minimum number of ring
> >> slots.  Reducing the number of slots per packet has significant
> >> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> >> receive test).
> >>
> > 
> > Good number.
> > 
> >> However, this does increase the number of grant ops per packet which
> >> decreases performance with some workloads (intrahost VM to VM)
> > 
> > Do you have figures before and after this change?
> 
> Some better (more rigorous) results done by Jonathan Davies shows no
> regressions with full coalescing even without the grant copy
> optimization, and a big improvement to single stream receive.
> 
>                          baseline    Full coalesce
> Interhost aggregate      24 Gb/s     24 Gb/s
> Interhost VM receive      7.2 Gb/s   11 Gb/s
> Intrahost single stream  14 Gb/s     14 Gb/s
> Intrahost aggregate      34 Gb/s     34 Gb/s

Do you have a third column to hand, for Full-coalesce w/ grant copy
optimisation case? (Don't worry if not)

Ian.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-13 14:05 [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets David Vrabel
                   ` (2 preceding siblings ...)
  2015-01-20 11:34 ` Ian Campbell
@ 2015-01-20 11:34 ` Ian Campbell
  2015-01-20 11:38   ` David Vrabel
  2015-01-20 11:38   ` David Vrabel
  3 siblings, 2 replies; 13+ messages in thread
From: Ian Campbell @ 2015-01-20 11:34 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, xen-devel, Wei Liu

On Tue, 2015-01-13 at 14:05 +0000, David Vrabel wrote:
> Always fully coalesce guest Rx packets into the minimum number of ring
> slots.  Reducing the number of slots per packet has significant
> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> receive test).
> 
> However, this does increase the number of grant ops per packet which
> decreases performance with some workloads (intrahost VM to VM)
> /unless/ grant copy has been optimized for adjacent ops with the same
> source or destination (see "grant-table: defer releasing pages
> acquired in a grant copy"[1]).
> 
> Do we need to retain the existing path and make the always coalesce
> path conditional on a suitable version of Xen?
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Looks good to me, thanks. Based on that plus the conversation in the
other subthread:

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-13 14:05 [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets David Vrabel
  2015-01-13 14:30 ` Wei Liu
  2015-01-13 14:30 ` Wei Liu
@ 2015-01-20 11:34 ` Ian Campbell
  2015-01-20 11:34 ` Ian Campbell
  3 siblings, 0 replies; 13+ messages in thread
From: Ian Campbell @ 2015-01-20 11:34 UTC (permalink / raw)
  To: David Vrabel; +Cc: netdev, Wei Liu, xen-devel

On Tue, 2015-01-13 at 14:05 +0000, David Vrabel wrote:
> Always fully coalesce guest Rx packets into the minimum number of ring
> slots.  Reducing the number of slots per packet has significant
> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
> receive test).
> 
> However, this does increase the number of grant ops per packet which
> decreases performance with some workloads (intrahost VM to VM)
> /unless/ grant copy has been optimized for adjacent ops with the same
> source or destination (see "grant-table: defer releasing pages
> acquired in a grant copy"[1]).
> 
> Do we need to retain the existing path and make the always coalesce
> path conditional on a suitable version of Xen?
> 
> [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>

Looks good to me, thanks. Based on that plus the conversation in the
other subthread:

Acked-by: Ian Campbell <ian.campbell@citrix.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-20 11:34 ` Ian Campbell
@ 2015-01-20 11:38   ` David Vrabel
  2015-01-20 11:38   ` David Vrabel
  1 sibling, 0 replies; 13+ messages in thread
From: David Vrabel @ 2015-01-20 11:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, xen-devel, Wei Liu

On 20/01/15 11:34, Ian Campbell wrote:
> On Tue, 2015-01-13 at 14:05 +0000, David Vrabel wrote:
>> Always fully coalesce guest Rx packets into the minimum number of ring
>> slots.  Reducing the number of slots per packet has significant
>> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
>> receive test).
>>
>> However, this does increase the number of grant ops per packet which
>> decreases performance with some workloads (intrahost VM to VM)
>> /unless/ grant copy has been optimized for adjacent ops with the same
>> source or destination (see "grant-table: defer releasing pages
>> acquired in a grant copy"[1]).
>>
>> Do we need to retain the existing path and make the always coalesce
>> path conditional on a suitable version of Xen?
>>
>> [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Looks good to me, thanks. Based on that plus the conversation in the
> other subthread:
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks.  I'll resend with a better commit message.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets
  2015-01-20 11:34 ` Ian Campbell
  2015-01-20 11:38   ` David Vrabel
@ 2015-01-20 11:38   ` David Vrabel
  1 sibling, 0 replies; 13+ messages in thread
From: David Vrabel @ 2015-01-20 11:38 UTC (permalink / raw)
  To: Ian Campbell; +Cc: netdev, Wei Liu, xen-devel

On 20/01/15 11:34, Ian Campbell wrote:
> On Tue, 2015-01-13 at 14:05 +0000, David Vrabel wrote:
>> Always fully coalesce guest Rx packets into the minimum number of ring
>> slots.  Reducing the number of slots per packet has significant
>> performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
>> receive test).
>>
>> However, this does increase the number of grant ops per packet which
>> decreases performance with some workloads (intrahost VM to VM)
>> /unless/ grant copy has been optimized for adjacent ops with the same
>> source or destination (see "grant-table: defer releasing pages
>> acquired in a grant copy"[1]).
>>
>> Do we need to retain the existing path and make the always coalesce
>> path conditional on a suitable version of Xen?
>>
>> [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
>>
>> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> 
> Looks good to me, thanks. Based on that plus the conversation in the
> other subthread:
> 
> Acked-by: Ian Campbell <ian.campbell@citrix.com>

Thanks.  I'll resend with a better commit message.

David

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-01-20 11:38 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-13 14:05 [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets David Vrabel
2015-01-13 14:30 ` Wei Liu
2015-01-19 17:36   ` David Vrabel
2015-01-19 17:36   ` [Xen-devel] " David Vrabel
2015-01-20 11:21     ` Ian Campbell
2015-01-20 11:21     ` Ian Campbell
2015-01-20 11:23     ` Ian Campbell
2015-01-20 11:23     ` [Xen-devel] " Ian Campbell
2015-01-13 14:30 ` Wei Liu
2015-01-20 11:34 ` Ian Campbell
2015-01-20 11:34 ` Ian Campbell
2015-01-20 11:38   ` David Vrabel
2015-01-20 11:38   ` David Vrabel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.