All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Paul Durrant <paul.durrant@citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	David Vrabel <david.vrabel@citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>
Subject: Re: [RFC] netif: staging grants for requests
Date: Thu, 5 Jan 2017 20:27:07 +0000	[thread overview]
Message-ID: <586EAC1B.2000905@oracle.com> (raw)
In-Reply-To: <20170104135456.GM13806@citrix.com>

On 01/04/2017 01:54 PM, Wei Liu wrote:
> Hey!
Hey!

> Thanks for writing this detailed document!
Thanks a lot for the review and comments!

> 
> On Wed, Dec 14, 2016 at 06:11:12PM +0000, Joao Martins wrote:
>> Hey,
>>
>> Back in the Xen hackaton '16 networking session there were a couple of ideas
>> brought up. One of them was about exploring permanently mapped grants between
>> xen-netback/xen-netfront.
>>
>> I started experimenting and came up with sort of a design document (in pandoc)
>> on what it would like to be proposed. This is meant as a seed for discussion
>> and also requesting input to know if this is a good direction. Of course, I
>> am willing to try alternatives that we come up beyond the contents of the
>> spec, or any other suggested changes ;)
>>
>> Any comments or feedback is welcome!
>>
>> Cheers,
>> Joao
>>
>> ---
>> % Staging grants for network I/O requests
>> % Joao Martins <<joao.m.martins@oracle.com>>
>> % Revision 1
>>
>> \clearpage
>>
>> --------------------------------------------------------------------
>> Status: **Experimental**
>>
>> Architecture(s): x86 and ARM
>>
> 
> Any.
OK.

> 
>> Component(s): Guest
>>
>> Hardware: Intel and AMD
> 
> No need to specify this.
OK.

> 
>> --------------------------------------------------------------------
>>
>> # Background and Motivation
>>
> 
> I skimmed through the middle -- I think you description of transmissions
> in both directions is accurate.
> 
> The proposal to replace some steps with explicit memcpy is also
> sensible.
Glad to hear that!

> 
>> \clearpage
>>
>> ## Performance
>>
>> Numbers that give a rough idea on the performance benefits of this extension.
>> These are Guest <-> Dom0 which test the communication between backend and
>> frontend, excluding other bottlenecks in the datapath (the software switch).
>>
>> ```
>> # grant copy
>> Guest TX (1vcpu,  64b, UDP in pps):  1 506 170 pps
>> Guest TX (4vcpu,  64b, UDP in pps):  4 988 563 pps
>> Guest TX (1vcpu, 256b, UDP in pps):  1 295 001 pps
>> Guest TX (4vcpu, 256b, UDP in pps):  4 249 211 pps
>>
>> # grant copy + grant map (see next subsection)
>> Guest TX (1vcpu, 260b, UDP in pps):    577 782 pps
>> Guest TX (4vcpu, 260b, UDP in pps):  1 218 273 pps
>>
>> # drop at the guest network stack
>> Guest RX (1vcpu,  64b, UDP in pps):  1 549 630 pps
>> Guest RX (4vcpu,  64b, UDP in pps):  2 870 947 pps
>> ```
>>
>> With this extension:
>> ```
>> # memcpy
>> data-len=256 TX (1vcpu,  64b, UDP in pps):  3 759 012 pps
>> data-len=256 TX (4vcpu,  64b, UDP in pps): 12 416 436 pps
> 
> This basically means we can almost get line rate for 10Gb link.
> 
> It is already a  good result. I'm interested in knowing if there is
> possibility to approach 40 or 100 Gb/s?
Certainly, so with bulk transfer we can already saturate a 40 Gbit/s NIC,
sending out from a guest to an external host. I got ~80 Gbit/s too but between
guests on the same host (some time ago back in xen 4.7). 100 Gbit/s is also on
my radar.

The problem comes with smaller packets <= MTU (and request/response workloads
with small payloads) and there is where we lack the performance. Specially
speaking of the workload with the very small packets, linux has a hard time
saturating those NICs (with XDP now rising up to the challenge); I think only
DPDK is able to at this point [*].

[*] Section 7.1,
https://download.01.org/packet-processing/ONPS2.1/Intel_ONP_Release_2.1_Performance_Test_Report_Rev1.0.pdf

> It would be good if we design this extension with higher goals in mind.
Totally agree!

>> data-len=256 TX (1vcpu, 256b, UDP in pps):  3 248 392 pps
>> data-len=256 TX (4vcpu, 256b, UDP in pps): 11 165 355 pps
>>
>> # memcpy + grant map (see next subsection)
>> data-len=256 TX (1vcpu, 260b, UDP in pps):    588 428 pps
>> data-len=256 TX (4vcpu, 260b, UDP in pps):  1 668 044 pps
>>
>> # (drop at the guest network stack)
>> data-len=256 RX (1vcpu,  64b, UDP in pps):  3 285 362 pps
>> data-len=256 RX (4vcpu,  64b, UDP in pps): 11 761 847 pps
>>
>> # (drop with guest XDP_DROP prog)
>> data-len=256 RX (1vcpu,  64b, UDP in pps):  9 466 591 pps
>> data-len=256 RX (4vcpu,  64b, UDP in pps): 33 006 157 pps
>> ```
>>
>> Latency measurements (netperf TCP_RR request size 1 and response size 1):
>> ```
>> 24 KTps vs 28 KTps
>> 39 KTps vs 50 KTps (with kernel busy poll)
>> ```
>>
>> TCP Bulk transfer measurements aren't showing a representative increase on
>> maximum throughput (sometimes ~10%), but rather less retransmissions and
>> more stable. This is probably because of being having a slight decrease in rtt
>> time (i.e. receiver acknowledging data quicker). Currently trying exploring
>> other data list sizes and probably will have a better idea on the effects of
>> this.
>>
>> ## Linux grant copy vs map remark
>>
>> Based on numbers above there's a sudden 2x performance drop when we switch from
>> grant copy to also grant map the ` gref`: 1 295 001 vs  577 782 for 256 and 260
>> packets bytes respectivally. Which is all the more visible when removing the grant
>> copy with memcpy in this extension (3 248 392 vs 588 428). While there's been
>> discussions of avoid the TLB unflush on unmap, one could wonder what the
>> threshold of that improvement would be. Chances are that this is the least of
>> our concerns in a fully poppulated host (or with an oversubscribed one). Would
>> it be worth experimenting increasing the threshold of the copy beyond the
>> header?
>>
> 
> Yes, it would be interesting to see more data points and provide
> sensible default. But I think this is secondary goal because "sensible
> default" can change overtime and on different environments.
Indeed; I am experimenting with more data points and other workloads to add up here.

>> \clearpage
>>
>> # References
>>
>> [0] http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg01504.html
>>
>> [1] https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/netmap_mem2.c#L362
>>
>> [2] https://www.freebsd.org/cgi/man.cgi?query=vale&sektion=4&n=1
>>
>> [3] https://github.com/iovisor/bpf-docs/blob/master/Express_Data_Path.pdf
>>
>> [4]
>> http://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/requirements.html#write-access-to-packet-data
>>
>> [5] http://lxr.free-electrons.com/source/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c#L2073
>>
>> [6] http://lxr.free-electrons.com/source/drivers/net/ethernet/mellanox/mlx4/en_rx.c#L52
>>
>> # History
>>
>> A table of changes to the document, in chronological order.
>>
>> ------------------------------------------------------------------------
>> Date       Revision Version  Notes
>> ---------- -------- -------- -------------------------------------------
>> 2016-12-14 1        Xen 4.9  Initial version.
>> ---------- -------- -------- -------------------------------------------

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-01-05 20:24 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-14 18:11 [RFC] netif: staging grants for requests Joao Martins
2017-01-04 13:54 ` Wei Liu
2017-01-05 20:27   ` Joao Martins [this message]
2017-01-04 19:40 ` Stefano Stabellini
2017-01-05 11:54   ` Wei Liu
2017-01-05 20:27   ` Joao Martins
2017-01-06  0:30     ` Stefano Stabellini
2017-01-06 17:13       ` Joao Martins
2017-01-06 19:02         ` Stefano Stabellini
2017-01-06  9:33 ` Paul Durrant
2017-01-06 19:18   ` Stefano Stabellini
2017-01-06 20:19     ` Joao Martins
2017-01-09  9:03     ` Paul Durrant
2017-01-09 18:25       ` Stefano Stabellini
2017-01-06 20:08   ` Joao Martins
2017-01-09  8:56     ` Paul Durrant
2017-01-09 13:01       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=586EAC1B.2000905@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=david.vrabel@citrix.com \
    --cc=paul.durrant@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.