All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wei Liu <wei.liu2@citrix.com>
To: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>,
	ian.campbell@citrix.com, xen-devel@lists.xen.org,
	annie li <annie.li@oracle.com>,
	andrew.bennieston@citrix.com
Subject: Re: Interesting observation with network event notification and batching
Date: Mon, 1 Jul 2013 15:39:19 +0100	[thread overview]
Message-ID: <20130701143919.GG7483@zion.uk.xensource.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1307011522460.4525@kaball.uk.xensource.com>

On Mon, Jul 01, 2013 at 03:29:45PM +0100, Stefano Stabellini wrote:
> On Mon, 1 Jul 2013, Wei Liu wrote:
> > On Mon, Jul 01, 2013 at 03:48:38PM +0800, annie li wrote:
> > > 
> > > On 2013-6-29 0:15, Wei Liu wrote:
> > > >Hi all,
> > > >
> > > >After collecting more stats and comparing copying / mapping cases, I now
> > > >have some more interesting finds, which might contradict what I said
> > > >before.
> > > >
> > > >I tuned the runes I used for benchmark to make sure iperf and netperf
> > > >generate large packets (~64K). Here are the runes I use:
> > > >
> > > >   iperf -c 10.80.237.127 -t 5 -l 131072 -w 128k (see note)
> > > >   netperf -H 10.80.237.127 -l10 -f m -- -s 131072 -S 131072
> > > >
> > > >                           COPY                    MAP
> > > >iperf    Tput:             6.5Gb/s             14Gb/s (was 2.5Gb/s)
> > > 
> > > So with default iperf setting, copy is about 7.9G, and map is about
> > > 2.5G? How about the result of netperf without large packets?
> > > 
> > 
> > First question, yes.
> > 
> > Second question, 5.8Gb/s. And I believe for the copying scheme without
> > large packet the throuput is more or less the same.
> > 
> > > >          PPI               2.90                  1.07
> > > >          SPI               37.75                 13.69
> > > >          PPN               2.90                  1.07
> > > >          SPN               37.75                 13.69
> > > >          tx_count           31808                174769
> > > 
> > > Seems interrupt count does not affect the performance at all with -l
> > > 131072 -w 128k.
> > > 
> > 
> > Right.
> > 
> > > >          nr_napi_schedule   31805                174697
> > > >          total_packets      92354                187408
> > > >          total_reqs         1200793              2392614
> > > >
> > > >netperf  Tput:            5.8Gb/s             10.5Gb/s
> > > >          PPI               2.13                   1.00
> > > >          SPI               36.70                  16.73
> > > >          PPN               2.13                   1.31
> > > >          SPN               36.70                  16.75
> > > >          tx_count           57635                205599
> > > >          nr_napi_schedule   57633                205311
> > > >          total_packets      122800               270254
> > > >          total_reqs         2115068              3439751
> > > >
> > > >   PPI: packets processed per interrupt
> > > >   SPI: slots processed per interrupt
> > > >   PPN: packets processed per napi schedule
> > > >   SPN: slots processed per napi schedule
> > > >   tx_count: interrupt count
> > > >   total_reqs: total slots used during test
> > > >
> > > >* Notification and batching
> > > >
> > > >Is notification and batching really a problem? I'm not so sure now. My
> > > >first thought when I didn't measure PPI / PPN / SPI / SPN in copying
> > > >case was that "in that case netback *must* have better batching" which
> > > >turned out not very true -- copying mode makes netback slower, however
> > > >the batching gained is not hugh.
> > > >
> > > >Ideally we still want to batch as much as possible. Possible way
> > > >includes playing with the 'weight' parameter in NAPI. But as the figures
> > > >show batching seems not to be very important for throughput, at least
> > > >for now. If the NAPI framework and netfront / netback are doing their
> > > >jobs as designed we might not need to worry about this now.
> > > >
> > > >Andrew, do you have any thought on this? You found out that NAPI didn't 
> > > >scale well with multi-threaded iperf in DomU, do you have any handle how
> > > >that can happen?
> > > >
> > > >* Thoughts on zero-copy TX
> > > >
> > > >With this hack we are able to achieve 10Gb/s single stream, which is
> > > >good. But, with classic XenoLinux kernel which has zero copy TX we
> > > >didn't able to achieve this.  I also developed another zero copy netback
> > > >prototype one year ago with Ian's out-of-tree skb frag destructor patch
> > > >series. That prototype couldn't achieve 10Gb/s either (IIRC the
> > > >performance was more or less the same as copying mode, about 6~7Gb/s).
> > > >
> > > >My hack maps all necessary pages permantently, there is no unmap, we
> > > >skip lots of page table manipulation and TLB flushes. So my basic
> > > >conclusion is that page table manipulation and TLB flushes do incur
> > > >heavy performance penalty.
> > > >
> > > >This hack can be upstreamed in no way. If we're to re-introduce
> > > >zero-copy TX, we would need to implement some sort of lazy flushing
> > > >mechanism. I haven't thought this through. Presumably this mechanism
> > > >would also benefit blk somehow? I'm not sure yet.
> > > >
> > > >Could persistent mapping (with the to-be-developed reclaim / MRU list
> > > >mechanism) be useful here? So that we can unify blk and net drivers?
> > > >
> > > >* Changes required to introduce zero-copy TX
> > > >
> > > >1. SKB frag destructor series: to track life cycle of SKB frags. This is
> > > >not yet upstreamed.
> > > 
> > > Are you mentioning this one http://old-list-archives.xen.org/archives/html/xen-devel/2011-06/msg01711.html?
> > > 
> > > <http://old-list-archives.xen.org/archives/html/xen-devel/2011-06/msg01711.html>
> > > 
> > 
> > Yes. But I believe there's been several versions posted. The link you
> > have is not the latest version.
> > 
> > > >
> > > >2. Mechanism to negotiate max slots frontend can use: mapping requires
> > > >backend's MAX_SKB_FRAGS >= frontend's MAX_SKB_FRAGS.
> > > >
> > > >3. Lazy flushing mechanism or persistent grants: ???
> > > 
> > > I did some test with persistent grants before, it did not show
> > > better performance than grant copy. But I was using the default
> > > params of netperf, and not tried large packet size. Your results
> > > reminds me that maybe persistent grants would get similar results
> > > with larger packet size too.
> > > 
> > 
> > "No better performance" -- that's because both mechanisms are copying?
> > However I presume persistent grant can scale better? From an earlier
> > email last week, I read that copying is done by the guest so that this
> > mechanism scales much better than hypervisor copying in blk's case.
> 
> Yes, I always expected persistent grants to be faster then
> gnttab_copy but I was very surprised by the difference in performances:
> 
> http://marc.info/?l=xen-devel&m=137234605929944
> 
> I think it's worth trying persistent grants on PV network, although it's
> very unlikely that they are going to improve the throughput by 5 Gb/s.
> 

I think it can improve aggregated throughput, however its not likely to
improve single stream throughput.

> Also once we have both PV block and network using persistent grants,
> we might incur the grant table limit, see this email:
> 
> http://marc.info/?l=xen-devel&m=137183474618974

Yes, indeed.

  reply	other threads:[~2013-07-01 14:39 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-12 10:14 Interesting observation with network event notification and batching Wei Liu
2013-06-14 18:53 ` Konrad Rzeszutek Wilk
2013-06-16  9:54   ` Wei Liu
2013-06-17  9:38     ` Ian Campbell
2013-06-17  9:56       ` Andrew Bennieston
2013-06-17 10:46         ` Wei Liu
2013-06-17 10:56           ` Andrew Bennieston
2013-06-17 11:08             ` Ian Campbell
2013-06-17 11:55               ` Andrew Bennieston
2013-06-17 10:06       ` Jan Beulich
2013-06-17 10:16         ` Ian Campbell
2013-06-17 10:35       ` Wei Liu
2013-06-17 11:34         ` annie li
2013-06-16 12:46   ` Wei Liu
2013-06-28 16:15 ` Wei Liu
2013-07-01  7:48   ` annie li
2013-07-01  8:54     ` Wei Liu
2013-07-01 14:29       ` Stefano Stabellini
2013-07-01 14:39         ` Wei Liu [this message]
2013-07-01 14:54           ` Stefano Stabellini
2013-07-01 15:59       ` annie li
2013-07-01 16:06         ` Wei Liu
2013-07-01 16:53           ` Andrew Bennieston
2013-07-01 17:55             ` Wei Liu
2013-07-03 15:18             ` Wei Liu
2013-07-01 14:19     ` Stefano Stabellini
2013-07-01 15:59       ` annie li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130701143919.GG7483@zion.uk.xensource.com \
    --to=wei.liu2@citrix.com \
    --cc=andrew.bennieston@citrix.com \
    --cc=annie.li@oracle.com \
    --cc=ian.campbell@citrix.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.