From mboxrd@z Thu Jan  1 00:00:00 1970
From: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Subject: Re: Interesting observation with network event
 notification and batching
Date: Mon, 1 Jul 2013 15:19:48 +0100
Message-ID: <alpine.DEB.2.02.1307011519180.4525@kaball.uk.xensource.com>
References: <20130612101451.GF2765@zion.uk.xensource.com>
	<20130628161542.GF16643@zion.uk.xensource.com>
	<51D13456.1040609@oracle.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <51D13456.1040609@oracle.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: annie li <annie.li@oracle.com>
Cc: Wei Liu <wei.liu2@citrix.com>, ian.campbell@citrix.com, stefano.stabellini@eu.citrix.com, xen-devel@lists.xen.org, andrew.bennieston@citrix.com
List-Id: xen-devel@lists.xenproject.org

Could you please use plain text emails in the future?

On Mon, 1 Jul 2013, annie li wrote:
> On 2013-6-29 0:15, Wei Liu wrote:
> 
> Hi all,
> 
> After collecting more stats and comparing copying / mapping cases, I now
> have some more interesting finds, which might contradict what I said
> before.
> 
> I tuned the runes I used for benchmark to make sure iperf and netperf
> generate large packets (~64K). Here are the runes I use:
> 
>   iperf -c 10.80.237.127 -t 5 -l 131072 -w 128k (see note)
>   netperf -H 10.80.237.127 -l10 -f m -- -s 131072 -S 131072
> 
>                           COPY                    MAP
> iperf    Tput:             6.5Gb/s             14Gb/s (was 2.5Gb/s)
> 
> 
> So with default iperf setting, copy is about 7.9G, and map is about 2.5G? How about the result of netperf without large packets?
> 
>          PPI               2.90                  1.07
>          SPI               37.75                 13.69
>          PPN               2.90                  1.07
>          SPN               37.75                 13.69
>          tx_count           31808                174769
> 
> 
> Seems interrupt count does not affect the performance at all with -l 131072 -w 128k.
> 
>          nr_napi_schedule   31805                174697
>          total_packets      92354                187408
>          total_reqs         1200793              2392614
> 
> netperf  Tput:            5.8Gb/s             10.5Gb/s
>          PPI               2.13                   1.00
>          SPI               36.70                  16.73
>          PPN               2.13                   1.31
>          SPN               36.70                  16.75
>          tx_count           57635                205599
>          nr_napi_schedule   57633                205311
>          total_packets      122800               270254
>          total_reqs         2115068              3439751
> 
>   PPI: packets processed per interrupt
>   SPI: slots processed per interrupt
>   PPN: packets processed per napi schedule
>   SPN: slots processed per napi schedule
>   tx_count: interrupt count
>   total_reqs: total slots used during test
> 
> * Notification and batching
> 
> Is notification and batching really a problem? I'm not so sure now. My
> first thought when I didn't measure PPI / PPN / SPI / SPN in copying
> case was that "in that case netback *must* have better batching" which
> turned out not very true -- copying mode makes netback slower, however
> the batching gained is not hugh.
> 
> Ideally we still want to batch as much as possible. Possible way
> includes playing with the 'weight' parameter in NAPI. But as the figures
> show batching seems not to be very important for throughput, at least
> for now. If the NAPI framework and netfront / netback are doing their
> jobs as designed we might not need to worry about this now.
> 
> Andrew, do you have any thought on this? You found out that NAPI didn't
> scale well with multi-threaded iperf in DomU, do you have any handle how
> that can happen?
> 
> * Thoughts on zero-copy TX
> 
> With this hack we are able to achieve 10Gb/s single stream, which is
> good. But, with classic XenoLinux kernel which has zero copy TX we
> didn't able to achieve this.  I also developed another zero copy netback
> prototype one year ago with Ian's out-of-tree skb frag destructor patch
> series. That prototype couldn't achieve 10Gb/s either (IIRC the
> performance was more or less the same as copying mode, about 6~7Gb/s).
> 
> My hack maps all necessary pages permantently, there is no unmap, we
> skip lots of page table manipulation and TLB flushes. So my basic
> conclusion is that page table manipulation and TLB flushes do incur
> heavy performance penalty.
> 
> This hack can be upstreamed in no way. If we're to re-introduce
> zero-copy TX, we would need to implement some sort of lazy flushing
> mechanism. I haven't thought this through. Presumably this mechanism
> would also benefit blk somehow? I'm not sure yet.
> 
> Could persistent mapping (with the to-be-developed reclaim / MRU list
> mechanism) be useful here? So that we can unify blk and net drivers?
> 
> * Changes required to introduce zero-copy TX
> 
> 1. SKB frag destructor series: to track life cycle of SKB frags. This is
> not yet upstreamed.
> 
> 
> Are you mentioning this one http://old-list-archives.xen.org/archives/html/xen-devel/2011-06/msg01711.html?
> 
> 
> 2. Mechanism to negotiate max slots frontend can use: mapping requires
> backend's MAX_SKB_FRAGS >= frontend's MAX_SKB_FRAGS.
> 
> 3. Lazy flushing mechanism or persistent grants: ???
> 
> 
> I did some test with persistent grants before, it did not show better performance than grant copy. But I was using the default
> params of netperf, and not tried large packet size. Your results reminds me that maybe persistent grants would get similar
> results with larger packet size too.
> 
> Thanks
> Annie
> 
> 
>