From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Subject: Re: Interesting observation with network event
 notification and batching
Date: Fri, 14 Jun 2013 14:53:03 -0400
Message-ID: <20130614185303.GC21280@phenom.dumpdata.com>
References: <20130612101451.GF2765@zion.uk.xensource.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Content-Disposition: inline
In-Reply-To: <20130612101451.GF2765@zion.uk.xensource.com>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Wei Liu <wei.liu2@citrix.com>
Cc: annie.li@oracle.com, stefano.stabellini@eu.citrix.com, andrew.bennieston@citrix.com, ian.campbell@citrix.com, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

On Wed, Jun 12, 2013 at 11:14:51AM +0100, Wei Liu wrote:
> Hi all
> 
> I'm hacking on a netback trying to identify whether TLB flushes causes
> heavy performance penalty on Tx path. The hack is quite nasty (you would
> not want to know, trust me).
> 
> Basically what is doesn't is, 1) alter network protocol to pass along

You probably meant: "what it does" ?

> mfns instead of grant references, 2) when the backend sees a new mfn,
> map it RO and cache it in its own address space.
> 
> With this hack, now we have some sort of zero-copy TX path. Backend
> doesn't need to issue any grant copy / map operation any more. When it
> sees a new packet in the ring, it just needs to pick up the pages
> in its own address space and assemble packets with those pages then pass
> the packet on to network stack.

Uh, so not sure I understand the RO part. If dom0 is mapping it won't
that trigger a PTE update? And doesn't somebody (either the guest or
initial domain) do a grant mapping to let the hypervisor know it is
OK to map a grant?

Or is dom0 actually permitted to map the MFN of any guest without using
the grants? In which case you are then using the _PAGE_IOMAP
somewhere and setting up vmap entries with the MFN's that point to the
foreign domain - I think?

> 
> In theory this should boost performance, but in practice it is the other
> way around. This hack makes Xen network more than 50% slower than before
> (OMG). Further investigation shows that with this hack the batching
> ability is gone. Before this hack, netback batches like 64 slots in one

That is quite interesting.

> interrupt event, however after this hack, it only batches 3 slots in one
> interrupt event -- that's no batching at all because we can expect one
> packet to occupy 3 slots.

Right.
> 
> Time to have some figures (iperf from DomU to Dom0).
> 
> Before the hack, doing grant copy, throughput: 7.9 Gb/s, average slots
> per batch 64.
> 
> After the hack, throughput: 2.5 Gb/s, average slots per batch 3.
> 
> After the hack, adds in 64 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 3.2 Gb/s, average slots
> per batch 6.
> 
> After the hack, adds in 256 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 5.2 Gb/s, average slots
> per batch 26.
> 
> After the hack, adds in 512 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 7.9 Gb/s, average slots
> per batch 26.
> 
> After the hack, adds in 768 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 5.6 Gb/s, average slots
> per batch 25.
> 
> After the hack, adds in 1024 HYPERVISOR_xen_version (it just does context
> switch into hypervisor) in Tx path, throughput: 4.4 Gb/s, average slots
> per batch 25.
> 

How do you get it to do more HYPERVISR_xen_version? Did you just add
a (for i = 1024; i>0;i--) hypervisor_yield();

in netback?
> Average slots per batch is calculate as followed:
>  1. count total_slots processed from start of day
>  2. count tx_count which is the number of tx_action function gets
>     invoked
>  3. avg_slots_per_tx = total_slots / tx_count
> 
> The counter-intuition figures imply that there is something wrong with
> the currently batching mechanism. Probably we need to fine-tune the
> batching behavior for network and play with event pointers in the ring
> (actually I'm looking into it now). It would be good to have some input
> on this.

I am still unsure I understand hwo your changes would incur more
of the yields.
> 
> Konrad, IIRC you once mentioned you discovered something with event
> notification, what's that?

They were bizzare. I naively expected some form of # of physical NIC 
interrupts to be around the same as the VIF or less. And I figured
that the amount of interrupts would be constant irregardless of the
size of the packets. In other words #packets == #interrupts.

In reality the number of interrupts the VIF had was about the same while
for the NIC it would fluctuate. (I can't remember the details).

But it was odd and I didn't go deeper in it to figure out what
was happening. And also to figure out if for the VIF we could
do something of #packets != #interrupts.  And hopefully some
mechanism to adjust so that the amount of interrupts would
be lesser per packets (hand waving here).