Re: [for-4.9] Re: HVM guest performance regression

From: Stefano Stabellini <sstabellini@kernel.org>
To: Juergen Gross <jgross@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	Wei Liu <wei.liu2@citrix.com>
Subject: Re: [for-4.9] Re: HVM guest performance regression
Date: Tue, 6 Jun 2017 12:08:58 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.10.1706061202290.15791@sstabellini-ThinkPad-X260> (raw)
In-Reply-To: <5de5e464-ae54-30b4-7a97-0a2dcbf91416@suse.com>

On Tue, 6 Jun 2017, Juergen Gross wrote:
> On 06/06/17 18:39, Stefano Stabellini wrote:
> > On Tue, 6 Jun 2017, Juergen Gross wrote:
> >> On 26/05/17 21:01, Stefano Stabellini wrote:
> >>> On Fri, 26 May 2017, Juergen Gross wrote:
> >>>> On 26/05/17 18:19, Ian Jackson wrote:
> >>>>> Juergen Gross writes ("HVM guest performance regression"):
> >>>>>> Looking for the reason of a performance regression of HVM guests under
> >>>>>> Xen 4.7 against 4.5 I found the reason to be commit
> >>>>>> c26f92b8fce3c9df17f7ef035b54d97cbe931c7a ("libxl: remove freemem_slack")
> >>>>>> in Xen 4.6.
> >>>>>>
> >>>>>> The problem occurred when dom0 had to be ballooned down when starting
> >>>>>> the guest. The performance of some micro benchmarks dropped by about
> >>>>>> a factor of 2 with above commit.
> >>>>>>
> >>>>>> Interesting point is that the performance of the guest will depend on
> >>>>>> the amount of free memory being available at guest creation time.
> >>>>>> When there was barely enough memory available for starting the guest
> >>>>>> the performance will remain low even if memory is being freed later.
> >>>>>>
> >>>>>> I'd like to suggest we either revert the commit or have some other
> >>>>>> mechanism to try to have some reserve free memory when starting a
> >>>>>> domain.
> >>>>>
> >>>>> Oh, dear.  The memory accounting swamp again.  Clearly we are not
> >>>>> going to drain that swamp now, but I don't like regressions.
> >>>>>
> >>>>> I am not opposed to reverting that commit.  I was a bit iffy about it
> >>>>> at the time; and according to the removal commit message, it was
> >>>>> basically removed because it was a piece of cargo cult for which we
> >>>>> had no justification in any of our records.
> >>>>>
> >>>>> Indeed I think fixing this is a candidate for 4.9.
> >>>>>
> >>>>> Do you know the mechanism by which the freemem slack helps ?  I think
> >>>>> that would be a prerequisite for reverting this.  That way we can have
> >>>>> an understanding of why we are doing things, rather than just
> >>>>> flailing at random...
> >>>>
> >>>> I wish I would understand it.
> >>>>
> >>>> One candidate would be 2M/1G pages being possible with enough free
> >>>> memory, but I haven't proofed this yet. I can have a try by disabling
> >>>> big pages in the hypervisor.
> >>>
> >>> Right, if I had to bet, I would put my money on superpages shattering
> >>> being the cause of the problem.
> >>
> >> Seems you would have lost your money...
> >>
> >> Meanwhile I've found a way to get the "good" performance in the micro
> >> benchmark. Unfortunately this requires to switch off the pv interfaces
> >> in the HVM guest via "xen_nopv" kernel boot parameter.
> >>
> >> I have verified that pv spinlocks are not to blame (via "xen_nopvspin"
> >> kernel boot parameter). Switching to clocksource TSC in the running
> >> system doesn't help either.
> > 
> > What about xen_hvm_exit_mmap (an optimization for shadow pagetables) and
> > xen_hvm_smp_init (PV IPI)?
> 
> xen_hvm_exit_mmap isn't active (kernel message telling me so was
> issued).
> 
> >> Unfortunately the kernel seems no longer to be functional when I try to
> >> tweak it not to use the PVHVM enhancements.
> > 
> > I guess you are not talking about regular PV drivers like netfront and
> > blkfront, right?
> 
> The plan was to be able to use PV drivers without having to use PV
> callbacks and PV timers. This isn't possible right now.

I think the code to handle that scenario was gradually removed over time
to simplify the code base.

> >> I'm wondering now whether
> >> there have ever been any benchmarks to proof PVHVM really being faster
> >> than non-PVHVM? My findings seem to suggest there might be a huge
> >> performance gap with PVHVM. OTOH this might depend on hardware and other
> >> factors.
> >>
> >> Stefano, didn't you do the PVHVM stuff back in 2010? Do you have any
> >> data from then regarding performance figures?
> > 
> > Yes, I still have these slides:
> > 
> > https://www.slideshare.net/xen_com_mgr/linux-pv-on-hvm
> 
> Thanks. So you measured the overall package, not the single items like
> callbacks, timers, time source? I'm asking because I start to believe
> there are some of those slower than their non-PV variants.

There isn't much left in terms of individual optimizations: you already
tried switching clocksource and removing pv spinlocks. xen_hvm_exit_mmap
is not used. Only the following are left (you might want to double check
I haven't missed anything):

1) PV IPI
2) PV suspend/resume
3) vector callback
4) interrupt remapping

2) is not on the hot path.
I did individual measurements of 3) at some points and it was a clear win.
Slide 14 shows the individual measurements of 4)

Only 1) is left to check as far as I can tell.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel