Re: [for-4.9] Re: HVM guest performance regression

From: "Jan Beulich" <JBeulich@suse.com>
To: Juergen Gross <jgross@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [for-4.9] Re: HVM guest performance regression
Date: Tue, 30 May 2017 01:24:10 -0600	[thread overview]
Message-ID: <592D3A3A020000780015D787@prv-mh.provo.novell.com> (raw)
In-Reply-To: <8be5f350-ad53-d74c-50fc-7ca71b6cdc3c@suse.com>

>>> On 29.05.17 at 21:05, <jgross@suse.com> wrote:
> Creating the domains with
> 
> xl -vvv create ...
> 
> showed the numbers of superpages and normal pages allocated for the
> domain.
> 
> The following allocation pattern resulted in a slow domain:
> 
> xc: detail: PHYSICAL MEMORY ALLOCATION:
> xc: detail:   4KB PAGES: 0x0000000000000600
> xc: detail:   2MB PAGES: 0x00000000000003f9
> xc: detail:   1GB PAGES: 0x0000000000000000
> 
> And this one was fast:
> 
> xc: detail: PHYSICAL MEMORY ALLOCATION:
> xc: detail:   4KB PAGES: 0x0000000000000400
> xc: detail:   2MB PAGES: 0x00000000000003fa
> xc: detail:   1GB PAGES: 0x0000000000000000
> 
> I ballooned dom0 down in small steps to be able to create those
> test cases.
> 
> I believe the main reason is that some data needed by the benchmark
> is located near the end of domain memory resulting in a rather high
> TLB miss rate in case of not all (or nearly all) memory available in
> form of 2MB pages.

Did you double check this by creating some other (persistent)
process prior to running your benchmark? I find it rather
unlikely that you would consistently see space from the top of
guest RAM allocated to your test, unless it consumes all RAM
that's available at the time it runs (but then I'd consider it
quite likely for overhead of using the few smaller pages to be
mostly hidden in the noise).

Or are you suspecting some crucial kernel structures to live
there?

>>> What makes the whole problem even more mysterious is that the
>>> regression was detected first with SLE12 SP3 (guest and dom0, Xen 4.9
>>> and Linux 4.4) against older systems (guest and dom0). While trying
>>> to find out whether the guest or the Xen version are the culprit I
>>> found that the old guest (based on kernel 3.12) showed the mentioned
>>> performance drop with above commit. The new guest (based on kernel
>>> 4.4) shows the same bad performance regardless of the Xen version or
>>> amount of free memory. I haven't found the Linux kernel commit yet
>>> being responsible for that performance drop.
> 
> And this might be result of a different memory usage of more recent
> kernels: I suspect the critical data is now at the very end of the
> domain's memory. As there are always some pages allocated in 4kB
> chunks the last pages of the domain will never be part of a 2MB page.

But if the OS allocated large pages internally for relevant data
structures, those obviously won't come from that necessarily 4k-
mapped tail range.

> Looking at meminit_hvm() in libxc doing the allocation of the memory
> I realized it is kind of sub-optimal: shouldn't it try to allocate
> the largest pages first and the smaller pages later?

Indeed this seems sub-optimal, yet the net effect isn't that
dramatic (at least for sufficiently large guests): There may be up
to two unnecessarily shattered 1G pages and at most one 2M
one afaict.

> Would it be possible to make memory holes larger sometimes to avoid
> having to use 4kB pages (with the exception of the first 2MB of the
> domain, of course)?

Which holes are you thinking about here? The pre-determined
one is at 0xF0000000 (i.e. is 2M-aligned already), and without
pass-through devices with large BARs hvmloader won't do any
relocation of RAM. Granted, when it does, doing so in larger
than 64k chunks may be advantageous. To have any effect,
that would require hypervisor side changes though, as
xenmem_add_to_physmap() acts on individual 4k pages right
now.

> Maybe it would even make sense to be able to tweak the allocation
> pattern depending on the guest type: preferring large pages either
> at the top or at the bottom of the domain's physical address space.

Why would top and bottom be better candidates for using large
pages than the middle part of address space? Any such heuristic
would surely need tailoring to the guest OS in order to not
adversely affect some while helping others.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel