xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: "Jan Beulich" <JBeulich@suse.com>
To: Juergen Gross <jgross@suse.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>,
	Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [for-4.9] Re: HVM guest performance regression
Date: Tue, 30 May 2017 01:24:10 -0600	[thread overview]
Message-ID: <592D3A3A020000780015D787@prv-mh.provo.novell.com> (raw)
In-Reply-To: <8be5f350-ad53-d74c-50fc-7ca71b6cdc3c@suse.com>

>>> On 29.05.17 at 21:05, <jgross@suse.com> wrote:
> Creating the domains with
> 
> xl -vvv create ...
> 
> showed the numbers of superpages and normal pages allocated for the
> domain.
> 
> The following allocation pattern resulted in a slow domain:
> 
> xc: detail: PHYSICAL MEMORY ALLOCATION:
> xc: detail:   4KB PAGES: 0x0000000000000600
> xc: detail:   2MB PAGES: 0x00000000000003f9
> xc: detail:   1GB PAGES: 0x0000000000000000
> 
> And this one was fast:
> 
> xc: detail: PHYSICAL MEMORY ALLOCATION:
> xc: detail:   4KB PAGES: 0x0000000000000400
> xc: detail:   2MB PAGES: 0x00000000000003fa
> xc: detail:   1GB PAGES: 0x0000000000000000
> 
> I ballooned dom0 down in small steps to be able to create those
> test cases.
> 
> I believe the main reason is that some data needed by the benchmark
> is located near the end of domain memory resulting in a rather high
> TLB miss rate in case of not all (or nearly all) memory available in
> form of 2MB pages.

Did you double check this by creating some other (persistent)
process prior to running your benchmark? I find it rather
unlikely that you would consistently see space from the top of
guest RAM allocated to your test, unless it consumes all RAM
that's available at the time it runs (but then I'd consider it
quite likely for overhead of using the few smaller pages to be
mostly hidden in the noise).

Or are you suspecting some crucial kernel structures to live
there?

>>> What makes the whole problem even more mysterious is that the
>>> regression was detected first with SLE12 SP3 (guest and dom0, Xen 4.9
>>> and Linux 4.4) against older systems (guest and dom0). While trying
>>> to find out whether the guest or the Xen version are the culprit I
>>> found that the old guest (based on kernel 3.12) showed the mentioned
>>> performance drop with above commit. The new guest (based on kernel
>>> 4.4) shows the same bad performance regardless of the Xen version or
>>> amount of free memory. I haven't found the Linux kernel commit yet
>>> being responsible for that performance drop.
> 
> And this might be result of a different memory usage of more recent
> kernels: I suspect the critical data is now at the very end of the
> domain's memory. As there are always some pages allocated in 4kB
> chunks the last pages of the domain will never be part of a 2MB page.

But if the OS allocated large pages internally for relevant data
structures, those obviously won't come from that necessarily 4k-
mapped tail range.

> Looking at meminit_hvm() in libxc doing the allocation of the memory
> I realized it is kind of sub-optimal: shouldn't it try to allocate
> the largest pages first and the smaller pages later?

Indeed this seems sub-optimal, yet the net effect isn't that
dramatic (at least for sufficiently large guests): There may be up
to two unnecessarily shattered 1G pages and at most one 2M
one afaict.

> Would it be possible to make memory holes larger sometimes to avoid
> having to use 4kB pages (with the exception of the first 2MB of the
> domain, of course)?

Which holes are you thinking about here? The pre-determined
one is at 0xF0000000 (i.e. is 2M-aligned already), and without
pass-through devices with large BARs hvmloader won't do any
relocation of RAM. Granted, when it does, doing so in larger
than 64k chunks may be advantageous. To have any effect,
that would require hypervisor side changes though, as
xenmem_add_to_physmap() acts on individual 4k pages right
now.

> Maybe it would even make sense to be able to tweak the allocation
> pattern depending on the guest type: preferring large pages either
> at the top or at the bottom of the domain's physical address space.

Why would top and bottom be better candidates for using large
pages than the middle part of address space? Any such heuristic
would surely need tailoring to the guest OS in order to not
adversely affect some while helping others.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

  reply	other threads:[~2017-05-30  7:24 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-26 16:14 HVM guest performance regression Juergen Gross
2017-05-26 16:19 ` [for-4.9] " Ian Jackson
2017-05-26 17:00   ` Juergen Gross
2017-05-26 19:01     ` Stefano Stabellini
2017-05-29 19:05       ` Juergen Gross
2017-05-30  7:24         ` Jan Beulich [this message]
     [not found]         ` <592D3A3A020000780015D787@suse.com>
2017-05-30 10:33           ` Juergen Gross
2017-05-30 10:43             ` Jan Beulich
     [not found]             ` <592D68DC020000780015D919@suse.com>
2017-05-30 14:57               ` Juergen Gross
2017-05-30 15:10                 ` Jan Beulich
2017-06-06 13:44       ` Juergen Gross
2017-06-06 16:39         ` Stefano Stabellini
2017-06-06 19:00           ` Juergen Gross
2017-06-06 19:08             ` Stefano Stabellini
2017-06-07  6:55               ` Juergen Gross
2017-06-07 18:19                 ` Stefano Stabellini
2017-06-08  9:37                   ` Juergen Gross
2017-06-08 18:09                     ` Stefano Stabellini
2017-06-08 18:28                       ` Juergen Gross
2017-06-08 21:00                     ` Dario Faggioli
2017-06-11  2:27                       ` Konrad Rzeszutek Wilk
2017-06-12  5:48                       ` Solved: " Juergen Gross
2017-06-12  7:35                         ` Andrew Cooper
2017-06-12  7:47                           ` Juergen Gross
2017-06-12  8:30                             ` Andrew Cooper
2017-05-26 17:04 ` Dario Faggioli
2017-05-26 17:25   ` Juergen Gross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=592D3A3A020000780015D787@prv-mh.provo.novell.com \
    --to=jbeulich@suse.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=jgross@suse.com \
    --cc=sstabellini@kernel.org \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).