From: Joao Martins <joao.m.martins@oracle.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Dario Faggioli <dario.faggioli@citrix.com>,
Xen-devel <xen-devel@lists.xen.org>
Subject: Re: DESIGN v2: CPUID part 3
Date: Wed, 2 Aug 2017 11:34:05 +0100 [thread overview]
Message-ID: <d537291c-8de2-1597-bb5f-78338dfbf703@oracle.com> (raw)
In-Reply-To: <89da4ab7-a0e1-52c6-3003-7b68e7b6eedb@citrix.com>
On 08/01/2017 07:34 PM, Andrew Cooper wrote:
> On 31/07/2017 20:49, Konrad Rzeszutek Wilk wrote:
>> On Wed, Jul 05, 2017 at 02:22:00PM +0100, Joao Martins wrote:
>>> On 07/05/2017 12:16 PM, Andrew Cooper wrote:
>>>> On 05/07/17 10:46, Joao Martins wrote:
>>>>> Hey Andrew,
>>>>>
>>>>> On 07/04/2017 03:55 PM, Andrew Cooper wrote:
>>>>>
>>>>>> (RFC: Decide exactly where to fit this. _XEN\_DOMCTL\_max\_vcpus_ perhaps?)
>>>>>> The toolstack shall also have a mechanism to explicitly select topology
>>>>>> configuration for the guest, which primarily affects the virtual APIC ID
>>>>>> layout, and has a knock on effect for the APIC ID of the virtual IO-APIC.
>>>>>> Xen's auditing shall ensure that guests observe values consistent with the
>>>>>> guarantees made by the vendor manuals.
>>>>>>
>>>>> Why choose max_vcpus domctl?
>>>> Despite its name, the max_vcpus hypercall is the one which allocates all
>>>> the vcpus in the hypervisor. I don't want there to be any opportunity
>>>> for vcpus to exist but no topology information to have been provided.
>>>>
>>> /nods
>>>
>>> So then doing this at vcpus allocation we would need to pass an additional CPU
>>> topology argument on the max_vcpus hypercall? Otherwise it's sort of guess work
>>> wrt sockets, cores, threads ... no?
>> Andrew, thoughts on this and the one below?
>
> Urgh sorry. I've been distracted with some high priority interrupts (of
> the non-maskable variety).
>
> So, bad news is that the CPUID and MSR policy handling has become
> substantially more complicated and entwined than I had first planned. A
> change in either of the data alters the auditing of the other, so I am
> leaning towards implementing everything with a single set hypercall (as
> this is the only way to get a plausibly-consistent set of data).
>
> The good news is that I don't think we actually need any changes to the
> XEN_DOMCTL_max_vcpus. I now think there is sufficient expressibility in
> the static cpuid policy to work.
>
Awesome!
>>> There could be other uses too on passing this info to Xen, say e.g. the
>>> scheduler knowing the guest CPU topology it would allow better selection of
>>> core+sibling pair such that it could match cache/cpu topology passed on the
>>> guest (for unpinned SMT guests).
>
> I remain to be convinced (i.e. with some real performance numbers) that
> the added complexity in the scheduler for that logic is a benefit in the
> general case.
>
The suggestion above was a simple extension to struct domain (e.g. cores/threads
or struct cpu_topology field) - nothing too disruptive I think.
But I cannot really argue on this as this was just an idea that I found
interesting (no numbers to support it entirely). We just happened to see it
under-perform when a simple range of cpus was used for affinity, and that some
vcpus end up being scheduled belonging the same core+sibling pair IIRC; hence I
(perhaps naively) imagined that there could be value in further scheduler
enlightenment e.g. "gang-scheduling" where we schedule core+sibling always
together. I was speaking to Dario (CC'ed) on the summit whether CPU topology
could have value - and there might be but it remains to be explored once we're
able to pass a cpu topology to the guest. (In the past it seemed enthusiastic of
the idea of the topology[0] and hence I assumed to be in the context of schedulers)
[0] https://lists.xenproject.org/archives/html/xen-devel/2016-02/msg03850.html
> In practice, customers are either running very specific and dedicated
> workloads (at which point pinning is used and there is no
> oversubscription, and exposing the actual SMT topology is a good thing),
>
/nods
> or customers are running general workloads with no pinning (or perhaps
> cpupool-numa-split) with a moderate amount of oversubscription (at which
> point exposing SMT is a bad move).
>
Given the scale you folks invest on over-subscription (1000 VMs), I wonder what
moderate here means :P
> Counterintuitively, exposing NUMA in general oversubscribed scenarios is
> terrible for net system performance. What happens in practice is that
> VMs which see NUMA spend their idle cycles trying to balance their own
> userspace processes, rather than yielding to the hypervisor so another
> guest can get a go.
>
Interesting to know - vNUMA perhaps is only better placed for performance cases
where both (or either) I/O topology and memory locality matter - or when going
for bigger guests. Provided that the correspondent CPU topology is provided.
Joao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel
next prev parent reply other threads:[~2017-08-02 10:34 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-08 13:12 DESIGN: CPUID part 3 Andrew Cooper
2017-06-08 13:47 ` Jan Beulich
2017-06-12 13:07 ` Andrew Cooper
2017-06-12 13:29 ` Jan Beulich
2017-06-12 13:36 ` Andrew Cooper
2017-06-12 13:42 ` Jan Beulich
2017-06-12 14:02 ` Andrew Cooper
2017-06-12 14:18 ` Jan Beulich
2017-06-09 12:24 ` Anshul Makkar
2017-06-12 13:21 ` Andrew Cooper
2017-07-04 14:55 ` DESIGN v2: " Andrew Cooper
2017-07-05 9:46 ` Joao Martins
2017-07-05 10:32 ` Joao Martins
2017-07-05 11:16 ` Andrew Cooper
2017-07-05 13:22 ` Joao Martins
2017-07-31 19:49 ` Konrad Rzeszutek Wilk
2017-08-01 18:34 ` Andrew Cooper
2017-08-02 10:34 ` Joao Martins [this message]
2017-08-03 2:55 ` Dario Faggioli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d537291c-8de2-1597-bb5f-78338dfbf703@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=andrew.cooper3@citrix.com \
--cc=dario.faggioli@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).