All of lore.kernel.org
 help / color / mirror / Atom feed
From: Elena Ufimtseva <ufimtseva@gmail.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: Keir Fraser <keir@xen.org>,
	Ian Campbell <Ian.Campbell@citrix.com>,
	Stefano Stabellini <stefano.stabellini@eu.citrix.com>,
	George Dunlap <george.dunlap@eu.citrix.com>,
	Matt Wilson <msw@linux.com>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	Li Yechen <lccycc123@gmail.com>,
	Ian Jackson <ian.jackson@eu.citrix.com>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>,
	Jan Beulich <JBeulich@suse.com>
Subject: Re: [PATCH v6 00/10] vnuma introduction
Date: Sun, 20 Jul 2014 10:57:44 -0400	[thread overview]
Message-ID: <CAEr7rXgtv=3xP3CRgKr_7z-HjKMj4Bb0PQ+kd9JAnAMXa1XDsw@mail.gmail.com> (raw)
In-Reply-To: <20140718114834.GI7142@zion.uk.xensource.com>


[-- Attachment #1.1: Type: text/plain, Size: 5603 bytes --]

On Fri, Jul 18, 2014 at 7:48 AM, Wei Liu <wei.liu2@citrix.com> wrote:

> On Fri, Jul 18, 2014 at 12:13:36PM +0200, Dario Faggioli wrote:
> > On ven, 2014-07-18 at 10:53 +0100, Wei Liu wrote:
> > > Hi! Another new series!
> > >
> > :-)
> >
> > > On Fri, Jul 18, 2014 at 01:49:59AM -0400, Elena Ufimtseva wrote:
> >
> > > > The workaround is to specify cpuid in config file and not use SMT.
> But soon I will come up
> > > > with some other acceptable solution.
> > > >
> > >
> > For Elena, workaround like what?
>

In the workaround I used I configured vcpus (as we have ht/smt turned on)
caches as this:



>  >
> > > I've also encountered this. I suspect that even if you disble SMT with
> > > cpuid in config file, the cpu topology in guest might still be wrong.
> > >
> > Can I ask why?
> >
>
> Because for a PV guest (currently) the guest kernel sees the real "ID"s
> for a cpu. See those "ID"s I change in my hacky patch.
>

Yep, thats what I see as well.

>
> > > What do hwloc-ls and lscpu show? Do you see any weird topology like one
> > > core belongs to one node while three belong to another?
> > >
> > Yep, that would be interesting to see.
> >
> > >  (I suspect not
> > > because your vcpus are already pinned to a specific node)
> > >
> > Sorry, I'm not sure I follow here... Are you saying that things probably
> > works ok, but that is (only) because of pinning?
>
> Yes, given that you derive numa memory allocation from cpu pinning or
> use combination of cpu pinning, vcpu to vnode map and vnode to pnode
> map, in those cases those IDs might reflect the right topology.
>
> >
> > I may be missing something here, but would it be possible to at least
> > try to make sure that the virtual topology and the topology related
> > content of CPUID actually agree? And I mean doing it automatically (if
>
> This is what I'm doing in my hack. :-)
>
> > only one of the two is specified) and to either error or warn if that is
> > not possible (if both are specified and they disagree)?
> >
> > I admit I'm not a CPUID expert, but I always thought this could be a
> > good solution...
> >
> > > What I did was to manipulate various "id"s in Linux kernel, so that I
> > > create a topology like 1 core : 1 cpu : 1 socket mapping.
> > >
> > And how this topology maps/interact with the virtual topology we want
> > the guest to have?
> >
>
> Say you have a two nodes guest, with 4 vcpus, you now have two sockets
> per node, each socket has one cpu, each cpu has one core.
>
> Node 0:
>   Socket 0:
>     CPU0:
>       Core 0
>   Socket 1:
>     CPU 1:
>       Core 1
> Node 1:
>   Socket 2:
>     CPU 2:
>       Core 2
>   Socket 3:
>     CPU 3:
>       Core 3
>
> > > In that case
> > > guest scheduler won't be able to make any assumption on individual CPU
> > > sharing caches with each other.
> > >
> > And, apart from SMT, what topology does the guest see then?
> >
>
> See above.
>
> > In any case, if this only alter SMT-ness (where "alter"="disable"), I
> > think that is fine too. What I'm failing at seeing is whether and why
> > this approach is more powerful than manipulating CPUID from config file.
> >
> > I'm insisting because, if they'd be equivalent, in terms of results, I
> > think it's easier, cleaner and more correct to deal with CPUID in xl and
> > libxl (automatically or semi-automatically).
> >
>
> SMT is just one aspect of the story that easily surfaces.
>
> In my opinion, if we don't manually create some kind of topology for the
> guest, the guest might end up with something weird. For example, if you
> have a 2 nodes, 4 sockets, 8 cpus, 8 cores system, you might have
>
> Node 0:
>   Socket 0
>     CPU0
>   Socket 1
>     CPU1
> Node 1:
>   Socket 2
>     CPU 3
>     CPU 4
>
> which all stems from guest having knowledge of real CPU "ID"s.
>
> And this topology is just wrong, it might just be the one during guest
> creation. Xen is free to schedule vcpus to different pcpus, so guest
> scheduler will make wrong decision based on errnous information.
>
> That's why I chose to have 1 core : 1 cpu : 1 socket mapping, so that
> guest makes no assumption on cache sharing etc. It's suboptimal but
> should provide predictable average performance. What do you think?
>

Running lstopo with vNUMA enabled in guest with 4 vnodes, 8 vcpus:
root@heatpipe:~# lstopo

Machine (7806MB) + L3 L#0 (7806MB 10MB) + L2 L#0 (7806MB 256KB) + L1d L#0
(7806MB 32KB) + L1i L#0 (7806MB 32KB)
  NUMANode L#0 (P#0 1933MB) + Socket L#0
    Core L#0 + PU L#0 (P#0)
    Core L#1 + PU L#1 (P#4)
  NUMANode L#1 (P#1 1967MB) + Socket L#1
    Core L#2 + PU L#2 (P#1)
    Core L#3 + PU L#3 (P#5)
  NUMANode L#2 (P#2 1969MB) + Socket L#2
    Core L#4 + PU L#4 (P#2)
    Core L#5 + PU L#5 (P#6)
  NUMANode L#3 (P#3 1936MB) + Socket L#3
    Core L#6 + PU L#6 (P#3)
    Core L#7 + PU L#7 (P#7)

Basically, L2 and L1 are shared between nodes :)

I have manipulated cache sharing options before in cpuid but I agree with
Wei its just a part of the problem.
Along with number of logical processor numbers (if HT is enabled), I guess
we need to construct apic ids (if its not done yet, I could not find it) and
cache sharing cpuids maybe needed, taking into account pinning if set.

Like its described here:
https://software.intel.com/en-us/articles/methods-to-utilize-intels-hyper-threading-technology-with-linux

"The Initial APIC ID is composed of the physical processor's ID and the
logical processor's ID within the physical processor. The least significant
bits of the APIC ID are used to identify the logical processors within a
single physical processor."




>
> Wei.
>



-- 
Elena

[-- Attachment #1.2: Type: text/html, Size: 8169 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

  reply	other threads:[~2014-07-20 14:57 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-07-18  5:49 [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 01/10] xen: vnuma topology and subop hypercalls Elena Ufimtseva
2014-07-18 10:30   ` Wei Liu
2014-07-20 13:16     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-22 15:18         ` Dario Faggioli
2014-07-23  5:33           ` Elena Ufimtseva
2014-07-18 13:49   ` Konrad Rzeszutek Wilk
2014-07-20 13:26     ` Elena Ufimtseva
2014-07-22 15:14   ` Dario Faggioli
2014-07-23  5:22     ` Elena Ufimtseva
2014-07-23 14:06   ` Jan Beulich
2014-07-25  4:52     ` Elena Ufimtseva
2014-07-25  7:33       ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 02/10] xsm bits for vNUMA hypercalls Elena Ufimtseva
2014-07-18 13:50   ` Konrad Rzeszutek Wilk
2014-07-18 15:26     ` Daniel De Graaf
2014-07-20 13:48       ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 03/10] vnuma hook to debug-keys u Elena Ufimtseva
2014-07-23 14:10   ` Jan Beulich
2014-07-18  5:50 ` [PATCH v6 04/10] libxc: Introduce xc_domain_setvnuma to set vNUMA Elena Ufimtseva
2014-07-18 10:33   ` Wei Liu
2014-07-29 10:33   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 05/10] libxl: vnuma topology configuration parser and doc Elena Ufimtseva
2014-07-18 10:53   ` Wei Liu
2014-07-20 14:04     ` Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-29 10:42   ` Ian Campbell
2014-08-06  4:46     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 06/10] libxc: move code to arch_boot_alloc func Elena Ufimtseva
2014-07-29 10:38   ` Ian Campbell
2014-07-18  5:50 ` [PATCH v6 07/10] libxc: allocate domain memory for vnuma enabled Elena Ufimtseva
2014-07-29 10:43   ` Ian Campbell
2014-08-06  4:48     ` Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 08/10] libxl: build numa nodes memory blocks Elena Ufimtseva
2014-07-18 11:01   ` Wei Liu
2014-07-20 12:58     ` Elena Ufimtseva
2014-07-20 15:59       ` Wei Liu
2014-07-18  5:50 ` [PATCH v6 09/10] libxl: vnuma nodes placement bits Elena Ufimtseva
2014-07-18  5:50 ` [PATCH v6 10/10] libxl: set vnuma for domain Elena Ufimtseva
2014-07-18 10:58   ` Wei Liu
2014-07-29 10:45   ` Ian Campbell
2014-08-12  3:52     ` Elena Ufimtseva
2014-08-12  9:42       ` Wei Liu
2014-08-12 17:10         ` Dario Faggioli
2014-08-12 17:13           ` Wei Liu
2014-08-12 17:24             ` Elena Ufimtseva
2014-07-18  6:16 ` [PATCH v6 00/10] vnuma introduction Elena Ufimtseva
2014-07-18  9:53 ` Wei Liu
2014-07-18 10:13   ` Dario Faggioli
2014-07-18 11:48     ` Wei Liu
2014-07-20 14:57       ` Elena Ufimtseva [this message]
2014-07-22 15:49         ` Dario Faggioli
2014-07-22 14:03       ` Dario Faggioli
2014-07-22 14:48         ` Wei Liu
2014-07-22 15:06           ` Dario Faggioli
2014-07-22 16:47             ` Wei Liu
2014-07-22 19:43         ` Is: cpuid creation of PV guests is not correct. Was:Re: " Konrad Rzeszutek Wilk
2014-07-22 22:34           ` Is: cpuid creation of PV guests is not correct Andrew Cooper
2014-07-22 22:53           ` Is: cpuid creation of PV guests is not correct. Was:Re: [PATCH v6 00/10] vnuma introduction Dario Faggioli
2014-07-23  6:00             ` Elena Ufimtseva
2014-07-22 12:49 ` Dario Faggioli
2014-07-23  5:59   ` Elena Ufimtseva

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEr7rXgtv=3xP3CRgKr_7z-HjKMj4Bb0PQ+kd9JAnAMXa1XDsw@mail.gmail.com' \
    --to=ufimtseva@gmail.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=ian.jackson@eu.citrix.com \
    --cc=keir@xen.org \
    --cc=lccycc123@gmail.com \
    --cc=msw@linux.com \
    --cc=stefano.stabellini@eu.citrix.com \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.