Re: [PATCH v6 00/10] vnuma introduction

From: Dario Faggioli <dario.faggioli@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>
Cc: keir@xen.org, Ian.Campbell@citrix.com,
	stefano.stabellini@eu.citrix.com, george.dunlap@eu.citrix.com,
	msw@linux.com, lccycc123@gmail.com, ian.jackson@eu.citrix.com,
	xen-devel@lists.xen.org, JBeulich@suse.com,
	Elena Ufimtseva <ufimtseva@gmail.com>
Subject: Re: [PATCH v6 00/10] vnuma introduction
Date: Tue, 22 Jul 2014 17:06:37 +0200	[thread overview]
Message-ID: <1406041597.17850.74.camel@Solace> (raw)
In-Reply-To: <20140722144846.GB6448@zion.uk.xensource.com>

[-- Attachment #1.1: Type: text/plain, Size: 6228 bytes --]

On mar, 2014-07-22 at 15:48 +0100, Wei Liu wrote:
> On Tue, Jul 22, 2014 at 04:03:44PM +0200, Dario Faggioli wrote:

> > I mean, even right now, PV guests see completely random cache-sharing
> > topology, and that does (at least potentially) affect performance, as
> > the guest scheduler will make incorrect/inconsistent assumptions.
> > 
> 
> Correct. It's just that it might be more obvious to see the problem with
> vNUMA.
> 
Yep.

> > > Yes, given that you derive numa memory allocation from cpu pinning or
> > > use combination of cpu pinning, vcpu to vnode map and vnode to pnode
> > > map, in those cases those IDs might reflect the right topology.
> > > 
> > Well, pinning does (should?) not always happen, as a consequence of a
> > virtual topology being used.
> > 
> 
> That's true. I was just referring to the current status of the patch
> series. AIUI that's how it is implemented now, not necessary the way it
> has to be.
> 
Ok.

> > With the following guest configuration, in terms of vcpu pinning:
> > 
> > 1) 2 vCPUs ==> same pCPUs
> 
> 4 vcpus, I think.
> 
> > root@benny:~# xl vcpu-list 
> > Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
> > debian.guest.osstest                 9     0    0   -b-       2.7  0
> > debian.guest.osstest                 9     1    0   -b-       5.2  0
> > debian.guest.osstest                 9     2    7   -b-       2.4  7
> > debian.guest.osstest                 9     3    7   -b-       4.4  7
> > 
What I meant with "2 vCPUs" was that I was putting 2 vCPUs of the guest
(0 and 1) on the same pCPU (0), and the other 2 (2 and 3) on another
(7).

That should have meant a topology that does not share at least the least
cache level in the guest, but it is not.

> > 2) no SMT
> > root@benny:~# xl vcpu-list 
> > Name                                ID  VCPU   CPU State   Time(s) CPU
> > Affinity
> > debian.guest.osstest                11     0    0   -b-       0.6  0
> > debian.guest.osstest                11     1    2   -b-       0.4  2
> > debian.guest.osstest                11     2    4   -b-       1.5  4
> > debian.guest.osstest                11     3    6   -b-       0.5  6
> > 
> > 3) Random
> > root@benny:~# xl vcpu-list 
> > Name                                ID  VCPU   CPU State   Time(s) CPU
> > Affinity
> > debian.guest.osstest                12     0    3   -b-       1.6  all
> > debian.guest.osstest                12     1    1   -b-       1.4  all
> > debian.guest.osstest                12     2    5   -b-       2.4  all
> > debian.guest.osstest                12     3    7   -b-       1.5  all
> > 
> > 4) yes SMT
> > root@benny:~# xl vcpu-list
> > Name                                ID  VCPU   CPU State   Time(s) CPU
> > Affinity
> > debian.guest.osstest                14     0    1   -b-       1.0  1
> > debian.guest.osstest                14     1    2   -b-       1.8  2
> > debian.guest.osstest                14     2    6   -b-       1.1  6
> > debian.guest.osstest                14     3    7   -b-       0.8  7
> > 
> > And, in *all* these 4 cases, here's what I see:
> > 
> > root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/core_siblings_list
> > 0-3
> > 0-3
> > 0-3
> > 0-3
> > 
> > root@debian:~# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list
> > 0-3
> > 0-3
> > 0-3
> > 0-3
> > 
> > root@debian:~# lstopo
> > Machine (488MB) + Socket L#0 + L3 L#0 (8192KB) + L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
> >   PU L#0 (P#0)
> >   PU L#1 (P#1)
> >   PU L#2 (P#2)
> >   PU L#3 (P#3)
> > 
> 
> I won't be surprised if guest builds up a wrong topology, as what real
> "ID"s it sees depends very much on what pcpus you pick.
> 
Exactly, but if I pin all the guest vCPUs on specific host pCPUs from
the very beginning (pinning specified in the config file, which is what
I'm doing), I should be able to control that...

> Have you tried pinning vcpus to pcpus [0, 1, 2, 3]? That way you should
> be able to see the same topology as the one you saw in Dom0?
> 
Well, at least some of the examples above should have shown some
non-shared cache levels already. Anyway, here it comes:

root@benny:~# xl vcpu-list 
Name                                ID  VCPU   CPU State   Time(s) CPU Affinity
debian.guest.osstest                15     0    0   -b-       1.8  0
debian.guest.osstest                15     1    1   -b-       0.7  1
debian.guest.osstest                15     2    2   -b-       0.6  2
debian.guest.osstest                15     3    3   -b-       0.7  3

root@debian:~# hwloc-ls --of console
Machine (488MB) + Socket L#0 + L3 L#0 (8192KB) + L2 L#0 (256KB) + L1 L#0
(32KB) + Core L#0
  PU L#0 (P#0)
  PU L#1 (P#1)
  PU L#2 (P#2)
  PU L#3 (P#3)

root@debian:~# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    4
Core(s) per socket:    1
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 60
Stepping:              3
CPU MHz:               3591.780
BogoMIPS:              7183.56
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K

So, no, that is not giving the same result as in Dom0. :-(

> > This is not the case for dom0 where (I booted with dom0_max_vcpus=4 on
> > the xen command line) I see this:
> > 
> 
> I guess this is because you're basically picking pcpu 0-3 for Dom0. It
> doesn't matter if you pin them or not.
> 
That makes total sense, and in fact, I was not surprised about Dom0
looking like this... I rather am about not being able to get a similar
topology for the guest, no matter how I pin it... :-/

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel