linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Topology updates and NUMA-level sched domains
@ 2015-04-06 21:45 Nishanth Aravamudan
  2015-04-07 10:21 ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Nishanth Aravamudan @ 2015-04-06 21:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, linux-kernel, Srikar Dronamraju, Boqun Feng,
	Anshuman Khandual, linuxppc-dev

Hi Peter,

As you are very aware, I think, power has some odd NUMA topologies (and
changes to the those topologies) at run-time. In particular, we can see
a topology at boot:

Node 0: all Cpus
Node 7: no cpus

Then we get a notification from the hypervisor that a core (or two) have
moved from node 0 to node 7. This results in the:

[   64.496687] BUG: arch topology borken
[   64.496689]      the CPU domain not a subset of the NUMA domain

messages for each moved CPU. I think this is because when we first came
up, we degrade (elide altogether?) the NUMA domain for node 7 as it has
no CPUs:

[    0.305823] CPU0 attaching sched-domain:
[    0.305831]  domain 0: span 0-7 level SIBLING
[    0.305834]   groups: 0 (cpu_power = 146) 1 (cpu_power = 146) 2
(cpu_power = 146) 3 (cpu_power = 146) 4 (cpu_power = 146) 5 (cpu_power =
146) 6 (cpu_power = 146) 7 (cpu_power = 146)
[    0.305854]   domain 1: span 0-79 level CPU
[    0.305856]    groups: 0-7 (cpu_power = 1168) 8-15 (cpu_power = 1168)
16-23 (cpu_power = 1168) 24-31 (cpu_power = 1168) 32-39 (cpu_power =
1168) 40-47 (cpu_power = 1168) 48-55 (cpu_power = 1168) 56-63 (cpu_power
= 1168) 64-71 (cpu_power = 1168) 72-79 (cpu_power = 1168)

For those cpus that moved, we get after the update:

[   64.505819] CPU8 attaching sched-domain:
[   64.505821]  domain 0: span 8-15 level SIBLING
[   64.505823]   groups: 8 (cpu_power = 147) 9 (cpu_power = 147) 10
(cpu_power = 147) 11 (cpu_power = 146) 12 (cpu_power = 147) 13
(cpu_power = 147) 14 (cpu_power = 146) 15 (cpu_power = 147)
[   64.505842]   domain 1: span 8-23,72-79 level CPU
[   64.505845]    groups: 8-15 (cpu_power = 1174) 16-23 (cpu_power =
1175) 72-79 (cpu_power = 1176)

while the non-modified CPUs report, correctly:

[   64.497186] CPU0 attaching sched-domain:
[   64.497189]  domain 0: span 0-7 level SIBLING
[   64.497192]   groups: 0 (cpu_power = 147) 1 (cpu_power = 147) 2
(cpu_power = 146) 3 (cpu_power = 147) 4 (cpu_power = 147) 5 (cpu_power =
147) 6 (cpu_power = 147) 7 (cpu_power = 146)
[   64.497213]   domain 1: span 0-7,24-71 level CPU
[   64.497215]    groups: 0-7 (cpu_power = 1174) 24-31 (cpu_power =
1173) 32-39 (cpu_power = 1176) 40-47 (cpu_power = 1175) 48-55 (cpu_power
= 1176) 56-63 (cpu_power = 1175) 64-71 (cpu_power = 1174)
[   64.497234]    domain 2: span 0-79 level NUMA
[   64.497236]     groups: 0-7,24-71 (cpu_power = 8223) 8-23,72-79
(cpu_power = 3525)

It seems like we might need something like this (HORRIBLE HACK, I know,
just to get discussion):

@@ -6958,6 +6960,10 @@ void partition_sched_domains(int ndoms_new,
cpumask_var_t doms_new[],
 
        /* Let architecture update cpu core mappings. */
        new_topology = arch_update_cpu_topology();
+       /* Update NUMA topology lists */
+       if (new_topology) {
+               sched_init_numa();
+       }
 
        n = doms_new ? ndoms_new : 0;

or a re-init API (which won't try to reallocate various bits), because
the topology could be completely different now (e.g.,
sched_domains_numa_distance will also be inaccurate now).  Really, a
topology update on power (not sure on s390x, but those are the only two
archs that return a positive value from arch_update_cpu_topology() right
now, afaics) is a lot like a hotplug event and we need to re-initialize
any dependent structures.

I'm just sending out feelers, as we can limp by with the above warning,
it seems, but is less than ideal. Any help or insight you could provide
would be greatly appreciated!

-Nish


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-04-10 20:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-06 21:45 Topology updates and NUMA-level sched domains Nishanth Aravamudan
2015-04-07 10:21 ` Peter Zijlstra
2015-04-07 17:14   ` Nishanth Aravamudan
2015-04-07 19:41     ` Peter Zijlstra
2015-04-08 10:32       ` Brice Goglin
2015-04-08 10:52         ` Peter Zijlstra
2015-04-09 22:40           ` Nishanth Aravamudan
2015-04-09 22:37         ` Nishanth Aravamudan
2015-04-09 22:29       ` Nishanth Aravamudan
2015-04-10  8:31         ` Peter Zijlstra
2015-04-10  9:08           ` Peter Zijlstra
2015-04-10 19:50             ` Nishanth Aravamudan
2015-04-10 20:30           ` Nishanth Aravamudan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).