All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] powerpc/topology: Check at boot for topology updates
@ 2018-08-10  6:56 Srikar Dronamraju
  2018-08-10 11:42 ` Michael Ellerman
  0 siblings, 1 reply; 4+ messages in thread
From: Srikar Dronamraju @ 2018-08-10  6:56 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman
  Cc: Michael Bringmann, Manjunatha H R, Srikar Dronamraju, Anshuman Khandual

On a shared lpar, Phyp will not update the cpu associativity at boot
time. Just after the boot system does recognize itself as a shared lpar and
trigger a request for correct cpu associativity. But by then the scheduler
would have already created/destroyed its sched domains.

This causes
- Broken load balance across Nodes causing islands of cores.
- Performance degradation esp if the system is lightly loaded
- dmesg to wrongly report all cpus to be in Node 0.
- Messages in dmesg saying borken topology.
- With commit 051f3ca02e46 ("sched/topology: Introduce NUMA identity
  node sched domain"), can cause rcu stalls at boot up.

>From a scheduler maintainer's perspective, moving cpus from one node to
another or creating more numa levels after boot is not appropriate
without some notification to the user space.
https://lore.kernel.org/lkml/20150406214558.GA38501@linux.vnet.ibm.com/T/#u

The sched_domains_numa_masks table which is used to generate cpumasks is
only created at boot time just before creating sched domains and never
updated.  Hence, its better to get the topology correct before the sched
domains are created.

For example on 64 core Power 8 shared lpar, dmesg reports

[    2.088360] Brought up 512 CPUs
[    2.088368] Node 0 CPUs: 0-511
[    2.088371] Node 1 CPUs:
[    2.088373] Node 2 CPUs:
[    2.088375] Node 3 CPUs:
[    2.088376] Node 4 CPUs:
[    2.088378] Node 5 CPUs:
[    2.088380] Node 6 CPUs:
[    2.088382] Node 7 CPUs:
[    2.088386] Node 8 CPUs:
[    2.088388] Node 9 CPUs:
[    2.088390] Node 10 CPUs:
[    2.088392] Node 11 CPUs:
...
[    3.916091] BUG: arch topology borken
[    3.916103]      the DIE domain not a subset of the NUMA domain
[    3.916105] BUG: arch topology borken
[    3.916106]      the DIE domain not a subset of the NUMA domain
...

numactl/lscpu output will still be correct with cores spreading across
all nodes.

Socket(s):             64
NUMA node(s):          12
Model:                 2.0 (pvr 004d 0200)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     pHyp
Virtualization type:   para
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
NUMA node11 CPU(s):    160-167,256-263,352-359,448-455

Currently on this lpar, the scheduler detects 2 levels of Numa and
created numa sched domains for all cpus, but it finds a single DIE
domain consisting of all cpus. Hence it deletes all numa sched domains.

To address this, split the topology update init, such that the first
part detects vphn/prrn soon after cpus are setup and force updates
topology just before scheduler creates sched domain.

With the fix, dmesg reports

[    0.491336] numa: Node 0 CPUs: 0-7 32-39 64-71 96-103 176-183 272-279 368-375 464-471
[    0.491351] numa: Node 1 CPUs: 8-15 40-47 72-79 104-111 184-191 280-287 376-383 472-479
[    0.491359] numa: Node 2 CPUs: 16-23 48-55 80-87 112-119 192-199 288-295 384-391 480-487
[    0.491366] numa: Node 3 CPUs: 24-31 56-63 88-95 120-127 200-207 296-303 392-399 488-495
[    0.491374] numa: Node 4 CPUs: 208-215 304-311 400-407 496-503
[    0.491379] numa: Node 5 CPUs: 168-175 264-271 360-367 456-463
[    0.491384] numa: Node 6 CPUs: 128-135 224-231 320-327 416-423
[    0.491389] numa: Node 7 CPUs: 136-143 232-239 328-335 424-431
[    0.491394] numa: Node 8 CPUs: 216-223 312-319 408-415 504-511
[    0.491399] numa: Node 9 CPUs: 144-151 240-247 336-343 432-439
[    0.491404] numa: Node 10 CPUs: 152-159 248-255 344-351 440-447
[    0.491409] numa: Node 11 CPUs: 160-167 256-263 352-359 448-455

and lscpu would also report

Socket(s):             64
NUMA node(s):          12
Model:                 2.0 (pvr 004d 0200)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     pHyp
Virtualization type:   para
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s): 0-7,32-39,64-71,96-103,176-183,272-279,368-375,464-471
NUMA node1 CPU(s): 8-15,40-47,72-79,104-111,184-191,280-287,376-383,472-479
NUMA node2 CPU(s): 16-23,48-55,80-87,112-119,192-199,288-295,384-391,480-487
NUMA node3 CPU(s): 24-31,56-63,88-95,120-127,200-207,296-303,392-399,488-495
NUMA node4 CPU(s):     208-215,304-311,400-407,496-503
NUMA node5 CPU(s):     168-175,264-271,360-367,456-463
NUMA node6 CPU(s):     128-135,224-231,320-327,416-423
NUMA node7 CPU(s):     136-143,232-239,328-335,424-431
NUMA node8 CPU(s):     216-223,312-319,408-415,504-511
NUMA node9 CPU(s):     144-151,240-247,336-343,432-439
NUMA node10 CPU(s):    152-159,248-255,344-351,440-447
NUMA node11 CPU(s):    160-167,256-263,352-359,448-455

Previous attempt to solve this problem
https://patchwork.ozlabs.org/patch/530090/

Reported-by: Manjunatha H R <manjuhr1@in.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1->v2
Fix compile warnings and checkpatch issues.
Changelog v2->v3
Fix compile warnings on !CONFIG_SMP

 arch/powerpc/include/asm/topology.h |  5 +++++
 arch/powerpc/kernel/smp.c           |  6 ++++++
 arch/powerpc/mm/numa.c              | 22 ++++++++++++++--------
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 16b077801a5f..70f2d2285ba7 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -92,6 +92,7 @@ extern int stop_topology_update(void);
 extern int prrn_is_enabled(void);
 extern int find_and_online_cpu_nid(int cpu);
 extern int timed_topology_update(int nsecs);
+extern void __init check_topology_updates(void);
 #else
 static inline int start_topology_update(void)
 {
@@ -113,6 +114,10 @@ static inline int timed_topology_update(int nsecs)
 {
 	return 0;
 }
+
+#ifdef CONFIG_SMP
+static inline void check_topology_updates(void) {}
+#endif
 #endif /* CONFIG_NUMA && CONFIG_PPC_SPLPAR */
 
 #include <asm-generic/topology.h>
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 4794d6b4f4d2..2aa0ffd954c9 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1156,6 +1156,12 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	if (smp_ops && smp_ops->bringup_done)
 		smp_ops->bringup_done();
 
+	/*
+	 * On a shared LPAR, associativity needs to be requested.
+	 * Hence, check for numa topology updates before dumping
+	 * cpu topology
+	 */
+	check_topology_updates();
 	dump_numa_cpu_topology();
 
 	/*
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 0c7e05d89244..32c13a208589 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1515,6 +1515,7 @@ int start_topology_update(void)
 		   lppaca_shared_proc(get_lppaca())) {
 		if (!vphn_enabled) {
 			vphn_enabled = 1;
+			topology_update_needed = 1;
 			setup_cpu_associativity_change_counters();
 			timer_setup(&topology_timer, topology_timer_fn,
 				    TIMER_DEFERRABLE);
@@ -1551,6 +1552,19 @@ int prrn_is_enabled(void)
 	return prrn_enabled;
 }
 
+void __init check_topology_updates(void)
+{
+	/* Do not poll for changes if disabled at boot */
+	if (topology_updates_enabled)
+		start_topology_update();
+
+	if (topology_update_needed) {
+		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
+			    nr_cpumask_bits);
+		numa_update_cpu_topology(false);
+	}
+}
+
 static int topology_read(struct seq_file *file, void *v)
 {
 	if (vphn_enabled || prrn_enabled)
@@ -1597,10 +1611,6 @@ static const struct file_operations topology_ops = {
 
 static int topology_update_init(void)
 {
-	/* Do not poll for changes if disabled at boot */
-	if (topology_updates_enabled)
-		start_topology_update();
-
 	if (vphn_enabled)
 		topology_schedule_update();
 
@@ -1608,10 +1618,6 @@ static int topology_update_init(void)
 		return -ENOMEM;
 
 	topology_inited = 1;
-	if (topology_update_needed)
-		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
-					nr_cpumask_bits);
-
 	return 0;
 }
 device_initcall(topology_update_init);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] powerpc/topology: Check at boot for topology updates
  2018-08-10  6:56 [PATCH v3] powerpc/topology: Check at boot for topology updates Srikar Dronamraju
@ 2018-08-10 11:42 ` Michael Ellerman
  2018-08-10 13:21   ` Srikar Dronamraju
  0 siblings, 1 reply; 4+ messages in thread
From: Michael Ellerman @ 2018-08-10 11:42 UTC (permalink / raw)
  To: Srikar Dronamraju, linuxppc-dev
  Cc: Michael Bringmann, Manjunatha H R, Srikar Dronamraju, Anshuman Khandual

Hi Srikar,

Thanks for debugging this.

Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> index 16b077801a5f..70f2d2285ba7 100644
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -113,6 +114,10 @@ static inline int timed_topology_update(int nsecs)
>  {
>  	return 0;
>  }
> +
> +#ifdef CONFIG_SMP
> +static inline void check_topology_updates(void) {}
> +#endif

I realise that's only called from smp.c but it doesn't really need to be
inside #ifdef CONFIG_SMP, it's harmless to have an empty version for
SMP=n.

> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 4794d6b4f4d2..2aa0ffd954c9 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1156,6 +1156,12 @@ void __init smp_cpus_done(unsigned int max_cpus)
>  	if (smp_ops && smp_ops->bringup_done)
>  		smp_ops->bringup_done();
>  
> +	/*
> +	 * On a shared LPAR, associativity needs to be requested.
> +	 * Hence, check for numa topology updates before dumping
> +	 * cpu topology
> +	 */
> +	check_topology_updates();

I get that you're calling it here because you want to get it correct
before we do the dump below, but we could move the dump later also.

You mention we need to do the check before the sched domains are
initialised, but where exactly does that happen?

> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 0c7e05d89244..32c13a208589 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1515,6 +1515,7 @@ int start_topology_update(void)
>  		   lppaca_shared_proc(get_lppaca())) {
>  		if (!vphn_enabled) {
>  			vphn_enabled = 1;
> +			topology_update_needed = 1;

I really don't understand what topology_update_needed is for.
We set it here.

>  			setup_cpu_associativity_change_counters();
>  			timer_setup(&topology_timer, topology_timer_fn,
>  				    TIMER_DEFERRABLE);
> @@ -1551,6 +1552,19 @@ int prrn_is_enabled(void)
>  	return prrn_enabled;
>  }
>  
> +void __init check_topology_updates(void)
> +{
> +	/* Do not poll for changes if disabled at boot */
> +	if (topology_updates_enabled)
> +		start_topology_update();

Here we call start_topology_update(), which might set
topology_update_needed to 1.

> +
> +	if (topology_update_needed) {

And then we check it here?
Why is this logic not *in* start_topology_update() ?

> +		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
> +			    nr_cpumask_bits);
> +		numa_update_cpu_topology(false);
> +	}
> +}

I'm not really clear on what check_topology_update() is responsible for
doing vs topology_update_init(). Why do we need both?

> @@ -1608,10 +1618,6 @@ static int topology_update_init(void)
>  		return -ENOMEM;
>  
>  	topology_inited = 1;

What does that mean? Didn't we already do the init earlier?

> -	if (topology_update_needed)
> -		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
> -					nr_cpumask_bits);
> -
>  	return 0;
>  }
>  device_initcall(topology_update_init);


cheers

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] powerpc/topology: Check at boot for topology updates
  2018-08-10 11:42 ` Michael Ellerman
@ 2018-08-10 13:21   ` Srikar Dronamraju
  2018-08-10 14:50     ` Michael Bringmann
  0 siblings, 1 reply; 4+ messages in thread
From: Srikar Dronamraju @ 2018-08-10 13:21 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, Michael Bringmann, Manjunatha H R, Anshuman Khandual

* Michael Ellerman <mpe@ellerman.id.au> [2018-08-10 21:42:28]:

> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
> > diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
> > index 16b077801a5f..70f2d2285ba7 100644
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -113,6 +114,10 @@ static inline int timed_topology_update(int nsecs)
> >  {
> >  	return 0;
> >  }
> > +
> > +#ifdef CONFIG_SMP
> > +static inline void check_topology_updates(void) {}
> > +#endif
> 
> I realise that's only called from smp.c but it doesn't really need to be
> inside #ifdef CONFIG_SMP, it's harmless to have an empty version for
> SMP=n.
> 

I did that just to fix
https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/577//


> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 4794d6b4f4d2..2aa0ffd954c9 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1156,6 +1156,12 @@ void __init smp_cpus_done(unsigned int max_cpus)
> >  	if (smp_ops && smp_ops->bringup_done)
> >  		smp_ops->bringup_done();
> >  
> > +	/*
> > +	 * On a shared LPAR, associativity needs to be requested.
> > +	 * Hence, check for numa topology updates before dumping
> > +	 * cpu topology
> > +	 */
> > +	check_topology_updates();
> 
> I get that you're calling it here because you want to get it correct
> before we do the dump below, but we could move the dump later also.
> 
> You mention we need to do the check before the sched domains are
> initialised, but where exactly does that happen?

Almost immediately after smp_cpus_done, 

kernel_init_freeable() calls smp_init() and sched_init_smp() back to
back.

smp_init() calls smp_cpus_done()
sched_init_smp() calls sched_init_numa() and sched_init_domains()

sched_init_numa() initialises sched_domains_numa_masks which is the
cpumask that matters the most in our discussions.
sched_init_domains initialised the sched_domain.

So I dont think we can move any later.

> 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 0c7e05d89244..32c13a208589 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1515,6 +1515,7 @@ int start_topology_update(void)
> >  		   lppaca_shared_proc(get_lppaca())) {
> >  		if (!vphn_enabled) {
> >  			vphn_enabled = 1;
> > +			topology_update_needed = 1;
> 
> I really don't understand what topology_update_needed is for.
> We set it here.

topology_update_needed is even set by numa_update_cpu_topology which
mighet get called from rtasd.

> 
> >  			setup_cpu_associativity_change_counters();
> >  			timer_setup(&topology_timer, topology_timer_fn,
> >  				    TIMER_DEFERRABLE);
> > @@ -1551,6 +1552,19 @@ int prrn_is_enabled(void)
> >  	return prrn_enabled;
> >  }
> >  
> > +void __init check_topology_updates(void)
> > +{
> > +	/* Do not poll for changes if disabled at boot */
> > +	if (topology_updates_enabled)
> > +		start_topology_update();
> 
> Here we call start_topology_update(), which might set
> topology_update_needed to 1.
> 
> > +
> > +	if (topology_update_needed) {
> 
> And then we check it here?
> Why is this logic not *in* start_topology_update() ?

I have split topology_update_init into two parts.  First part being
check_topology_update() and the next part being topology_update_init().

By the time the current topology_update_init() gets called,
sched_domains are already created. So we need to move this part before.

However the scheduled topology updates because of vphn can wait, atleast
that how the current code is written. topology_inited indicates if the
scheduled topology updates have been setup.

> 
> > +		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
> > +			    nr_cpumask_bits);
> > +		numa_update_cpu_topology(false);
> > +	}
> > +}
> 
> I'm not really clear on what check_topology_update() is responsible for
> doing vs topology_update_init(). Why do we need both?

I am okay to move the complete topology_update_init above and delete the
device_initcall. But then we would be moving the polling of vphn events
to a little bit earlier stage in the initialisation.

> 
> > @@ -1608,10 +1618,6 @@ static int topology_update_init(void)
> >  		return -ENOMEM;
> >  
> >  	topology_inited = 1;
> 
> What does that mean? Didn't we already do the init earlier?

If there is an explicit request for topology update via
numa_update_cpu_topology() even before we looked at the feature flags
and set vphn_enabled/prrn_enabled, then we would defer handle it till
the feature flags are checked. topology_inited helps us in achieving
this.

> 
> > -	if (topology_update_needed)
> > -		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
> > -					nr_cpumask_bits);
> > -
> >  	return 0;
> >  }
> >  device_initcall(topology_update_init);

Thanks for the review.

-- 
Thanks and Regards
Srikar

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] powerpc/topology: Check at boot for topology updates
  2018-08-10 13:21   ` Srikar Dronamraju
@ 2018-08-10 14:50     ` Michael Bringmann
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Bringmann @ 2018-08-10 14:50 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: linuxppc-dev, Manjunatha H R, Anshuman Khandual

On 08/10/2018 08:21 AM, Srikar Dronamraju wrote:
> * Michael Ellerman <mpe@ellerman.id.au> [2018-08-10 21:42:28]:
>
>> Srikar Dronamraju <srikar@linux.vnet.ibm.com> writes:
>>> diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include=
/asm/topology.h
>>> index 16b077801a5f..70f2d2285ba7 100644
>>> --- a/arch/powerpc/include/asm/topology.h
>>> +++ b/arch/powerpc/include/asm/topology.h
>>> @@ -113,6 +114,10 @@ static inline int timed_topology_update(int nsecs)
>>>  {
>>>  	return 0;
>>>  }
>>> +
>>> +#ifdef CONFIG_SMP
>>> +static inline void check_topology_updates(void) {}
>>> +#endif
>> I realise that's only called from smp.c but it doesn't really need to be
>> inside #ifdef CONFIG_SMP, it's harmless to have an empty version for
>> SMP=3Dn.
>>
> I did that just to fix
> https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/577//
>
>
>>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>>> index 4794d6b4f4d2..2aa0ffd954c9 100644
>>> --- a/arch/powerpc/kernel/smp.c
>>> +++ b/arch/powerpc/kernel/smp.c
>>> @@ -1156,6 +1156,12 @@ void __init smp_cpus_done(unsigned int max_cpus)
>>>  	if (smp_ops && smp_ops->bringup_done)
>>>  		smp_ops->bringup_done();
>>>=20=20
>>> +	/*
>>> +	 * On a shared LPAR, associativity needs to be requested.
>>> +	 * Hence, check for numa topology updates before dumping
>>> +	 * cpu topology
>>> +	 */
>>> +	check_topology_updates();
>> I get that you're calling it here because you want to get it correct
>> before we do the dump below, but we could move the dump later also.
>>
>> You mention we need to do the check before the sched domains are
>> initialised, but where exactly does that happen?
> Almost immediately after smp_cpus_done,=20
>
> kernel_init_freeable() calls smp_init() and sched_init_smp() back to
> back.
>
> smp_init() calls smp_cpus_done()
> sched_init_smp() calls sched_init_numa() and sched_init_domains()
>
> sched_init_numa() initialises sched_domains_numa_masks which is the
> cpumask that matters the most in our discussions.
> sched_init_domains initialised the sched_domain.
>
> So I dont think we can move any later.
>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index 0c7e05d89244..32c13a208589 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -1515,6 +1515,7 @@ int start_topology_update(void)
>>>  		   lppaca_shared_proc(get_lppaca())) {
>>>  		if (!vphn_enabled) {
>>>  			vphn_enabled =3D 1;
>>> +			topology_update_needed =3D 1;
>> I really don't understand what topology_update_needed is for.
>> We set it here.
> topology_update_needed is even set by numa_update_cpu_topology which
> mighet get called from rtasd.
Yes.=C2=A0 While looking at problems with Shared CPU Topology updates
for NUMA last year, I observed that numa_update_cpu_topology
was being called before topology_update_init(), and thus, the next
call to numa_update_cpu_topology did not correctly initialize the
topology based upon VPHN.
>
>>>  			setup_cpu_associativity_change_counters();
>>>  			timer_setup(&topology_timer, topology_timer_fn,
>>>  				    TIMER_DEFERRABLE);
>>> @@ -1551,6 +1552,19 @@ int prrn_is_enabled(void)
>>>  	return prrn_enabled;
>>>  }
>>>=20=20
>>> +void __init check_topology_updates(void)
>>> +{
>>> +	/* Do not poll for changes if disabled at boot */
>>> +	if (topology_updates_enabled)
>>> +		start_topology_update();
>> Here we call start_topology_update(), which might set
>> topology_update_needed to 1.
>>
>>> +
>>> +	if (topology_update_needed) {
>> And then we check it here?
>> Why is this logic not *in* start_topology_update() ?
> I have split topology_update_init into two parts.  First part being
> check_topology_update() and the next part being topology_update_init().
>
> By the time the current topology_update_init() gets called,
> sched_domains are already created. So we need to move this part before.
>
> However the scheduled topology updates because of vphn can wait, atleast
> that how the current code is written. topology_inited indicates if the
> scheduled topology updates have been setup.
>
>>> +		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
>>> +			    nr_cpumask_bits);
>>> +		numa_update_cpu_topology(false);
>>> +	}
>>> +}
>> I'm not really clear on what check_topology_update() is responsible for
>> doing vs topology_update_init(). Why do we need both?
> I am okay to move the complete topology_update_init above and delete the
> device_initcall. But then we would be moving the polling of vphn events
> to a little bit earlier stage in the initialisation.
>
>>> @@ -1608,10 +1618,6 @@ static int topology_update_init(void)
>>>  		return -ENOMEM;
>>>=20=20
>>>  	topology_inited =3D 1;
>> What does that mean? Didn't we already do the init earlier?
> If there is an explicit request for topology update via
> numa_update_cpu_topology() even before we looked at the feature flags
> and set vphn_enabled/prrn_enabled, then we would defer handle it till
> the feature flags are checked. topology_inited helps us in achieving
> this.
Yes.
>
>>> -	if (topology_update_needed)
>>> -		bitmap_fill(cpumask_bits(&cpu_associativity_changes_mask),
>>> -					nr_cpumask_bits);
>>> -
>>>  	return 0;
>>>  }
>>>  device_initcall(topology_update_init);
> Thanks for the review.
>
Looks good.

--=20
----------
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mbringm@us.ibm.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-08-10 14:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-10  6:56 [PATCH v3] powerpc/topology: Check at boot for topology updates Srikar Dronamraju
2018-08-10 11:42 ` Michael Ellerman
2018-08-10 13:21   ` Srikar Dronamraju
2018-08-10 14:50     ` Michael Bringmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.