linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
@ 2020-08-04  3:33 Srikar Dronamraju
  2020-08-04  3:33 ` [PATCH 2/2] powerpc/topology: Override cpu_smt_mask Srikar Dronamraju
  2020-08-04 10:45 ` [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask peterz
  0 siblings, 2 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-08-04  3:33 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Ellerman,
	Michael Neuling, Gautham R Shenoy, Vaidyanathan Srinivasan

cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
most architectures. One of the users of cpu_smt_mask(), would be to
identify idle-cores. On Power9, a pair of cores can be presented by the
firmware as a big-core for backward compatibility reasons.

In order to maintain userspace backward compatibility with previous
versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
is option to the firmware to advertise a pair of SMT4 cores as a fused
cores (referred to as the big_core mode in the Linux Kernel). On Power9
this pair shares the L2 cache as well. However, from the scheduler's point
of view, a core should be determined by SMT4. The load-balancer already
does this. Hence allow PowerPc architecture to override the default
cpu_smt_mask() to point to the SMT4 cores in a big_core mode.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/topology.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 608fa4aadf0e..ad03df1cc266 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -198,7 +198,7 @@ static inline int cpu_to_mem(int cpu)
 #define topology_die_cpumask(cpu)		cpumask_of(cpu)
 #endif

-#ifdef CONFIG_SCHED_SMT
+#if defined(CONFIG_SCHED_SMT) && !defined(cpu_smt_mask)
 static inline const struct cpumask *cpu_smt_mask(int cpu)
 {
 	return topology_sibling_cpumask(cpu);
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] powerpc/topology: Override cpu_smt_mask
  2020-08-04  3:33 [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask Srikar Dronamraju
@ 2020-08-04  3:33 ` Srikar Dronamraju
  2020-08-04 10:46   ` peterz
  2020-08-04 10:45 ` [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask peterz
  1 sibling, 1 reply; 13+ messages in thread
From: Srikar Dronamraju @ 2020-08-04  3:33 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: LKML, Mel Gorman, Rik van Riel, Srikar Dronamraju,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Ellerman,
	Michael Neuling, Gautham R Shenoy, Vaidyanathan Srinivasan

On Power9 a pair of cores can be presented by the firmware as a big-core
for backward compatibility reasons, with 4 threads per (small) core and 8
threads per big-core. cpu_smt_mask() should generally point to the cpu mask
of the (small)core.

In order to maintain userspace backward compatibility (with Power8 chips in
case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
has to be set to big-core. Hence override the default cpu_smt_mask() to be
powerpc specific allowing for better scheduling behaviour on Power.

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/cputhreads.h |  1 -
 arch/powerpc/include/asm/smp.h        | 13 +++++++++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index deb99fd6e060..98c8bd155bf9 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -23,7 +23,6 @@
 extern int threads_per_core;
 extern int threads_per_subcore;
 extern int threads_shift;
-extern bool has_big_cores;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core	1
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 9cd0765633c5..d4bc28accb28 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -131,6 +131,19 @@ static inline struct cpumask *cpu_smallcore_mask(int cpu)
 
 extern int cpu_to_core_id(int cpu);
 
+extern bool has_big_cores;
+
+#define cpu_smt_mask cpu_smt_mask
+#ifdef CONFIG_SCHED_SMT
+static inline const struct cpumask *cpu_smt_mask(int cpu)
+{
+	if (has_big_cores)
+		return per_cpu(cpu_smallcore_map, cpu);
+
+	return per_cpu(cpu_sibling_map, cpu);
+}
+#endif /* CONFIG_SCHED_SMT */
+
 /* Since OpenPIC has only 4 IPIs, we use slightly different message numbers.
  *
  * Make sure this matches openpic_request_IPIs in open_pic.c, or what shows up
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-04  3:33 [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask Srikar Dronamraju
  2020-08-04  3:33 ` [PATCH 2/2] powerpc/topology: Override cpu_smt_mask Srikar Dronamraju
@ 2020-08-04 10:45 ` peterz
  2020-08-04 12:10   ` Srikar Dronamraju
  1 sibling, 1 reply; 13+ messages in thread
From: peterz @ 2020-08-04 10:45 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Dietmar Eggemann,
	linuxppc-dev, Michael Ellerman, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> most architectures. One of the users of cpu_smt_mask(), would be to
> identify idle-cores. On Power9, a pair of cores can be presented by the
> firmware as a big-core for backward compatibility reasons.
> 
> In order to maintain userspace backward compatibility with previous
> versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> is option to the firmware to advertise a pair of SMT4 cores as a fused
> cores (referred to as the big_core mode in the Linux Kernel). On Power9
> this pair shares the L2 cache as well. However, from the scheduler's point
> of view, a core should be determined by SMT4. The load-balancer already
> does this. Hence allow PowerPc architecture to override the default
> cpu_smt_mask() to point to the SMT4 cores in a big_core mode.

I'm utterly confused.

Why can't you set your regular siblings mask to the smt4 thing? Who
cares about the compat stuff, I thought that was an LPAR/AIX thing.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/topology: Override cpu_smt_mask
  2020-08-04  3:33 ` [PATCH 2/2] powerpc/topology: Override cpu_smt_mask Srikar Dronamraju
@ 2020-08-04 10:46   ` peterz
  2020-08-04 11:02     ` Valentin Schneider
  0 siblings, 1 reply; 13+ messages in thread
From: peterz @ 2020-08-04 10:46 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Dietmar Eggemann,
	linuxppc-dev, Michael Ellerman, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

On Tue, Aug 04, 2020 at 09:03:07AM +0530, Srikar Dronamraju wrote:
> On Power9 a pair of cores can be presented by the firmware as a big-core
> for backward compatibility reasons, with 4 threads per (small) core and 8
> threads per big-core. cpu_smt_mask() should generally point to the cpu mask
> of the (small)core.
> 
> In order to maintain userspace backward compatibility (with Power8 chips in
> case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
> has to be set to big-core. Hence override the default cpu_smt_mask() to be
> powerpc specific allowing for better scheduling behaviour on Power.

Why does Linux userspace care about this?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] powerpc/topology: Override cpu_smt_mask
  2020-08-04 10:46   ` peterz
@ 2020-08-04 11:02     ` Valentin Schneider
  0 siblings, 0 replies; 13+ messages in thread
From: Valentin Schneider @ 2020-08-04 11:02 UTC (permalink / raw)
  To: peterz
  Cc: Srikar Dronamraju, Ingo Molnar, LKML, Mel Gorman, Rik van Riel,
	Thomas Gleixner, Vincent Guittot, Dietmar Eggemann, linuxppc-dev,
	Michael Ellerman, Michael Neuling, Gautham R Shenoy,
	Vaidyanathan Srinivasan


On 04/08/20 11:46, peterz@infradead.org wrote:
> On Tue, Aug 04, 2020 at 09:03:07AM +0530, Srikar Dronamraju wrote:
>> On Power9 a pair of cores can be presented by the firmware as a big-core
>> for backward compatibility reasons, with 4 threads per (small) core and 8
>> threads per big-core. cpu_smt_mask() should generally point to the cpu mask
>> of the (small)core.
>>
>> In order to maintain userspace backward compatibility (with Power8 chips in
>> case of Power9) in enterprise Linux systems, the topology_sibling_cpumask
>> has to be set to big-core. Hence override the default cpu_smt_mask() to be
>> powerpc specific allowing for better scheduling behaviour on Power.
>
> Why does Linux userspace care about this?

Ditto; from [1], a core contains CPUs that all share the same L1 (and capacity,
as per SD_SHARE_CPUCAPACITY). So IMO it makes perfect sense to have a first
domain spanning L1, and its parent spanning L2 - that means
topology_sibling_cpumask *itself* should span a single core rather than a
pair.

[1]: https://lkml.kernel.org/r/jhjr1sviswg.mognet@arm.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-04 10:45 ` [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask peterz
@ 2020-08-04 12:10   ` Srikar Dronamraju
  2020-08-04 12:47     ` peterz
  0 siblings, 1 reply; 13+ messages in thread
From: Srikar Dronamraju @ 2020-08-04 12:10 UTC (permalink / raw)
  To: peterz
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Dietmar Eggemann,
	linuxppc-dev, Michael Ellerman, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

* peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:

> On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > most architectures. One of the users of cpu_smt_mask(), would be to
> > identify idle-cores. On Power9, a pair of cores can be presented by the
> > firmware as a big-core for backward compatibility reasons.
> > 
> > In order to maintain userspace backward compatibility with previous
> > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > this pair shares the L2 cache as well. However, from the scheduler's point
> > of view, a core should be determined by SMT4. The load-balancer already
> > does this. Hence allow PowerPc architecture to override the default
> > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> 
> I'm utterly confused.
> 
> Why can't you set your regular siblings mask to the smt4 thing? Who
> cares about the compat stuff, I thought that was an LPAR/AIX thing.

There are no technical challenges to set the sibling mask to SMT4.
This is for Linux running on PowerVM. When these Power9 boxes are sold /
marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
world, everything is in SMT8 mode, the device tree properties still mark
the system to be running on 8 thread core. There are a number of utilities
like ppc64_cpu that directly read from device-tree. They would get core
count and thread count which is SMT8 based.

If the sibling_mask is set to small core, then same user when looking at
output from lscpu and other utilities that look at sysfs will start seeing
2x number of cores to what he had provisioned and what the utilities from
the device-tree show. This can gets users confused.

So to keep the device-tree properties, utilities depending on device-tree,
sysfs and utilities depending on sysfs on the same page, userspace are only
exposed as SMT8.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-04 12:10   ` Srikar Dronamraju
@ 2020-08-04 12:47     ` peterz
  2020-08-06  5:32       ` Michael Ellerman
  0 siblings, 1 reply; 13+ messages in thread
From: peterz @ 2020-08-04 12:47 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Dietmar Eggemann,
	linuxppc-dev, Michael Ellerman, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
> * peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:
> 
> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
> > > most architectures. One of the users of cpu_smt_mask(), would be to
> > > identify idle-cores. On Power9, a pair of cores can be presented by the
> > > firmware as a big-core for backward compatibility reasons.
> > > 
> > > In order to maintain userspace backward compatibility with previous
> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
> > > this pair shares the L2 cache as well. However, from the scheduler's point
> > > of view, a core should be determined by SMT4. The load-balancer already
> > > does this. Hence allow PowerPc architecture to override the default
> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
> > 
> > I'm utterly confused.
> > 
> > Why can't you set your regular siblings mask to the smt4 thing? Who
> > cares about the compat stuff, I thought that was an LPAR/AIX thing.
> 
> There are no technical challenges to set the sibling mask to SMT4.
> This is for Linux running on PowerVM. When these Power9 boxes are sold /
> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
> world, everything is in SMT8 mode, the device tree properties still mark
> the system to be running on 8 thread core. There are a number of utilities
> like ppc64_cpu that directly read from device-tree. They would get core
> count and thread count which is SMT8 based.
> 
> If the sibling_mask is set to small core, then same user when looking at

FWIW, I find the small/big core naming utterly confusing vs the
big/little naming from ARM. When you say small, I'm thinking of
asymmetric cores, not SMT4/SMT8.

> output from lscpu and other utilities that look at sysfs will start seeing
> 2x number of cores to what he had provisioned and what the utilities from
> the device-tree show. This can gets users confused.

One will report SMT8 and the other SMT4, right? So only users that
cannot read will be confused, but if you can't read, confusion is
guaranteed anyway.

Also, by exposing the true (SMT4) topology to userspace, userspace
applications could behave better -- for those few that actually parse
the topology information.

> So to keep the device-tree properties, utilities depending on device-tree,
> sysfs and utilities depending on sysfs on the same page, userspace are only
> exposed as SMT8.

I'm not convinced it makes sense to lie to userspace just to accomodate
a few users that cannot read.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-04 12:47     ` peterz
@ 2020-08-06  5:32       ` Michael Ellerman
  2020-08-06  8:54         ` peterz
  0 siblings, 1 reply; 13+ messages in thread
From: Michael Ellerman @ 2020-08-06  5:32 UTC (permalink / raw)
  To: peterz, Srikar Dronamraju
  Cc: Ingo Molnar, LKML, Mel Gorman, Rik van Riel, Thomas Gleixner,
	Valentin Schneider, Vincent Guittot, Dietmar Eggemann,
	linuxppc-dev, Michael Neuling, Gautham R Shenoy,
	Vaidyanathan Srinivasan

peterz@infradead.org writes:
> On Tue, Aug 04, 2020 at 05:40:07PM +0530, Srikar Dronamraju wrote:
>> * peterz@infradead.org <peterz@infradead.org> [2020-08-04 12:45:20]:
>> 
>> > On Tue, Aug 04, 2020 at 09:03:06AM +0530, Srikar Dronamraju wrote:
>> > > cpu_smt_mask tracks topology_sibling_cpumask. This would be good for
>> > > most architectures. One of the users of cpu_smt_mask(), would be to
>> > > identify idle-cores. On Power9, a pair of cores can be presented by the
>> > > firmware as a big-core for backward compatibility reasons.
>> > > 
>> > > In order to maintain userspace backward compatibility with previous
>> > > versions of processor, (since Power8 had SMT8 cores), Power9 onwards there
>> > > is option to the firmware to advertise a pair of SMT4 cores as a fused
>> > > cores (referred to as the big_core mode in the Linux Kernel). On Power9
>> > > this pair shares the L2 cache as well. However, from the scheduler's point
>> > > of view, a core should be determined by SMT4. The load-balancer already
>> > > does this. Hence allow PowerPc architecture to override the default
>> > > cpu_smt_mask() to point to the SMT4 cores in a big_core mode.
>> > 
>> > I'm utterly confused.
>> > 
>> > Why can't you set your regular siblings mask to the smt4 thing? Who
>> > cares about the compat stuff, I thought that was an LPAR/AIX thing.

To be clear this stuff is all for when we're running on the PowerVM machines,
ie. as LPARs.

That brings with it a bunch of problems, such as existing software that
has been developed/configured for Power8 and expects to see SMT8.

We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
maintaining the illusion of SMT8 is considered a requirement to make that work.

>> There are no technical challenges to set the sibling mask to SMT4.
>> This is for Linux running on PowerVM. When these Power9 boxes are sold /
>> marketed as X core boxes (where X stand for SMT8 cores).  Since on PowerVM
>> world, everything is in SMT8 mode, the device tree properties still mark
>> the system to be running on 8 thread core. There are a number of utilities
>> like ppc64_cpu that directly read from device-tree. They would get core
>> count and thread count which is SMT8 based.
>> 
>> If the sibling_mask is set to small core, then same user when looking at
>
> FWIW, I find the small/big core naming utterly confusing vs the
> big/little naming from ARM. When you say small, I'm thinking of
> asymmetric cores, not SMT4/SMT8.

Yeah I agree the naming is confusing.

Let's call them "SMT4 cores" and "SMT8 cores"?

>> output from lscpu and other utilities that look at sysfs will start seeing
>> 2x number of cores to what he had provisioned and what the utilities from
>> the device-tree show. This can gets users confused.
>
> One will report SMT8 and the other SMT4, right? So only users that
> cannot read will be confused, but if you can't read, confusion is
> guaranteed anyway.

It's partly users, but also software that would see different values depending
on where it looks.

> Also, by exposing the true (SMT4) topology to userspace, userspace
> applications could behave better -- for those few that actually parse
> the topology information.

Agreed, though as you say there aren't that many that actually use the low-level
topology information.

>> So to keep the device-tree properties, utilities depending on device-tree,
>> sysfs and utilities depending on sysfs on the same page, userspace are only
>> exposed as SMT8.
>
> I'm not convinced it makes sense to lie to userspace just to accomodate
> a few users that cannot read.

The problem is we are already lying to userspace, because firmware lies to us.

ie. the firmware on these systems shows us an SMT8 core, and so current kernels
show SMT8 to userspace. I don't think we can realistically change that fact now,
as these systems are already out in the field.

What this patch tries to do is undo some of the mess, and at least give the
scheduler the right information.

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-06  5:32       ` Michael Ellerman
@ 2020-08-06  8:54         ` peterz
  2020-08-06 12:25           ` Michael Ellerman
  2020-08-06 12:53           ` Srikar Dronamraju
  0 siblings, 2 replies; 13+ messages in thread
From: peterz @ 2020-08-06  8:54 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Srikar Dronamraju, Ingo Molnar, LKML, Mel Gorman, Rik van Riel,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

On Thu, Aug 06, 2020 at 03:32:25PM +1000, Michael Ellerman wrote:

> That brings with it a bunch of problems, such as existing software that
> has been developed/configured for Power8 and expects to see SMT8.
> 
> We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
> maintaining the illusion of SMT8 is considered a requirement to make that work.

So how does that work if the kernel booted on P9 and demuxed the SMT8
into 2xSMT4? If you migrate that state onto a P8 with actual SMT8 you're
toast again.

> Yeah I agree the naming is confusing.
> 
> Let's call them "SMT4 cores" and "SMT8 cores"?

Works for me, thanks!

> The problem is we are already lying to userspace, because firmware lies to us.
> 
> ie. the firmware on these systems shows us an SMT8 core, and so current kernels
> show SMT8 to userspace. I don't think we can realistically change that fact now,
> as these systems are already out in the field.
> 
> What this patch tries to do is undo some of the mess, and at least give the
> scheduler the right information.

What a mess... I think it depends on what you do with that P9 to P8
migration case. Does it make sense to have a "p8_compat" boot arg for
the case where you want LPAR migration back onto P8 systems -- in which
case it simply takes the firmware's word as gospel and doesn't untangle
things, because it can actually land on a P8.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-06  8:54         ` peterz
@ 2020-08-06 12:25           ` Michael Ellerman
  2020-08-06 13:15             ` peterz
  2020-08-06 12:53           ` Srikar Dronamraju
  1 sibling, 1 reply; 13+ messages in thread
From: Michael Ellerman @ 2020-08-06 12:25 UTC (permalink / raw)
  To: peterz
  Cc: Srikar Dronamraju, Ingo Molnar, LKML, Mel Gorman, Rik van Riel,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

peterz@infradead.org writes:
> On Thu, Aug 06, 2020 at 03:32:25PM +1000, Michael Ellerman wrote:
>
>> That brings with it a bunch of problems, such as existing software that
>> has been developed/configured for Power8 and expects to see SMT8.
>> 
>> We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
>> maintaining the illusion of SMT8 is considered a requirement to make that work.
>
> So how does that work if the kernel booted on P9 and demuxed the SMT8
> into 2xSMT4? If you migrate that state onto a P8 with actual SMT8 you're
> toast again.

The SMT mask would be inaccurate on the P8, rather than the current case
where it's inaccurate on the P9.

Which would be our preference, because the backward migration case is
not common AIUI.

Or am I missing a reason we'd be even more toast than that?

Under PowerVM the kernel does know it's being migrated, so we could
actually update the mask, but I'm not sure if that's really feasible.

>> Yeah I agree the naming is confusing.
>> 
>> Let's call them "SMT4 cores" and "SMT8 cores"?
>
> Works for me, thanks!
>
>> The problem is we are already lying to userspace, because firmware lies to us.
>> 
>> ie. the firmware on these systems shows us an SMT8 core, and so current kernels
>> show SMT8 to userspace. I don't think we can realistically change that fact now,
>> as these systems are already out in the field.
>> 
>> What this patch tries to do is undo some of the mess, and at least give the
>> scheduler the right information.
>
> What a mess... I think it depends on what you do with that P9 to P8
> migration case. Does it make sense to have a "p8_compat" boot arg for
> the case where you want LPAR migration back onto P8 systems -- in which
> case it simply takes the firmware's word as gospel and doesn't untangle
> things, because it can actually land on a P8.

We already get told by firmware that we're running in "p8 compat" mode,
because we have to pretend to userspace that it's running on a P8. So we
could use that as a signal to leave things alone.

But my understanding is most LPARs don't get migrated back and forth,
they'll start life on a P8 and only get migrated to a P9 once when the
customer gets a P9. They might then run for a long time (months to
years) on the P9 in P8 compat mode, not because they ever want to
migrate back to a real P8, but because the software in the LPAR is still
expecting to be on a P8.

I'm not a real expert on all the Enterprisey stuff though, so someone
else might be able to give us a better picture.

But the point of mentioning the migration stuff was mainly just to
explain why we feel we need to present SMT8 to userspace even on P9.

cheers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-06  8:54         ` peterz
  2020-08-06 12:25           ` Michael Ellerman
@ 2020-08-06 12:53           ` Srikar Dronamraju
  1 sibling, 0 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-08-06 12:53 UTC (permalink / raw)
  To: peterz
  Cc: Michael Ellerman, Ingo Molnar, LKML, Mel Gorman, Rik van Riel,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

* peterz@infradead.org <peterz@infradead.org> [2020-08-06 10:54:29]:

> On Thu, Aug 06, 2020 at 03:32:25PM +1000, Michael Ellerman wrote:
> 
> > That brings with it a bunch of problems, such as existing software that
> > has been developed/configured for Power8 and expects to see SMT8.
> > 
> > We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
> > maintaining the illusion of SMT8 is considered a requirement to make that work.
> 
> So how does that work if the kernel booted on P9 and demuxed the SMT8
> into 2xSMT4? If you migrate that state onto a P8 with actual SMT8 you're
> toast again.
> 

To add to what Michael already said, the reason we don't expose the demux of
SMT8 into 2xSMT4 to userspace, is to make the userspace believe they are on
a SMT8. When the kernel is live migrated from P8 to P9, till the time of reboot
they would only have the older P8 topology. After reboot the kernel topology
would change, but the userspace is made to believe that they are running on
SMT8 core by way of keeping the sibling_cpumask at SMT8 core level.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-06 12:25           ` Michael Ellerman
@ 2020-08-06 13:15             ` peterz
  2020-08-06 14:09               ` Srikar Dronamraju
  0 siblings, 1 reply; 13+ messages in thread
From: peterz @ 2020-08-06 13:15 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Srikar Dronamraju, Ingo Molnar, LKML, Mel Gorman, Rik van Riel,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

On Thu, Aug 06, 2020 at 10:25:12PM +1000, Michael Ellerman wrote:
> peterz@infradead.org writes:
> > On Thu, Aug 06, 2020 at 03:32:25PM +1000, Michael Ellerman wrote:
> >
> >> That brings with it a bunch of problems, such as existing software that
> >> has been developed/configured for Power8 and expects to see SMT8.
> >> 
> >> We also allow LPARs to be live migrated from Power8 to Power9 (and back), so
> >> maintaining the illusion of SMT8 is considered a requirement to make that work.
> >
> > So how does that work if the kernel booted on P9 and demuxed the SMT8
> > into 2xSMT4? If you migrate that state onto a P8 with actual SMT8 you're
> > toast again.
> 
> The SMT mask would be inaccurate on the P8, rather than the current case
> where it's inaccurate on the P9.
> 
> Which would be our preference, because the backward migration case is
> not common AIUI.
> 
> Or am I missing a reason we'd be even more toast than that?

Well, the scheduler might do a wee bit funny. We just had a patch that
increase load-balancing opportunities between SMT siblings because they
all share L1 anyway.

But yeah, nothing terminal.

> Under PowerVM the kernel does know it's being migrated, so we could
> actually update the mask, but I'm not sure if that's really feasible.

As long as you get a notification, rebuilding the sched domains isn't
terribly hard to do, there's more code that does that.

> >> Yeah I agree the naming is confusing.
> >> 
> >> Let's call them "SMT4 cores" and "SMT8 cores"?
> >
> > Works for me, thanks!
> >
> >> The problem is we are already lying to userspace, because firmware lies to us.
> >> 
> >> ie. the firmware on these systems shows us an SMT8 core, and so current kernels
> >> show SMT8 to userspace. I don't think we can realistically change that fact now,
> >> as these systems are already out in the field.
> >> 
> >> What this patch tries to do is undo some of the mess, and at least give the
> >> scheduler the right information.
> >
> > What a mess... I think it depends on what you do with that P9 to P8
> > migration case. Does it make sense to have a "p8_compat" boot arg for
> > the case where you want LPAR migration back onto P8 systems -- in which
> > case it simply takes the firmware's word as gospel and doesn't untangle
> > things, because it can actually land on a P8.
> 
> We already get told by firmware that we're running in "p8 compat" mode,
> because we have to pretend to userspace that it's running on a P8. So we
> could use that as a signal to leave things alone.
> 
> But my understanding is most LPARs don't get migrated back and forth,
> they'll start life on a P8 and only get migrated to a P9 once when the
> customer gets a P9. They might then run for a long time (months to
> years) on the P9 in P8 compat mode, not because they ever want to
> migrate back to a real P8, but because the software in the LPAR is still
> expecting to be on a P8.
> 
> I'm not a real expert on all the Enterprisey stuff though, so someone
> else might be able to give us a better picture.
> 
> But the point of mentioning the migration stuff was mainly just to
> explain why we feel we need to present SMT8 to userspace even on P9.

OK, fair enough. The patch wasn't particularly onerous, I was just
wondering why etc..

The case of starting on a P8 and being migrated to a P9 makes sense to
me; in that case you'd like to rebuild your sched domains, but can't go
about changing user visible topolofy information.

I suppose:

Acked-by; Peter Zijlstra (Intel) <peterz@infradead.org>

An updated Changelog that recaps some of this discussion might also be
nice.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask
  2020-08-06 13:15             ` peterz
@ 2020-08-06 14:09               ` Srikar Dronamraju
  0 siblings, 0 replies; 13+ messages in thread
From: Srikar Dronamraju @ 2020-08-06 14:09 UTC (permalink / raw)
  To: peterz
  Cc: Michael Ellerman, Ingo Molnar, LKML, Mel Gorman, Rik van Riel,
	Thomas Gleixner, Valentin Schneider, Vincent Guittot,
	Dietmar Eggemann, linuxppc-dev, Michael Neuling,
	Gautham R Shenoy, Vaidyanathan Srinivasan

* peterz@infradead.org <peterz@infradead.org> [2020-08-06 15:15:47]:

> > But my understanding is most LPARs don't get migrated back and forth,
> > they'll start life on a P8 and only get migrated to a P9 once when the
> > customer gets a P9. They might then run for a long time (months to
> > years) on the P9 in P8 compat mode, not because they ever want to
> > migrate back to a real P8, but because the software in the LPAR is still
> > expecting to be on a P8.
> > 
> > I'm not a real expert on all the Enterprisey stuff though, so someone
> > else might be able to give us a better picture.
> > 
> > But the point of mentioning the migration stuff was mainly just to
> > explain why we feel we need to present SMT8 to userspace even on P9.
> 
> OK, fair enough. The patch wasn't particularly onerous, I was just
> wondering why etc..
> 
> The case of starting on a P8 and being migrated to a P9 makes sense to
> me; in that case you'd like to rebuild your sched domains, but can't go
> about changing user visible topolofy information.
> 
> I suppose:
> 
> Acked-by; Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> An updated Changelog that recaps some of this discussion might also be
> nice.

Okay, will surely do the needful.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-08-06 17:29 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-04  3:33 [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask Srikar Dronamraju
2020-08-04  3:33 ` [PATCH 2/2] powerpc/topology: Override cpu_smt_mask Srikar Dronamraju
2020-08-04 10:46   ` peterz
2020-08-04 11:02     ` Valentin Schneider
2020-08-04 10:45 ` [PATCH 1/2] sched/topology: Allow archs to override cpu_smt_mask peterz
2020-08-04 12:10   ` Srikar Dronamraju
2020-08-04 12:47     ` peterz
2020-08-06  5:32       ` Michael Ellerman
2020-08-06  8:54         ` peterz
2020-08-06 12:25           ` Michael Ellerman
2020-08-06 13:15             ` peterz
2020-08-06 14:09               ` Srikar Dronamraju
2020-08-06 12:53           ` Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).