linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
@ 2022-03-01  0:28 Darren Hart
  2022-03-01  0:29 ` [PATCH 1/1] " Darren Hart
  0 siblings, 1 reply; 9+ messages in thread
From: Darren Hart @ 2022-03-01  0:28 UTC (permalink / raw)
  To: LKML, Linux Arm
  Cc: Catalin Marinas, Will Deacon, Peter Zijlstra, Vincent Guittot,
	Barry Song, Valentin Schneider, D . Scott Phillips,
	Ilkka Koskinen

After the feedback from Vincent and Barry, I wrote a version of this which
performed the test in topology.c and added an ACPI pptt API to detect ancestry
of cpu topology nodes. In the end, the new API felt forced and was inconsistent
with the other APIs, and the detection loop was pretty ugly with a lot corner
case cleanup. It seemed fragile and not very extensible.

This version follows the approach taken by other architectures: wait until smp
cpus are online and take advantage of the stored cpu_topology structure to make
sched domain topology decisions.

This adds a single point to test and decide on which sched domain topology
should be used for an arm64 system without impacting the scheduler code, acpi
code, or the construction of the cpu_topology structures. I think this is a
cleaner and less invasive solution which properly encapsulates the sched domain
topology decision in a way that is easy to maintain and extend over time.

Vincent did mention in an earlier version that by relying on cpumasks and
removing the MC sched domain, I didn't address possible future architectures
which may have clusters without shared cache and an MC level with shared cache.
In this version, I limit the removal of the MC level to topologies where the
coregroup weight is 1 (so it doesn't impact such future architectures). I also
think that by encapsulating the sched domain topology decision into a single
function after smp boot, it will be simple enough for someone to support such a
topology if and when it exists, allowing us to create a minimal solution to the
immediate problem of BUGing on the current topology.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: D. Scott Phillips <scott@os.amperecomputing.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>

Darren Hart (1):
  arm64: smp: Skip MC sched domain on SoCs with no LLC

 arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-01  0:28 [PATCH 0/1] arm64: smp: Skip MC sched domain on SoCs with no LLC Darren Hart
@ 2022-03-01  0:29 ` Darren Hart
  2022-03-02  9:32   ` Vincent Guittot
  0 siblings, 1 reply; 9+ messages in thread
From: Darren Hart @ 2022-03-01  0:29 UTC (permalink / raw)
  To: LKML, Linux Arm
  Cc: Catalin Marinas, Will Deacon, Peter Zijlstra, Vincent Guittot,
	Barry Song, Valentin Schneider, D . Scott Phillips,
	Ilkka Koskinen, stable

Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
Control Unit, but have no shared CPU-side last level cache.

cpu_coregroup_mask() will return a cpumask with weight 1, while
cpu_clustergroup_mask() will return a cpumask with weight 2.

As a result, build_sched_domain() will BUG() once per CPU with:

BUG: arch topology borken
     the CLS domain not a subset of the MC domain

The MC level cpumask is then extended to that of the CLS child, and is
later removed entirely as redundant. This sched domain topology is an
improvement over previous topologies, or those built without
SCHED_CLUSTER, particularly for certain latency sensitive workloads.
With the current scheduler model and heuristics, this is a desirable
default topology for Ampere Altra and Altra Max system.

Introduce an alternate sched domain topology for arm64 without the MC
level and test for llc_sibling weight 1 across all CPUs to enable it.

Do this in arch/arm64/kernel/smp.c (as opposed to
arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
populated and we avoid needing to extend the drivers/acpi/pptt.c API to
detect the cluster level being above the cpu llc level. This is
consistent with other architectures and provides a readily extensible
mechanism for other alternate topologies.

The final sched domain topology for a 2 socket Ampere Altra system is
unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:

For CPU0:

CONFIG_SCHED_CLUSTER=y
CLS  [0-1]
DIE  [0-79]
NUMA [0-159]

CONFIG_SCHED_CLUSTER is not set
DIE  [0-79]
NUMA [0-159]

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Barry Song <song.bao.hua@hisilicon.com>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: D. Scott Phillips <scott@os.amperecomputing.com>
Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Cc: <stable@vger.kernel.org> # 5.16.x
Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
---
 arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 27df5c1e6baa..3597e75645e1 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
 	}
 }
 
+static struct sched_domain_topology_level arm64_no_mc_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+
+#ifdef CONFIG_SCHED_CLUSTER
+	{ cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
+#endif
+
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
+static void __init update_sched_domain_topology(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		if (cpu_topology[cpu].llc_id != -1 &&
+		    cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
+			return;
+	}
+
+	pr_info("No LLC siblings, using No MC sched domains topology\n");
+	set_sched_topology(arm64_no_mc_topology);
+}
+
 void __init smp_cpus_done(unsigned int max_cpus)
 {
 	pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
@@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	hyp_mode_check();
 	apply_alternatives_all();
 	mark_linear_text_alias_ro();
+	update_sched_domain_topology();
 }
 
 void __init smp_prepare_boot_cpu(void)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-01  0:29 ` [PATCH 1/1] " Darren Hart
@ 2022-03-02  9:32   ` Vincent Guittot
  2022-03-03  2:18     ` Darren Hart
  0 siblings, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2022-03-02  9:32 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Linux Arm, Catalin Marinas, Will Deacon, Peter Zijlstra,
	Barry Song, Valentin Schneider, D . Scott Phillips,
	Ilkka Koskinen, stable

On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
>
> Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> Control Unit, but have no shared CPU-side last level cache.
>
> cpu_coregroup_mask() will return a cpumask with weight 1, while
> cpu_clustergroup_mask() will return a cpumask with weight 2.
>
> As a result, build_sched_domain() will BUG() once per CPU with:
>
> BUG: arch topology borken
>      the CLS domain not a subset of the MC domain
>
> The MC level cpumask is then extended to that of the CLS child, and is
> later removed entirely as redundant. This sched domain topology is an
> improvement over previous topologies, or those built without
> SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> With the current scheduler model and heuristics, this is a desirable
> default topology for Ampere Altra and Altra Max system.
>
> Introduce an alternate sched domain topology for arm64 without the MC
> level and test for llc_sibling weight 1 across all CPUs to enable it.
>
> Do this in arch/arm64/kernel/smp.c (as opposed to
> arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> detect the cluster level being above the cpu llc level. This is
> consistent with other architectures and provides a readily extensible
> mechanism for other alternate topologies.
>
> The final sched domain topology for a 2 socket Ampere Altra system is
> unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
>
> For CPU0:
>
> CONFIG_SCHED_CLUSTER=y
> CLS  [0-1]
> DIE  [0-79]
> NUMA [0-159]
>
> CONFIG_SCHED_CLUSTER is not set
> DIE  [0-79]
> NUMA [0-159]
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Barry Song <song.bao.hua@hisilicon.com>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> Cc: <stable@vger.kernel.org> # 5.16.x
> Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> ---
>  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 27df5c1e6baa..3597e75645e1 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
>         }
>  }
>
> +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +
> +#ifdef CONFIG_SCHED_CLUSTER
> +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> +#endif
> +
> +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +       { NULL, },
> +};
> +
> +static void __init update_sched_domain_topology(void)
> +{
> +       int cpu;
> +
> +       for_each_possible_cpu(cpu) {
> +               if (cpu_topology[cpu].llc_id != -1 &&

Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
by ACPI system and  llc_id == -1 for others like DT based system

> +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> +                       return;
> +       }
> +
> +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> +       set_sched_topology(arm64_no_mc_topology);
> +}
> +
>  void __init smp_cpus_done(unsigned int max_cpus)
>  {
>         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
>         hyp_mode_check();
>         apply_alternatives_all();
>         mark_linear_text_alias_ro();
> +       update_sched_domain_topology();
>  }
>
>  void __init smp_prepare_boot_cpu(void)
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-02  9:32   ` Vincent Guittot
@ 2022-03-03  2:18     ` Darren Hart
  2022-03-03  5:36       ` Barry Song
  2022-03-03  8:08       ` Vincent Guittot
  0 siblings, 2 replies; 9+ messages in thread
From: Darren Hart @ 2022-03-03  2:18 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: LKML, Linux Arm, Catalin Marinas, Will Deacon, Peter Zijlstra,
	Barry Song, Valentin Schneider, D . Scott Phillips,
	Ilkka Koskinen, stable

On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> >
> > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > Control Unit, but have no shared CPU-side last level cache.
> >
> > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > cpu_clustergroup_mask() will return a cpumask with weight 2.
> >
> > As a result, build_sched_domain() will BUG() once per CPU with:
> >
> > BUG: arch topology borken
> >      the CLS domain not a subset of the MC domain
> >
> > The MC level cpumask is then extended to that of the CLS child, and is
> > later removed entirely as redundant. This sched domain topology is an
> > improvement over previous topologies, or those built without
> > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > With the current scheduler model and heuristics, this is a desirable
> > default topology for Ampere Altra and Altra Max system.
> >
> > Introduce an alternate sched domain topology for arm64 without the MC
> > level and test for llc_sibling weight 1 across all CPUs to enable it.
> >
> > Do this in arch/arm64/kernel/smp.c (as opposed to
> > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > detect the cluster level being above the cpu llc level. This is
> > consistent with other architectures and provides a readily extensible
> > mechanism for other alternate topologies.
> >
> > The final sched domain topology for a 2 socket Ampere Altra system is
> > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> >
> > For CPU0:
> >
> > CONFIG_SCHED_CLUSTER=y
> > CLS  [0-1]
> > DIE  [0-79]
> > NUMA [0-159]
> >
> > CONFIG_SCHED_CLUSTER is not set
> > DIE  [0-79]
> > NUMA [0-159]
> >
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > Cc: <stable@vger.kernel.org> # 5.16.x
> > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > ---
> >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> >  1 file changed, 28 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index 27df5c1e6baa..3597e75645e1 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> >         }
> >  }
> >
> > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > +#ifdef CONFIG_SCHED_SMT
> > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > +#endif
> > +
> > +#ifdef CONFIG_SCHED_CLUSTER
> > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > +#endif
> > +
> > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > +       { NULL, },
> > +};
> > +
> > +static void __init update_sched_domain_topology(void)
> > +{
> > +       int cpu;
> > +
> > +       for_each_possible_cpu(cpu) {
> > +               if (cpu_topology[cpu].llc_id != -1 &&
> 
> Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> by ACPI system and  llc_id == -1 for others like DT based system
> 
> > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > +                       return;
> > +       }

Hi Vincent,

I did not have a non-acpi system to test, no. You're right of course,
llc_id is only set by ACPI systems on arm64. We could wrap this in a
CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:

+       for_each_possible_cpu(cpu) {
+               if (cpu_topology[cpu].llc_id == -1 ||
+                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
+                       return;
+       }

Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
arm64 systems who can test and verify this behaves as intended. I will ask
around tomorrow as well to see what I may have access to.

Thanks,

> > +
> > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > +       set_sched_topology(arm64_no_mc_topology);
> > +}
> > +
> >  void __init smp_cpus_done(unsigned int max_cpus)
> >  {
> >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> >         hyp_mode_check();
> >         apply_alternatives_all();
> >         mark_linear_text_alias_ro();
> > +       update_sched_domain_topology();
> >  }
> >
> >  void __init smp_prepare_boot_cpu(void)
> > --
> > 2.31.1
> >

-- 
Darren Hart
Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-03  2:18     ` Darren Hart
@ 2022-03-03  5:36       ` Barry Song
  2022-03-03 16:35         ` Darren Hart
  2022-03-03  8:08       ` Vincent Guittot
  1 sibling, 1 reply; 9+ messages in thread
From: Barry Song @ 2022-03-03  5:36 UTC (permalink / raw)
  To: Darren Hart
  Cc: Vincent Guittot, LKML, Linux Arm, Catalin Marinas, Will Deacon,
	Peter Zijlstra, Barry Song, Valentin Schneider,
	D . Scott Phillips, Ilkka Koskinen, stable

On Thu, Mar 3, 2022 at 3:22 PM Darren Hart
<darren@os.amperecomputing.com> wrote:
>
> On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> > >
> > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > Control Unit, but have no shared CPU-side last level cache.
> > >
> > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > >
> > > As a result, build_sched_domain() will BUG() once per CPU with:
> > >
> > > BUG: arch topology borken
> > >      the CLS domain not a subset of the MC domain
> > >
> > > The MC level cpumask is then extended to that of the CLS child, and is
> > > later removed entirely as redundant. This sched domain topology is an
> > > improvement over previous topologies, or those built without
> > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > With the current scheduler model and heuristics, this is a desirable
> > > default topology for Ampere Altra and Altra Max system.
> > >
> > > Introduce an alternate sched domain topology for arm64 without the MC
> > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > >
> > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > detect the cluster level being above the cpu llc level. This is
> > > consistent with other architectures and provides a readily extensible
> > > mechanism for other alternate topologies.
> > >
> > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > >
> > > For CPU0:
> > >
> > > CONFIG_SCHED_CLUSTER=y
> > > CLS  [0-1]
> > > DIE  [0-79]
> > > NUMA [0-159]
> > >
> > > CONFIG_SCHED_CLUSTER is not set
> > > DIE  [0-79]
> > > NUMA [0-159]
> > >
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > ---
> > >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > >  1 file changed, 28 insertions(+)
> > >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 27df5c1e6baa..3597e75645e1 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > >         }
> > >  }
> > >
> > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > +#ifdef CONFIG_SCHED_SMT
> > > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > +#endif
> > > +
> > > +#ifdef CONFIG_SCHED_CLUSTER
> > > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > +#endif
> > > +
> > > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > +       { NULL, },
> > > +};
> > > +
> > > +static void __init update_sched_domain_topology(void)
> > > +{
> > > +       int cpu;
> > > +
> > > +       for_each_possible_cpu(cpu) {
> > > +               if (cpu_topology[cpu].llc_id != -1 &&
> >
> > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > by ACPI system and  llc_id == -1 for others like DT based system
> >
> > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > +                       return;
> > > +       }
>
> Hi Vincent,
>
> I did not have a non-acpi system to test, no. You're right of course,
> llc_id is only set by ACPI systems on arm64. We could wrap this in a
> CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
>
> +       for_each_possible_cpu(cpu) {
> +               if (cpu_topology[cpu].llc_id == -1 ||
> +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> +                       return;
> +       }
>
> Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> arm64 systems who can test and verify this behaves as intended. I will ask
> around tomorrow as well to see what I may have access to.

I wonder if we can fix it by this

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 976154140f0b..551655ccd0eb 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -627,6 +627,13 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
                if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
                        core_mask = &cpu_topology[cpu].llc_sibling;
        }
+       /*
+        * Some machines have no LLC but have clusters, we let MC = CLUSTER
+        * as MC should always be after CLUSTER. But anyway, the MC domain
+        * will be removed
+        */
+       if (cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
+               core_mask = &cpu_topology[cpu].cluster_sibling;

        return core_mask;
 }

as it can make all kinds of topologies happy -  symmetric and asymmetric.

>
> Thanks,
>
> > > +
> > > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > +       set_sched_topology(arm64_no_mc_topology);
> > > +}
> > > +
> > >  void __init smp_cpus_done(unsigned int max_cpus)
> > >  {
> > >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > >         hyp_mode_check();
> > >         apply_alternatives_all();
> > >         mark_linear_text_alias_ro();
> > > +       update_sched_domain_topology();
> > >  }
> > >
> > >  void __init smp_prepare_boot_cpu(void)
> > > --
> > > 2.31.1
> > >
>
> --
> Darren Hart
> Ampere Computing / OS and Kernel

Thanks
Barry

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-03  2:18     ` Darren Hart
  2022-03-03  5:36       ` Barry Song
@ 2022-03-03  8:08       ` Vincent Guittot
  2022-03-03 16:02         ` Darren Hart
  1 sibling, 1 reply; 9+ messages in thread
From: Vincent Guittot @ 2022-03-03  8:08 UTC (permalink / raw)
  To: Darren Hart
  Cc: LKML, Linux Arm, Catalin Marinas, Will Deacon, Peter Zijlstra,
	Barry Song, Valentin Schneider, D . Scott Phillips,
	Ilkka Koskinen, stable

On Thu, 3 Mar 2022 at 03:18, Darren Hart <darren@os.amperecomputing.com> wrote:
>
> On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> > >
> > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > Control Unit, but have no shared CPU-side last level cache.
> > >
> > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > >
> > > As a result, build_sched_domain() will BUG() once per CPU with:
> > >
> > > BUG: arch topology borken
> > >      the CLS domain not a subset of the MC domain
> > >
> > > The MC level cpumask is then extended to that of the CLS child, and is
> > > later removed entirely as redundant. This sched domain topology is an
> > > improvement over previous topologies, or those built without
> > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > With the current scheduler model and heuristics, this is a desirable
> > > default topology for Ampere Altra and Altra Max system.
> > >
> > > Introduce an alternate sched domain topology for arm64 without the MC
> > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > >
> > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > detect the cluster level being above the cpu llc level. This is
> > > consistent with other architectures and provides a readily extensible
> > > mechanism for other alternate topologies.
> > >
> > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > >
> > > For CPU0:
> > >
> > > CONFIG_SCHED_CLUSTER=y
> > > CLS  [0-1]
> > > DIE  [0-79]
> > > NUMA [0-159]
> > >
> > > CONFIG_SCHED_CLUSTER is not set
> > > DIE  [0-79]
> > > NUMA [0-159]
> > >
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Will Deacon <will@kernel.org>
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > ---
> > >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > >  1 file changed, 28 insertions(+)
> > >
> > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > index 27df5c1e6baa..3597e75645e1 100644
> > > --- a/arch/arm64/kernel/smp.c
> > > +++ b/arch/arm64/kernel/smp.c
> > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > >         }
> > >  }
> > >
> > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > +#ifdef CONFIG_SCHED_SMT
> > > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > +#endif
> > > +
> > > +#ifdef CONFIG_SCHED_CLUSTER
> > > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > +#endif
> > > +
> > > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > +       { NULL, },
> > > +};
> > > +
> > > +static void __init update_sched_domain_topology(void)
> > > +{
> > > +       int cpu;
> > > +
> > > +       for_each_possible_cpu(cpu) {
> > > +               if (cpu_topology[cpu].llc_id != -1 &&
> >
> > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > by ACPI system and  llc_id == -1 for others like DT based system
> >
> > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > +                       return;
> > > +       }
>
> Hi Vincent,
>
> I did not have a non-acpi system to test, no. You're right of course,
> llc_id is only set by ACPI systems on arm64. We could wrap this in a
> CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
>
> +       for_each_possible_cpu(cpu) {
> +               if (cpu_topology[cpu].llc_id == -1 ||
> +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> +                       return;
> +       }

This works.
Also , do you really need to loop on all possible cpus ? Would it be
enough to check only the 1st cpu ?
You won't be able to support a mixed topology so all cpus have the
same kind of topology i.e either cluster before or cluster before the
MC level


>
> Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> arm64 systems who can test and verify this behaves as intended. I will ask
> around tomorrow as well to see what I may have access to.
>
> Thanks,
>
> > > +
> > > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > +       set_sched_topology(arm64_no_mc_topology);
> > > +}
> > > +
> > >  void __init smp_cpus_done(unsigned int max_cpus)
> > >  {
> > >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > >         hyp_mode_check();
> > >         apply_alternatives_all();
> > >         mark_linear_text_alias_ro();
> > > +       update_sched_domain_topology();
> > >  }
> > >
> > >  void __init smp_prepare_boot_cpu(void)
> > > --
> > > 2.31.1
> > >
>
> --
> Darren Hart
> Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-03  8:08       ` Vincent Guittot
@ 2022-03-03 16:02         ` Darren Hart
  0 siblings, 0 replies; 9+ messages in thread
From: Darren Hart @ 2022-03-03 16:02 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: LKML, Linux Arm, Catalin Marinas, Will Deacon, Peter Zijlstra,
	Barry Song, Valentin Schneider, D . Scott Phillips,
	Ilkka Koskinen, stable

On Thu, Mar 03, 2022 at 09:08:38AM +0100, Vincent Guittot wrote:
> On Thu, 3 Mar 2022 at 03:18, Darren Hart <darren@os.amperecomputing.com> wrote:
> >
> > On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> > > >
> > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > > Control Unit, but have no shared CPU-side last level cache.
> > > >
> > > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > > >
> > > > As a result, build_sched_domain() will BUG() once per CPU with:
> > > >
> > > > BUG: arch topology borken
> > > >      the CLS domain not a subset of the MC domain
> > > >
> > > > The MC level cpumask is then extended to that of the CLS child, and is
> > > > later removed entirely as redundant. This sched domain topology is an
> > > > improvement over previous topologies, or those built without
> > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > > With the current scheduler model and heuristics, this is a desirable
> > > > default topology for Ampere Altra and Altra Max system.
> > > >
> > > > Introduce an alternate sched domain topology for arm64 without the MC
> > > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > > >
> > > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > > detect the cluster level being above the cpu llc level. This is
> > > > consistent with other architectures and provides a readily extensible
> > > > mechanism for other alternate topologies.
> > > >
> > > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > > >
> > > > For CPU0:
> > > >
> > > > CONFIG_SCHED_CLUSTER=y
> > > > CLS  [0-1]
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > CONFIG_SCHED_CLUSTER is not set
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > > ---
> > > >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > > >  1 file changed, 28 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > index 27df5c1e6baa..3597e75645e1 100644
> > > > --- a/arch/arm64/kernel/smp.c
> > > > +++ b/arch/arm64/kernel/smp.c
> > > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > > >         }
> > > >  }
> > > >
> > > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > > +#ifdef CONFIG_SCHED_SMT
> > > > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_SCHED_CLUSTER
> > > > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > > +#endif
> > > > +
> > > > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > > +       { NULL, },
> > > > +};
> > > > +
> > > > +static void __init update_sched_domain_topology(void)
> > > > +{
> > > > +       int cpu;
> > > > +
> > > > +       for_each_possible_cpu(cpu) {
> > > > +               if (cpu_topology[cpu].llc_id != -1 &&
> > >
> > > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > > by ACPI system and  llc_id == -1 for others like DT based system
> > >
> > > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > > +                       return;
> > > > +       }
> >
> > Hi Vincent,
> >
> > I did not have a non-acpi system to test, no. You're right of course,
> > llc_id is only set by ACPI systems on arm64. We could wrap this in a
> > CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
> >
> > +       for_each_possible_cpu(cpu) {
> > +               if (cpu_topology[cpu].llc_id == -1 ||
> > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > +                       return;
> > +       }
> 
> This works.
> Also , do you really need to loop on all possible cpus ? Would it be
> enough to check only the 1st cpu ?
> You won't be able to support a mixed topology so all cpus have the
> same kind of topology i.e either cluster before or cluster before the
> MC level

My intention here is to restrict the use of of the new topology to a very
specific architecture where the problem is known to manifest, and avoid
introducing any unexpected change to other systems.

For other systems, they will break on the first loop, so the loop is also
minimal impact.

As for supporting a mixed topology, my intention was again to not make any
statement about the existance or viability of such systems. If they would break
before, they would still break. If a new topology is needed for them, this
provides a easily modifiable location to do that.

If the consensus is we don't need the loop, this simplifies my specific use case
at the cost of applying to a broader set (but only hypothetically I believe) of
topologies. So no objection to dropping the loop.

Will, do you have a preference? Lean toward targeted change and minimal impact,
or lean toward simpler implementation with slightly broader impact?

Thanks,

> 
> 
> >
> > Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> > arm64 systems who can test and verify this behaves as intended. I will ask
> > around tomorrow as well to see what I may have access to.
> >
> > Thanks,
> >
> > > > +
> > > > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > > +       set_sched_topology(arm64_no_mc_topology);
> > > > +}
> > > > +
> > > >  void __init smp_cpus_done(unsigned int max_cpus)
> > > >  {
> > > >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > >         hyp_mode_check();
> > > >         apply_alternatives_all();
> > > >         mark_linear_text_alias_ro();
> > > > +       update_sched_domain_topology();
> > > >  }
> > > >
> > > >  void __init smp_prepare_boot_cpu(void)
> > > > --
> > > > 2.31.1
> > > >
> >
> > --
> > Darren Hart
> > Ampere Computing / OS and Kernel

-- 
Darren Hart
Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-03  5:36       ` Barry Song
@ 2022-03-03 16:35         ` Darren Hart
  2022-03-03 21:43           ` Barry Song
  0 siblings, 1 reply; 9+ messages in thread
From: Darren Hart @ 2022-03-03 16:35 UTC (permalink / raw)
  To: Barry Song, Sudeep Holla, Greg Kroah-Hartman, Rafael J. Wysocki
  Cc: Vincent Guittot, LKML, Linux Arm, Catalin Marinas, Will Deacon,
	Peter Zijlstra, Barry Song, Valentin Schneider,
	D . Scott Phillips, Ilkka Koskinen, stable

On Thu, Mar 03, 2022 at 06:36:30PM +1300, Barry Song wrote:
> On Thu, Mar 3, 2022 at 3:22 PM Darren Hart
> <darren@os.amperecomputing.com> wrote:
> >
> > On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> > > >
> > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > > Control Unit, but have no shared CPU-side last level cache.
> > > >
> > > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > > >
> > > > As a result, build_sched_domain() will BUG() once per CPU with:
> > > >
> > > > BUG: arch topology borken
> > > >      the CLS domain not a subset of the MC domain
> > > >
> > > > The MC level cpumask is then extended to that of the CLS child, and is
> > > > later removed entirely as redundant. This sched domain topology is an
> > > > improvement over previous topologies, or those built without
> > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > > With the current scheduler model and heuristics, this is a desirable
> > > > default topology for Ampere Altra and Altra Max system.
> > > >
> > > > Introduce an alternate sched domain topology for arm64 without the MC
> > > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > > >
> > > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > > detect the cluster level being above the cpu llc level. This is
> > > > consistent with other architectures and provides a readily extensible
> > > > mechanism for other alternate topologies.
> > > >
> > > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > > >
> > > > For CPU0:
> > > >
> > > > CONFIG_SCHED_CLUSTER=y
> > > > CLS  [0-1]
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > CONFIG_SCHED_CLUSTER is not set
> > > > DIE  [0-79]
> > > > NUMA [0-159]
> > > >
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Will Deacon <will@kernel.org>
> > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > > ---
> > > >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > > >  1 file changed, 28 insertions(+)
> > > >
> > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > index 27df5c1e6baa..3597e75645e1 100644
> > > > --- a/arch/arm64/kernel/smp.c
> > > > +++ b/arch/arm64/kernel/smp.c
> > > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > > >         }
> > > >  }
> > > >
> > > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > > +#ifdef CONFIG_SCHED_SMT
> > > > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > > +#endif
> > > > +
> > > > +#ifdef CONFIG_SCHED_CLUSTER
> > > > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > > +#endif
> > > > +
> > > > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > > +       { NULL, },
> > > > +};
> > > > +
> > > > +static void __init update_sched_domain_topology(void)
> > > > +{
> > > > +       int cpu;
> > > > +
> > > > +       for_each_possible_cpu(cpu) {
> > > > +               if (cpu_topology[cpu].llc_id != -1 &&
> > >
> > > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > > by ACPI system and  llc_id == -1 for others like DT based system
> > >
> > > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > > +                       return;
> > > > +       }
> >
> > Hi Vincent,
> >
> > I did not have a non-acpi system to test, no. You're right of course,
> > llc_id is only set by ACPI systems on arm64. We could wrap this in a
> > CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
> >
> > +       for_each_possible_cpu(cpu) {
> > +               if (cpu_topology[cpu].llc_id == -1 ||
> > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > +                       return;
> > +       }
> >
> > Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> > arm64 systems who can test and verify this behaves as intended. I will ask
> > around tomorrow as well to see what I may have access to.
> 
> I wonder if we can fix it by this
> 
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index 976154140f0b..551655ccd0eb 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -627,6 +627,13 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
>                 if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
>                         core_mask = &cpu_topology[cpu].llc_sibling;
>         }
> +       /*
> +        * Some machines have no LLC but have clusters, we let MC = CLUSTER
> +        * as MC should always be after CLUSTER. But anyway, the MC domain
> +        * will be removed
> +        */
> +       if (cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> +               core_mask = &cpu_topology[cpu].cluster_sibling;
> 
>         return core_mask;
>  }
> 
> as it can make all kinds of topologies happy -  symmetric and asymmetric.
> 

Hah. Full circle. Yes, this works, and it's basically what we'd started
with internally. I ended up exploring various paths here to avoid a
"band aid" and to target the fix and minimize impact. That said, after
digging through the acpi, topology, smp, and sched domains code... I
don't think this approach is a band aid and it's a very minimal
solution. The only downside I can think of is masking a potential
topology bug and not catching it in the scheduler - that seems very
unlikely. I'm perfectly happy with this solution as well.

Will D, would you prefer this approach?

+Sudeep, Greg, and Rafael,

Are you OK with this approach?

If so, we can drop my arm64 specific new topology patch and I can send a
version of this one out (suggested-by Barry of course), unless you'd
prefer to send it Barry?

Thanks,

> >
> > Thanks,
> >
> > > > +
> > > > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > > +       set_sched_topology(arm64_no_mc_topology);
> > > > +}
> > > > +
> > > >  void __init smp_cpus_done(unsigned int max_cpus)
> > > >  {
> > > >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > >         hyp_mode_check();
> > > >         apply_alternatives_all();
> > > >         mark_linear_text_alias_ro();
> > > > +       update_sched_domain_topology();
> > > >  }
> > > >
> > > >  void __init smp_prepare_boot_cpu(void)
> > > > --
> > > > 2.31.1
> > > >
> >
> > --
> > Darren Hart
> > Ampere Computing / OS and Kernel
> 
> Thanks
> Barry

-- 
Darren Hart
Ampere Computing / OS and Kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/1] arm64: smp: Skip MC sched domain on SoCs with no LLC
  2022-03-03 16:35         ` Darren Hart
@ 2022-03-03 21:43           ` Barry Song
  0 siblings, 0 replies; 9+ messages in thread
From: Barry Song @ 2022-03-03 21:43 UTC (permalink / raw)
  To: Darren Hart
  Cc: Sudeep Holla, Greg Kroah-Hartman, Rafael J. Wysocki,
	Vincent Guittot, LKML, Linux Arm, Catalin Marinas, Will Deacon,
	Peter Zijlstra, Barry Song, Valentin Schneider,
	D . Scott Phillips, Ilkka Koskinen, stable

On Fri, Mar 4, 2022 at 5:35 AM Darren Hart
<darren@os.amperecomputing.com> wrote:
>
> On Thu, Mar 03, 2022 at 06:36:30PM +1300, Barry Song wrote:
> > On Thu, Mar 3, 2022 at 3:22 PM Darren Hart
> > <darren@os.amperecomputing.com> wrote:
> > >
> > > On Wed, Mar 02, 2022 at 10:32:06AM +0100, Vincent Guittot wrote:
> > > > On Tue, 1 Mar 2022 at 01:35, Darren Hart <darren@os.amperecomputing.com> wrote:
> > > > >
> > > > > Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
> > > > > Control Unit, but have no shared CPU-side last level cache.
> > > > >
> > > > > cpu_coregroup_mask() will return a cpumask with weight 1, while
> > > > > cpu_clustergroup_mask() will return a cpumask with weight 2.
> > > > >
> > > > > As a result, build_sched_domain() will BUG() once per CPU with:
> > > > >
> > > > > BUG: arch topology borken
> > > > >      the CLS domain not a subset of the MC domain
> > > > >
> > > > > The MC level cpumask is then extended to that of the CLS child, and is
> > > > > later removed entirely as redundant. This sched domain topology is an
> > > > > improvement over previous topologies, or those built without
> > > > > SCHED_CLUSTER, particularly for certain latency sensitive workloads.
> > > > > With the current scheduler model and heuristics, this is a desirable
> > > > > default topology for Ampere Altra and Altra Max system.
> > > > >
> > > > > Introduce an alternate sched domain topology for arm64 without the MC
> > > > > level and test for llc_sibling weight 1 across all CPUs to enable it.
> > > > >
> > > > > Do this in arch/arm64/kernel/smp.c (as opposed to
> > > > > arch/arm64/kernel/topology.c) as all the CPU sibling maps are now
> > > > > populated and we avoid needing to extend the drivers/acpi/pptt.c API to
> > > > > detect the cluster level being above the cpu llc level. This is
> > > > > consistent with other architectures and provides a readily extensible
> > > > > mechanism for other alternate topologies.
> > > > >
> > > > > The final sched domain topology for a 2 socket Ampere Altra system is
> > > > > unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
> > > > >
> > > > > For CPU0:
> > > > >
> > > > > CONFIG_SCHED_CLUSTER=y
> > > > > CLS  [0-1]
> > > > > DIE  [0-79]
> > > > > NUMA [0-159]
> > > > >
> > > > > CONFIG_SCHED_CLUSTER is not set
> > > > > DIE  [0-79]
> > > > > NUMA [0-159]
> > > > >
> > > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > > Cc: Will Deacon <will@kernel.org>
> > > > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > > > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > > > > Cc: Barry Song <song.bao.hua@hisilicon.com>
> > > > > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > > > > Cc: D. Scott Phillips <scott@os.amperecomputing.com>
> > > > > Cc: Ilkka Koskinen <ilkka@os.amperecomputing.com>
> > > > > Cc: <stable@vger.kernel.org> # 5.16.x
> > > > > Signed-off-by: Darren Hart <darren@os.amperecomputing.com>
> > > > > ---
> > > > >  arch/arm64/kernel/smp.c | 28 ++++++++++++++++++++++++++++
> > > > >  1 file changed, 28 insertions(+)
> > > > >
> > > > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > > > > index 27df5c1e6baa..3597e75645e1 100644
> > > > > --- a/arch/arm64/kernel/smp.c
> > > > > +++ b/arch/arm64/kernel/smp.c
> > > > > @@ -433,6 +433,33 @@ static void __init hyp_mode_check(void)
> > > > >         }
> > > > >  }
> > > > >
> > > > > +static struct sched_domain_topology_level arm64_no_mc_topology[] = {
> > > > > +#ifdef CONFIG_SCHED_SMT
> > > > > +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> > > > > +#endif
> > > > > +
> > > > > +#ifdef CONFIG_SCHED_CLUSTER
> > > > > +       { cpu_clustergroup_mask, cpu_cluster_flags, SD_INIT_NAME(CLS) },
> > > > > +#endif
> > > > > +
> > > > > +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> > > > > +       { NULL, },
> > > > > +};
> > > > > +
> > > > > +static void __init update_sched_domain_topology(void)
> > > > > +{
> > > > > +       int cpu;
> > > > > +
> > > > > +       for_each_possible_cpu(cpu) {
> > > > > +               if (cpu_topology[cpu].llc_id != -1 &&
> > > >
> > > > Have you tested it with a non-acpi system ? AFAICT, llc_id is only set
> > > > by ACPI system and  llc_id == -1 for others like DT based system
> > > >
> > > > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > > > +                       return;
> > > > > +       }
> > >
> > > Hi Vincent,
> > >
> > > I did not have a non-acpi system to test, no. You're right of course,
> > > llc_id is only set by ACPI systems on arm64. We could wrap this in a
> > > CONFIG_ACPI ifdef (or IS_ENABLED), but I think this would be preferable:
> > >
> > > +       for_each_possible_cpu(cpu) {
> > > +               if (cpu_topology[cpu].llc_id == -1 ||
> > > +                   cpumask_weight(&cpu_topology[cpu].llc_sibling) > 1)
> > > +                       return;
> > > +       }
> > >
> > > Quickly tested on Altra successfully. Would appreciate anyone with non-acpi
> > > arm64 systems who can test and verify this behaves as intended. I will ask
> > > around tomorrow as well to see what I may have access to.
> >
> > I wonder if we can fix it by this
> >
> > diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> > index 976154140f0b..551655ccd0eb 100644
> > --- a/drivers/base/arch_topology.c
> > +++ b/drivers/base/arch_topology.c
> > @@ -627,6 +627,13 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
> >                 if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
> >                         core_mask = &cpu_topology[cpu].llc_sibling;
> >         }
> > +       /*
> > +        * Some machines have no LLC but have clusters, we let MC = CLUSTER
> > +        * as MC should always be after CLUSTER. But anyway, the MC domain
> > +        * will be removed
> > +        */
> > +       if (cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
> > +               core_mask = &cpu_topology[cpu].cluster_sibling;
> >
> >         return core_mask;
> >  }
> >
> > as it can make all kinds of topologies happy -  symmetric and asymmetric.
> >
>
> Hah. Full circle. Yes, this works, and it's basically what we'd started
> with internally. I ended up exploring various paths here to avoid a
> "band aid" and to target the fix and minimize impact. That said, after
> digging through the acpi, topology, smp, and sched domains code... I
> don't think this approach is a band aid and it's a very minimal
> solution. The only downside I can think of is masking a potential
> topology bug and not catching it in the scheduler - that seems very
> unlikely. I'm perfectly happy with this solution as well.
>
> Will D, would you prefer this approach?
>
> +Sudeep, Greg, and Rafael,
>
> Are you OK with this approach?
>
> If so, we can drop my arm64 specific new topology patch and I can send a
> version of this one out (suggested-by Barry of course), unless you'd
> prefer to send it Barry?

i prefer this solution as anyway, cpu_coregroup_mask() is a combination
of a couple of masks. we are just putting cluster_mask into consideration.

const struct cpumask *cpu_coregroup_mask(int cpu)
{
        const cpumask_t *core_mask = cpumask_of_node(cpu_to_node(cpu));

        /* Find the smaller of NUMA, core or LLC siblings */
        if (cpumask_subset(&cpu_topology[cpu].core_sibling, core_mask)) {
                /* not numa in package, lets use the package siblings */
                core_mask = &cpu_topology[cpu].core_sibling;
        }
        if (cpu_topology[cpu].llc_id != -1) {
                if (cpumask_subset(&cpu_topology[cpu].llc_sibling, core_mask))
                        core_mask = &cpu_topology[cpu].llc_sibling;
        }

        return core_mask;
}

we are also making this machine happy:

CPU0-CPU1 have cluster but have no LLC.
CPU2-CPU3 have both cluster and LLC.

for cpu0 and cpu1, they are getting cluster level only, mc will be dropped.
for cpu2 and cpu3, they are getting both cluster and mc.

Please feel free to send, Darren. And i feel wake_affine series will
finally have to handle your exception.

>
> Thanks,
>
> > >
> > > Thanks,
> > >
> > > > > +
> > > > > +       pr_info("No LLC siblings, using No MC sched domains topology\n");
> > > > > +       set_sched_topology(arm64_no_mc_topology);
> > > > > +}
> > > > > +
> > > > >  void __init smp_cpus_done(unsigned int max_cpus)
> > > > >  {
> > > > >         pr_info("SMP: Total of %d processors activated.\n", num_online_cpus());
> > > > > @@ -440,6 +467,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
> > > > >         hyp_mode_check();
> > > > >         apply_alternatives_all();
> > > > >         mark_linear_text_alias_ro();
> > > > > +       update_sched_domain_topology();
> > > > >  }
> > > > >
> > > > >  void __init smp_prepare_boot_cpu(void)
> > > > > --
> > > > > 2.31.1
> > > > >
> > >
> > > --
> > > Darren Hart
> > > Ampere Computing / OS and Kernel
> >
> > Thanks
> > Barry
>
> --
> Darren Hart
> Ampere Computing / OS and Kernel

Thanks
Barry

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-03 21:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01  0:28 [PATCH 0/1] arm64: smp: Skip MC sched domain on SoCs with no LLC Darren Hart
2022-03-01  0:29 ` [PATCH 1/1] " Darren Hart
2022-03-02  9:32   ` Vincent Guittot
2022-03-03  2:18     ` Darren Hart
2022-03-03  5:36       ` Barry Song
2022-03-03 16:35         ` Darren Hart
2022-03-03 21:43           ` Barry Song
2022-03-03  8:08       ` Vincent Guittot
2022-03-03 16:02         ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).