linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
@ 2021-02-09 22:39 Alison Schofield
  2021-02-09 23:09 ` Luck, Tony
  2021-02-10  8:05 ` Peter Zijlstra
  0 siblings, 2 replies; 10+ messages in thread
From: Alison Schofield @ 2021-02-09 22:39 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: Alison Schofield, x86, linux-kernel, Dave Hansen, Tony Luck,
	Tim Chen, H. Peter Anvin, Peter Zijlstra, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin

Commit 1340ccfa9a9a ("x86,sched: Allow topologies where NUMA nodes
share an LLC") added a vendor and model specific check to skip the
topology_sane() check for Intel's Sky Lake Server CPUs where NUMA
nodes shared an LLC.

This topology is no longer a quirk for Intel CPUs as Ice Lake and
Sapphire Rapids CPUs exhibit the same topology. Rather than maintain
the quirk list, define a synthetic flag that directs the scheduler
to allow this topology without warning for all Intel CPUs when NUMA
is configured.

Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: brice.goglin@gmail.com
---
 arch/x86/include/asm/cpufeatures.h |  1 +
 arch/x86/kernel/cpu/intel.c        | 15 +++++++++++++++
 arch/x86/kernel/smpboot.c          | 23 ++---------------------
 3 files changed, 18 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..bec74b90d3d6 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -421,5 +421,6 @@
 #define X86_BUG_TAA			X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */
 #define X86_BUG_ITLB_MULTIHIT		X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
 #define X86_BUG_SRBDS			X86_BUG(24) /* CPU may leak RNG bits if not mitigated */
+#define X86_BUG_NUMA_SHARES_LLC		X86_BUG(25) /* CPU may enumerate an LLC shared by multiple NUMA nodes */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 816fdbec795a..027348261080 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -719,6 +719,21 @@ static void init_intel(struct cpuinfo_x86 *c)
 		tsx_disable();
 
 	split_lock_init();
+
+	/*
+	 * Set X86_BUG_NUMA_SHARES_LLC to allow topologies where NUMA
+	 * nodes share an LLC. In Sub-NUMA Clustering mode Intel CPUs
+	 * may enumerate an LLC as shared by multiple NUMA nodes. The
+	 * LLC is shared for off-package data access but private to
+	 * the NUMA node for on-package access. This topology first
+	 * appeared in SKYLAKE_X. It was treated as a quirk and allowed.
+	 * This topology reappeared in ICELAKE_X and SAPPHIRERAPIDS_X.
+	 * Rather than maintain a list of quirk CPUS, allow this topology
+	 * on all Intel CPUs with NUMA configured. When this X86_BUG is
+	 * set, the scheduler accepts this topology without warning.
+	 */
+	if (IS_ENABLED(CONFIG_NUMA))
+		set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC);
 }
 
 #ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 117e24fbfd8a..7d05c3552795 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -458,26 +458,6 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
-/*
- * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
- *
- * These are Intel CPUs that enumerate an LLC that is shared by
- * multiple NUMA nodes. The LLC on these systems is shared for
- * off-package data access but private to the NUMA node (half
- * of the package) for on-package access.
- *
- * CPUID (the source of the information about the LLC) can only
- * enumerate the cache as being shared *or* unshared, but not
- * this particular configuration. The CPU in this case enumerates
- * the cache to be shared across the entire package (spanning both
- * NUMA nodes).
- */
-
-static const struct x86_cpu_id snc_cpu[] = {
-	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, NULL),
-	{}
-};
-
 static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
 	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
@@ -495,7 +475,8 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	 * means 'c' does not share the LLC of 'o'. This will be
 	 * reflected to userspace.
 	 */
-	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
+	if (!topology_same_node(c, o) &&
+	    boot_cpu_has_bug(X86_BUG_NUMA_SHARES_LLC))
 		return false;
 
 	return topology_sane(c, o, "llc");
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-09 22:39 [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms Alison Schofield
@ 2021-02-09 23:09 ` Luck, Tony
  2021-02-10  8:10   ` Peter Zijlstra
  2021-02-10  8:05 ` Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Luck, Tony @ 2021-02-09 23:09 UTC (permalink / raw)
  To: Schofield, Alison, Thomas Gleixner, Ingo Molnar, Borislav Petkov
  Cc: x86, linux-kernel, Dave Hansen, Tim Chen, H. Peter Anvin,
	Peter Zijlstra, David Rientjes, Igor Mammedov, Prarit Bhargava,
	brice.goglin

> +#define X86_BUG_NUMA_SHARES_LLC		X86_BUG(25) /* CPU may enumerate an LLC shared by multiple NUMA nodes */

During internal review I wondered why this is a "BUG" rather than a "FEATURE" bit.

Apparently, the suggestion for "BUG" came from earlier community discussions.

Historically it may have seemed reasonable to say that a cache cannot span
NUMA domains. But with more and more things moving off the motherboard
and into the socket, this doesn't seem too weird now.

-Tony


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-09 22:39 [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms Alison Schofield
  2021-02-09 23:09 ` Luck, Tony
@ 2021-02-10  8:05 ` Peter Zijlstra
  2021-02-10 15:22   ` Dave Hansen
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-02-10  8:05 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, linux-kernel,
	Dave Hansen, Tony Luck, Tim Chen, H. Peter Anvin, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin

On Tue, Feb 09, 2021 at 02:39:43PM -0800, Alison Schofield wrote:

> Commit 1340ccfa9a9a ("x86,sched: Allow topologies where NUMA nodes
> share an LLC") added a vendor and model specific check to skip the
> topology_sane() check for Intel's Sky Lake Server CPUs where NUMA
> nodes shared an LLC.
> 
> This topology is no longer a quirk for Intel CPUs as Ice Lake and
> Sapphire Rapids CPUs exhibit the same topology. Rather than maintain
> the quirk list, define a synthetic flag that directs the scheduler
> to allow this topology without warning for all Intel CPUs when NUMA
> is configured.

Hurmph, I still think it's daft.

> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index 816fdbec795a..027348261080 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -719,6 +719,21 @@ static void init_intel(struct cpuinfo_x86 *c)
>  		tsx_disable();
>  
>  	split_lock_init();
> +
> +	/*
> +	 * Set X86_BUG_NUMA_SHARES_LLC to allow topologies where NUMA
> +	 * nodes share an LLC. In Sub-NUMA Clustering mode Intel CPUs
> +	 * may enumerate an LLC as shared by multiple NUMA nodes. The
> +	 * LLC is shared for off-package data access but private to
> +	 * the NUMA node for on-package access. This topology first
> +	 * appeared in SKYLAKE_X. It was treated as a quirk and allowed.
> +	 * This topology reappeared in ICELAKE_X and SAPPHIRERAPIDS_X.
> +	 * Rather than maintain a list of quirk CPUS, allow this topology
> +	 * on all Intel CPUs with NUMA configured. When this X86_BUG is
> +	 * set, the scheduler accepts this topology without warning.
> +	 */
> +	if (IS_ENABLED(CONFIG_NUMA))
> +		set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC);
>  }

This seens wrong too, it shouldn't be allowed pre SKX. And ideally only
be allowed when SNC is enabled.

Please make this more specific than: all Intel CPUs. Ofcourse, since you
all knew this was an issue, you could've made it discoverable
_somewhere_ :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-09 23:09 ` Luck, Tony
@ 2021-02-10  8:10   ` Peter Zijlstra
  2021-02-10 17:41     ` Dave Hansen
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-02-10  8:10 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Schofield, Alison, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, linux-kernel, Dave Hansen, Tim Chen, H. Peter Anvin,
	David Rientjes, Igor Mammedov, Prarit Bhargava, brice.goglin

On Tue, Feb 09, 2021 at 11:09:27PM +0000, Luck, Tony wrote:
> > +#define X86_BUG_NUMA_SHARES_LLC		X86_BUG(25) /* CPU may enumerate an LLC shared by multiple NUMA nodes */
> 
> During internal review I wondered why this is a "BUG" rather than a "FEATURE" bit.
> 
> Apparently, the suggestion for "BUG" came from earlier community discussions.
> 
> Historically it may have seemed reasonable to say that a cache cannot span
> NUMA domains. But with more and more things moving off the motherboard
> and into the socket, this doesn't seem too weird now.

If you look at the details this SNC LLC span doesn't behave quite right
either.

It really isn't a regular cache, but behaves a bit like a mash-up of the
s390 book caches and a normal LLC.

Did anybody play with adding the book domain to these SNC
configurations? Can we detect SNC other than by this quirk?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-10  8:05 ` Peter Zijlstra
@ 2021-02-10 15:22   ` Dave Hansen
  2021-02-10 19:38     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Dave Hansen @ 2021-02-10 15:22 UTC (permalink / raw)
  To: Peter Zijlstra, Alison Schofield
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, linux-kernel,
	Dave Hansen, Tony Luck, Tim Chen, H. Peter Anvin, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin

On 2/10/21 12:05 AM, Peter Zijlstra wrote:
>> +	if (IS_ENABLED(CONFIG_NUMA))
>> +		set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC);
>>  }
> This seens wrong too, it shouldn't be allowed pre SKX. And ideally only
> be allowed when SNC is enabled.

Originally, this just added a few more models to the list of CPUs with
SNC.  I was hoping for something a bit more durable that we wouldn't
have to go back and poke at every year or two.

> Please make this more specific than: all Intel CPUs. Ofcourse, since you
> all knew this was an issue, you could've made it discoverable
> _somewhere_ :-(

You're totally right, of course.  The hardware could enumerate SNC as a
feature explicitly somewhere.  But, that's a little silly because all of
the information that it's enumerating about the CPU caches and NUMA
nodes present and correct is *correct*.  The secondary information would
only be for the CPU to say, "yeah, I'm really sure about that other stuff".

I think this sanity check has outlived its usefulness.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-10  8:10   ` Peter Zijlstra
@ 2021-02-10 17:41     ` Dave Hansen
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Hansen @ 2021-02-10 17:41 UTC (permalink / raw)
  To: Peter Zijlstra, Luck, Tony
  Cc: Schofield, Alison, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, linux-kernel, Dave Hansen, Tim Chen, H. Peter Anvin,
	David Rientjes, Igor Mammedov, Prarit Bhargava, brice.goglin

On 2/10/21 12:10 AM, Peter Zijlstra wrote:
> On Tue, Feb 09, 2021 at 11:09:27PM +0000, Luck, Tony wrote:
>>> +#define X86_BUG_NUMA_SHARES_LLC		X86_BUG(25) /* CPU may enumerate an LLC shared by multiple NUMA nodes */
>>
>> During internal review I wondered why this is a "BUG" rather than a "FEATURE" bit.
>>
>> Apparently, the suggestion for "BUG" came from earlier community discussions.
>>
>> Historically it may have seemed reasonable to say that a cache cannot span
>> NUMA domains. But with more and more things moving off the motherboard
>> and into the socket, this doesn't seem too weird now.
> 
> If you look at the details this SNC LLC span doesn't behave quite right
> either.

Yes, the rules are weird.  I came to the conclusion that there's no
precise way to enumerate these rules with the existing CPUID-based cache
enumeration.

I can send you my powerpoint slides. ;)

> It really isn't a regular cache, but behaves a bit like a mash-up of the
> s390 book caches and a normal LLC.
> 
> Did anybody play with adding the book domain to these SNC
> configurations?

Nope.  Probably mostly because we don't have a great way of generating it.

For those playing along at home, I think Peter is talking about this:

static struct sched_domain_topology_level s390_topology[] = {
        { cpu_thread_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
        { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
        { cpu_book_mask, SD_INIT_NAME(BOOK) },
        { cpu_drawer_mask, SD_INIT_NAME(DRAWER) },
        { cpu_cpu_mask, SD_INIT_NAME(DIE) },
        { NULL, },
};

From arch/s390/kernel/topology.c

> Can we detect SNC other than by this quirk?

I'm sure there's _a_ way, but nothing that's architectural.  The kernel
has literally been given all the information about the topology that it
needs from the CPU and the firmware.  The problem is that that
information resembles garbage that the kernel has been presented with in
the past.

I guess you're saying that it would be nice to have some other bit of
info that the kernel can use to boost its confidence that the
hardware/bios are being sane.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-10 15:22   ` Dave Hansen
@ 2021-02-10 19:38     ` Peter Zijlstra
  2021-02-10 22:11       ` Alison Schofield
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-02-10 19:38 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Alison Schofield, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86, linux-kernel, Dave Hansen, Tony Luck, Tim Chen,
	H. Peter Anvin, David Rientjes, Igor Mammedov, Prarit Bhargava,
	brice.goglin

On Wed, Feb 10, 2021 at 07:22:03AM -0800, Dave Hansen wrote:
> On 2/10/21 12:05 AM, Peter Zijlstra wrote:
> >> +	if (IS_ENABLED(CONFIG_NUMA))
> >> +		set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC);
> >>  }
> > This seens wrong too, it shouldn't be allowed pre SKX. And ideally only
> > be allowed when SNC is enabled.
> 
> Originally, this just added a few more models to the list of CPUs with
> SNC.  I was hoping for something a bit more durable that we wouldn't
> have to go back and poke at every year or two.

It's not like we don't have to update a gazillion FMS tables for each
new instance anyway :-(

> > Please make this more specific than: all Intel CPUs. Ofcourse, since you
> > all knew this was an issue, you could've made it discoverable
> > _somewhere_ :-(
> 
> You're totally right, of course.  The hardware could enumerate SNC as a
> feature explicitly somewhere.  But, that's a little silly because all of
> the information that it's enumerating about the CPU caches and NUMA
> nodes present and correct is *correct*.  The secondary information would
> only be for the CPU to say, "yeah, I'm really sure about that other stuff".
> 
> I think this sanity check has outlived its usefulness.

Maybe BIOS monkeys got better, but I'm not sure I trust it all.

So SNC is all on-package, do all those nodes have the same pkg id? That
is, I'm trying to find something to restrict topological madness.


diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 88cd0064d1f8..de1010dd0bba 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -458,6 +458,26 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
+static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+	if ((c->phys_proc_id == o->phys_proc_id) &&
+		(c->cpu_die_id == o->cpu_die_id))
+		return true;
+	return false;
+}
+
+/*
+ * Unlike the other levels, we do not enforce keeping a
+ * multicore group inside a NUMA node.  If this happens, we will
+ * discard the MC level of the topology later.
+ */
+static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+	if (c->phys_proc_id == o->phys_proc_id)
+		return true;
+	return false;
+}
+
 /*
  * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
  *
@@ -495,33 +515,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	 * means 'c' does not share the LLC of 'o'. This will be
 	 * reflected to userspace.
 	 */
-	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
+	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu) && match_pkg(c, o))
 		return false;
 
 	return topology_sane(c, o, "llc");
 }
 
-/*
- * Unlike the other levels, we do not enforce keeping a
- * multicore group inside a NUMA node.  If this happens, we will
- * discard the MC level of the topology later.
- */
-static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
-{
-	if (c->phys_proc_id == o->phys_proc_id)
-		return true;
-	return false;
-}
-
-static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
-{
-	if ((c->phys_proc_id == o->phys_proc_id) &&
-		(c->cpu_die_id == o->cpu_die_id))
-		return true;
-	return false;
-}
-
-
 #if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
 static inline int x86_sched_itmt_flags(void)
 {

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-10 19:38     ` Peter Zijlstra
@ 2021-02-10 22:11       ` Alison Schofield
  2021-02-16 11:29         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Alison Schofield @ 2021-02-10 22:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	linux-kernel, Dave Hansen, Tony Luck, Tim Chen, H. Peter Anvin,
	David Rientjes, Igor Mammedov, Prarit Bhargava, brice.goglin

On Wed, Feb 10, 2021 at 08:38:42PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 10, 2021 at 07:22:03AM -0800, Dave Hansen wrote:
> > On 2/10/21 12:05 AM, Peter Zijlstra wrote:
> > >> +	if (IS_ENABLED(CONFIG_NUMA))
> > >> +		set_cpu_bug(c, X86_BUG_NUMA_SHARES_LLC);
> > >>  }
> > > This seens wrong too, it shouldn't be allowed pre SKX. And ideally only
> > > be allowed when SNC is enabled.
> > 
> > Originally, this just added a few more models to the list of CPUs with
> > SNC.  I was hoping for something a bit more durable that we wouldn't
> > have to go back and poke at every year or two.
> 
> It's not like we don't have to update a gazillion FMS tables for each
> new instance anyway :-(
> 
> > > Please make this more specific than: all Intel CPUs. Ofcourse, since you
> > > all knew this was an issue, you could've made it discoverable
> > > _somewhere_ :-(
> > 
> > You're totally right, of course.  The hardware could enumerate SNC as a
> > feature explicitly somewhere.  But, that's a little silly because all of
> > the information that it's enumerating about the CPU caches and NUMA
> > nodes present and correct is *correct*.  The secondary information would
> > only be for the CPU to say, "yeah, I'm really sure about that other stuff".
> > 
> > I think this sanity check has outlived its usefulness.
> 
> Maybe BIOS monkeys got better, but I'm not sure I trust it all.
> 
> So SNC is all on-package, do all those nodes have the same pkg id? That
> is, I'm trying to find something to restrict topological madness.
> 
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 88cd0064d1f8..de1010dd0bba 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -458,6 +458,26 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  	return false;
>  }
>  
> +static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> +{
> +	if ((c->phys_proc_id == o->phys_proc_id) &&
> +		(c->cpu_die_id == o->cpu_die_id))
> +		return true;
> +	return false;
> +}
> +
> +/*
> + * Unlike the other levels, we do not enforce keeping a
> + * multicore group inside a NUMA node.  If this happens, we will
> + * discard the MC level of the topology later.
> + */
> +static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> +{
> +	if (c->phys_proc_id == o->phys_proc_id)
> +		return true;
> +	return false;
> +}
> +
>  /*
>   * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
>   *
> @@ -495,33 +515,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  	 * means 'c' does not share the LLC of 'o'. This will be
>  	 * reflected to userspace.
>  	 */
> -	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> +	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu) && match_pkg(c, o))
>  		return false;
>  
>  	return topology_sane(c, o, "llc");
>  }
>  

This is equivalent to determining if x86_has_numa_in_package.
Do you think there is an opportunity to set x86_has_numa_in_package
earlier, and use it here and in set_cpu_sibling_map()?

With that additional info (match_pkg()) how about -

Instead of this:
-       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
+       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu) && match_pkg(c, o))

Do this:

-       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
+       if (!topology_same_node(c, o) && match_pkg(c, o))


Looking at Commit 316ad248307f ("sched/x86: Rewrite set_cpu_sibling_map())
which reworked topology WARNINGs, the intent was to "make sure to
only warn when the check changes the end result"

This check doesn't change the end result. It returns false directly
and if it were bypassed completely, it would still return false with
a WARNING.

If we add that additional match_pkg() check is removing the WARNING for
all cases possible?


-snip

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-10 22:11       ` Alison Schofield
@ 2021-02-16 11:29         ` Peter Zijlstra
  2021-02-16 19:53           ` Alison Schofield
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2021-02-16 11:29 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	linux-kernel, Dave Hansen, Tony Luck, Tim Chen, H. Peter Anvin,
	David Rientjes, Igor Mammedov, Prarit Bhargava, brice.goglin

On Wed, Feb 10, 2021 at 02:11:34PM -0800, Alison Schofield wrote:

> This is equivalent to determining if x86_has_numa_in_package.
> Do you think there is an opportunity to set x86_has_numa_in_package
> earlier, and use it here and in set_cpu_sibling_map()?

Sure. Not sure that's actually clearer though.

> With that additional info (match_pkg()) how about -
> 
> Instead of this:
> -       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> +       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu) && match_pkg(c, o))
> 
> Do this:
> 
> -       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> +       if (!topology_same_node(c, o) && match_pkg(c, o))
> 
> 
> Looking at Commit 316ad248307f ("sched/x86: Rewrite set_cpu_sibling_map())
> which reworked topology WARNINGs, the intent was to "make sure to
> only warn when the check changes the end result"
> 
> This check doesn't change the end result. It returns false directly
> and if it were bypassed completely, it would still return false with
> a WARNING.

I'm not following the argument, we explicitly *do* modify the end result
for those SNC caches. Also, by doing what you propose, we fail to get a
warning if/when AMD decides to do 'funny' things.

Suppose AMD too thinks this is a swell idea, but they have subtly
different cache behaviour (just for giggles), then it all goes
undetected, which would be bad.

> If we add that additional match_pkg() check is removing the WARNING for
> all cases possible?

How many parts had that Intel Cluster-on-Die thing? Seeing how all the
new parts have the SNC crud, that seems like a finite list.

Wikipedia seems to suggest haswell and broadwell were the only onces
with COD on, skylake and later has the SNC.

How's something like this then (needs splitting into multiple patches I
suppose):

---
 arch/x86/kernel/smpboot.c | 76 +++++++++++++++++++++++++----------------------
 1 file changed, 41 insertions(+), 35 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 117e24fbfd8a..cfe23badf9a3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -458,8 +458,31 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
+static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+	if ((c->phys_proc_id == o->phys_proc_id) &&
+		(c->cpu_die_id == o->cpu_die_id))
+		return true;
+	return false;
+}
+
+/*
+ * Unlike the other levels, we do not enforce keeping a
+ * multicore group inside a NUMA node.  If this happens, we will
+ * discard the MC level of the topology later.
+ */
+static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
+{
+	if (c->phys_proc_id == o->phys_proc_id)
+		return true;
+	return false;
+}
+
 /*
- * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
+ * Define intel_cod_cpu[] for Intel COD (Cluster-on-Die) CPUs.
+ *
+ * Any Intel CPU that has multiple nodes per package and doesn't match this
+ * will have the newer SNC (Sub-NUMA Cluster).
  *
  * These are Intel CPUs that enumerate an LLC that is shared by
  * multiple NUMA nodes. The LLC on these systems is shared for
@@ -473,14 +496,18 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
  * NUMA nodes).
  */
 
-static const struct x86_cpu_id snc_cpu[] = {
-	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, NULL),
+static const struct x86_cpu_id intel_cod_cpu[] = {
+	X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0),	/* COD */
+	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0),	/* COD */
+	X86_MATCH_INTEL_FAM6_MODEL(ANY, 1),		/* SNC */
 	{}
 };
 
 static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
+	const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu);
 	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
+	bool intel_snc = id && id->driver_data;
 
 	/* Do not match if we do not have a valid APICID for cpu: */
 	if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
@@ -495,32 +522,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	 * means 'c' does not share the LLC of 'o'. This will be
 	 * reflected to userspace.
 	 */
-	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
+	if (match_pkg(c, o) && !topology_same_node(c, o) && intel_snc)
 		return false;
 
 	return topology_sane(c, o, "llc");
 }
 
-/*
- * Unlike the other levels, we do not enforce keeping a
- * multicore group inside a NUMA node.  If this happens, we will
- * discard the MC level of the topology later.
- */
-static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
-{
-	if (c->phys_proc_id == o->phys_proc_id)
-		return true;
-	return false;
-}
-
-static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
-{
-	if ((c->phys_proc_id == o->phys_proc_id) &&
-		(c->cpu_die_id == o->cpu_die_id))
-		return true;
-	return false;
-}
-
 
 #if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
 static inline int x86_sched_itmt_flags(void)
@@ -592,14 +599,23 @@ void set_cpu_sibling_map(int cpu)
 	for_each_cpu(i, cpu_sibling_setup_mask) {
 		o = &cpu_data(i);
 
+		if (match_pkg(c, o) && !topology_same_node(c, o))
+			x86_has_numa_in_package = true;
+
 		if ((i == cpu) || (has_smt && match_smt(c, o)))
 			link_mask(topology_sibling_cpumask, cpu, i);
 
 		if ((i == cpu) || (has_mp && match_llc(c, o)))
 			link_mask(cpu_llc_shared_mask, cpu, i);
 
+		if ((i == cpu) || (has_mp && match_die(c, o)))
+			link_mask(topology_die_cpumask, cpu, i);
 	}
 
+	threads = cpumask_weight(topology_sibling_cpumask(cpu));
+	if (threads > __max_smt_threads)
+		__max_smt_threads = threads;
+
 	/*
 	 * This needs a separate iteration over the cpus because we rely on all
 	 * topology_sibling_cpumask links to be set-up.
@@ -613,8 +629,7 @@ void set_cpu_sibling_map(int cpu)
 			/*
 			 *  Does this new cpu bringup a new core?
 			 */
-			if (cpumask_weight(
-			    topology_sibling_cpumask(cpu)) == 1) {
+			if (threads == 1) {
 				/*
 				 * for each core in package, increment
 				 * the booted_cores for this new cpu
@@ -631,16 +646,7 @@ void set_cpu_sibling_map(int cpu)
 			} else if (i != cpu && !c->booted_cores)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
-		if (match_pkg(c, o) && !topology_same_node(c, o))
-			x86_has_numa_in_package = true;
-
-		if ((i == cpu) || (has_mp && match_die(c, o)))
-			link_mask(topology_die_cpumask, cpu, i);
 	}
-
-	threads = cpumask_weight(topology_sibling_cpumask(cpu));
-	if (threads > __max_smt_threads)
-		__max_smt_threads = threads;
 }
 
 /* maps the cpu to the sched domain representing multi-core */

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms
  2021-02-16 11:29         ` Peter Zijlstra
@ 2021-02-16 19:53           ` Alison Schofield
  0 siblings, 0 replies; 10+ messages in thread
From: Alison Schofield @ 2021-02-16 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	linux-kernel, Dave Hansen, Tony Luck, Tim Chen, H. Peter Anvin,
	David Rientjes, Igor Mammedov, Prarit Bhargava, brice.goglin

On Tue, Feb 16, 2021 at 12:29:02PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 10, 2021 at 02:11:34PM -0800, Alison Schofield wrote:
> 
> > This is equivalent to determining if x86_has_numa_in_package.
> > Do you think there is an opportunity to set x86_has_numa_in_package
> > earlier, and use it here and in set_cpu_sibling_map()?
> 
> Sure. Not sure that's actually clearer though.
> 
> > With that additional info (match_pkg()) how about -
> > 
> > Instead of this:
> > -       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> > +       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu) && match_pkg(c, o))
> > 
> > Do this:
> > 
> > -       if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> > +       if (!topology_same_node(c, o) && match_pkg(c, o))
> > 
> > 
> > Looking at Commit 316ad248307f ("sched/x86: Rewrite set_cpu_sibling_map())
> > which reworked topology WARNINGs, the intent was to "make sure to
> > only warn when the check changes the end result"
> > 
> > This check doesn't change the end result. It returns false directly
> > and if it were bypassed completely, it would still return false with
> > a WARNING.
> 
> I'm not following the argument, we explicitly *do* modify the end result
> for those SNC caches. Also, by doing what you propose, we fail to get a
> warning if/when AMD decides to do 'funny' things.

This might be beating a dead horse (me==dead horse) but how do we
modify the end result? That is, I see false returned either way.
Am I missing another code path that considers this info?

> 
> Suppose AMD too thinks this is a swell idea, but they have subtly
> different cache behaviour (just for giggles), then it all goes
> undetected, which would be bad.
>
Understood.

> > If we add that additional match_pkg() check is removing the WARNING for
> > all cases possible?
> 
> How many parts had that Intel Cluster-on-Die thing? Seeing how all the
> new parts have the SNC crud, that seems like a finite list.
> 
> Wikipedia seems to suggest haswell and broadwell were the only onces
> with COD on, skylake and later has the SNC.
> 
> How's something like this then (needs splitting into multiple patches I
> suppose):
> 

This is interesting. Flipping to check COD, instead of SNC would mean
we don't have to revisit this for future SNC CPUs. I don't think it's
worth the risk now (more code changes, more platforms to test).


The list exists. We can add to the list and it makes us think about the
SNC topologies as each new CPU is announced. That is a good nuisance.

Thanks for showing how the COD case would work.


> ---
>  arch/x86/kernel/smpboot.c | 76 +++++++++++++++++++++++++----------------------
>  1 file changed, 41 insertions(+), 35 deletions(-)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 117e24fbfd8a..cfe23badf9a3 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -458,8 +458,31 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  	return false;
>  }
>  
> +static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> +{
> +	if ((c->phys_proc_id == o->phys_proc_id) &&
> +		(c->cpu_die_id == o->cpu_die_id))
> +		return true;
> +	return false;
> +}
> +
> +/*
> + * Unlike the other levels, we do not enforce keeping a
> + * multicore group inside a NUMA node.  If this happens, we will
> + * discard the MC level of the topology later.
> + */
> +static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> +{
> +	if (c->phys_proc_id == o->phys_proc_id)
> +		return true;
> +	return false;
> +}
> +
>  /*
> - * Define snc_cpu[] for SNC (Sub-NUMA Cluster) CPUs.
> + * Define intel_cod_cpu[] for Intel COD (Cluster-on-Die) CPUs.
> + *
> + * Any Intel CPU that has multiple nodes per package and doesn't match this
> + * will have the newer SNC (Sub-NUMA Cluster).
>   *
>   * These are Intel CPUs that enumerate an LLC that is shared by
>   * multiple NUMA nodes. The LLC on these systems is shared for
> @@ -473,14 +496,18 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>   * NUMA nodes).
>   */
>  
> -static const struct x86_cpu_id snc_cpu[] = {
> -	X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X, NULL),
> +static const struct x86_cpu_id intel_cod_cpu[] = {
> +	X86_MATCH_INTEL_FAM6_MODEL(HASWELL_X, 0),	/* COD */
> +	X86_MATCH_INTEL_FAM6_MODEL(BROADWELL_X, 0),	/* COD */
> +	X86_MATCH_INTEL_FAM6_MODEL(ANY, 1),		/* SNC */
>  	{}
>  };
>  
>  static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  {
> +	const struct x86_cpu_id *id = x86_match_cpu(intel_cod_cpu);
>  	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
> +	bool intel_snc = id && id->driver_data;
>  
>  	/* Do not match if we do not have a valid APICID for cpu: */
>  	if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
> @@ -495,32 +522,12 @@ static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  	 * means 'c' does not share the LLC of 'o'. This will be
>  	 * reflected to userspace.
>  	 */
> -	if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu))
> +	if (match_pkg(c, o) && !topology_same_node(c, o) && intel_snc)
>  		return false;
>  
>  	return topology_sane(c, o, "llc");
>  }
>  
> -/*
> - * Unlike the other levels, we do not enforce keeping a
> - * multicore group inside a NUMA node.  If this happens, we will
> - * discard the MC level of the topology later.
> - */
> -static bool match_pkg(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> -{
> -	if (c->phys_proc_id == o->phys_proc_id)
> -		return true;
> -	return false;
> -}
> -
> -static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> -{
> -	if ((c->phys_proc_id == o->phys_proc_id) &&
> -		(c->cpu_die_id == o->cpu_die_id))
> -		return true;
> -	return false;
> -}
> -
>  
>  #if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
>  static inline int x86_sched_itmt_flags(void)
> @@ -592,14 +599,23 @@ void set_cpu_sibling_map(int cpu)
>  	for_each_cpu(i, cpu_sibling_setup_mask) {
>  		o = &cpu_data(i);
>  
> +		if (match_pkg(c, o) && !topology_same_node(c, o))
> +			x86_has_numa_in_package = true;
> +
>  		if ((i == cpu) || (has_smt && match_smt(c, o)))
>  			link_mask(topology_sibling_cpumask, cpu, i);
>  
>  		if ((i == cpu) || (has_mp && match_llc(c, o)))
>  			link_mask(cpu_llc_shared_mask, cpu, i);
>  
> +		if ((i == cpu) || (has_mp && match_die(c, o)))
> +			link_mask(topology_die_cpumask, cpu, i);
>  	}
>  
> +	threads = cpumask_weight(topology_sibling_cpumask(cpu));
> +	if (threads > __max_smt_threads)
> +		__max_smt_threads = threads;
> +
>  	/*
>  	 * This needs a separate iteration over the cpus because we rely on all
>  	 * topology_sibling_cpumask links to be set-up.
> @@ -613,8 +629,7 @@ void set_cpu_sibling_map(int cpu)
>  			/*
>  			 *  Does this new cpu bringup a new core?
>  			 */
> -			if (cpumask_weight(
> -			    topology_sibling_cpumask(cpu)) == 1) {
> +			if (threads == 1) {
>  				/*
>  				 * for each core in package, increment
>  				 * the booted_cores for this new cpu
> @@ -631,16 +646,7 @@ void set_cpu_sibling_map(int cpu)
>  			} else if (i != cpu && !c->booted_cores)
>  				c->booted_cores = cpu_data(i).booted_cores;
>  		}
> -		if (match_pkg(c, o) && !topology_same_node(c, o))
> -			x86_has_numa_in_package = true;
> -
> -		if ((i == cpu) || (has_mp && match_die(c, o)))
> -			link_mask(topology_die_cpumask, cpu, i);
>  	}
> -
> -	threads = cpumask_weight(topology_sibling_cpumask(cpu));
> -	if (threads > __max_smt_threads)
> -		__max_smt_threads = threads;
>  }
>  
>  /* maps the cpu to the sched domain representing multi-core */

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-02-16 19:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 22:39 [PATCH] x86, sched: Allow NUMA nodes to share an LLC on Intel platforms Alison Schofield
2021-02-09 23:09 ` Luck, Tony
2021-02-10  8:10   ` Peter Zijlstra
2021-02-10 17:41     ` Dave Hansen
2021-02-10  8:05 ` Peter Zijlstra
2021-02-10 15:22   ` Dave Hansen
2021-02-10 19:38     ` Peter Zijlstra
2021-02-10 22:11       ` Alison Schofield
2021-02-16 11:29         ` Peter Zijlstra
2021-02-16 19:53           ` Alison Schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).