linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
@ 2018-03-22 20:49 Alison Schofield
  2018-03-22 23:20 ` Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Alison Schofield @ 2018-03-22 20:49 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar
  Cc: Dave Hansen, Luck, Tony, Tim Chen, H. Peter Anvin,
	Borislav Petkov, Peter Zijlstra, David Rientjes, Igor Mammedov,
	Prarit Bhargava, brice.goglin, x86, linux-kernel

From: Alison Schofield <alison.schofield@intel.com>

Intel's Skylake Server CPUs have a different LLC topology than previous
generations. When in Sub-NUMA-Clustering (SNC) mode, the package is
divided into two "slices", each containing half the cores, half the LLC,
and one memory controller and each slice is enumerated to Linux as a
NUMA node. This is similar to how the cores and LLC were arranged
for the Cluster-On-Die (CoD) feature.

CoD allowed the same cache line to be present in each half of the LLC.
But, with SNC, each line is only ever present in *one* slice. This
means that the portion of the LLC *available* to a CPU depends on the
data being accessed:

    Remote socket: entire package LLC is shared
    Local socket->local slice: data goes into local slice LLC
    Local socket->remote slice: data goes into remote-slice LLC. Slightly
                    		higher latency than local slice LLC.

The biggest implication from this is that a process accessing all
NUMA-local memory only sees half the LLC capacity.

The CPU describes its cache hierarchy with the CPUID instruction. One
of the CPUID leaves enumerates the "logical processors sharing this
cache". This information is used for scheduling decisions so that tasks
move more freely between CPUs sharing the cache.

But, the CPUID for the SNC configuration discussed above enumerates
the LLC as being shared by the entire package. This is not 100%
precise because the entire cache is not usable by all accesses. But,
it *is* the way the hardware enumerates itself, and this is not likely
to change.

This breaks the sane_topology() check in the smpboot.c code because
this topology is considered not-sane. To fix this, add a vendor and
model specifc check to never call topology_sane() for these systems.
Also, just like "Cluster-on-Die" we throw out the "coregroup"
sched_domain_topology_level and use NUMA information from the SRAT
alone.

This is OK at least on the hardware we are immediately concerned about
because the LLC sharing happens at both the slice and at the package
level, which are also NUMA boundaries.

This patch eliminates a warning that looks like this:

	sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.

Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Luck, Tony <tony.luck@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: brice.goglin@gmail.com
Cc: Ingo Molnar <mingo@kernel.org>
---

Changes in v2:

 * Add vendor check (Intel) where we only had a model check (Skylake_X).
   Considered the suggestion of adding a new flag here but thought that
   to be overkill for this usage.

 * Return false, instead of true, from match_llc() per reviewer suggestion.
   That also cleaned up a topology broken bug message in sched_domain().

 * Updated the comments around the match_llc() return value, continuing to
   note that the return value doesn't actually matter because we are throwing
   away coregroups for scheduling.

 * Changed submitter. I'm following up on this patch originally submitted
   by Dave Hansen


 arch/x86/kernel/smpboot.c | 52 ++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 42 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ff99e2b..cffd181 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -77,6 +77,7 @@
 #include <asm/i8259.h>
 #include <asm/misc.h>
 #include <asm/qspinlock.h>
+#include <asm/intel-family.h>
 
 /* Number of siblings per CPU package */
 int smp_num_siblings = 1;
@@ -390,15 +391,52 @@ static bool match_smt(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
+/*
+ * Set if a package/die has multiple NUMA nodes inside.
+ * AMD Magny-Cours, Intel Cluster-on-Die, and Intel
+ * Sub-NUMA Clustering have this.
+ */
+static bool x86_has_numa_in_package;
+
 static bool match_llc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 {
 	int cpu1 = c->cpu_index, cpu2 = o->cpu_index;
 
-	if (per_cpu(cpu_llc_id, cpu1) != BAD_APICID &&
-	    per_cpu(cpu_llc_id, cpu1) == per_cpu(cpu_llc_id, cpu2))
-		return topology_sane(c, o, "llc");
+	/* Do not match if we do not have a valid APICID for cpu: */
+	if (per_cpu(cpu_llc_id, cpu1) == BAD_APICID)
+		return false;
 
-	return false;
+	/* Do not match if LLC id does not match: */
+	if (per_cpu(cpu_llc_id, cpu1) != per_cpu(cpu_llc_id, cpu2))
+		return false;
+
+	/*
+	 * Some Intel CPUs enumerate an LLC that is shared by
+	 * multiple NUMA nodes.  The LLC on these systems is
+	 * shared for off-package data access but private to the
+	 * NUMA node (half of the package) for on-package access.
+	 *
+	 * CPUID can only enumerate the cache as being shared *or*
+	 * unshared, but not this particular configuration.  The
+	 * CPU in this case enumerates the cache to be shared
+	 * across the entire package (spanning both NUMA nodes).
+	 */
+	if (!topology_same_node(c, o) &&
+	    (c->x86_vendor == X86_VENDOR_INTEL &&
+	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {
+		/* Use NUMA instead of coregroups for scheduling: */
+		x86_has_numa_in_package = true;
+
+		/*
+		 * Return value doesn't actually matter because we
+		 * are throwing away coregroups for scheduling anyway.
+		 * Return false to bypass topology broken bug messages
+		 * and fixups in sched_domain().
+		 */
+		return false;
+	}
+
+	return topology_sane(c, o, "llc");
 }
 
 /*
@@ -454,12 +492,6 @@ static struct sched_domain_topology_level x86_topology[] = {
 	{ NULL, },
 };
 
-/*
- * Set if a package/die has multiple NUMA nodes inside.
- * AMD Magny-Cours and Intel Cluster-on-Die have this.
- */
-static bool x86_has_numa_in_package;
-
 void set_cpu_sibling_map(int cpu)
 {
 	bool has_smt = smp_num_siblings > 1;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-22 20:49 [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC Alison Schofield
@ 2018-03-22 23:20 ` Peter Zijlstra
  2018-03-22 23:55   ` Dave Hansen
  2018-03-22 23:30 ` Luck, Tony
  2018-03-23  0:42 ` Tim Chen
  2 siblings, 1 reply; 8+ messages in thread
From: Peter Zijlstra @ 2018-03-22 23:20 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, Luck, Tony, Tim Chen,
	H. Peter Anvin, Borislav Petkov, David Rientjes, Igor Mammedov,
	Prarit Bhargava, brice.goglin, x86, linux-kernel

On Thu, Mar 22, 2018 at 01:49:22PM -0700, Alison Schofield wrote:
> +	/*
> +	 * Some Intel CPUs enumerate an LLC that is shared by
> +	 * multiple NUMA nodes.  The LLC on these systems is
> +	 * shared for off-package data access but private to the
> +	 * NUMA node (half of the package) for on-package access.
> +	 *
> +	 * CPUID can only enumerate the cache as being shared *or*
> +	 * unshared, but not this particular configuration.  The
> +	 * CPU in this case enumerates the cache to be shared
> +	 * across the entire package (spanning both NUMA nodes).
> +	 */
> +	if (!topology_same_node(c, o) &&
> +	    (c->x86_vendor == X86_VENDOR_INTEL &&
> +	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {
> +		/* Use NUMA instead of coregroups for scheduling: */
> +		x86_has_numa_in_package = true;
> +
> +		/*
> +		 * Return value doesn't actually matter because we
> +		 * are throwing away coregroups for scheduling anyway.
> +		 * Return false to bypass topology broken bug messages
> +		 * and fixups in sched_domain().
> +		 */
> +		return false;

IIRC that return value _does_ matter because the resulting mask still
ends up user visible in sysfs.

IIRC I went over with this dhansen a week or so ago, but I cannot now
recall what we settled on as being the right return value and for what
reason.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-22 20:49 [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC Alison Schofield
  2018-03-22 23:20 ` Peter Zijlstra
@ 2018-03-22 23:30 ` Luck, Tony
  2018-03-26 22:19   ` Alison Schofield
  2018-03-23  0:42 ` Tim Chen
  2 siblings, 1 reply; 8+ messages in thread
From: Luck, Tony @ 2018-03-22 23:30 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, Luck, Tim Chen,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin, x86, linux-kernel

On Thu, Mar 22, 2018 at 01:49:22PM -0700, Alison Schofield wrote:
> +	if (!topology_same_node(c, o) &&
> +	    (c->x86_vendor == X86_VENDOR_INTEL &&
> +	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {

Maybe make life easier in the future to add more models
to the list by using x86_match_cpu() here?

-Tony

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-22 23:20 ` Peter Zijlstra
@ 2018-03-22 23:55   ` Dave Hansen
  0 siblings, 0 replies; 8+ messages in thread
From: Dave Hansen @ 2018-03-22 23:55 UTC (permalink / raw)
  To: Peter Zijlstra, Alison Schofield
  Cc: Thomas Gleixner, Ingo Molnar, Luck, Tony, Tim Chen,
	H. Peter Anvin, Borislav Petkov, David Rientjes, Igor Mammedov,
	Prarit Bhargava, brice.goglin, x86, linux-kernel

On 03/22/2018 04:20 PM, Peter Zijlstra wrote:
>> +		/*
>> +		 * Return value doesn't actually matter because we
>> +		 * are throwing away coregroups for scheduling anyway.
>> +		 * Return false to bypass topology broken bug messages
>> +		 * and fixups in sched_domain().
>> +		 */
>> +		return false;
> IIRC that return value _does_ matter because the resulting mask still
> ends up user visible in sysfs.
> 
> IIRC I went over with this dhansen a week or so ago, but I cannot now
> recall what we settled on as being the right return value and for what
> reason.

IIRC, a 'return true' true keeps the topology information and a 'return
false' throws it away.  We chose 'false' here because without it some
other warnings showed up in another part of the code:

BUG: arch topology borken the MC domain not a subset of the NODE domain

I think the right thing to do is just remove the first sentence of the
comment.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-22 20:49 [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC Alison Schofield
  2018-03-22 23:20 ` Peter Zijlstra
  2018-03-22 23:30 ` Luck, Tony
@ 2018-03-23  0:42 ` Tim Chen
  2018-03-24  0:17   ` Alison Schofield
  2 siblings, 1 reply; 8+ messages in thread
From: Tim Chen @ 2018-03-23  0:42 UTC (permalink / raw)
  To: Alison Schofield, Thomas Gleixner, Ingo Molnar
  Cc: Dave Hansen, Luck, Tony, H. Peter Anvin, Borislav Petkov,
	Peter Zijlstra, David Rientjes, Igor Mammedov, Prarit Bhargava,
	brice.goglin, x86, linux-kernel

On 03/22/2018 01:49 PM, Alison Schofield wrote:
>
> +	 */
> +	if (!topology_same_node(c, o) &&
> +	    (c->x86_vendor == X86_VENDOR_INTEL &&
> +	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {
> +		/* Use NUMA instead of coregroups for scheduling: */
> +		x86_has_numa_in_package = true;

x86_has_numa_in_package will only be set true for SKYLAKE in the above? 

This boolean probably should be set for (!topology_same_node(c, o) && match_die(c, o)) and not
dependent on cpu family.  Only the return value should depend on cpu family.

Tim


> +
> +		/*
> +		 * Return value doesn't actually matter because we
> +		 * are throwing away coregroups for scheduling anyway.
> +		 * Return false to bypass topology broken bug messages
> +		 * and fixups in sched_domain().
> +		 */
> +		return false;
> +	}
> +
> +	return topology_sane(c, o, "llc");
>  }
>  
>  /*
> @@ -454,12 +492,6 @@ static struct sched_domain_topology_level x86_topology[] = {
>  	{ NULL, },
>  };
>  
> -/*
> - * Set if a package/die has multiple NUMA nodes inside.
> - * AMD Magny-Cours and Intel Cluster-on-Die have this.
> - */
> -static bool x86_has_numa_in_package;
> -
>  void set_cpu_sibling_map(int cpu)
>  {
>  	bool has_smt = smp_num_siblings > 1;
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-23  0:42 ` Tim Chen
@ 2018-03-24  0:17   ` Alison Schofield
  0 siblings, 0 replies; 8+ messages in thread
From: Alison Schofield @ 2018-03-24  0:17 UTC (permalink / raw)
  To: Tim Chen
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, Tony Luck,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin, x86, linux-kernel

On Thu, Mar 22, 2018 at 05:42:41PM -0700, Tim Chen wrote:
> On 03/22/2018 01:49 PM, Alison Schofield wrote:
> >
> > +	 */
> > +	if (!topology_same_node(c, o) &&
> > +	    (c->x86_vendor == X86_VENDOR_INTEL &&
> > +	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {
> > +		/* Use NUMA instead of coregroups for scheduling: */
> > +		x86_has_numa_in_package = true;
> 
> x86_has_numa_in_package will only be set true for SKYLAKE in the above? 
> 
> This boolean probably should be set for (!topology_same_node(c, o) && match_die(c, o)) and not
> dependent on cpu family.  Only the return value should depend on cpu family.
> 
> Tim

Tim,

x86_has_numa_in_package is being set in set_cpu_sibling_map()  with the
same criteria you describe: (!topology_same_node(c, o) &&  match_die(c,
o))

Skylake in SNC mode takes that path and it gets set correctly, so I'm
thinking and seeing that the setting of it in match_llc() is redundant.

The intent of the patch is to skip the topology_sane() check to avoid
the warning message it spits out at boot time. 

This has me wondering if, aside from it being redundant, if it may be
incorrect? Should we set that boolean based on vendor/model alone in
match_llc().  I guess I need to understand what happens w a Skylake that
doesn't have SNC turned on. I can try that configuration next.

Thanks for reviewing!
alisons
> 
> 
> > +
> > +		/*
> > +		 * Return value doesn't actually matter because we
> > +		 * are throwing away coregroups for scheduling anyway.
> > +		 * Return false to bypass topology broken bug messages
> > +		 * and fixups in sched_domain().
> > +		 */
> > +		return false;
> > +	}
> > +
> > +	return topology_sane(c, o, "llc");
> >  }
> >  
> >  /*
> > @@ -454,12 +492,6 @@ static struct sched_domain_topology_level x86_topology[] = {
> >  	{ NULL, },
> >  };
> >  
> > -/*
> > - * Set if a package/die has multiple NUMA nodes inside.
> > - * AMD Magny-Cours and Intel Cluster-on-Die have this.
> > - */
> > -static bool x86_has_numa_in_package;
> > -
> >  void set_cpu_sibling_map(int cpu)
> >  {
> >  	bool has_smt = smp_num_siblings > 1;
> > 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-22 23:30 ` Luck, Tony
@ 2018-03-26 22:19   ` Alison Schofield
  2018-03-26 22:45     ` Luck, Tony
  0 siblings, 1 reply; 8+ messages in thread
From: Alison Schofield @ 2018-03-26 22:19 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, Luck, Tim Chen,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin, x86, linux-kernel

On Thu, Mar 22, 2018 at 04:30:29PM -0700, Luck, Tony wrote:
> On Thu, Mar 22, 2018 at 01:49:22PM -0700, Alison Schofield wrote:
> > +	if (!topology_same_node(c, o) &&
> > +	    (c->x86_vendor == X86_VENDOR_INTEL &&
> > +	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {
> 
> Maybe make life easier in the future to add more models
> to the list by using x86_match_cpu() here?
> 
> -Tony

Tony - 
 Am I on the right track below?

Define like this:
static const __initconst struct x86_cpu_id snc_cpu[] = {
	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X },
	{}
};

Use like this:
if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) {

Thanks,
alisons

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC
  2018-03-26 22:19   ` Alison Schofield
@ 2018-03-26 22:45     ` Luck, Tony
  0 siblings, 0 replies; 8+ messages in thread
From: Luck, Tony @ 2018-03-26 22:45 UTC (permalink / raw)
  To: Alison Schofield
  Cc: Thomas Gleixner, Ingo Molnar, Dave Hansen, Luck, Tim Chen,
	H. Peter Anvin, Borislav Petkov, Peter Zijlstra, David Rientjes,
	Igor Mammedov, Prarit Bhargava, brice.goglin, x86, linux-kernel

On Mon, Mar 26, 2018 at 03:19:48PM -0700, Alison Schofield wrote:
> On Thu, Mar 22, 2018 at 04:30:29PM -0700, Luck, Tony wrote:
> > On Thu, Mar 22, 2018 at 01:49:22PM -0700, Alison Schofield wrote:
> > > +	if (!topology_same_node(c, o) &&
> > > +	    (c->x86_vendor == X86_VENDOR_INTEL &&
> > > +	     c->x86_model == INTEL_FAM6_SKYLAKE_X)) {
> > 
> > Maybe make life easier in the future to add more models
> > to the list by using x86_match_cpu() here?
> > 
> > -Tony
> 
> Tony - 
>  Am I on the right track below?
> 
> Define like this:
> static const __initconst struct x86_cpu_id snc_cpu[] = {
> 	{ X86_VENDOR_INTEL, 6, INTEL_FAM6_SKYLAKE_X },
> 	{}
> };
> 
> Use like this:
> if (!topology_same_node(c, o) && x86_match_cpu(snc_cpu)) {

Alison,

Exactly right.  You can decide how much of the comment
that was before the "if (!topology_same_node(c, o) ..."
can be moved to before the definition of snc_cpu[]. I'm
too lazy to go back to the original patch to read it, but
I suspect most/all of it would be better descibing the
data structure than cluttering up the code that uses it.

-Tony

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-03-26 22:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-22 20:49 [PATCH v2] x86,sched: allow topologies where NUMA nodes share an LLC Alison Schofield
2018-03-22 23:20 ` Peter Zijlstra
2018-03-22 23:55   ` Dave Hansen
2018-03-22 23:30 ` Luck, Tony
2018-03-26 22:19   ` Alison Schofield
2018-03-26 22:45     ` Luck, Tony
2018-03-23  0:42 ` Tim Chen
2018-03-24  0:17   ` Alison Schofield

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).