linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/7] sched: Instrument sched domain flags
@ 2020-07-01 19:06 Valentin Schneider
  2020-07-01 19:06 ` [PATCH v3 1/7] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, morten.rasmussen

Hi,

I've repeatedly stared at an SD flag and asked myself "how should that be
set up in the domain hierarchy anyway?". I figured that if we formalize our
flags zoology a bit, we could also do some runtime assertions on them -
this is what this series is all about.

Patches
=======

The idea is to associate the flags with metaflags that describes how they
should be set in a sched domain hierarchy ("if this SD has it, all its {parents,
children} have it") or how they behave wrt degeneration - details are in the
comments and commit logs. 

The good thing is that the debugging bits go away when CONFIG_SCHED_DEBUG isn't
set. The bad thing is that this replaces SD_* flags definitions with some
unsavoury macros. This is mainly because I wanted to avoid having to duplicate
work between declaring the flags and declaring their metaflags.

o Patches 1-2 set up the SD flag definition groundwork
o Patch 3 enables the hierarchy debugging
o Patch 4 is a derelict SD flag removal
o Patches 5-7 tweak the SD degeneration to leverage metaflags as well (suggested
  by Peter).

  Note that this builds a compile-time mask using the single most horrendous
  macro I have ever presented to the public eye. It might end up being much
  better to explicitely build the mask in topology.h, but to be honest I felt
  like I had to show that monster around out of morbid fascination.

Revisions
=========

v2 -> v3
--------

o Reworded comment for SD_OVERLAP (it's about the groups, not the domains)

o Added more flags to the SD degeneration mask
o Added generation of an SD flag mask for the degeneration functions (Peter)

RFC -> v2
---------

o Rebased on top of tip/sched/core
o Aligned wording of comments between flags
o Rectified some flag descriptions (Morten)
o Added removal of SD_SHARE_POWERDOMAIN (Morten)

Valentin Schneider (7):
  sched/topology: Split out SD_* flags declaration to its own file
  sched/topology: Define and assign sched_domain flag metadata
  sched/topology: Verify SD_* flags setup when sched_debug is on
  arm, sched/topology: Remove SD_SHARE_POWERDOMAIN
  sched/topology: Add more flags to the SD degeneration mask
  sched/topology: Introduce SD metaflag for flags needing > 1 groups
  sched/topology: Use prebuilt SD flag degeneration mask

 arch/arm/kernel/topology.c     |   2 +-
 include/linux/sched/sd_flags.h | 153 +++++++++++++++++++++++++++++++++
 include/linux/sched/topology.h |  36 +++++---
 kernel/sched/topology.c        |  40 +++++----
 4 files changed, 197 insertions(+), 34 deletions(-)
 create mode 100644 include/linux/sched/sd_flags.h

--
2.27.0


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v3 1/7] sched/topology: Split out SD_* flags declaration to its own file
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-01 19:06 ` [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, morten.rasmussen

To associate the SD flags with some metadata, we need some more structure
in the way they are declared.

Rather than shove that in a free-standing macro list, move the declaration
in a separate file that can be re-imported with different SD_FLAG
definitions, inspired by what we do with syscalls and unistd.h.

No change in functionality.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/sd_flags.h | 33 +++++++++++++++++++++++++++++++++
 include/linux/sched/topology.h | 18 ++++--------------
 2 files changed, 37 insertions(+), 14 deletions(-)
 create mode 100644 include/linux/sched/sd_flags.h

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
new file mode 100644
index 000000000000..685bbe736945
--- /dev/null
+++ b/include/linux/sched/sd_flags.h
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * sched-domains (multiprocessor balancing) flag declarations.
+ */
+
+/* Balance when about to become idle */
+SD_FLAG(SD_BALANCE_NEWIDLE,     0)
+/* Balance on exec */
+SD_FLAG(SD_BALANCE_EXEC,        1)
+/* Balance on fork, clone */
+SD_FLAG(SD_BALANCE_FORK,        2)
+/* Balance on wakeup */
+SD_FLAG(SD_BALANCE_WAKE,        3)
+/* Wake task to waking CPU */
+SD_FLAG(SD_WAKE_AFFINE,         4)
+/* Domain members have different CPU capacities */
+SD_FLAG(SD_ASYM_CPUCAPACITY,    5)
+/* Domain members share CPU capacity */
+SD_FLAG(SD_SHARE_CPUCAPACITY,   6)
+/* Domain members share power domain */
+SD_FLAG(SD_SHARE_POWERDOMAIN,   7)
+/* Domain members share CPU pkg resources */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 8)
+/* Only a single load balancing instance */
+SD_FLAG(SD_SERIALIZE,           9)
+/* Place busy groups earlier in the domain */
+SD_FLAG(SD_ASYM_PACKING,        10)
+/* Prefer to place tasks in a sibling domain */
+SD_FLAG(SD_PREFER_SIBLING,      11)
+/* sched_domains of this level overlap */
+SD_FLAG(SD_OVERLAP,             12)
+/* cross-node balancing */
+SD_FLAG(SD_NUMA,                13)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index fb11091129b3..f274eca282de 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,20 +11,10 @@
  */
 #ifdef CONFIG_SMP
 
-#define SD_BALANCE_NEWIDLE	0x0001	/* Balance when about to become idle */
-#define SD_BALANCE_EXEC		0x0002	/* Balance on exec */
-#define SD_BALANCE_FORK		0x0004	/* Balance on fork, clone */
-#define SD_BALANCE_WAKE		0x0008  /* Balance on wakeup */
-#define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
-#define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
-#define SD_SHARE_POWERDOMAIN	0x0080	/* Domain members share power domain */
-#define SD_SHARE_PKG_RESOURCES	0x0100	/* Domain members share CPU pkg resources */
-#define SD_SERIALIZE		0x0200	/* Only a single load balancing instance */
-#define SD_ASYM_PACKING		0x0400  /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING	0x0800	/* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP		0x1000	/* sched_domains of this level overlap */
-#define SD_NUMA			0x2000	/* cross-node balancing */
+/* Generate SD_FOO = VALUE */
+#define SD_FLAG(name, idx) static const unsigned int name = BIT(idx);
+#include <linux/sched/sd_flags.h>
+#undef SD_FLAG
 
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
  2020-07-01 19:06 ` [PATCH v3 1/7] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-02 12:15   ` Quentin Perret
  2020-07-01 19:06 ` [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, morten.rasmussen

There are some expectations regarding how sched domain flags should be laid
out, but none of them are checked or asserted in
sched_domain_debug_one(). After staring at said flags for a while, I've
come to realize they all (except *one*) fall in either of two categories:

- Shared with children: those flags are set from the base CPU domain
  upwards. Any domain that has it set will have it set in its children. It
  hints at "some property holds true / some behaviour is enabled until this
  level".

- Shared with parents: those flags are set from the topmost domain
  downwards. Any domain that has it set will have it set in its parents. It
  hints at "some property isn't visible / some behaviour is disabled until
  this level".

The odd one out is SD_PREFER_SIBLING, which is cleared below levels with
SD_ASYM_CPUCAPACITY. The change was introduced by commit

  9c63e84db29b ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains")

as it could break misfit migration on some systems. In light of this, we
might want to change it back to make it fit one of the two categories and
fix the issue another way.

Tweak the sched_domain flag declaration to assign each flag an expected
layout, and include the rationale for each flag "meta type" assignment as a
comment. Consolidate the flag metadata into an array; the index of a flag's
metadata can easily be found with log2(flag), IOW __ffs(flag).

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/sd_flags.h | 164 +++++++++++++++++++++++++++------
 include/linux/sched/topology.h |  13 ++-
 2 files changed, 148 insertions(+), 29 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 685bbe736945..b5a11df0afe4 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -3,31 +3,139 @@
  * sched-domains (multiprocessor balancing) flag declarations.
  */
 
-/* Balance when about to become idle */
-SD_FLAG(SD_BALANCE_NEWIDLE,     0)
-/* Balance on exec */
-SD_FLAG(SD_BALANCE_EXEC,        1)
-/* Balance on fork, clone */
-SD_FLAG(SD_BALANCE_FORK,        2)
-/* Balance on wakeup */
-SD_FLAG(SD_BALANCE_WAKE,        3)
-/* Wake task to waking CPU */
-SD_FLAG(SD_WAKE_AFFINE,         4)
-/* Domain members have different CPU capacities */
-SD_FLAG(SD_ASYM_CPUCAPACITY,    5)
-/* Domain members share CPU capacity */
-SD_FLAG(SD_SHARE_CPUCAPACITY,   6)
-/* Domain members share power domain */
-SD_FLAG(SD_SHARE_POWERDOMAIN,   7)
-/* Domain members share CPU pkg resources */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 8)
-/* Only a single load balancing instance */
-SD_FLAG(SD_SERIALIZE,           9)
-/* Place busy groups earlier in the domain */
-SD_FLAG(SD_ASYM_PACKING,        10)
-/* Prefer to place tasks in a sibling domain */
-SD_FLAG(SD_PREFER_SIBLING,      11)
-/* sched_domains of this level overlap */
-SD_FLAG(SD_OVERLAP,             12)
-/* cross-node balancing */
-SD_FLAG(SD_NUMA,                13)
+#ifndef SD_FLAG
+#define SD_FLAG(x, y, z)
+#endif
+
+/*
+ * Expected flag uses
+ *
+ * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
+ * If a domain has this flag set, all of its children should have it set. This
+ * is usually because the flag describes some shared resource (all CPUs in that
+ * domain share the same foobar), or because they are tied to a scheduling
+ * behaviour that we want to disable at some point in the hierarchy for
+ * scalability reasons.
+ *
+ * In those cases it doesn't make sense to have the flag set for a domain but
+ * not have it in (some of) its children: sched domains ALWAYS span their child
+ * domains, so operations done with parent domains will cover CPUs in the lower
+ * child domains.
+ *
+ *
+ * SHARED_PARENT: These flags are meant to be set from the highest domain
+ * downwards. If a domain has this flag set, all of its parents should have it
+ * set. This is usually for topology properties that start to appear above a
+ * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
+ * socket).
+ */
+#define SDF_SHARED_CHILD 0x1
+#define SDF_SHARED_PARENT 0x2
+
+/*
+ * Balance when about to become idle
+ *
+ * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD)
+
+/*
+ * Balance on exec
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD)
+
+/*
+ * Balance on fork, clone
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD)
+
+/*
+ * Balance on wakeup
+ *
+ * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD)
+
+/*
+ * Consider waking task on waking CPU.
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_WAKE_AFFINE,         4, SDF_SHARED_CHILD)
+
+/*
+ * Domain members have different CPU capacities
+ *
+ * SHARED_PARENT: Set from the topmost domain down to the first domain where
+ * asymmetry is detected.
+ */
+SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)
+
+/*
+ * Domain members share CPU capacity (i.e. SMT)
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * CPU capacity.
+ */
+SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
+
+/*
+ * Domain members share power domain
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * the same power domain.
+ */
+SD_FLAG(SD_SHARE_POWERDOMAIN,   7, SDF_SHARED_CHILD)
+
+/*
+ * Domain members share CPU package resources (i.e. caches)
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * the same cache(s).
+ */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 8, SDF_SHARED_CHILD)
+
+/*
+ * Only a single load balancing instance
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE. Could be set from a
+ * different level upwards, but it doesn't change that if a domain has this flag
+ * set, then all of its parents need to have it too (otherwise the serialization
+ * doesn't make sense).
+ */
+SD_FLAG(SD_SERIALIZE,           9, SDF_SHARED_PARENT)
+
+/*
+ * Place busy tasks earlier in the domain
+ *
+ * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
+ * up, but currently assumed to be set from the base domain upwards (see
+ * update_top_cache_domain()).
+ */
+SD_FLAG(SD_ASYM_PACKING,        10, SDF_SHARED_CHILD)
+
+/*
+ * Prefer to place tasks in a sibling domain
+ *
+ * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
+ * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ */
+SD_FLAG(SD_PREFER_SIBLING,      11, 0)
+
+/*
+ * sched_groups of this level overlap
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_OVERLAP,             12, SDF_SHARED_PARENT)
+
+/*
+ * cross-node balancing
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_NUMA,                13, SDF_SHARED_PARENT)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index f274eca282de..fce64dae09af 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -12,10 +12,21 @@
 #ifdef CONFIG_SMP
 
 /* Generate SD_FOO = VALUE */
-#define SD_FLAG(name, idx) static const unsigned int name = BIT(idx);
+#define SD_FLAG(name, idx, mflags) static const unsigned int name = BIT(idx);
 #include <linux/sched/sd_flags.h>
 #undef SD_FLAG
 
+#ifdef CONFIG_SCHED_DEBUG
+#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
+static const struct {
+	unsigned int meta_flags;
+	char *name;
+} sd_flag_debug[] = {
+#include <linux/sched/sd_flags.h>
+};
+#undef SD_FLAG
+#endif
+
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
  2020-07-01 19:06 ` [PATCH v3 1/7] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
  2020-07-01 19:06 ` [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-02 14:20   ` Peter Zijlstra
  2020-07-01 19:06 ` [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, morten.rasmussen

Now that we have some description of what we expect the flags layout to
be, we can use that to assert at runtime that the actual layout is sane.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9079d865a935..b3f891db16a7 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -29,6 +29,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 				  struct cpumask *groupmask)
 {
 	struct sched_group *group = sd->groups;
+	int flags = sd->flags;
 
 	cpumask_clear(groupmask);
 
@@ -43,6 +44,22 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 		printk(KERN_ERR "ERROR: domain->groups does not contain CPU%d\n", cpu);
 	}
 
+	for (; flags; flags &= flags - 1) {
+		unsigned int idx = __ffs(flags);
+		unsigned int flag = BIT(idx);
+		unsigned int meta_flags = sd_flag_debug[idx].meta_flags;
+
+		if ((meta_flags & SDF_SHARED_CHILD) && sd->child &&
+		    !(sd->child->flags & flag))
+			printk(KERN_ERR "ERROR: flag %s set here but not in child\n",
+			       sd_flag_debug[idx].name);
+
+		if ((meta_flags & SDF_SHARED_PARENT) && sd->parent &&
+		    !(sd->parent->flags & flag))
+			printk(KERN_ERR "ERROR: flag %s set here but not in parent\n",
+			       sd_flag_debug[idx].name);
+	}
+
 	printk(KERN_DEBUG "%*s groups:", level + 1, "");
 	do {
 		if (!group) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
                   ` (2 preceding siblings ...)
  2020-07-01 19:06 ` [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-02 16:44   ` Dietmar Eggemann
  2020-07-01 19:06 ` [PATCH v3 5/7] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Morten Rasmussen, mingo, peterz, vincent.guittot, dietmar.eggemann

This flag was introduced in 2014 by commit

  d77b3ed5c9f8 ("sched: Add a new SD_SHARE_POWERDOMAIN for sched_domain")

but AFAIA it was never leveraged by the scheduler. The closest thing I can
think of is EAS caring about frequency domains, and it does that by
leveraging performance domains.

Remove the flag.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 arch/arm/kernel/topology.c     |  2 +-
 include/linux/sched/sd_flags.h | 20 ++++++--------------
 kernel/sched/topology.c        | 10 +++-------
 3 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index b5adaf744630..353f3ee660e4 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -243,7 +243,7 @@ void store_cpu_topology(unsigned int cpuid)
 
 static inline int cpu_corepower_flags(void)
 {
-	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
+	return SD_SHARE_PKG_RESOURCES;
 }
 
 static struct sched_domain_topology_level arm_topology[] = {
diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index b5a11df0afe4..c0003b252d48 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -83,21 +83,13 @@ SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)
  */
 SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
 
-/*
- * Domain members share power domain
- *
- * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * the same power domain.
- */
-SD_FLAG(SD_SHARE_POWERDOMAIN,   7, SDF_SHARED_CHILD)
-
 /*
  * Domain members share CPU package resources (i.e. caches)
  *
  * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
  * the same cache(s).
  */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 8, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)
 
 /*
  * Only a single load balancing instance
@@ -107,7 +99,7 @@ SD_FLAG(SD_SHARE_PKG_RESOURCES, 8, SDF_SHARED_CHILD)
  * set, then all of its parents need to have it too (otherwise the serialization
  * doesn't make sense).
  */
-SD_FLAG(SD_SERIALIZE,           9, SDF_SHARED_PARENT)
+SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
 
 /*
  * Place busy tasks earlier in the domain
@@ -116,7 +108,7 @@ SD_FLAG(SD_SERIALIZE,           9, SDF_SHARED_PARENT)
  * up, but currently assumed to be set from the base domain upwards (see
  * update_top_cache_domain()).
  */
-SD_FLAG(SD_ASYM_PACKING,        10, SDF_SHARED_CHILD)
+SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD)
 
 /*
  * Prefer to place tasks in a sibling domain
@@ -124,18 +116,18 @@ SD_FLAG(SD_ASYM_PACKING,        10, SDF_SHARED_CHILD)
  * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
  * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
  */
-SD_FLAG(SD_PREFER_SIBLING,      11, 0)
+SD_FLAG(SD_PREFER_SIBLING,      10, 0)
 
 /*
  * sched_groups of this level overlap
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE.
  */
-SD_FLAG(SD_OVERLAP,             12, SDF_SHARED_PARENT)
+SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT)
 
 /*
  * cross-node balancing
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE.
  */
-SD_FLAG(SD_NUMA,                13, SDF_SHARED_PARENT)
+SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b3f891db16a7..6047d491abe9 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -165,8 +165,7 @@ static int sd_degenerate(struct sched_domain *sd)
 			 SD_BALANCE_EXEC |
 			 SD_SHARE_CPUCAPACITY |
 			 SD_ASYM_CPUCAPACITY |
-			 SD_SHARE_PKG_RESOURCES |
-			 SD_SHARE_POWERDOMAIN)) {
+			 SD_SHARE_PKG_RESOURCES)) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
@@ -197,8 +196,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 			    SD_ASYM_CPUCAPACITY |
 			    SD_SHARE_CPUCAPACITY |
 			    SD_SHARE_PKG_RESOURCES |
-			    SD_PREFER_SIBLING |
-			    SD_SHARE_POWERDOMAIN);
+			    SD_PREFER_SIBLING);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}
@@ -1309,7 +1307,6 @@ int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;
  *   SD_SHARE_CPUCAPACITY   - describes SMT topologies
  *   SD_SHARE_PKG_RESOURCES - describes shared caches
  *   SD_NUMA                - describes NUMA topologies
- *   SD_SHARE_POWERDOMAIN   - describes shared power domain
  *
  * Odd one out, which beside describing the topology has a quirk also
  * prescribes the desired behaviour that goes along with it:
@@ -1320,8 +1317,7 @@ int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;
 	(SD_SHARE_CPUCAPACITY	|	\
 	 SD_SHARE_PKG_RESOURCES |	\
 	 SD_NUMA		|	\
-	 SD_ASYM_PACKING	|	\
-	 SD_SHARE_POWERDOMAIN)
+	 SD_ASYM_PACKING)
 
 static struct sched_domain *
 sd_init(struct sched_domain_topology_level *tl,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 5/7] sched/topology: Add more flags to the SD degeneration mask
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
                   ` (3 preceding siblings ...)
  2020-07-01 19:06 ` [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-02 18:28   ` Dietmar Eggemann
  2020-07-01 19:06 ` [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
  2020-07-01 19:06 ` [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
  6 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann, morten.rasmussen

I don't think it is going to change much in practice, but we were missing
those:

o SD_BALANCE_WAKE: Used just like the other SD_BALANCE_* flags, so also
  needs > 1 group.
o SD_ASYM_PACKING: Hinges on load balancing (periodic / wakeup), thus needs
  > 1 group to happen
o SD_OVERLAP: Describes domains with overlapping groups; can't have
  overlaps with a single group.

SD_PREFER_SIBLING is as always the odd one out: we currently consider it
in sd_parent_degenerate() but not in sd_degenerate(). It too hinges on load
balancing, and thus won't have any effect when set on a domain with a
single group. Add it to the sd_degenerate() groups mask.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6047d491abe9..fe393b295444 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -161,11 +161,15 @@ static int sd_degenerate(struct sched_domain *sd)
 
 	/* Following flags need at least 2 groups */
 	if (sd->flags & (SD_BALANCE_NEWIDLE |
+			 SD_BALANCE_WAKE |
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
+			 SD_ASYM_PACKING |
 			 SD_SHARE_CPUCAPACITY |
 			 SD_ASYM_CPUCAPACITY |
-			 SD_SHARE_PKG_RESOURCES)) {
+			 SD_SHARE_PKG_RESOURCES |
+			 SD_OVERLAP |
+			 SD_PREFER_SIBLING)) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
@@ -191,11 +195,14 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 	/* Flags needing groups don't count if only 1 group in parent */
 	if (parent->groups == parent->groups->next) {
 		pflags &= ~(SD_BALANCE_NEWIDLE |
+			    SD_BALANCE_WAKE |
 			    SD_BALANCE_FORK |
 			    SD_BALANCE_EXEC |
+			    SD_ASYM_PACKING |
 			    SD_ASYM_CPUCAPACITY |
 			    SD_SHARE_CPUCAPACITY |
 			    SD_SHARE_PKG_RESOURCES |
+			    SD_OVERLAP |
 			    SD_PREFER_SIBLING);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
                   ` (4 preceding siblings ...)
  2020-07-01 19:06 ` [PATCH v3 5/7] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-02 18:29   ` Dietmar Eggemann
  2020-07-13 12:39   ` Peter Zijlstra
  2020-07-01 19:06 ` [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
  6 siblings, 2 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen

In preparation of cleaning up the sd_degenerate*() functions, mark flags
that are only relevant when their associated domain has more than a single
group. With this, build a compile-time mask of those SD flags.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/sd_flags.h | 60 ++++++++++++++++++++++------------
 include/linux/sched/topology.h |  7 ++++
 2 files changed, 47 insertions(+), 20 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index c0003b252d48..408832193a94 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -8,7 +8,7 @@
 #endif
 
 /*
- * Expected flag uses
+ * Hierarchical metaflags
  *
  * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
  * If a domain has this flag set, all of its children should have it set. This
@@ -29,36 +29,50 @@
  * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
  * socket).
  */
-#define SDF_SHARED_CHILD 0x1
-#define SDF_SHARED_PARENT 0x2
+#define SDF_SHARED_CHILD       0x1
+#define SDF_SHARED_PARENT      0x2
+
+/*
+ * Behavioural metaflags
+ *
+ * NEEDS_GROUPS: These flags are only relevant if the domain they are set on has
+ * more than one group. This is usually for balancing flags (load balancing
+ * involves equalizing a metric between groups), or for flags describing some
+ * shared resource (which would be shared between groups).
+ */
+#define SDF_NEEDS_GROUPS       0x4
 
 /*
  * Balance when about to become idle
  *
  * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Balance on exec
  *
  * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Balance on fork, clone
  *
  * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Balance on wakeup
  *
  * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Consider waking task on waking CPU.
@@ -71,33 +85,36 @@ SD_FLAG(SD_WAKE_AFFINE,         4, SDF_SHARED_CHILD)
  * Domain members have different CPU capacities
  *
  * SHARED_PARENT: Set from the topmost domain down to the first domain where
- * asymmetry is detected.
+ *                asymmetry is detected.
+ * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
  */
-SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)
+SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * Domain members share CPU capacity (i.e. SMT)
  *
  * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * CPU capacity.
+ *               CPU capacity.
+ * NEEDS_GROUPS: Capacity is shared between groups.
  */
-SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Domain members share CPU package resources (i.e. caches)
  *
  * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * the same cache(s).
+ *               the same cache(s).
+ * NEEDS_GROUPS: Caches are shared between groups.
  */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Only a single load balancing instance
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE. Could be set from a
- * different level upwards, but it doesn't change that if a domain has this flag
- * set, then all of its parents need to have it too (otherwise the serialization
- * doesn't make sense).
+ *                different level upwards, but it doesn't change that if a
+ *                domain has this flag set, then all of its parents need to have
+ *                it too (otherwise the serialization doesn't make sense).
  */
 SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
 
@@ -105,16 +122,18 @@ SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
  * Place busy tasks earlier in the domain
  *
  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
- * up, but currently assumed to be set from the base domain upwards (see
- * update_top_cache_domain()).
+ *               up, but currently assumed to be set from the base domain
+ *               upwards (see update_top_cache_domain()).
  */
-SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD)
+SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Prefer to place tasks in a sibling domain
  *
  * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
  * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ *
+ * NEEDS_GROUPS: Load balancing flag.
  */
 SD_FLAG(SD_PREFER_SIBLING,      10, 0)
 
@@ -122,8 +141,9 @@ SD_FLAG(SD_PREFER_SIBLING,      10, 0)
  * sched_groups of this level overlap
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE.
+ * NEEDS_GROUPS: Overlaps can only exist with more than one group.
  */
-SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT)
+SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * cross-node balancing
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index fce64dae09af..a6761d5d66dd 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -16,6 +16,13 @@
 #include <linux/sched/sd_flags.h>
 #undef SD_FLAG
 
+/* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
+#define SD_FLAG(name, idx, mflags) (BIT(idx) * (((mflags) & SDF_NEEDS_GROUPS) / SDF_NEEDS_GROUPS)) |
+static const int SD_DEGENERATE_GROUPS_MASK =
+#include<linux/sched/sd_flags.h>
+0;
+#undef SD_FLAG
+
 #ifdef CONFIG_SCHED_DEBUG
 #define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
 static const struct {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask
  2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
                   ` (5 preceding siblings ...)
  2020-07-01 19:06 ` [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
@ 2020-07-01 19:06 ` Valentin Schneider
  2020-07-13 12:55   ` Peter Zijlstra
  6 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-01 19:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen

Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
sd_degenerate_parent().

Note that this changes sd_degenerate() somewhat: I'm using the negation of
SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
is equivalent to:

SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA

whereas the current mask for that is simply

SD_WAKE_AFFINE

I played with a few toy NUMA topologies on QEMU and couldn't cause a
different degeneration than what mainline does currently. If that is deemed
too risky, we can go back to using SD_WAKE_AFFINE explicitly.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 24 +++---------------------
 1 file changed, 3 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index fe393b295444..a135a0c99b16 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -160,22 +160,13 @@ static int sd_degenerate(struct sched_domain *sd)
 		return 1;
 
 	/* Following flags need at least 2 groups */
-	if (sd->flags & (SD_BALANCE_NEWIDLE |
-			 SD_BALANCE_WAKE |
-			 SD_BALANCE_FORK |
-			 SD_BALANCE_EXEC |
-			 SD_ASYM_PACKING |
-			 SD_SHARE_CPUCAPACITY |
-			 SD_ASYM_CPUCAPACITY |
-			 SD_SHARE_PKG_RESOURCES |
-			 SD_OVERLAP |
-			 SD_PREFER_SIBLING)) {
+	if (sd->flags & SD_DEGENERATE_GROUPS_MASK) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
 
 	/* Following flags don't use groups */
-	if (sd->flags & (SD_WAKE_AFFINE))
+	if (sd->flags & ~SD_DEGENERATE_GROUPS_MASK)
 		return 0;
 
 	return 1;
@@ -194,16 +185,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 
 	/* Flags needing groups don't count if only 1 group in parent */
 	if (parent->groups == parent->groups->next) {
-		pflags &= ~(SD_BALANCE_NEWIDLE |
-			    SD_BALANCE_WAKE |
-			    SD_BALANCE_FORK |
-			    SD_BALANCE_EXEC |
-			    SD_ASYM_PACKING |
-			    SD_ASYM_CPUCAPACITY |
-			    SD_SHARE_CPUCAPACITY |
-			    SD_SHARE_PKG_RESOURCES |
-			    SD_OVERLAP |
-			    SD_PREFER_SIBLING);
+		pflags &= ~SD_DEGENERATE_GROUPS_MASK;
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-01 19:06 ` [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
@ 2020-07-02 12:15   ` Quentin Perret
  2020-07-02 14:31     ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Quentin Perret @ 2020-07-02 12:15 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen

Hey Valentin,

On Wednesday 01 Jul 2020 at 20:06:50 (+0100), Valentin Schneider wrote:
> +/*
> + * Domain members have different CPU capacities
> + *
> + * SHARED_PARENT: Set from the topmost domain down to the first domain where
> + * asymmetry is detected.
> + */
> +SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)

Probably not a huge deal, but I don't think this is _exactly_ how
SD_ASYM_CPUCAPACITY was defined originally, nor how the topology
detection code deals with it (IIRC).

That is, IIRC Morten defined SD_ASYM_CPUCAPACITY as the _lowest_ domain
at which all CPU capacities are visible. On all real systems I can think
of, this domain also happens to be the topmost domain, so that might not
be a huge deal and we can probably change that definition to the one you
suggest. But we should perhaps make the matching changes to
asym_cpu_capacity_level()/build_sched_domains() if we're going down that
path?

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on
  2020-07-01 19:06 ` [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
@ 2020-07-02 14:20   ` Peter Zijlstra
  2020-07-02 14:32     ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-07-02 14:20 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen

On Wed, Jul 01, 2020 at 08:06:51PM +0100, Valentin Schneider wrote:

> @@ -29,6 +29,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
>  				  struct cpumask *groupmask)
>  {
>  	struct sched_group *group = sd->groups;
> +	int flags = sd->flags;

	unsigned long flags = sd->flags;

>  
>  	cpumask_clear(groupmask);
>  
> @@ -43,6 +44,22 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
>  		printk(KERN_ERR "ERROR: domain->groups does not contain CPU%d\n", cpu);
>  	}
>  
> +	for (; flags; flags &= flags - 1) {
> +		unsigned int idx = __ffs(flags);

	for_each_set_bit(idx, &flags, SD_MAX_BIT) {

Yes, it's a bit yuck, but far more readable imo.

> +		unsigned int flag = BIT(idx);
> +		unsigned int meta_flags = sd_flag_debug[idx].meta_flags;
> +
> +		if ((meta_flags & SDF_SHARED_CHILD) && sd->child &&
> +		    !(sd->child->flags & flag))
> +			printk(KERN_ERR "ERROR: flag %s set here but not in child\n",
> +			       sd_flag_debug[idx].name);
> +
> +		if ((meta_flags & SDF_SHARED_PARENT) && sd->parent &&
> +		    !(sd->parent->flags & flag))
> +			printk(KERN_ERR "ERROR: flag %s set here but not in parent\n",
> +			       sd_flag_debug[idx].name);
> +	}
> +
>  	printk(KERN_DEBUG "%*s groups:", level + 1, "");
>  	do {
>  		if (!group) {
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-02 12:15   ` Quentin Perret
@ 2020-07-02 14:31     ` Valentin Schneider
  2020-07-02 15:45       ` Quentin Perret
  0 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-02 14:31 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen


Hi Quentin,

Thanks for having a look!

On 02/07/20 13:15, Quentin Perret wrote:
> Hey Valentin,
>
> On Wednesday 01 Jul 2020 at 20:06:50 (+0100), Valentin Schneider wrote:
>> +/*
>> + * Domain members have different CPU capacities
>> + *
>> + * SHARED_PARENT: Set from the topmost domain down to the first domain where
>> + * asymmetry is detected.
>> + */
>> +SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)
>
> Probably not a huge deal, but I don't think this is _exactly_ how
> SD_ASYM_CPUCAPACITY was defined originally, nor how the topology
> detection code deals with it (IIRC).
>
> That is, IIRC Morten defined SD_ASYM_CPUCAPACITY as the _lowest_ domain
> at which all CPU capacities are visible. On all real systems I can think
> of, this domain also happens to be the topmost domain, so that might not
> be a huge deal and we can probably change that definition to the one you
> suggest. But we should perhaps make the matching changes to
> asym_cpu_capacity_level()/build_sched_domains() if we're going down that
> path?
>

You're spot on, I was trying to sweep some of it under the carpet...

There an "interesting" quirk of asym_cpu_capacity_level() in that it does
something slightly different than what it says on the tin: it detects
the lowest topology level where *the biggest* CPU capacity is visible by
all CPUs. That works just fine on big.LITTLE, but there are questionable
DynamIQ topologies that could hit some issues.

Consider:

DIE [                   ]
MC  [             ][    ] <- sd_asym_cpucapacity
     0   1   2   3  4  5
     L   L   B   B  B  B

asym_cpu_capacity_level() would pick MC as the asymmetric topology level,
and you can argue either way: it should be DIE, because that's where CPUs 4
and 5 can see a LITTLE, or it should be MC, at least for CPUs 0-3 because
there they see all CPU capacities.

I have a plan on how to fix that, but I haven't been made aware of any
"real" topology that would seriously break there. The moment one does, this
will surface up to the top of my todo-list.

In the meantime, we can make it match the SDF_SHARED_PARENT semantics, and
this actually fixes an issue with solo big CPU clusters (which I
anecdotally found out while first writing this series, and forgot to
include):

--->8
From: Valentin Schneider <valentin.schneider@arm.com>
Date: Wed, 16 Oct 2019 18:12:12 +0100
Subject: [PATCH 1/1] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards

We currently set this flag *only* on domains whose topology level exactly
match the level where we detect asymmetry (as returned by
asym_cpu_capacity_level()). This is rather problematic.

Say there are two clusters in the system, one with a lone big CPU and the
other with a mix of big and LITTLE CPUs:

DIE [                ]
MC  [             ][ ]
     0   1   2   3  4
     L   L   B   B  B

asym_cpu_capacity_level() will figure out that the MC level is the one
where all CPUs can see a CPU of max capacity, and we will thus set
SD_ASYM_CPUCAPACITY at MC level for all CPUs.

That lone big CPU will degenerate its MC domain, since it would be alone in
there, and will end up with just a DIE domain. Since the flag was only set
at MC, this CPU ends up not seeing any SD with the flag set, which is
broken.

Rather than clearing dflags at every topology level, clear it before
entering the topology level loop. This will properly propagate upwards
flags that are set starting from a certain level.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index b5667a273bf6..549268249645 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1965,11 +1965,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
        /* Set up domains for CPUs specified by the cpu_map: */
        for_each_cpu(i, cpu_map) {
                struct sched_domain_topology_level *tl;
+		int dflags = 0;

                sd = NULL;
                for_each_sd_topology(tl) {
-			int dflags = 0;
-
                        if (tl == tl_asym) {
                                dflags |= SD_ASYM_CPUCAPACITY;
                                has_asym = true;
--
2.27.0

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on
  2020-07-02 14:20   ` Peter Zijlstra
@ 2020-07-02 14:32     ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-02 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen


On 02/07/20 15:20, Peter Zijlstra wrote:
> On Wed, Jul 01, 2020 at 08:06:51PM +0100, Valentin Schneider wrote:
>
>> @@ -29,6 +29,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
>>                                struct cpumask *groupmask)
>>  {
>>      struct sched_group *group = sd->groups;
>> +	int flags = sd->flags;
>
>       unsigned long flags = sd->flags;
>
>>
>>      cpumask_clear(groupmask);
>>
>> @@ -43,6 +44,22 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
>>              printk(KERN_ERR "ERROR: domain->groups does not contain CPU%d\n", cpu);
>>      }
>>
>> +	for (; flags; flags &= flags - 1) {
>> +		unsigned int idx = __ffs(flags);
>
>       for_each_set_bit(idx, &flags, SD_MAX_BIT) {
>
> Yes, it's a bit yuck, but far more readable imo.
>

Much better than homegrown stuff, thanks!

>> +		unsigned int flag = BIT(idx);
>> +		unsigned int meta_flags = sd_flag_debug[idx].meta_flags;
>> +
>> +		if ((meta_flags & SDF_SHARED_CHILD) && sd->child &&
>> +		    !(sd->child->flags & flag))
>> +			printk(KERN_ERR "ERROR: flag %s set here but not in child\n",
>> +			       sd_flag_debug[idx].name);
>> +
>> +		if ((meta_flags & SDF_SHARED_PARENT) && sd->parent &&
>> +		    !(sd->parent->flags & flag))
>> +			printk(KERN_ERR "ERROR: flag %s set here but not in parent\n",
>> +			       sd_flag_debug[idx].name);
>> +	}
>> +
>>      printk(KERN_DEBUG "%*s groups:", level + 1, "");
>>      do {
>>              if (!group) {
>> --
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-02 14:31     ` Valentin Schneider
@ 2020-07-02 15:45       ` Quentin Perret
  2020-07-02 16:25         ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Quentin Perret @ 2020-07-02 15:45 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen

On Thursday 02 Jul 2020 at 15:31:07 (+0100), Valentin Schneider wrote:
> There an "interesting" quirk of asym_cpu_capacity_level() in that it does
> something slightly different than what it says on the tin: it detects
> the lowest topology level where *the biggest* CPU capacity is visible by
> all CPUs. That works just fine on big.LITTLE, but there are questionable
> DynamIQ topologies that could hit some issues.
> 
> Consider:
> 
> DIE [                   ]
> MC  [             ][    ] <- sd_asym_cpucapacity
>      0   1   2   3  4  5
>      L   L   B   B  B  B
> 
> asym_cpu_capacity_level() would pick MC as the asymmetric topology level,
> and you can argue either way: it should be DIE, because that's where CPUs 4
> and 5 can see a LITTLE, or it should be MC, at least for CPUs 0-3 because
> there they see all CPU capacities.

Right, I am not looking forward to these topologies...

> I have a plan on how to fix that, but I haven't been made aware of any
> "real" topology that would seriously break there. The moment one does, this
> will surface up to the top of my todo-list.
> 
> In the meantime, we can make it match the SDF_SHARED_PARENT semantics, and
> this actually fixes an issue with solo big CPU clusters (which I
> anecdotally found out while first writing this series, and forgot to
> include):
> 
> --->8
> From: Valentin Schneider <valentin.schneider@arm.com>
> Date: Wed, 16 Oct 2019 18:12:12 +0100
> Subject: [PATCH 1/1] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
> 
> We currently set this flag *only* on domains whose topology level exactly
> match the level where we detect asymmetry (as returned by
> asym_cpu_capacity_level()). This is rather problematic.
> 
> Say there are two clusters in the system, one with a lone big CPU and the
> other with a mix of big and LITTLE CPUs:
> 
> DIE [                ]
> MC  [             ][ ]
>      0   1   2   3  4
>      L   L   B   B  B
> 
> asym_cpu_capacity_level() will figure out that the MC level is the one
> where all CPUs can see a CPU of max capacity, and we will thus set
> SD_ASYM_CPUCAPACITY at MC level for all CPUs.
> 
> That lone big CPU will degenerate its MC domain, since it would be alone in
> there, and will end up with just a DIE domain. Since the flag was only set
> at MC, this CPU ends up not seeing any SD with the flag set, which is
> broken.

+1

> Rather than clearing dflags at every topology level, clear it before
> entering the topology level loop. This will properly propagate upwards
> flags that are set starting from a certain level.

I'm feeling a bit nervous about that asymmetry -- in your example
select_idle_capacity() on, say, CPU3 will see less CPUs than on CPU4.
So, you might get fun side-effects where all task migrated to CPUs 0-3
will be 'stuck' there while CPU 4 stays mostly idle.

I have a few ideas to avoid that (e.g. looking at the rd span in
select_idle_capacity() instead of sd_asym_cpucapacity) but all this is
theoretical, so I'm happy to wait for a real platform to be released
before we worry too much about it.

In the meantime:

Reviewed-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/topology.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index b5667a273bf6..549268249645 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1965,11 +1965,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>         /* Set up domains for CPUs specified by the cpu_map: */
>         for_each_cpu(i, cpu_map) {
>                 struct sched_domain_topology_level *tl;
> +		int dflags = 0;
> 
>                 sd = NULL;
>                 for_each_sd_topology(tl) {
> -			int dflags = 0;
> -
>                         if (tl == tl_asym) {
>                                 dflags |= SD_ASYM_CPUCAPACITY;
>                                 has_asym = true;
> --
> 2.27.0

Thanks,
Quentin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-02 15:45       ` Quentin Perret
@ 2020-07-02 16:25         ` Valentin Schneider
  2020-07-02 16:37           ` Quentin Perret
  0 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-02 16:25 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen


On 02/07/20 16:45, Quentin Perret wrote:
> On Thursday 02 Jul 2020 at 15:31:07 (+0100), Valentin Schneider wrote:
>> There an "interesting" quirk of asym_cpu_capacity_level() in that it does
>> something slightly different than what it says on the tin: it detects
>> the lowest topology level where *the biggest* CPU capacity is visible by
>> all CPUs. That works just fine on big.LITTLE, but there are questionable
>> DynamIQ topologies that could hit some issues.
>>
>> Consider:
>>
>> DIE [                   ]
>> MC  [             ][    ] <- sd_asym_cpucapacity
>>      0   1   2   3  4  5
>>      L   L   B   B  B  B
>>
>> asym_cpu_capacity_level() would pick MC as the asymmetric topology level,
>> and you can argue either way: it should be DIE, because that's where CPUs 4
>> and 5 can see a LITTLE, or it should be MC, at least for CPUs 0-3 because
>> there they see all CPU capacities.
>
> Right, I am not looking forward to these topologies...

I'll try my best to prevent those from seeing the light of day, but you
know how this works...

>> Say there are two clusters in the system, one with a lone big CPU and the
>> other with a mix of big and LITTLE CPUs:
>>
>> DIE [                ]
>> MC  [             ][ ]
>>      0   1   2   3  4
>>      L   L   B   B  B
>>
>> asym_cpu_capacity_level() will figure out that the MC level is the one
>> where all CPUs can see a CPU of max capacity, and we will thus set
>> SD_ASYM_CPUCAPACITY at MC level for all CPUs.
>>
>> That lone big CPU will degenerate its MC domain, since it would be alone in
>> there, and will end up with just a DIE domain. Since the flag was only set
>> at MC, this CPU ends up not seeing any SD with the flag set, which is
>> broken.
>
> +1
>
>> Rather than clearing dflags at every topology level, clear it before
>> entering the topology level loop. This will properly propagate upwards
>> flags that are set starting from a certain level.
>
> I'm feeling a bit nervous about that asymmetry -- in your example
> select_idle_capacity() on, say, CPU3 will see less CPUs than on CPU4.
> So, you might get fun side-effects where all task migrated to CPUs 0-3
> will be 'stuck' there while CPU 4 stays mostly idle.
>

It's actually pretty close to what happens with the LLC domain on SMP -
select_idle_sibling() doesn't look outside of it. The wake_affine() stuff
might steer the task towards a different LLC, but that's about it for
wakeups. We rely on load balancing (fork/exec, newidle, nohz and periodic)
to spread this further - and we would here too.

It gets "funny" for EAS when we aren't overutilized and thus can't rely on
load balancing; at least misfit ought to still work. It *is* a weird
topology, for sure.

> I have a few ideas to avoid that (e.g. looking at the rd span in
> select_idle_capacity() instead of sd_asym_cpucapacity) but all this is
> theoretical, so I'm happy to wait for a real platform to be released
> before we worry too much about it.
>
> In the meantime:
>
> Reviewed-by: Quentin Perret <qperret@google.com>

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-02 16:25         ` Valentin Schneider
@ 2020-07-02 16:37           ` Quentin Perret
  2020-07-02 16:49             ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Quentin Perret @ 2020-07-02 16:37 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen

On Thursday 02 Jul 2020 at 17:25:41 (+0100), Valentin Schneider wrote:
> It's actually pretty close to what happens with the LLC domain on SMP -
> select_idle_sibling() doesn't look outside of it. The wake_affine() stuff
> might steer the task towards a different LLC, but that's about it for
> wakeups. We rely on load balancing (fork/exec, newidle, nohz and periodic)
> to spread this further - and we would here too.

Sure, but on SMP the search space in select_idle_sibling is always
consistent -- you search within the LLC. With the fix you suggested,
CPUs 0-3 will search within their LLCs, while CPU4 searches the entire
system, which creates an imbalanced mess IMO.

For affine wake-ups, you could migrate from CPU4 -> CPU0-3, but CPU0-3
to CPU4 is not possible, so this asymmetry is almost guaranteed to
actively create imbalance. And sure, the periodic load balancer ought to
fix it, but really wake-up balance and periodic load balance should be
pushing in the same direction and not fighting against each other.

Anyways, enough bikeshedding for today, I'll try and have look at the
rest of the series :)

Cheers,
Quentin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN
  2020-07-01 19:06 ` [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
@ 2020-07-02 16:44   ` Dietmar Eggemann
  2020-07-02 18:46     ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Dietmar Eggemann @ 2020-07-02 16:44 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: Morten Rasmussen, mingo, peterz, vincent.guittot

On 01/07/2020 21:06, Valentin Schneider wrote:
> This flag was introduced in 2014 by commit
> 
>   d77b3ed5c9f8 ("sched: Add a new SD_SHARE_POWERDOMAIN for sched_domain")
> 
> but AFAIA it was never leveraged by the scheduler. The closest thing I can
> think of is EAS caring about frequency domains, and it does that by
> leveraging performance domains.

... and even this was purely out of tree (SD_SHARE_CAP_STATES).

> Remove the flag.
> 
> Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  arch/arm/kernel/topology.c     |  2 +-
>  include/linux/sched/sd_flags.h | 20 ++++++--------------
>  kernel/sched/topology.c        | 10 +++-------
>  3 files changed, 10 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index b5adaf744630..353f3ee660e4 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -243,7 +243,7 @@ void store_cpu_topology(unsigned int cpuid)
>  
>  static inline int cpu_corepower_flags(void)
>  {
> -	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
> +	return SD_SHARE_PKG_RESOURCES;
>  }
>  
>  static struct sched_domain_topology_level arm_topology[] = {

I guess with SD_SHARE_POWERDOMAIN gone, arch arm can even use the default_topology[]:

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index b5adaf744630..87dd193165cc 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -241,20 +241,6 @@ void store_cpu_topology(unsigned int cpuid)
        update_siblings_masks(cpuid);
 }
 
-static inline int cpu_corepower_flags(void)
-{
-       return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
-}
-
-static struct sched_domain_topology_level arm_topology[] = {
-#ifdef CONFIG_SCHED_MC
-       { cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
-       { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
-#endif
-       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
-       { NULL, },
-};
-
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
@@ -265,7 +251,4 @@ void __init init_cpu_topology(void)
        smp_wmb();
 
        parse_dt_topology();
-
-       /* Set scheduler topology descriptor */
-       set_sched_topology(arm_topology);
 }

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata
  2020-07-02 16:37           ` Quentin Perret
@ 2020-07-02 16:49             ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-02 16:49 UTC (permalink / raw)
  To: Quentin Perret
  Cc: linux-kernel, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen


On 02/07/20 17:37, Quentin Perret wrote:
> On Thursday 02 Jul 2020 at 17:25:41 (+0100), Valentin Schneider wrote:
>> It's actually pretty close to what happens with the LLC domain on SMP -
>> select_idle_sibling() doesn't look outside of it. The wake_affine() stuff
>> might steer the task towards a different LLC, but that's about it for
>> wakeups. We rely on load balancing (fork/exec, newidle, nohz and periodic)
>> to spread this further - and we would here too.
>
> Sure, but on SMP the search space in select_idle_sibling is always
> consistent -- you search within the LLC. With the fix you suggested,
> CPUs 0-3 will search within their LLCs, while CPU4 searches the entire
> system, which creates an imbalanced mess IMO.
>

Yeah, it is a mess.

> For affine wake-ups, you could migrate from CPU4 -> CPU0-3, but CPU0-3
> to CPU4 is not possible

AIU the wake_affine bits, you get to steer the wakeup towards the waking
CPU. So if the task previously ran on CPU0-3, wake_affine can make the
target CPU4 (waking CPU), so it would become a possible candidate.

But as you say, this thing is still an ugly asymmetric mess.

> so this asymmetry is almost guaranteed to
> actively create imbalance. And sure, the periodic load balancer ought to
> fix it, but really wake-up balance and periodic load balance should be
> pushing in the same direction and not fighting against each other.
>
> Anyways, enough bikeshedding for today, I'll try and have look at the
> rest of the series :)
>

Thanks!

> Cheers,
> Quentin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 5/7] sched/topology: Add more flags to the SD degeneration mask
  2020-07-01 19:06 ` [PATCH v3 5/7] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
@ 2020-07-02 18:28   ` Dietmar Eggemann
  0 siblings, 0 replies; 27+ messages in thread
From: Dietmar Eggemann @ 2020-07-02 18:28 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen

On 01/07/2020 21:06, Valentin Schneider wrote:
> I don't think it is going to change much in practice, but we were missing
> those:
> 
> o SD_BALANCE_WAKE: Used just like the other SD_BALANCE_* flags, so also
>   needs > 1 group.
> o SD_ASYM_PACKING: Hinges on load balancing (periodic / wakeup), thus needs
>   > 1 group to happen
> o SD_OVERLAP: Describes domains with overlapping groups; can't have
>   overlaps with a single group.
> 
> SD_PREFER_SIBLING is as always the odd one out: we currently consider it
> in sd_parent_degenerate() but not in sd_degenerate(). It too hinges on load
> balancing, and thus won't have any effect when set on a domain with a
> single group. Add it to the sd_degenerate() groups mask.
> 
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/topology.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6047d491abe9..fe393b295444 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -161,11 +161,15 @@ static int sd_degenerate(struct sched_domain *sd)
>  
>  	/* Following flags need at least 2 groups */
>  	if (sd->flags & (SD_BALANCE_NEWIDLE |
> +			 SD_BALANCE_WAKE |
>  			 SD_BALANCE_FORK |
>  			 SD_BALANCE_EXEC |
> +			 SD_ASYM_PACKING |
>  			 SD_SHARE_CPUCAPACITY |
>  			 SD_ASYM_CPUCAPACITY |
> -			 SD_SHARE_PKG_RESOURCES)) {
> +			 SD_SHARE_PKG_RESOURCES |
> +			 SD_OVERLAP |
> +			 SD_PREFER_SIBLING)) {

Why do you add SD_PREFER_SIBLING here? It doesn't have SDF_NEEDS_GROUPS
set in [PATCH v3 6/7].

$ cat ./include/linux/sched/sd_flags.h | grep ^SD_FLAG
SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_WAKE_AFFINE,         4, SDF_SHARED_CHILD)
SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
SD_FLAG(SD_PREFER_SIBLING,      10, 0)                           <-- !!!
SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT |SDF_NEEDS_GROUPS)
SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups
  2020-07-01 19:06 ` [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
@ 2020-07-02 18:29   ` Dietmar Eggemann
  2020-07-02 18:46     ` Valentin Schneider
  2020-07-13 12:39   ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Dietmar Eggemann @ 2020-07-02 18:29 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: Peter Zijlstra, mingo, vincent.guittot, morten.rasmussen

On 01/07/2020 21:06, Valentin Schneider wrote:

[...]

> @@ -105,16 +122,18 @@ SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
>   * Place busy tasks earlier in the domain
>   *
>   * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
> - * up, but currently assumed to be set from the base domain upwards (see
> - * update_top_cache_domain()).
> + *               up, but currently assumed to be set from the base domain
> + *               upwards (see update_top_cache_domain()).
>   */
> -SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD)
> +SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
>  
>  /*
>   * Prefer to place tasks in a sibling domain
>   *
>   * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
>   * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
> + *
> + * NEEDS_GROUPS: Load balancing flag.
>   */
>  SD_FLAG(SD_PREFER_SIBLING,      10, 0)

Related to my comment in [PATCH v3 5/7], maybe you wanted to add
SDF_NEEDS_GROUPS for SD_PREFER_SIBLING as well ? This comment
'NEEDS_GROUPS: Load balancing flag.' makes me wondering.

Currently, SD_PREFER_SIBLING isn't in SD_DEGENERATE_GROUPS_MASK=0xaef.

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN
  2020-07-02 16:44   ` Dietmar Eggemann
@ 2020-07-02 18:46     ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-02 18:46 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Morten Rasmussen, mingo, peterz, vincent.guittot


On 02/07/20 17:44, Dietmar Eggemann wrote:
> On 01/07/2020 21:06, Valentin Schneider wrote:
>> This flag was introduced in 2014 by commit
>>
>>   d77b3ed5c9f8 ("sched: Add a new SD_SHARE_POWERDOMAIN for sched_domain")
>>
>> but AFAIA it was never leveraged by the scheduler. The closest thing I can
>> think of is EAS caring about frequency domains, and it does that by
>> leveraging performance domains.
>
> ... and even this was purely out of tree (SD_SHARE_CAP_STATES).
>
>> Remove the flag.
>>
>> Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>> ---
>>  arch/arm/kernel/topology.c     |  2 +-
>>  include/linux/sched/sd_flags.h | 20 ++++++--------------
>>  kernel/sched/topology.c        | 10 +++-------
>>  3 files changed, 10 insertions(+), 22 deletions(-)
>>
>> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
>> index b5adaf744630..353f3ee660e4 100644
>> --- a/arch/arm/kernel/topology.c
>> +++ b/arch/arm/kernel/topology.c
>> @@ -243,7 +243,7 @@ void store_cpu_topology(unsigned int cpuid)
>>
>>  static inline int cpu_corepower_flags(void)
>>  {
>> -	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
>> +	return SD_SHARE_PKG_RESOURCES;
>>  }
>>
>>  static struct sched_domain_topology_level arm_topology[] = {
>
> I guess with SD_SHARE_POWERDOMAIN gone, arch arm can even use the default_topology[]:

That does look like it! I never noticed we declared this GMC topology
level. Given it uses the thread_sibling mask, and that no (upstream) arm DT
uses the thread topology binding, I guess that makes sense.

>
> diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
> index b5adaf744630..87dd193165cc 100644
> --- a/arch/arm/kernel/topology.c
> +++ b/arch/arm/kernel/topology.c
> @@ -241,20 +241,6 @@ void store_cpu_topology(unsigned int cpuid)
>         update_siblings_masks(cpuid);
>  }
>
> -static inline int cpu_corepower_flags(void)
> -{
> -       return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
> -}
> -
> -static struct sched_domain_topology_level arm_topology[] = {
> -#ifdef CONFIG_SCHED_MC
> -       { cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
> -       { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> -#endif
> -       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> -       { NULL, },
> -};
> -
>  /*
>   * init_cpu_topology is called at boot when only one cpu is running
>   * which prevent simultaneous write access to cpu_topology array
> @@ -265,7 +251,4 @@ void __init init_cpu_topology(void)
>         smp_wmb();
>
>         parse_dt_topology();
> -
> -       /* Set scheduler topology descriptor */
> -       set_sched_topology(arm_topology);
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups
  2020-07-02 18:29   ` Dietmar Eggemann
@ 2020-07-02 18:46     ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-02 18:46 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: linux-kernel, Peter Zijlstra, mingo, vincent.guittot, morten.rasmussen


On 02/07/20 19:29, Dietmar Eggemann wrote:
> On 01/07/2020 21:06, Valentin Schneider wrote:
>
> [...]
>
>> @@ -105,16 +122,18 @@ SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
>>   * Place busy tasks earlier in the domain
>>   *
>>   * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
>> - * up, but currently assumed to be set from the base domain upwards (see
>> - * update_top_cache_domain()).
>> + *               up, but currently assumed to be set from the base domain
>> + *               upwards (see update_top_cache_domain()).
>>   */
>> -SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD)
>> +SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
>>
>>  /*
>>   * Prefer to place tasks in a sibling domain
>>   *
>>   * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
>>   * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
>> + *
>> + * NEEDS_GROUPS: Load balancing flag.
>>   */
>>  SD_FLAG(SD_PREFER_SIBLING,      10, 0)
>
> Related to my comment in [PATCH v3 5/7], maybe you wanted to add
> SDF_NEEDS_GROUPS for SD_PREFER_SIBLING as well ? This comment
> 'NEEDS_GROUPS: Load balancing flag.' makes me wondering.
>
> Currently, SD_PREFER_SIBLING isn't in SD_DEGENERATE_GROUPS_MASK=0xaef.
>

You're right, that's a fail from my end. Thanks (and sorry)!

> [...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups
  2020-07-01 19:06 ` [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
  2020-07-02 18:29   ` Dietmar Eggemann
@ 2020-07-13 12:39   ` Peter Zijlstra
  2020-07-13 13:25     ` Valentin Schneider
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-07-13 12:39 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen

On Wed, Jul 01, 2020 at 08:06:54PM +0100, Valentin Schneider wrote:
> +/* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
> +#define SD_FLAG(name, idx, mflags) (BIT(idx) * (((mflags) & SDF_NEEDS_GROUPS) / SDF_NEEDS_GROUPS)) |

#define SD_FLAGS(name, idx, mflags) (!!((mflags) & SDF_NEEDS_GROUPS) * BIT(idx))

> +static const int SD_DEGENERATE_GROUPS_MASK =
> +#include<linux/sched/sd_flags.h>
> +0;
> +#undef SD_FLAG
> +
>  #ifdef CONFIG_SCHED_DEBUG
>  #define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
>  static const struct {
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask
  2020-07-01 19:06 ` [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
@ 2020-07-13 12:55   ` Peter Zijlstra
  2020-07-13 13:28     ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-07-13 12:55 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen

On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote:
> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
> sd_degenerate_parent().
> 
> Note that this changes sd_degenerate() somewhat: I'm using the negation of
> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
> is equivalent to:
> 
> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA
> 
> whereas the current mask for that is simply
> 
> SD_WAKE_AFFINE
> 
> I played with a few toy NUMA topologies on QEMU and couldn't cause a
> different degeneration than what mainline does currently. If that is deemed
> too risky, we can go back to using SD_WAKE_AFFINE explicitly.

Arguably SD_SERIALIZE needs groups, note how we're only having that
effective for machines with at least 2 nodes. It's a bit shit how we end
up there, but IIRC that's what it ends up as.

SD_NUMA is descriptive, and not marking a group as degenerates because
it has SD_NUMA seems a bit silly. But then, it would be the top domain
and would survive anyway?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups
  2020-07-13 12:39   ` Peter Zijlstra
@ 2020-07-13 13:25     ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-13 13:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen


On 13/07/20 13:39, Peter Zijlstra wrote:
> On Wed, Jul 01, 2020 at 08:06:54PM +0100, Valentin Schneider wrote:
>> +/* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
>> +#define SD_FLAG(name, idx, mflags) (BIT(idx) * (((mflags) & SDF_NEEDS_GROUPS) / SDF_NEEDS_GROUPS)) |
>
> #define SD_FLAGS(name, idx, mflags) (!!((mflags) & SDF_NEEDS_GROUPS) * BIT(idx))
>

Was paranoid about really getting a 1 or a 0, but AFAICT the standard
does agree with you!

>> +static const int SD_DEGENERATE_GROUPS_MASK =
>> +#include<linux/sched/sd_flags.h>
>> +0;
>> +#undef SD_FLAG
>> +
>>  #ifdef CONFIG_SCHED_DEBUG
>>  #define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
>>  static const struct {
>> --
>> 2.27.0
>>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask
  2020-07-13 12:55   ` Peter Zijlstra
@ 2020-07-13 13:28     ` Valentin Schneider
  2020-07-13 13:43       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2020-07-13 13:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen


On 13/07/20 13:55, Peter Zijlstra wrote:
> On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote:
>> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
>> sd_degenerate_parent().
>>
>> Note that this changes sd_degenerate() somewhat: I'm using the negation of
>> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
>> is equivalent to:
>>
>> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA
>>
>> whereas the current mask for that is simply
>>
>> SD_WAKE_AFFINE
>>
>> I played with a few toy NUMA topologies on QEMU and couldn't cause a
>> different degeneration than what mainline does currently. If that is deemed
>> too risky, we can go back to using SD_WAKE_AFFINE explicitly.
>
> Arguably SD_SERIALIZE needs groups, note how we're only having that
> effective for machines with at least 2 nodes. It's a bit shit how we end
> up there, but IIRC that's what it ends up as.
>

Right, AFAICT we get SD_SERIALIZE wherever we have SD_NUMA, which is any
level above NODE.

> SD_NUMA is descriptive, and not marking a group as degenerates because
> it has SD_NUMA seems a bit silly.

It does, although we can still degenerate it, see below.

> But then, it would be the top domain
> and would survive anyway?

So from what I've tested we still get rid of those via
sd_parent_degenerate(): child and parent have the same flags and same span,
so parent goes out.

That happens in the middle of the NUMA topology levels on that borked
topology with weird distances, aka

  node distances:
  node   0   1   2   3
    0:  10  12  20  22
    1:  12  10  22  24
    2:  20  22  10  12
    3:  22  24  12  10

which ought to look something like (+local distance to end result)

      2      10      2
  1 <---> 0 <---> 2 <---> 3

We end up with the following NUMA levels (i.e. deduplicated distances)
  NUMA (<= 12)
  NUMA (<= 20)
  NUMA (<= 22)
  NUMA (<= 24)

For e.g. any CPU of node1, NUMA(<=20) is gonna have the same span as
NUMA(<=12), so we'll degenerate it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask
  2020-07-13 13:28     ` Valentin Schneider
@ 2020-07-13 13:43       ` Peter Zijlstra
  2020-07-13 13:52         ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-07-13 13:43 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen

On Mon, Jul 13, 2020 at 02:28:29PM +0100, Valentin Schneider wrote:
> 
> On 13/07/20 13:55, Peter Zijlstra wrote:
> > On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote:
> >> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
> >> sd_degenerate_parent().
> >>
> >> Note that this changes sd_degenerate() somewhat: I'm using the negation of
> >> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
> >> is equivalent to:
> >>
> >> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA
> >>
> >> whereas the current mask for that is simply
> >>
> >> SD_WAKE_AFFINE
> >>
> >> I played with a few toy NUMA topologies on QEMU and couldn't cause a
> >> different degeneration than what mainline does currently. If that is deemed
> >> too risky, we can go back to using SD_WAKE_AFFINE explicitly.
> >
> > Arguably SD_SERIALIZE needs groups, note how we're only having that
> > effective for machines with at least 2 nodes. It's a bit shit how we end
> > up there, but IIRC that's what it ends up as.
> >
> 
> Right, AFAICT we get SD_SERIALIZE wherever we have SD_NUMA, which is any
> level above NODE.

Oh, right, I forgot we have NODE, d'0h. But in that case these lines:

	if (nr_node_ids == 1)
		pflags &= ~SD_SERIALIZE;

are dead code, right?

> > SD_NUMA is descriptive, and not marking a group as degenerates because
> > it has SD_NUMA seems a bit silly.
> 
> It does, although we can still degenerate it, see below.
> 
> > But then, it would be the top domain
> > and would survive anyway?
> 
> So from what I've tested we still get rid of those via
> sd_parent_degenerate(): child and parent have the same flags and same span,
> so parent goes out.
> 
> That happens in the middle of the NUMA topology levels on that borked
> topology with weird distances, aka
> 
>   node distances:
>   node   0   1   2   3
>     0:  10  12  20  22
>     1:  12  10  22  24
>     2:  20  22  10  12
>     3:  22  24  12  10
> 
> which ought to look something like (+local distance to end result)
> 
>       2      10      2
>   1 <---> 0 <---> 2 <---> 3
> 
> We end up with the following NUMA levels (i.e. deduplicated distances)
>   NUMA (<= 12)
>   NUMA (<= 20)
>   NUMA (<= 22)
>   NUMA (<= 24)
> 
> For e.g. any CPU of node1, NUMA(<=20) is gonna have the same span as
> NUMA(<=12), so we'll degenerate it.

Man, that's horrible :-) OK, fair enough, keep it as is, we'll see what
if anything breaks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask
  2020-07-13 13:43       ` Peter Zijlstra
@ 2020-07-13 13:52         ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2020-07-13 13:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann, morten.rasmussen


On 13/07/20 14:43, Peter Zijlstra wrote:
> On Mon, Jul 13, 2020 at 02:28:29PM +0100, Valentin Schneider wrote:
>>
>> On 13/07/20 13:55, Peter Zijlstra wrote:
>> > On Wed, Jul 01, 2020 at 08:06:55PM +0100, Valentin Schneider wrote:
>> >> Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
>> >> sd_degenerate_parent().
>> >>
>> >> Note that this changes sd_degenerate() somewhat: I'm using the negation of
>> >> SD_DEGENERATE_GROUPS_MASK as the mask of flags not requiring groups, which
>> >> is equivalent to:
>> >>
>> >> SD_WAKE_AFFINE | SD_SERIALIZE | SD_NUMA
>> >>
>> >> whereas the current mask for that is simply
>> >>
>> >> SD_WAKE_AFFINE
>> >>
>> >> I played with a few toy NUMA topologies on QEMU and couldn't cause a
>> >> different degeneration than what mainline does currently. If that is deemed
>> >> too risky, we can go back to using SD_WAKE_AFFINE explicitly.
>> >
>> > Arguably SD_SERIALIZE needs groups, note how we're only having that
>> > effective for machines with at least 2 nodes. It's a bit shit how we end
>> > up there, but IIRC that's what it ends up as.
>> >
>>
>> Right, AFAICT we get SD_SERIALIZE wherever we have SD_NUMA, which is any
>> level above NODE.
>
> Oh, right, I forgot we have NODE, d'0h. But in that case these lines:
>
>       if (nr_node_ids == 1)
>               pflags &= ~SD_SERIALIZE;
>
> are dead code, right?
>

Uuuuh right, that does sound like something made obsolete by having NODE;
we didn't have it back then:

5436499e6098 ("sched: fix sd_parent_degenerate on non-numa smp machine")

I'll fold that in (and test it, just to be sure :-)).

>> > SD_NUMA is descriptive, and not marking a group as degenerates because
>> > it has SD_NUMA seems a bit silly.
>>
>> It does, although we can still degenerate it, see below.
>>
>> > But then, it would be the top domain
>> > and would survive anyway?
>>
>> So from what I've tested we still get rid of those via
>> sd_parent_degenerate(): child and parent have the same flags and same span,
>> so parent goes out.
>>
>> That happens in the middle of the NUMA topology levels on that borked
>> topology with weird distances, aka
>>
>>   node distances:
>>   node   0   1   2   3
>>     0:  10  12  20  22
>>     1:  12  10  22  24
>>     2:  20  22  10  12
>>     3:  22  24  12  10
>>
>> which ought to look something like (+local distance to end result)
>>
>>       2      10      2
>>   1 <---> 0 <---> 2 <---> 3
>>
>> We end up with the following NUMA levels (i.e. deduplicated distances)
>>   NUMA (<= 12)
>>   NUMA (<= 20)
>>   NUMA (<= 22)
>>   NUMA (<= 24)
>>
>> For e.g. any CPU of node1, NUMA(<=20) is gonna have the same span as
>> NUMA(<=12), so we'll degenerate it.
>
> Man, that's horrible :-)

It is :(

> OK, fair enough, keep it as is, we'll see what
> if anything breaks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-07-13 13:52 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-01 19:06 [PATCH v3 0/7] sched: Instrument sched domain flags Valentin Schneider
2020-07-01 19:06 ` [PATCH v3 1/7] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
2020-07-01 19:06 ` [PATCH v3 2/7] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
2020-07-02 12:15   ` Quentin Perret
2020-07-02 14:31     ` Valentin Schneider
2020-07-02 15:45       ` Quentin Perret
2020-07-02 16:25         ` Valentin Schneider
2020-07-02 16:37           ` Quentin Perret
2020-07-02 16:49             ` Valentin Schneider
2020-07-01 19:06 ` [PATCH v3 3/7] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
2020-07-02 14:20   ` Peter Zijlstra
2020-07-02 14:32     ` Valentin Schneider
2020-07-01 19:06 ` [PATCH v3 4/7] arm, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
2020-07-02 16:44   ` Dietmar Eggemann
2020-07-02 18:46     ` Valentin Schneider
2020-07-01 19:06 ` [PATCH v3 5/7] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
2020-07-02 18:28   ` Dietmar Eggemann
2020-07-01 19:06 ` [PATCH v3 6/7] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
2020-07-02 18:29   ` Dietmar Eggemann
2020-07-02 18:46     ` Valentin Schneider
2020-07-13 12:39   ` Peter Zijlstra
2020-07-13 13:25     ` Valentin Schneider
2020-07-01 19:06 ` [PATCH v3 7/7] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
2020-07-13 12:55   ` Peter Zijlstra
2020-07-13 13:28     ` Valentin Schneider
2020-07-13 13:43       ` Peter Zijlstra
2020-07-13 13:52         ` Valentin Schneider

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).