linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/10] sched: Instrument sched domain flags
@ 2020-07-31 11:54 Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 01/10] ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
                   ` (10 more replies)
  0 siblings, 11 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

Hi,

I've repeatedly stared at an SD flag and asked myself "how should that be
set up in the domain hierarchy anyway?". I figured that if we formalize our
flags zoology a bit, we could also do some runtime assertions on them -
this is what this series is all about.

Patches
=======

The idea is to associate the flags with metaflags that describes how they
should be set in a sched domain hierarchy ("if this SD has it, all its {parents,
children} have it") or how they behave wrt degeneration - details are in the
comments and commit logs. 

The good thing is that the debugging bits go away when CONFIG_SCHED_DEBUG isn't
set. The bad thing is that this replaces SD_* flags definitions with some
unsavoury macros. This is mainly because I wanted to avoid having to duplicate
work between declaring the flags and declaring their metaflags.

o Patches 1-3 are topology cleanups / fixes
o Patches 4-6 instrument SD flags and add assertions
o Patches 7-10 leverage the instrumentation to factorize domain degeneration

Revisions
=========

v3 -> v4
--------

o Reordered the series to have fixes / cleanups first

o Added SD_ASYM_CPUCAPACITY propagation (Quentin)
o Made ARM revert back to the default sched topology (Dietmar)
o Removed SD_SERIALIZE degeneration special case (Peter)

o Made SD_NUMA and SD_SERIALIZE have SDF_NEEDS_GROUPS

  As discussed on v3, I thought this wasn't required, but thinking some more
  about it there can be cases where that changes the current behaviour. For
  instance, in the following wacky triangle:

      0\ 30
      | \
  20  |  2
      | /
      1/ 30

  there are two unique distances thus two NUMA topology levels, however the
  first one for node 2 would have the same span as its child domain and thus
  should be degenerated. If we don't give SD_NUMA and SD_SERIALIZE
  SDF_NEEDS_GROUPS, this domain wouldn't be denegerated since its child
  *doesn't* have either SD_NUMA or SD_SERIALIZE (it's the first NUMA domain),
  and we'd have this weird NUMA domain lingering with a single group.

v2 -> v3
--------

o Reworded comment for SD_OVERLAP (it's about the groups, not the domains)

o Added more flags to the SD degeneration mask
o Added generation of an SD flag mask for the degeneration functions (Peter)

RFC -> v2
---------

o Rebased on top of tip/sched/core
o Aligned wording of comments between flags
o Rectified some flag descriptions (Morten)
o Added removal of SD_SHARE_POWERDOMAIN (Morten)

Valentin Schneider (10):
  ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN
  ARM: Revert back to default scheduler topology.
  sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
  sched/topology: Split out SD_* flags declaration to its own file
  sched/topology: Define and assign sched_domain flag metadata
  sched/topology: Verify SD_* flags setup when sched_debug is on
  sched/topology: Add more flags to the SD degeneration mask
  sched/topology: Remove SD_SERIALIZE degeneration special case
  sched/topology: Introduce SD metaflag for flags needing > 1 groups
  sched/topology: Use prebuilt SD flag degeneration mask

 arch/arm/kernel/topology.c     |  26 ------
 include/linux/sched/sd_flags.h | 156 +++++++++++++++++++++++++++++++++
 include/linux/sched/topology.h |  36 +++++---
 kernel/sched/topology.c        |  54 ++++++------
 4 files changed, 204 insertions(+), 68 deletions(-)
 create mode 100644 include/linux/sched/sd_flags.h

--
2.27.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v4 01/10] ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 02/10] ARM: Revert back to default scheduler topology Valentin Schneider
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Morten Rasmussen, mingo, peterz, vincent.guittot,
	dietmar.eggemann, Quentin Perret

This flag was introduced in 2014 by commit

  d77b3ed5c9f8 ("sched: Add a new SD_SHARE_POWERDOMAIN for sched_domain")

but AFAIA it was never leveraged by the scheduler. The closest thing I can
think of is EAS caring about frequency domains, and it does that by
leveraging performance domains.

Remove the flag.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 arch/arm/kernel/topology.c     |  2 +-
 include/linux/sched/topology.h | 13 ++++++-------
 kernel/sched/topology.c        | 10 +++-------
 3 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index b5adaf744630..353f3ee660e4 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -243,7 +243,7 @@ void store_cpu_topology(unsigned int cpuid)
 
 static inline int cpu_corepower_flags(void)
 {
-	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
+	return SD_SHARE_PKG_RESOURCES;
 }
 
 static struct sched_domain_topology_level arm_topology[] = {
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 764222d637b7..c88249bed095 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -18,13 +18,12 @@
 #define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
 #define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
 #define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
-#define SD_SHARE_POWERDOMAIN	0x0080	/* Domain members share power domain */
-#define SD_SHARE_PKG_RESOURCES	0x0100	/* Domain members share CPU pkg resources */
-#define SD_SERIALIZE		0x0200	/* Only a single load balancing instance */
-#define SD_ASYM_PACKING		0x0400  /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING	0x0800	/* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP		0x1000	/* sched_domains of this level overlap */
-#define SD_NUMA			0x2000	/* cross-node balancing */
+#define SD_SHARE_PKG_RESOURCES	0x0080	/* Domain members share CPU pkg resources */
+#define SD_SERIALIZE		0x0100	/* Only a single load balancing instance */
+#define SD_ASYM_PACKING		0x0200  /* Place busy groups earlier in the domain */
+#define SD_PREFER_SIBLING	0x0400	/* Prefer to place tasks in a sibling domain */
+#define SD_OVERLAP		0x0800	/* sched_domains of this level overlap */
+#define SD_NUMA			0x1000	/* cross-node balancing */
 
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 9079d865a935..865fff3ef20a 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -148,8 +148,7 @@ static int sd_degenerate(struct sched_domain *sd)
 			 SD_BALANCE_EXEC |
 			 SD_SHARE_CPUCAPACITY |
 			 SD_ASYM_CPUCAPACITY |
-			 SD_SHARE_PKG_RESOURCES |
-			 SD_SHARE_POWERDOMAIN)) {
+			 SD_SHARE_PKG_RESOURCES)) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
@@ -180,8 +179,7 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 			    SD_ASYM_CPUCAPACITY |
 			    SD_SHARE_CPUCAPACITY |
 			    SD_SHARE_PKG_RESOURCES |
-			    SD_PREFER_SIBLING |
-			    SD_SHARE_POWERDOMAIN);
+			    SD_PREFER_SIBLING);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}
@@ -1292,7 +1290,6 @@ int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;
  *   SD_SHARE_CPUCAPACITY   - describes SMT topologies
  *   SD_SHARE_PKG_RESOURCES - describes shared caches
  *   SD_NUMA                - describes NUMA topologies
- *   SD_SHARE_POWERDOMAIN   - describes shared power domain
  *
  * Odd one out, which beside describing the topology has a quirk also
  * prescribes the desired behaviour that goes along with it:
@@ -1303,8 +1300,7 @@ int __read_mostly		node_reclaim_distance = RECLAIM_DISTANCE;
 	(SD_SHARE_CPUCAPACITY	|	\
 	 SD_SHARE_PKG_RESOURCES |	\
 	 SD_NUMA		|	\
-	 SD_ASYM_PACKING	|	\
-	 SD_SHARE_POWERDOMAIN)
+	 SD_ASYM_PACKING)
 
 static struct sched_domain *
 sd_init(struct sched_domain_topology_level *tl,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 02/10] ARM: Revert back to default scheduler topology.
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 01/10] ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards Valentin Schneider
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Dietmar Eggemann, mingo, peterz, vincent.guittot,
	morten.rasmussen, Quentin Perret

The ARM-specific GMC level is meant to be built using the thread sibling
mask, but no devicetree in arch/arm/boot/dts uses the 'thread' cpu-map
binding. With SD_SHARE_POWERDOMAIN gone, this topology level can be
removed, at which point ARM no longer benefits from having a custom defined
topology table.

Delete the GMC topology level by making ARM use the default scheduler
topology table. This essentially reverts commit

  fb2aa85564f4 ("sched, ARM: Create a dedicated scheduler topology table")

Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 arch/arm/kernel/topology.c | 26 --------------------------
 1 file changed, 26 deletions(-)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 353f3ee660e4..ef0058de432b 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -177,15 +177,6 @@ static inline void parse_dt_topology(void) {}
 static inline void update_cpu_capacity(unsigned int cpuid) {}
 #endif
 
-/*
- * The current assumption is that we can power gate each core independently.
- * This will be superseded by DT binding once available.
- */
-const struct cpumask *cpu_corepower_mask(int cpu)
-{
-	return &cpu_topology[cpu].thread_sibling;
-}
-
 /*
  * store_cpu_topology is called at boot when only one cpu is running
  * and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
@@ -241,20 +232,6 @@ void store_cpu_topology(unsigned int cpuid)
 	update_siblings_masks(cpuid);
 }
 
-static inline int cpu_corepower_flags(void)
-{
-	return SD_SHARE_PKG_RESOURCES;
-}
-
-static struct sched_domain_topology_level arm_topology[] = {
-#ifdef CONFIG_SCHED_MC
-	{ cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
-#endif
-	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
-	{ NULL, },
-};
-
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
@@ -265,7 +242,4 @@ void __init init_cpu_topology(void)
 	smp_wmb();
 
 	parse_dt_topology();
-
-	/* Set scheduler topology descriptor */
-	set_sched_topology(arm_topology);
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 01/10] ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 02/10] ARM: Revert back to default scheduler topology Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-08-06 14:20   ` Ingo Molnar
  2020-07-31 11:54 ` [PATCH v4 04/10] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: Quentin Perret, mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen

We currently set this flag *only* on domains whose topology level exactly
match the level where we detect asymmetry (as returned by
asym_cpu_capacity_level()). This is rather problematic.

Say there are two clusters in the system, one with a lone big CPU and the
other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):

DIE [                ]
MC  [             ][ ]
     0   1   2   3  4
     L   L   B   B  B

asym_cpu_capacity_level() will figure out that the MC level is the one
where all CPUs can see a CPU of max capacity, and we will thus set
SD_ASYM_CPUCAPACITY at MC level for all CPUs.

That lone big CPU will degenerate its MC domain, since it would be alone in
there, and will end up with just a DIE domain. Since the flag was only set
at MC, this CPU ends up not seeing any SD with the flag set, which is
broken.

Rather than clearing dflags at every topology level, clear it before
entering the topology level loop. This will properly propagate upwards
flags that are set starting from a certain level.

Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 865fff3ef20a..42b89668e1e4 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1985,11 +1985,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
 	/* Set up domains for CPUs specified by the cpu_map: */
 	for_each_cpu(i, cpu_map) {
 		struct sched_domain_topology_level *tl;
+		int dflags = 0;
 
 		sd = NULL;
 		for_each_sd_topology(tl) {
-			int dflags = 0;
-
 			if (tl == tl_asym) {
 				dflags |= SD_ASYM_CPUCAPACITY;
 				has_asym = true;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 04/10] sched/topology: Split out SD_* flags declaration to its own file
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (2 preceding siblings ...)
  2020-07-31 11:54 ` [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

To associate the SD flags with some metadata, we need some more structure
in the way they are declared.

Rather than shove that in a free-standing macro list, move the declaration
in a separate file that can be re-imported with different SD_FLAG
definitions. This is inspired by what is done with the syscall
table (see uapi/asm/unistd.h and sys_call_table).

No change in functionality.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/sd_flags.h | 31 +++++++++++++++++++++++++++++++
 include/linux/sched/topology.h | 17 ++++-------------
 2 files changed, 35 insertions(+), 13 deletions(-)
 create mode 100644 include/linux/sched/sd_flags.h

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
new file mode 100644
index 000000000000..c313291fe8bd
--- /dev/null
+++ b/include/linux/sched/sd_flags.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * sched-domains (multiprocessor balancing) flag declarations.
+ */
+
+/* Balance when about to become idle */
+SD_FLAG(SD_BALANCE_NEWIDLE,     0)
+/* Balance on exec */
+SD_FLAG(SD_BALANCE_EXEC,        1)
+/* Balance on fork, clone */
+SD_FLAG(SD_BALANCE_FORK,        2)
+/* Balance on wakeup */
+SD_FLAG(SD_BALANCE_WAKE,        3)
+/* Wake task to waking CPU */
+SD_FLAG(SD_WAKE_AFFINE,         4)
+/* Domain members have different CPU capacities */
+SD_FLAG(SD_ASYM_CPUCAPACITY,    5)
+/* Domain members share CPU capacity */
+SD_FLAG(SD_SHARE_CPUCAPACITY,   6)
+/* Domain members share CPU pkg resources */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 7)
+/* Only a single load balancing instance */
+SD_FLAG(SD_SERIALIZE,           8)
+/* Place busy groups earlier in the domain */
+SD_FLAG(SD_ASYM_PACKING,        9)
+/* Prefer to place tasks in a sibling domain */
+SD_FLAG(SD_PREFER_SIBLING,      10)
+/* sched_domains of this level overlap */
+SD_FLAG(SD_OVERLAP,             11)
+/* cross-node balancing */
+SD_FLAG(SD_NUMA,                12)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index c88249bed095..cf5f16fa2610 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -11,19 +11,10 @@
  */
 #ifdef CONFIG_SMP
 
-#define SD_BALANCE_NEWIDLE	0x0001	/* Balance when about to become idle */
-#define SD_BALANCE_EXEC		0x0002	/* Balance on exec */
-#define SD_BALANCE_FORK		0x0004	/* Balance on fork, clone */
-#define SD_BALANCE_WAKE		0x0008  /* Balance on wakeup */
-#define SD_WAKE_AFFINE		0x0010	/* Wake task to waking CPU */
-#define SD_ASYM_CPUCAPACITY	0x0020  /* Domain members have different CPU capacities */
-#define SD_SHARE_CPUCAPACITY	0x0040	/* Domain members share CPU capacity */
-#define SD_SHARE_PKG_RESOURCES	0x0080	/* Domain members share CPU pkg resources */
-#define SD_SERIALIZE		0x0100	/* Only a single load balancing instance */
-#define SD_ASYM_PACKING		0x0200  /* Place busy groups earlier in the domain */
-#define SD_PREFER_SIBLING	0x0400	/* Prefer to place tasks in a sibling domain */
-#define SD_OVERLAP		0x0800	/* sched_domains of this level overlap */
-#define SD_NUMA			0x1000	/* cross-node balancing */
+/* Generate SD_FOO = VALUE */
+#define SD_FLAG(name, idx) static const unsigned int name = BIT(idx);
+#include <linux/sched/sd_flags.h>
+#undef SD_FLAG
 
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (3 preceding siblings ...)
  2020-07-31 11:54 ` [PATCH v4 04/10] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-08-04 11:08   ` peterz
  2020-08-06 14:07   ` Ingo Molnar
  2020-07-31 11:54 ` [PATCH v4 06/10] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

There are some expectations regarding how sched domain flags should be laid
out, but none of them are checked or asserted in
sched_domain_debug_one(). After staring at said flags for a while, I've
come to realize they all (except *one*) fall in either of two categories:

- Shared with children: those flags are set from the base CPU domain
  upwards. Any domain that has it set will have it set in its children. It
  hints at "some property holds true / some behaviour is enabled until this
  level".

- Shared with parents: those flags are set from the topmost domain
  downwards. Any domain that has it set will have it set in its parents. It
  hints at "some property isn't visible / some behaviour is disabled until
  this level".

The odd one out is SD_PREFER_SIBLING, which is cleared below levels with
SD_ASYM_CPUCAPACITY. The change was introduced by commit

  9c63e84db29b ("sched/core: Disable SD_PREFER_SIBLING on asymmetric CPU capacity domains")

as it could break misfit migration on some systems. In light of this, we
might want to change it back to make it fit one of the two categories and
fix the issue another way.

Tweak the sched_domain flag declaration to assign each flag an expected
layout, and include the rationale for each flag "meta type" assignment as a
comment. Consolidate the flag metadata into an array; the index of a flag's
metadata can easily be found with log2(flag), IOW __ffs(flag).

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/sd_flags.h | 154 +++++++++++++++++++++++++++------
 include/linux/sched/topology.h |  13 ++-
 2 files changed, 140 insertions(+), 27 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index c313291fe8bd..9d0fb8924181 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -3,29 +3,131 @@
  * sched-domains (multiprocessor balancing) flag declarations.
  */
 
-/* Balance when about to become idle */
-SD_FLAG(SD_BALANCE_NEWIDLE,     0)
-/* Balance on exec */
-SD_FLAG(SD_BALANCE_EXEC,        1)
-/* Balance on fork, clone */
-SD_FLAG(SD_BALANCE_FORK,        2)
-/* Balance on wakeup */
-SD_FLAG(SD_BALANCE_WAKE,        3)
-/* Wake task to waking CPU */
-SD_FLAG(SD_WAKE_AFFINE,         4)
-/* Domain members have different CPU capacities */
-SD_FLAG(SD_ASYM_CPUCAPACITY,    5)
-/* Domain members share CPU capacity */
-SD_FLAG(SD_SHARE_CPUCAPACITY,   6)
-/* Domain members share CPU pkg resources */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 7)
-/* Only a single load balancing instance */
-SD_FLAG(SD_SERIALIZE,           8)
-/* Place busy groups earlier in the domain */
-SD_FLAG(SD_ASYM_PACKING,        9)
-/* Prefer to place tasks in a sibling domain */
-SD_FLAG(SD_PREFER_SIBLING,      10)
-/* sched_domains of this level overlap */
-SD_FLAG(SD_OVERLAP,             11)
-/* cross-node balancing */
-SD_FLAG(SD_NUMA,                12)
+#ifndef SD_FLAG
+#define SD_FLAG(x, y, z)
+#endif
+
+/*
+ * Expected flag uses
+ *
+ * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
+ * If a domain has this flag set, all of its children should have it set. This
+ * is usually because the flag describes some shared resource (all CPUs in that
+ * domain share the same foobar), or because they are tied to a scheduling
+ * behaviour that we want to disable at some point in the hierarchy for
+ * scalability reasons.
+ *
+ * In those cases it doesn't make sense to have the flag set for a domain but
+ * not have it in (some of) its children: sched domains ALWAYS span their child
+ * domains, so operations done with parent domains will cover CPUs in the lower
+ * child domains.
+ *
+ *
+ * SHARED_PARENT: These flags are meant to be set from the highest domain
+ * downwards. If a domain has this flag set, all of its parents should have it
+ * set. This is usually for topology properties that start to appear above a
+ * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
+ * socket).
+ */
+#define SDF_SHARED_CHILD 0x1
+#define SDF_SHARED_PARENT 0x2
+
+/*
+ * Balance when about to become idle
+ *
+ * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD)
+
+/*
+ * Balance on exec
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD)
+
+/*
+ * Balance on fork, clone
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD)
+
+/*
+ * Balance on wakeup
+ *
+ * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ */
+SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD)
+
+/*
+ * Consider waking task on waking CPU.
+ *
+ * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ */
+SD_FLAG(SD_WAKE_AFFINE,         4, SDF_SHARED_CHILD)
+
+/*
+ * Domain members have different CPU capacities
+ *
+ * SHARED_PARENT: Set from the topmost domain down to the first domain where
+ * asymmetry is detected.
+ */
+SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)
+
+/*
+ * Domain members share CPU capacity (i.e. SMT)
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * CPU capacity.
+ */
+SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
+
+/*
+ * Domain members share CPU package resources (i.e. caches)
+ *
+ * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
+ * the same cache(s).
+ */
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)
+
+/*
+ * Only a single load balancing instance
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE. Could be set from a
+ * different level upwards, but it doesn't change that if a domain has this flag
+ * set, then all of its parents need to have it too (otherwise the serialization
+ * doesn't make sense).
+ */
+SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
+
+/*
+ * Place busy tasks earlier in the domain
+ *
+ * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
+ * up, but currently assumed to be set from the base domain upwards (see
+ * update_top_cache_domain()).
+ */
+SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD)
+
+/*
+ * Prefer to place tasks in a sibling domain
+ *
+ * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
+ * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ */
+SD_FLAG(SD_PREFER_SIBLING,      10, 0)
+
+/*
+ * sched_groups of this level overlap
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT)
+
+/*
+ * cross-node balancing
+ *
+ * SHARED_PARENT: Set for all NUMA levels above NODE.
+ */
+SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index cf5f16fa2610..99b16d4c03f2 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -12,10 +12,21 @@
 #ifdef CONFIG_SMP
 
 /* Generate SD_FOO = VALUE */
-#define SD_FLAG(name, idx) static const unsigned int name = BIT(idx);
+#define SD_FLAG(name, idx, mflags) static const unsigned int name = BIT(idx);
 #include <linux/sched/sd_flags.h>
 #undef SD_FLAG
 
+#ifdef CONFIG_SCHED_DEBUG
+#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
+static const struct {
+	unsigned int meta_flags;
+	char *name;
+} sd_flag_debug[] = {
+#include <linux/sched/sd_flags.h>
+};
+#undef SD_FLAG
+#endif
+
 #ifdef CONFIG_SCHED_SMT
 static inline int cpu_smt_flags(void)
 {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 06/10] sched/topology: Verify SD_* flags setup when sched_debug is on
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (4 preceding siblings ...)
  2020-07-31 11:54 ` [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-07-31 11:54 ` [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

Now that we have some description of what we expect the flags layout to
be, we can use that to assert at runtime that the actual layout is sane.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 42b89668e1e4..cda02507aebf 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -29,6 +29,8 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 				  struct cpumask *groupmask)
 {
 	struct sched_group *group = sd->groups;
+	unsigned long flags = sd->flags;
+	unsigned int idx;
 
 	cpumask_clear(groupmask);
 
@@ -43,6 +45,21 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 		printk(KERN_ERR "ERROR: domain->groups does not contain CPU%d\n", cpu);
 	}
 
+	for_each_set_bit(idx, &flags, BITS_PER_TYPE(typeof(flags))) {
+		unsigned int flag = BIT(idx);
+		unsigned int meta_flags = sd_flag_debug[idx].meta_flags;
+
+		if ((meta_flags & SDF_SHARED_CHILD) && sd->child &&
+		    !(sd->child->flags & flag))
+			printk(KERN_ERR "ERROR: flag %s set here but not in child\n",
+			       sd_flag_debug[idx].name);
+
+		if ((meta_flags & SDF_SHARED_PARENT) && sd->parent &&
+		    !(sd->parent->flags & flag))
+			printk(KERN_ERR "ERROR: flag %s set here but not in parent\n",
+			       sd_flag_debug[idx].name);
+	}
+
 	printk(KERN_DEBUG "%*s groups:", level + 1, "");
 	do {
 		if (!group) {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (5 preceding siblings ...)
  2020-07-31 11:54 ` [PATCH v4 06/10] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
@ 2020-07-31 11:54 ` Valentin Schneider
  2020-08-06 13:57   ` Ingo Molnar
  2020-07-31 11:55 ` [PATCH v4 08/10] sched/topology: Remove SD_SERIALIZE degeneration special case Valentin Schneider
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:54 UTC (permalink / raw)
  To: linux-kernel
  Cc: mingo, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

I don't think it is going to change much in practice, but we were missing
those:

o SD_BALANCE_WAKE: Used just like the other SD_BALANCE_* flags, so also
  needs > 1 group.
o SD_ASYM_PACKING: Hinges on load balancing (periodic / wakeup), thus needs
  > 1 group to happen
o SD_OVERLAP: Describes domains with overlapping groups; can't have
  overlaps with a single group.

SD_PREFER_SIBLING is as always the odd one out: we currently consider it
in sd_parent_degenerate() but not in sd_degenerate(). It too hinges on load
balancing, and thus won't have any effect when set on a domain with a
single group. Add it to the sd_degenerate() groups mask.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index cda02507aebf..93a7ff52335b 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -161,11 +161,15 @@ static int sd_degenerate(struct sched_domain *sd)
 
 	/* Following flags need at least 2 groups */
 	if (sd->flags & (SD_BALANCE_NEWIDLE |
+			 SD_BALANCE_WAKE |
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
+			 SD_ASYM_PACKING |
 			 SD_SHARE_CPUCAPACITY |
 			 SD_ASYM_CPUCAPACITY |
-			 SD_SHARE_PKG_RESOURCES)) {
+			 SD_SHARE_PKG_RESOURCES |
+			 SD_OVERLAP |
+			 SD_PREFER_SIBLING)) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
@@ -191,11 +195,14 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 	/* Flags needing groups don't count if only 1 group in parent */
 	if (parent->groups == parent->groups->next) {
 		pflags &= ~(SD_BALANCE_NEWIDLE |
+			    SD_BALANCE_WAKE |
 			    SD_BALANCE_FORK |
 			    SD_BALANCE_EXEC |
+			    SD_ASYM_PACKING |
 			    SD_ASYM_CPUCAPACITY |
 			    SD_SHARE_CPUCAPACITY |
 			    SD_SHARE_PKG_RESOURCES |
+			    SD_OVERLAP |
 			    SD_PREFER_SIBLING);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 08/10] sched/topology: Remove SD_SERIALIZE degeneration special case
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (6 preceding siblings ...)
  2020-07-31 11:54 ` [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
@ 2020-07-31 11:55 ` Valentin Schneider
  2020-07-31 11:55 ` [PATCH v4 09/10] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

If there is only a single NUMA node in the system, the only NUMA topology
level that will be generated will be NODE (identity distance), which
doesn't have SD_SERIALIZE.

This means we don't need this special case in sd_parent_degenerate(), as
having the NODE level "naturally" covers it. Thus, remove it.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 93a7ff52335b..c6ecc395c76c 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -204,8 +204,6 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 			    SD_SHARE_PKG_RESOURCES |
 			    SD_OVERLAP |
 			    SD_PREFER_SIBLING);
-		if (nr_node_ids == 1)
-			pflags &= ~SD_SERIALIZE;
 	}
 	if (~cflags & pflags)
 		return 0;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 09/10] sched/topology: Introduce SD metaflag for flags needing > 1 groups
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (7 preceding siblings ...)
  2020-07-31 11:55 ` [PATCH v4 08/10] sched/topology: Remove SD_SERIALIZE degeneration special case Valentin Schneider
@ 2020-07-31 11:55 ` Valentin Schneider
  2020-07-31 11:55 ` [PATCH v4 10/10] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
  2020-08-06 11:25 ` [PATCH v4 00/10] sched: Instrument sched domain flags Dietmar Eggemann
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

In preparation of cleaning up the sd_degenerate*() functions, mark flags
that are only relevant when their associated domain has more than a single
group. With this, build a compile-time mask of those SD flags.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 include/linux/sched/sd_flags.h | 69 ++++++++++++++++++++++------------
 include/linux/sched/topology.h |  7 ++++
 2 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 9d0fb8924181..6c70af066995 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h
@@ -8,7 +8,7 @@
 #endif
 
 /*
- * Expected flag uses
+ * Hierarchical metaflags
  *
  * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
  * If a domain has this flag set, all of its children should have it set. This
@@ -29,36 +29,50 @@
  * certain level (e.g. domain starts spanning CPUs outside of the base CPU's
  * socket).
  */
-#define SDF_SHARED_CHILD 0x1
-#define SDF_SHARED_PARENT 0x2
+#define SDF_SHARED_CHILD       0x1
+#define SDF_SHARED_PARENT      0x2
+
+/*
+ * Behavioural metaflags
+ *
+ * NEEDS_GROUPS: These flags are only relevant if the domain they are set on has
+ * more than one group. This is usually for balancing flags (load balancing
+ * involves equalizing a metric between groups), or for flags describing some
+ * shared resource (which would be shared between groups).
+ */
+#define SDF_NEEDS_GROUPS       0x4
 
 /*
  * Balance when about to become idle
  *
  * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_NEWIDLE,     0, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Balance on exec
  *
  * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_EXEC,        1, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Balance on fork, clone
  *
  * SHARED_CHILD: Set from the base domain up to the NUMA reclaim level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_FORK,        2, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Balance on wakeup
  *
  * SHARED_CHILD: Set from the base domain up to cpuset.sched_relax_domain_level.
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD)
+SD_FLAG(SD_BALANCE_WAKE,        3, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Consider waking task on waking CPU.
@@ -71,63 +85,72 @@ SD_FLAG(SD_WAKE_AFFINE,         4, SDF_SHARED_CHILD)
  * Domain members have different CPU capacities
  *
  * SHARED_PARENT: Set from the topmost domain down to the first domain where
- * asymmetry is detected.
+ *                asymmetry is detected.
+ * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
  */
-SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT)
+SD_FLAG(SD_ASYM_CPUCAPACITY,    5, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * Domain members share CPU capacity (i.e. SMT)
  *
  * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * CPU capacity.
+ *               CPU capacity.
+ * NEEDS_GROUPS: Capacity is shared between groups.
  */
-SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Domain members share CPU package resources (i.e. caches)
  *
  * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
- * the same cache(s).
+ *               the same cache(s).
+ * NEEDS_GROUPS: Caches are shared between groups.
  */
-SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)
+SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Only a single load balancing instance
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE. Could be set from a
- * different level upwards, but it doesn't change that if a domain has this flag
- * set, then all of its parents need to have it too (otherwise the serialization
- * doesn't make sense).
+ *                different level upwards, but it doesn't change that if a
+ *                domain has this flag set, then all of its parents need to have
+ *                it too (otherwise the serialization doesn't make sense).
+ * NEEDS_GROUPS: No point in preserving domain if it has a single group.
  */
-SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT)
+SD_FLAG(SD_SERIALIZE,           8, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * Place busy tasks earlier in the domain
  *
  * SHARED_CHILD: Usually set on the SMT level. Technically could be set further
- * up, but currently assumed to be set from the base domain upwards (see
- * update_top_cache_domain()).
+ *               up, but currently assumed to be set from the base domain
+ *               upwards (see update_top_cache_domain()).
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD)
+SD_FLAG(SD_ASYM_PACKING,        9, SDF_SHARED_CHILD | SDF_NEEDS_GROUPS)
 
 /*
  * Prefer to place tasks in a sibling domain
  *
  * Set up until domains start spanning NUMA nodes. Close to being a SHARED_CHILD
  * flag, but cleared below domains with SD_ASYM_CPUCAPACITY.
+ *
+ * NEEDS_GROUPS: Load balancing flag.
  */
-SD_FLAG(SD_PREFER_SIBLING,      10, 0)
+SD_FLAG(SD_PREFER_SIBLING,      10, SDF_NEEDS_GROUPS)
 
 /*
  * sched_groups of this level overlap
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE.
+ * NEEDS_GROUPS: Overlaps can only exist with more than one group.
  */
-SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT)
+SD_FLAG(SD_OVERLAP,             11, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
 
 /*
  * cross-node balancing
  *
  * SHARED_PARENT: Set for all NUMA levels above NODE.
+ * NEEDS_GROUPS: No point in preserving domain if it has a single group.
  */
-SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT)
+SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 99b16d4c03f2..74e6ec32a413 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -16,6 +16,13 @@
 #include <linux/sched/sd_flags.h>
 #undef SD_FLAG
 
+/* Generate a mask of SD flags with the SDF_NEEDS_GROUPS metaflag */
+#define SD_FLAG(name, idx, mflags) (BIT(idx) * !!((mflags) & SDF_NEEDS_GROUPS)) |
+static const unsigned int SD_DEGENERATE_GROUPS_MASK =
+#include <linux/sched/sd_flags.h>
+0;
+#undef SD_FLAG
+
 #ifdef CONFIG_SCHED_DEBUG
 #define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
 static const struct {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v4 10/10] sched/topology: Use prebuilt SD flag degeneration mask
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (8 preceding siblings ...)
  2020-07-31 11:55 ` [PATCH v4 09/10] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
@ 2020-07-31 11:55 ` Valentin Schneider
  2020-08-06 11:25 ` [PATCH v4 00/10] sched: Instrument sched domain flags Dietmar Eggemann
  10 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-07-31 11:55 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

Leverage SD_DEGENERATE_GROUPS_MASK in sd_degenerate() and
sd_parent_degenerate(). The negation of SD_DEGENERATE_GROUPS_MASK is used
as the mask of flags not requiring groups, which is equivalent to
SD_WAKE_AFFINE, IOW is the same as what we have now.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/topology.c | 33 +++++++--------------------------
 1 file changed, 7 insertions(+), 26 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index c6ecc395c76c..0cc5ce5cd034 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -160,22 +160,12 @@ static int sd_degenerate(struct sched_domain *sd)
 		return 1;
 
 	/* Following flags need at least 2 groups */
-	if (sd->flags & (SD_BALANCE_NEWIDLE |
-			 SD_BALANCE_WAKE |
-			 SD_BALANCE_FORK |
-			 SD_BALANCE_EXEC |
-			 SD_ASYM_PACKING |
-			 SD_SHARE_CPUCAPACITY |
-			 SD_ASYM_CPUCAPACITY |
-			 SD_SHARE_PKG_RESOURCES |
-			 SD_OVERLAP |
-			 SD_PREFER_SIBLING)) {
-		if (sd->groups != sd->groups->next)
-			return 0;
-	}
+	if ((sd->flags & SD_DEGENERATE_GROUPS_MASK) &&
+	    (sd->groups != sd->groups->next))
+		return 0;
 
 	/* Following flags don't use groups */
-	if (sd->flags & (SD_WAKE_AFFINE))
+	if (sd->flags & ~SD_DEGENERATE_GROUPS_MASK)
 		return 0;
 
 	return 1;
@@ -193,18 +183,9 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 		return 0;
 
 	/* Flags needing groups don't count if only 1 group in parent */
-	if (parent->groups == parent->groups->next) {
-		pflags &= ~(SD_BALANCE_NEWIDLE |
-			    SD_BALANCE_WAKE |
-			    SD_BALANCE_FORK |
-			    SD_BALANCE_EXEC |
-			    SD_ASYM_PACKING |
-			    SD_ASYM_CPUCAPACITY |
-			    SD_SHARE_CPUCAPACITY |
-			    SD_SHARE_PKG_RESOURCES |
-			    SD_OVERLAP |
-			    SD_PREFER_SIBLING);
-	}
+	if (parent->groups == parent->groups->next)
+		pflags &= ~SD_DEGENERATE_GROUPS_MASK;
+
 	if (~cflags & pflags)
 		return 0;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata
  2020-07-31 11:54 ` [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
@ 2020-08-04 11:08   ` peterz
  2020-08-04 11:12     ` Valentin Schneider
  2020-08-06 14:07   ` Ingo Molnar
  1 sibling, 1 reply; 21+ messages in thread
From: peterz @ 2020-08-04 11:08 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret

On Fri, Jul 31, 2020 at 12:54:57PM +0100, Valentin Schneider wrote:
> +/*
> + * Domain members share CPU capacity (i.e. SMT)
> + *
> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
> + * CPU capacity.
> + */
> +SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
> +
> +/*
> + * Domain members share CPU package resources (i.e. caches)
> + *
> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
> + * the same cache(s).
> + */
> +SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)

We should probably rename these to SD_SHARE_CORE / SD_SHARE_CACHE or
something, but let me first go through this series (and hopefully apply)
before we go make more changes.


.. onwards!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata
  2020-08-04 11:08   ` peterz
@ 2020-08-04 11:12     ` Valentin Schneider
  0 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-08-04 11:12 UTC (permalink / raw)
  To: peterz
  Cc: linux-kernel, mingo, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret


On 04/08/20 12:08, peterz@infradead.org wrote:
> On Fri, Jul 31, 2020 at 12:54:57PM +0100, Valentin Schneider wrote:
>> +/*
>> + * Domain members share CPU capacity (i.e. SMT)
>> + *
>> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
>> + * CPU capacity.
>> + */
>> +SD_FLAG(SD_SHARE_CPUCAPACITY,   6, SDF_SHARED_CHILD)
>> +
>> +/*
>> + * Domain members share CPU package resources (i.e. caches)
>> + *
>> + * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
>> + * the same cache(s).
>> + */
>> +SD_FLAG(SD_SHARE_PKG_RESOURCES, 7, SDF_SHARED_CHILD)
>
> We should probably rename these to SD_SHARE_CORE / SD_SHARE_CACHE or
> something,

SD_SHARE_CACHE sounds good, given how it's used (LLC stuff).

> but let me first go through this series (and hopefully apply)
> before we go make more changes.
>
>
> .. onwards!

Thanks!

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 00/10] sched: Instrument sched domain flags
  2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
                   ` (9 preceding siblings ...)
  2020-07-31 11:55 ` [PATCH v4 10/10] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
@ 2020-08-06 11:25 ` Dietmar Eggemann
  10 siblings, 0 replies; 21+ messages in thread
From: Dietmar Eggemann @ 2020-08-06 11:25 UTC (permalink / raw)
  To: Valentin Schneider, linux-kernel
  Cc: mingo, peterz, vincent.guittot, morten.rasmussen, Quentin Perret

On 31/07/2020 13:54, Valentin Schneider wrote:
> Hi,
> 
> I've repeatedly stared at an SD flag and asked myself "how should that be
> set up in the domain hierarchy anyway?". I figured that if we formalize our
> flags zoology a bit, we could also do some runtime assertions on them -
> this is what this series is all about.
> 
> Patches
> =======
> 
> The idea is to associate the flags with metaflags that describes how they
> should be set in a sched domain hierarchy ("if this SD has it, all its {parents,
> children} have it") or how they behave wrt degeneration - details are in the
> comments and commit logs. 
> 
> The good thing is that the debugging bits go away when CONFIG_SCHED_DEBUG isn't
> set. The bad thing is that this replaces SD_* flags definitions with some
> unsavoury macros. This is mainly because I wanted to avoid having to duplicate
> work between declaring the flags and declaring their metaflags.
> 
> o Patches 1-3 are topology cleanups / fixes
> o Patches 4-6 instrument SD flags and add assertions
> o Patches 7-10 leverage the instrumentation to factorize domain degeneration
> 
> Revisions
> =========
> 
> v3 -> v4
> --------
> 
> o Reordered the series to have fixes / cleanups first
> 
> o Added SD_ASYM_CPUCAPACITY propagation (Quentin)
> o Made ARM revert back to the default sched topology (Dietmar)
> o Removed SD_SERIALIZE degeneration special case (Peter)
> 
> o Made SD_NUMA and SD_SERIALIZE have SDF_NEEDS_GROUPS
> 
>   As discussed on v3, I thought this wasn't required, but thinking some more
>   about it there can be cases where that changes the current behaviour. For
>   instance, in the following wacky triangle:
> 
>       0\ 30
>       | \
>   20  |  2
>       | /
>       1/ 30
> 
>   there are two unique distances thus two NUMA topology levels, however the
>   first one for node 2 would have the same span as its child domain and thus
>   should be degenerated. If we don't give SD_NUMA and SD_SERIALIZE
>   SDF_NEEDS_GROUPS, this domain wouldn't be denegerated since its child
>   *doesn't* have either SD_NUMA or SD_SERIALIZE (it's the first NUMA domain),
>   and we'd have this weird NUMA domain lingering with a single group.

LGTM.

Tested on Arm & Arm64 dual-cluster big.LITTLE (so only
default_topology[]) with CONFIG_SCHED_MC=y for the following cases:

(1) normal bring-up
(2) CPU hp all but one CPU of one cluster
(3) CPU hp entire cluster

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask
  2020-07-31 11:54 ` [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
@ 2020-08-06 13:57   ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2020-08-06 13:57 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret


* Valentin Schneider <valentin.schneider@arm.com> wrote:

> I don't think it is going to change much in practice, but we were missing
> those:
> 
> o SD_BALANCE_WAKE: Used just like the other SD_BALANCE_* flags, so also
>   needs > 1 group.
> o SD_ASYM_PACKING: Hinges on load balancing (periodic / wakeup), thus needs
>   > 1 group to happen
> o SD_OVERLAP: Describes domains with overlapping groups; can't have
>   overlaps with a single group.
> 
> SD_PREFER_SIBLING is as always the odd one out: we currently consider it
> in sd_parent_degenerate() but not in sd_degenerate(). It too hinges on load
> balancing, and thus won't have any effect when set on a domain with a
> single group. Add it to the sd_degenerate() groups mask.

Would be nice to add these one by one, just in case there's a 
performance or boot regression with any of them.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata
  2020-07-31 11:54 ` [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
  2020-08-04 11:08   ` peterz
@ 2020-08-06 14:07   ` Ingo Molnar
  2020-08-06 16:18     ` Valentin Schneider
  1 sibling, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2020-08-06 14:07 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret


* Valentin Schneider <valentin.schneider@arm.com> wrote:

> +#ifndef SD_FLAG
> +#define SD_FLAG(x, y, z)
> +#endif

AFAICS there's not a single use of sd_flags.h that doesn't come with 
its own SD_FLAG definition, so I suppose this should be:

#ifndef SD_FLAG
# error "Should not happen."
#endif

?

Also, some nits:

> +/*
> + * Expected flag uses
> + *
> + * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
> + * If a domain has this flag set, all of its children should have it set. This
> + * is usually because the flag describes some shared resource (all CPUs in that
> + * domain share the same foobar), or because they are tied to a scheduling
> + * behaviour that we want to disable at some point in the hierarchy for
> + * scalability reasons.

s/foobar/resource

?

> +/*
> + * cross-node balancing
> + *
> + * SHARED_PARENT: Set for all NUMA levels above NODE.
> + */
> +SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT)

s/cross-node/Cross-node

BTW., is there any particular reason why these need to be defines with 
a manual enumeration of flag values - couldn't we generate 
auto-enumerated C enums instead or so?

> +#ifdef CONFIG_SCHED_DEBUG
> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},

s/{./{ .
s/e}/e }

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
  2020-07-31 11:54 ` [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards Valentin Schneider
@ 2020-08-06 14:20   ` Ingo Molnar
  2020-08-06 16:19     ` Valentin Schneider
  0 siblings, 1 reply; 21+ messages in thread
From: Ingo Molnar @ 2020-08-06 14:20 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Quentin Perret, peterz, vincent.guittot,
	dietmar.eggemann, morten.rasmussen


* Valentin Schneider <valentin.schneider@arm.com> wrote:

> We currently set this flag *only* on domains whose topology level exactly
> match the level where we detect asymmetry (as returned by
> asym_cpu_capacity_level()). This is rather problematic.
> 
> Say there are two clusters in the system, one with a lone big CPU and the
> other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):
> 
> DIE [                ]
> MC  [             ][ ]
>      0   1   2   3  4
>      L   L   B   B  B
> 
> asym_cpu_capacity_level() will figure out that the MC level is the one
> where all CPUs can see a CPU of max capacity, and we will thus set
> SD_ASYM_CPUCAPACITY at MC level for all CPUs.
> 
> That lone big CPU will degenerate its MC domain, since it would be alone in
> there, and will end up with just a DIE domain. Since the flag was only set
> at MC, this CPU ends up not seeing any SD with the flag set, which is
> broken.
> 
> Rather than clearing dflags at every topology level, clear it before
> entering the topology level loop. This will properly propagate upwards
> flags that are set starting from a certain level.
> 
> Reviewed-by: Quentin Perret <qperret@google.com>
> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
> ---
>  kernel/sched/topology.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 865fff3ef20a..42b89668e1e4 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -1985,11 +1985,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>  	/* Set up domains for CPUs specified by the cpu_map: */
>  	for_each_cpu(i, cpu_map) {
>  		struct sched_domain_topology_level *tl;
> +		int dflags = 0;
>  
>  		sd = NULL;
>  		for_each_sd_topology(tl) {
> -			int dflags = 0;
> -
>  			if (tl == tl_asym) {
>  				dflags |= SD_ASYM_CPUCAPACITY;
>  				has_asym = true;

I'd suggest ordering all patches with potential side effects at the 
end, to make them easier to bisect.

I.e. I'd reorder this series to do:

 - Obviously correct renamings & cleanups

 - Convert the code over to the new instrumented sd-flags method. This 
   will presumably spew a few warnings for problems the new debugging 
   checks catch in existing topologies.

 - Do all the behavioral changes and fixes like this patch, even if we 
   think that they have no serious side effects.

In that sense it might make sense to order the two ARM patches to the 
later stage as well - but I suppose it's OK to do those two first as 
well.

Nice series otherwise, these new checks look really useful and already 
caught bugs.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata
  2020-08-06 14:07   ` Ingo Molnar
@ 2020-08-06 16:18     ` Valentin Schneider
  2020-08-08  0:19       ` Valentin Schneider
  0 siblings, 1 reply; 21+ messages in thread
From: Valentin Schneider @ 2020-08-06 16:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret


On 06/08/20 15:07, Ingo Molnar wrote:
> * Valentin Schneider <valentin.schneider@arm.com> wrote:
>
>> +#ifndef SD_FLAG
>> +#define SD_FLAG(x, y, z)
>> +#endif
>
> AFAICS there's not a single use of sd_flags.h that doesn't come with
> its own SD_FLAG definition, so I suppose this should be:
>
> #ifndef SD_FLAG
> # error "Should not happen."
> #endif
>
> ?
>

I can give this a try; for the context, I copied uapi/asm-generic/unistd.h
without thinking too much, and that does:

  #ifndef __SYSCALL
  #define __SYSCALL(x, y)
  #endif

> Also, some nits:
>
>> +/*
>> + * Expected flag uses
>> + *
>> + * SHARED_CHILD: These flags are meant to be set from the base domain upwards.
>> + * If a domain has this flag set, all of its children should have it set. This
>> + * is usually because the flag describes some shared resource (all CPUs in that
>> + * domain share the same foobar), or because they are tied to a scheduling
>> + * behaviour that we want to disable at some point in the hierarchy for
>> + * scalability reasons.
>
> s/foobar/resource
>
> ?
>

That's better indeed, I think that's a remnant of when I was listing a lot
of things that could be shared.

>> +/*
>> + * cross-node balancing
>> + *
>> + * SHARED_PARENT: Set for all NUMA levels above NODE.
>> + */
>> +SD_FLAG(SD_NUMA,                12, SDF_SHARED_PARENT)
>
> s/cross-node/Cross-node
>
> BTW., is there any particular reason why these need to be defines with
> a manual enumeration of flag values - couldn't we generate
> auto-enumerated C enums instead or so?
>

I remember exploring a few different options there, but it's been a while
already. I think one of the reasons I kept some form of explicit assignment
is to avoid making reading

  /proc/sys/kernel/sched_domain/cpu*/domain*/flags

more convoluted than it is now (i.e. you have to go fetch the
bit -> name translation in the source code).

In the grand scheme of things I'd actually like to have this file output
the names of the flags rather than their values (since I now save them when
SCHED_DEBUG=y), but I didn't find a simple way to hack the existing SD ctl
table (sd_alloc_ctl_domain_table() and co) into doing this.


Now as to making this fully automagic, I *think* I could do something like
having a first enum to set up an ordering:

  #define SD_FLAG(name, ...) __##name,
  enum {
    #include <linux/sched/sd_flags.h>
  };

A second one to have powers of 2:

  #define SD_FLAG(name, ...) name = 1 << __##name,
  enum {
    #include <linux/sched/sd_flags.h>
  };

And finally the metadata array assignment might be doable with:

  #define SD_FLAG(_name, mflags) [__##_name] = { .meta_flags = mflags, .name = #_name },

Or, if there is a way somehow to directly get powers of 2 out of an enum:

  #define SD_FLAG(_name, mflags) [_ffs(_name)] = { .meta_flags = mflags, .name = #_name },

>> +#ifdef CONFIG_SCHED_DEBUG
>> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
>
> s/{./{ .
> s/e}/e }
>
> Thanks,
>
>       Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
  2020-08-06 14:20   ` Ingo Molnar
@ 2020-08-06 16:19     ` Valentin Schneider
  2020-08-06 16:46       ` Ingo Molnar
  0 siblings, 1 reply; 21+ messages in thread
From: Valentin Schneider @ 2020-08-06 16:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Quentin Perret, peterz, vincent.guittot,
	dietmar.eggemann, morten.rasmussen


On 06/08/20 15:20, Ingo Molnar wrote:
> * Valentin Schneider <valentin.schneider@arm.com> wrote:
>
>> We currently set this flag *only* on domains whose topology level exactly
>> match the level where we detect asymmetry (as returned by
>> asym_cpu_capacity_level()). This is rather problematic.
>>
>> Say there are two clusters in the system, one with a lone big CPU and the
>> other with a mix of big and LITTLE CPUs (as is allowed by DynamIQ):
>>
>> DIE [                ]
>> MC  [             ][ ]
>>      0   1   2   3  4
>>      L   L   B   B  B
>>
>> asym_cpu_capacity_level() will figure out that the MC level is the one
>> where all CPUs can see a CPU of max capacity, and we will thus set
>> SD_ASYM_CPUCAPACITY at MC level for all CPUs.
>>
>> That lone big CPU will degenerate its MC domain, since it would be alone in
>> there, and will end up with just a DIE domain. Since the flag was only set
>> at MC, this CPU ends up not seeing any SD with the flag set, which is
>> broken.
>>
>> Rather than clearing dflags at every topology level, clear it before
>> entering the topology level loop. This will properly propagate upwards
>> flags that are set starting from a certain level.
>>
>> Reviewed-by: Quentin Perret <qperret@google.com>
>> Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
>> ---
>>  kernel/sched/topology.c | 3 +--
>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 865fff3ef20a..42b89668e1e4 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -1985,11 +1985,10 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
>>      /* Set up domains for CPUs specified by the cpu_map: */
>>      for_each_cpu(i, cpu_map) {
>>              struct sched_domain_topology_level *tl;
>> +		int dflags = 0;
>>
>>              sd = NULL;
>>              for_each_sd_topology(tl) {
>> -			int dflags = 0;
>> -
>>                      if (tl == tl_asym) {
>>                              dflags |= SD_ASYM_CPUCAPACITY;
>>                              has_asym = true;
>
> I'd suggest ordering all patches with potential side effects at the
> end, to make them easier to bisect.
>
> I.e. I'd reorder this series to do:
>
>  - Obviously correct renamings & cleanups
>
>  - Convert the code over to the new instrumented sd-flags method. This
>    will presumably spew a few warnings for problems the new debugging
>    checks catch in existing topologies.
>
>  - Do all the behavioral changes and fixes like this patch, even if we
>    think that they have no serious side effects.
>
> In that sense it might make sense to order the two ARM patches to the
> later stage as well - but I suppose it's OK to do those two first as
> well.
>

This does sound sensible; I can shuffle this around for v5.

FWIW the reason I had this very patch before the instrumentation is that
IMO it really wants to be propagated and could thus directly be tagged with
SDF_SHARED_PARENT when the instrumentation hits. It's a minor thing, but
having it after the instrumentation means that I'll first have to tag it
without any hierarchical metaflag, and then tag it with SDF_SHARED_PARENT
in the propagation fix.

If that sounds fine by you, I'll do just that.

> Nice series otherwise, these new checks look really useful and already
> caught bugs.
>

Thanks!

> Thanks,
>
>       Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards
  2020-08-06 16:19     ` Valentin Schneider
@ 2020-08-06 16:46       ` Ingo Molnar
  0 siblings, 0 replies; 21+ messages in thread
From: Ingo Molnar @ 2020-08-06 16:46 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: linux-kernel, Quentin Perret, peterz, vincent.guittot,
	dietmar.eggemann, morten.rasmussen


* Valentin Schneider <valentin.schneider@arm.com> wrote:

> This does sound sensible; I can shuffle this around for v5.

Thanks!

> FWIW the reason I had this very patch before the instrumentation is that
> IMO it really wants to be propagated and could thus directly be tagged with
> SDF_SHARED_PARENT when the instrumentation hits. It's a minor thing, but
> having it after the instrumentation means that I'll first have to tag it
> without any hierarchical metaflag, and then tag it with SDF_SHARED_PARENT
> in the propagation fix.
> 
> If that sounds fine by you, I'll do just that.

Sounds good to me!

	Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata
  2020-08-06 16:18     ` Valentin Schneider
@ 2020-08-08  0:19       ` Valentin Schneider
  0 siblings, 0 replies; 21+ messages in thread
From: Valentin Schneider @ 2020-08-08  0:19 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, peterz, vincent.guittot, dietmar.eggemann,
	morten.rasmussen, Quentin Perret


On 06/08/20 17:18, Valentin Schneider wrote:
> In the grand scheme of things I'd actually like to have this file output
> the names of the flags rather than their values (since I now save them when
> SCHED_DEBUG=y), but I didn't find a simple way to hack the existing SD ctl
> table (sd_alloc_ctl_domain_table() and co) into doing this.
>

I "just" had to spend some more time grokking how the whole ctl
proc_handler thing is supposed to work; I now have a mostly working
solution, i.e. on my Juno:

  $ cat /proc/sys/kernel/sched_domain/cpu0/domain*/flags
  SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES
  SD_BALANCE_NEWIDLE SD_BALANCE_EXEC SD_BALANCE_FORK SD_WAKE_AFFINE SD_ASYM_CPUCAPACITY SD_PREFER_SIBLING

I'll clean that up and go for the automagic ordering I previously
described.

>
> Now as to making this fully automagic, I *think* I could do something like
> having a first enum to set up an ordering:
>
>   #define SD_FLAG(name, ...) __##name,
>   enum {
>     #include <linux/sched/sd_flags.h>
>   };
>
> A second one to have powers of 2:
>
>   #define SD_FLAG(name, ...) name = 1 << __##name,
>   enum {
>     #include <linux/sched/sd_flags.h>
>   };
>
> And finally the metadata array assignment might be doable with:
>
>   #define SD_FLAG(_name, mflags) [__##_name] = { .meta_flags = mflags, .name = #_name },
>
> Or, if there is a way somehow to directly get powers of 2 out of an enum:
>
>   #define SD_FLAG(_name, mflags) [_ffs(_name)] = { .meta_flags = mflags, .name = #_name },
>
>>> +#ifdef CONFIG_SCHED_DEBUG
>>> +#define SD_FLAG(_name, idx, mflags) [idx] = {.meta_flags = mflags, .name = #_name},
>>
>> s/{./{ .
>> s/e}/e }
>>
>> Thanks,
>>
>>       Ingo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-08-08  0:19 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-31 11:54 [PATCH v4 00/10] sched: Instrument sched domain flags Valentin Schneider
2020-07-31 11:54 ` [PATCH v4 01/10] ARM, sched/topology: Remove SD_SHARE_POWERDOMAIN Valentin Schneider
2020-07-31 11:54 ` [PATCH v4 02/10] ARM: Revert back to default scheduler topology Valentin Schneider
2020-07-31 11:54 ` [PATCH v4 03/10] sched/topology: Propagate SD_ASYM_CPUCAPACITY upwards Valentin Schneider
2020-08-06 14:20   ` Ingo Molnar
2020-08-06 16:19     ` Valentin Schneider
2020-08-06 16:46       ` Ingo Molnar
2020-07-31 11:54 ` [PATCH v4 04/10] sched/topology: Split out SD_* flags declaration to its own file Valentin Schneider
2020-07-31 11:54 ` [PATCH v4 05/10] sched/topology: Define and assign sched_domain flag metadata Valentin Schneider
2020-08-04 11:08   ` peterz
2020-08-04 11:12     ` Valentin Schneider
2020-08-06 14:07   ` Ingo Molnar
2020-08-06 16:18     ` Valentin Schneider
2020-08-08  0:19       ` Valentin Schneider
2020-07-31 11:54 ` [PATCH v4 06/10] sched/topology: Verify SD_* flags setup when sched_debug is on Valentin Schneider
2020-07-31 11:54 ` [PATCH v4 07/10] sched/topology: Add more flags to the SD degeneration mask Valentin Schneider
2020-08-06 13:57   ` Ingo Molnar
2020-07-31 11:55 ` [PATCH v4 08/10] sched/topology: Remove SD_SERIALIZE degeneration special case Valentin Schneider
2020-07-31 11:55 ` [PATCH v4 09/10] sched/topology: Introduce SD metaflag for flags needing > 1 groups Valentin Schneider
2020-07-31 11:55 ` [PATCH v4 10/10] sched/topology: Use prebuilt SD flag degeneration mask Valentin Schneider
2020-08-06 11:25 ` [PATCH v4 00/10] sched: Instrument sched domain flags Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).