linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/8] change scheduler domain hierarchy set-up
@ 2013-12-13 12:11 dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 1/8] sched: arch interface for scheduler domain setup dietmar.eggemann
                   ` (8 more replies)
  0 siblings, 9 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch-set cleans up the scheduler domain level initialization code.
It is based on the idea of Peter Zijlstra to use a single scheduler domain
init function sketched here: https://lkml.org/lkml/2013/11/5/239 

What does the patch-set try to achieve:

1) Let the arch define the conventional (here defined to all levels except
   the NUMA levels) scheduler domain hierarchy.  The arch specifies per
   scheduler domain the pointer to the getter function of the
   corresponding cpu mask as well as the topology related scheduler
   domain flags.

2) Unify the set-up code for conventional and NUMA scheduler domains.
   All scheduler domain topology levels are now allocated in the same
   function and the scheduler does not rely on a default scheduler
   domain topology array any more.  All scheduler domains now use a
   common initialization function which makes the existing SD_FOO_INIT
   macros redundant.

3) The arch is no longer limited to the existing scheduler domain levels
   (SMT, MC, BOOK, CPU) but can easily define additional levels.

4) Prepare the mechanics to make it easier to integrate the provision of
   additional topology related data (e.g. energy information) to the
   scheduler.

Current limitations:

1) The arch interface for scheduler domain set-up is only implemented for
   the ARM and the x86 arch and tested on an ARM TC2 (2 clusters, one with
   2 Cortex A15 and the other with 3 Cortex A7) and an Intel i5-520M (2
   cores with 2 threads each) platform.   

2) For other archs it has only been compile tested for certain
   configurations (powerpc: chroma_defconfig, mips: ip27_defconfig,
   s390: defconfig, tile: tilegx_defconfig).  Obviously, linking these
   kernels doesn't succeed due to the missing arch interface for
   scheduler domain set-up implementation (undefined reference to
   arch_sched_domain_info).

3) It does not delete the arch specific SD_FOO_INIT macros for ia64,
   metag, s390 and tile arch.

4) It does not delete the arch_sd_sibling_asym_packing function which
   will be redundant once the arch interface for scheduler domain set-up
   has been implemented for powerpc arch.

5) There is no default set-up any more.  Each arch has to define a
   arch_sched_domain_info array, a circumstance which might not be
   desirable.

6) It has to be specified what happens when an arch specifies an
   arch_sched_domain_info array with only a { NULL, } entry.

The patch-set is against v3.13-rc3.

I restrict the discussion to the scheduler community for now and will cc
the arch maintainer later, in case some level of agreement over these
patches can be reached.

Dietmar Eggemann (8):
  sched: arch interface for scheduler domain setup
  arm: implement arch interface for scheduler domain setup
  x86: implement arch interface for scheduler domain setup
  sched: allocate the entire topology array dynamically
  sched: introduce common topology level init function
  sched: replace for_each_sd_topology with explicit for loop
  sched: replace topology level init func ptr with sd_init
  sched: remove scheduler domain naming

 arch/arm/kernel/topology.c |    8 +
 arch/x86/kernel/topology.c |   12 ++
 include/linux/sched.h      |    3 -
 include/linux/topology.h   |  156 +++++-------------
 kernel/sched/core.c        |  380 +++++++++++++++++++++-----------------------
 kernel/sched/sched.h       |   19 +++
 6 files changed, 258 insertions(+), 320 deletions(-)

-- 
1.7.9.5



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC PATCH 1/8] sched: arch interface for scheduler domain setup
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 2/8] arm: implement " dietmar.eggemann
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch defines an arch interface to provide the number of scheduler
domain levels, the pointer to the function returning the cpu mask and the
topology flags for each scheduler domain level.
The cpu mask getter functions for the smt and cpu level as well as the
function pointer sched_domain_mask_f have been moved from the scheduler
code into the topology header file.
The arch has to provide the arch_sched_domain_info array with an entry for
each scheduler domain level specifying the pointer to the cpu mask getter
function and the topology flags.
This patch covers all scheduler domain levels except NUMA levels.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 include/linux/topology.h |   41 +++++++++++++++++++++++++++++++++++++++++
 kernel/sched/core.c      |   13 -------------
 2 files changed, 41 insertions(+), 13 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce997d6..d147f63c2b1f 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -295,4 +295,45 @@ static inline int cpu_to_mem(int cpu)
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
+#ifdef CONFIG_SCHED_SMT
+static inline const struct cpumask *cpu_smt_mask(int cpu)
+{
+	return topology_thread_cpumask(cpu);
+}
+#endif
+
+static inline const struct cpumask *cpu_cpu_mask(int cpu)
+{
+	return cpumask_of_node(cpu_to_node(cpu));
+}
+
+typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
+
+typedef struct {
+	sched_domain_mask_f mask;
+	unsigned int flags;
+} arch_sched_domain_info_t;
+
+extern arch_sched_domain_info_t arch_sched_domain_info[];
+
+static inline unsigned int arch_sd_levels(void)
+{
+	unsigned int i;
+
+	for (i = 0; arch_sched_domain_info[i].mask; i++)
+		;
+
+	return i;
+}
+
+static inline sched_domain_mask_f arch_sd_mask(unsigned int i)
+{
+	return arch_sched_domain_info[i].mask;
+}
+
+static inline unsigned int arch_sd_flags(unsigned int i)
+{
+	return arch_sched_domain_info[i].flags;
+}
+
 #endif /* _LINUX_TOPOLOGY_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e85cda20ab2b..fe21f7efb2ee 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4987,11 +4987,6 @@ static int __init isolated_cpu_setup(char *str)
 
 __setup("isolcpus=", isolated_cpu_setup);
 
-static const struct cpumask *cpu_cpu_mask(int cpu)
-{
-	return cpumask_of_node(cpu_to_node(cpu));
-}
-
 struct sd_data {
 	struct sched_domain **__percpu sd;
 	struct sched_group **__percpu sg;
@@ -5013,7 +5008,6 @@ enum s_alloc {
 struct sched_domain_topology_level;
 
 typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
-typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
 
 #define SDTL_OVERLAP	0x01
 
@@ -5367,13 +5361,6 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 		*per_cpu_ptr(sdd->sgp, cpu) = NULL;
 }
 
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *cpu_smt_mask(int cpu)
-{
-	return topology_thread_cpumask(cpu);
-}
-#endif
-
 /*
  * Topology list, bottom-up.
  */
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 2/8] arm: implement arch interface for scheduler domain setup
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 1/8] sched: arch interface for scheduler domain setup dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 3/8] x86: " dietmar.eggemann
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch provides the arch_sched_domain_info array for the ARM arch.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 arch/arm/kernel/topology.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 85a87370f144..0be2685743ff 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -25,6 +25,14 @@
 #include <asm/cputype.h>
 #include <asm/topology.h>
 
+arch_sched_domain_info_t arch_sched_domain_info[] = {
+#ifdef CONFIG_SCHED_MC
+		{ cpu_coregroup_mask, SD_SHARE_PKG_RESOURCES },
+#endif
+		{ cpu_cpu_mask, SD_PREFER_SIBLING },
+		{ NULL, },
+};
+
 /*
  * cpu power scale management
  */
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 3/8] x86: implement arch interface for scheduler domain setup
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 1/8] sched: arch interface for scheduler domain setup dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 2/8] arm: implement " dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 4/8] sched: allocate the entire topology array dynamically dietmar.eggemann
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch provides the arch_sched_domain_info array for the x86 arch.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 arch/x86/kernel/topology.c |   12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c
index 649b010da00b..b9ddd4b50265 100644
--- a/arch/x86/kernel/topology.c
+++ b/arch/x86/kernel/topology.c
@@ -30,11 +30,23 @@
 #include <linux/mmzone.h>
 #include <linux/init.h>
 #include <linux/smp.h>
+#include <linux/sched.h>
 #include <linux/irq.h>
 #include <asm/cpu.h>
 
 static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);
 
+arch_sched_domain_info_t arch_sched_domain_info[] = {
+#ifdef CONFIG_SCHED_SMT
+		{ cpu_smt_mask, SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES },
+#endif
+#ifdef CONFIG_SCHED_MC
+		{ cpu_coregroup_mask, SD_SHARE_PKG_RESOURCES },
+#endif
+		{ cpu_cpu_mask, SD_PREFER_SIBLING },
+		{ NULL, },
+};
+
 #ifdef CONFIG_HOTPLUG_CPU
 
 #ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 4/8] sched: allocate the entire topology array dynamically
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
                   ` (2 preceding siblings ...)
  2013-12-13 12:11 ` [RFC PATCH 3/8] x86: " dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 5/8] sched: introduce common topology level init function dietmar.eggemann
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch prepares the scheduler domain level set up code to be able to
not rely on the default_topology[] any more.
The NUMA specific function sched_init_numa, renamed to sched_alloc(), is
now for all systems to allocate the memory for the
sched_domain_topology_level structures.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c |  134 ++++++++++++++++++++++++++-------------------------
 1 file changed, 68 insertions(+), 66 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index fe21f7efb2ee..b36a4edddc37 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5486,12 +5486,71 @@ static bool find_numa_distance(int distance)
 	return false;
 }
 
-static void sched_init_numa(void)
+static void sched_domains_numa_masks_set(int cpu)
+{
+	int i, j;
+	int node = cpu_to_node(cpu);
+
+	for (i = 0; i < sched_domains_numa_levels; i++) {
+		for (j = 0; j < nr_node_ids; j++) {
+			if (node_distance(j, node) <= sched_domains_numa_distance[i])
+				cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]);
+		}
+	}
+}
+
+static void sched_domains_numa_masks_clear(int cpu)
+{
+	int i, j;
+	for (i = 0; i < sched_domains_numa_levels; i++) {
+		for (j = 0; j < nr_node_ids; j++)
+			cpumask_clear_cpu(cpu, sched_domains_numa_masks[i][j]);
+	}
+}
+
+/*
+ * Update sched_domains_numa_masks[level][node] array when new cpus
+ * are onlined.
+ */
+static int sched_domains_numa_masks_update(struct notifier_block *nfb,
+					   unsigned long action,
+					   void *hcpu)
+{
+	int cpu = (long)hcpu;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_ONLINE:
+		sched_domains_numa_masks_set(cpu);
+		break;
+
+	case CPU_DEAD:
+		sched_domains_numa_masks_clear(cpu);
+		break;
+
+	default:
+		return NOTIFY_DONE;
+	}
+
+	return NOTIFY_OK;
+}
+#else
+static int sched_domains_numa_masks_update(struct notifier_block *nfb,
+					   unsigned long action,
+					   void *hcpu)
+{
+	return 0;
+}
+#endif /* CONFIG_NUMA */
+
+static void sched_alloc(void)
 {
-	int next_distance, curr_distance = node_distance(0, 0);
 	struct sched_domain_topology_level *tl;
 	int level = 0;
-	int i, j, k;
+	int i;
+
+#ifdef CONFIG_NUMA
+	int next_distance, curr_distance = node_distance(0, 0);
+	int j, k;
 
 	sched_domains_numa_distance = kzalloc(sizeof(int) * nr_node_ids, GFP_KERNEL);
 	if (!sched_domains_numa_distance)
@@ -5587,18 +5646,22 @@ static void sched_init_numa(void)
 			}
 		}
 	}
+#endif /* CONFIG_NUMA */
 
 	tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
 			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
 	if (!tl)
 		return;
 
+	sched_domain_topology = tl;
+
 	/*
 	 * Copy the default topology bits..
 	 */
 	for (i = 0; default_topology[i].init; i++)
 		tl[i] = default_topology[i];
 
+#ifdef CONFIG_NUMA
 	/*
 	 * .. and append 'j' levels of NUMA goodness.
 	 */
@@ -5611,70 +5674,9 @@ static void sched_init_numa(void)
 		};
 	}
 
-	sched_domain_topology = tl;
-
 	sched_domains_numa_levels = level;
-}
-
-static void sched_domains_numa_masks_set(int cpu)
-{
-	int i, j;
-	int node = cpu_to_node(cpu);
-
-	for (i = 0; i < sched_domains_numa_levels; i++) {
-		for (j = 0; j < nr_node_ids; j++) {
-			if (node_distance(j, node) <= sched_domains_numa_distance[i])
-				cpumask_set_cpu(cpu, sched_domains_numa_masks[i][j]);
-		}
-	}
-}
-
-static void sched_domains_numa_masks_clear(int cpu)
-{
-	int i, j;
-	for (i = 0; i < sched_domains_numa_levels; i++) {
-		for (j = 0; j < nr_node_ids; j++)
-			cpumask_clear_cpu(cpu, sched_domains_numa_masks[i][j]);
-	}
-}
-
-/*
- * Update sched_domains_numa_masks[level][node] array when new cpus
- * are onlined.
- */
-static int sched_domains_numa_masks_update(struct notifier_block *nfb,
-					   unsigned long action,
-					   void *hcpu)
-{
-	int cpu = (long)hcpu;
-
-	switch (action & ~CPU_TASKS_FROZEN) {
-	case CPU_ONLINE:
-		sched_domains_numa_masks_set(cpu);
-		break;
-
-	case CPU_DEAD:
-		sched_domains_numa_masks_clear(cpu);
-		break;
-
-	default:
-		return NOTIFY_DONE;
-	}
-
-	return NOTIFY_OK;
-}
-#else
-static inline void sched_init_numa(void)
-{
-}
-
-static int sched_domains_numa_masks_update(struct notifier_block *nfb,
-					   unsigned long action,
-					   void *hcpu)
-{
-	return 0;
-}
 #endif /* CONFIG_NUMA */
+}
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
 {
@@ -6108,7 +6110,7 @@ void __init sched_init_smp(void)
 	alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
 	alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
 
-	sched_init_numa();
+	sched_alloc();
 
 	/*
 	 * There's no userspace yet to cause hotplug operations; hence all the
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 5/8] sched: introduce common topology level init function
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
                   ` (3 preceding siblings ...)
  2013-12-13 12:11 ` [RFC PATCH 4/8] sched: allocate the entire topology array dynamically dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-20 14:04   ` Peter Zijlstra
  2013-12-13 12:11 ` [RFC PATCH 6/8] sched: replace for_each_sd_topology with explicit for loop dietmar.eggemann
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch introduces the common scheduler domain level init function
sd_init and the definition of the topology related scheduler domain flags.
The sd_init function bases on the idea of Peter Zijlstra:
https://lkml.org/lkml/2013/11/5/239.
It should replace all default SD_FOO_INIT macros and the one defined in
the archs as well as the sd_numa_init function.  The [min|max]_interval
and the balance_interval values are now calculated based on the cpu mask
weight. Fine tuning of the scheduler domains is done based on topology
flags.
The topology information provided by the topology flags has to be
converted into scheduler behaviour, i.e. that based on the topology flags
the various struct sched_domain data members have to be tuned.
The related if/else condition construct works in the following order:
SD_SHARE_CPUPOWER flag indicates SMT level, SD_SHARE_PKG_RESOURCES
flag MC level, SD_NUMA flag one of the NUMA levels and the final else
condition indicates CPU level.  By providing the arch the possibility to
specify the topology flags, we obviously rely on correctly configured
arch_sched_domain_info array here.
The sd_init function still calls arch_sd_sibling_asym_packing which is
only used by the powerpc arch.  If the SD_ASYM_PACKING flag will be set
via the arch_sched_domain_info array the arch_sd_sibling_asym_packing
function can be deleted.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c  |   86 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/sched/sched.h |   19 +++++++++++
 2 files changed, 105 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b36a4edddc37..37febb067bad 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5542,6 +5542,92 @@ static int sched_domains_numa_masks_update(struct notifier_block *nfb,
 }
 #endif /* CONFIG_NUMA */
 
+static struct sched_domain *
+sd_init(struct sched_domain_topology_level *tl, int cpu)
+{
+	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
+	int sd_weight;
+
+#ifdef CONFIG_NUMA
+	/*
+	 * Ugly hack to pass state to sd_numa_mask()...
+	 */
+	sched_domains_curr_level = tl->numa_level;
+#endif
+
+	sd_weight = cpumask_weight(tl->mask(cpu));
+
+	if (WARN_ONCE((tl->flags & ~SDTL_OVERLAP) & ~TOPOLOGY_SD_FLAGS,
+			"wrong sd_flags in topology description\n"))
+		tl->flags &= ~TOPOLOGY_SD_FLAGS;
+
+	*sd = (struct sched_domain){
+				.min_interval  = sd_weight,
+				.max_interval  = 2*sd_weight,
+				.busy_factor   = 64,
+				.imbalance_pct = 125,
+
+				.flags =  1*SD_LOAD_BALANCE
+						| 1*SD_BALANCE_NEWIDLE
+						| 1*SD_BALANCE_EXEC
+						| 1*SD_BALANCE_FORK
+						| 1*SD_WAKE_AFFINE
+						,
+
+				.last_balance     = jiffies,
+				.balance_interval = sd_weight,
+	};
+
+	sd->flags |= (tl->flags & ~SDTL_OVERLAP);
+
+	/*
+	 * Convert topological properties into behaviour.
+	 */
+
+	if (sd->flags & SD_SHARE_CPUPOWER) {
+		sd->imbalance_pct = 110;
+		sd->smt_gain = 1178; /* ~15% */
+		SD_INIT_NAME(sd, SMT);
+	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
+		sd->cache_nice_tries = 1;
+		sd->busy_idx = 2;
+
+		/*
+		 * Call SMT specific arch topology function.
+		 * This goes away once the powerpc arch uses
+		 * the new interface for scheduler domain
+		 * setup.
+		 */
+		sd->flags |= arch_sd_sibling_asym_packing();
+
+		SD_INIT_NAME(sd, MC);
+#ifdef CONFIG_NUMA
+	} else if (sd->flags & SD_NUMA) {
+		sd->busy_factor = 32,
+		sd->cache_nice_tries = 2;
+		sd->busy_idx = 3;
+		sd->idle_idx = 2;
+		sd->flags |= SD_SERIALIZE;
+		if (sched_domains_numa_distance[tl->numa_level]
+				> RECLAIM_DISTANCE) {
+			sd->flags &= ~(SD_BALANCE_EXEC |
+				       SD_BALANCE_FORK |
+				       SD_WAKE_AFFINE);
+		}
+		SD_INIT_NAME(sd, NUMA);
+#endif
+	} else {
+		sd->cache_nice_tries = 1;
+		sd->busy_idx = 2;
+		sd->idle_idx = 1;
+		SD_INIT_NAME(sd, CPU);
+	}
+
+	sd->private = &tl->data;
+
+	return sd;
+}
+
 static void sched_alloc(void)
 {
 	struct sched_domain_topology_level *tl;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 88c85b21d633..d4d7dbe716db 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1414,3 +1414,22 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+/*
+ * SD_flags allowed in topology descriptions.
+ *
+ * SD_SHARE_CPUPOWER      - describes SMT topologies
+ * SD_SHARE_PKG_RESOURCES - describes shared caches
+ * SD_NUMA                - describes NUMA topologies
+ *
+ * Odd one out:
+ * SD_ASYM_PACKING        - describes SMT quirks
+ *
+ * SD_PREFER_SIBLING      - describes preference for sibling domain
+ */
+#define TOPOLOGY_SD_FLAGS         \
+	(SD_SHARE_CPUPOWER |      \
+	 SD_SHARE_PKG_RESOURCES | \
+	 SD_NUMA |                \
+	 SD_ASYM_PACKING |        \
+	 SD_PREFER_SIBLING)
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 6/8] sched: replace for_each_sd_topology with explicit for loop
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
                   ` (4 preceding siblings ...)
  2013-12-13 12:11 ` [RFC PATCH 5/8] sched: introduce common topology level init function dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 7/8] sched: replace topology level init func ptr with sd_init dietmar.eggemann
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch prepares the scheduler domain level set up code to be able to
not rely on the default_topology[] any more.
The for_each_sd_topology macro is replaced with an explicit for loop
iterating over the sched_domain_topology array.  It introduces the file
global variable sched_domains_levels to hold the number of scheduler
domain levels.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 kernel/sched/core.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 37febb067bad..897ff9222cab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5379,9 +5379,7 @@ static struct sched_domain_topology_level default_topology[] = {
 };
 
 static struct sched_domain_topology_level *sched_domain_topology = default_topology;
-
-#define for_each_sd_topology(tl)			\
-	for (tl = sched_domain_topology; tl->init; tl++)
+static int sched_domains_levels;
 
 #ifdef CONFIG_NUMA
 
@@ -5734,7 +5732,9 @@ static void sched_alloc(void)
 	}
 #endif /* CONFIG_NUMA */
 
-	tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
+	sched_domains_levels = arch_sd_levels();
+
+	tl = kzalloc((sched_domains_levels + level) *
 			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
 	if (!tl)
 		return;
@@ -5761,15 +5761,16 @@ static void sched_alloc(void)
 	}
 
 	sched_domains_numa_levels = level;
+	sched_domains_levels += level;
 #endif /* CONFIG_NUMA */
 }
 
 static int __sdt_alloc(const struct cpumask *cpu_map)
 {
-	struct sched_domain_topology_level *tl;
-	int j;
+	int i, j;
 
-	for_each_sd_topology(tl) {
+	for (i = 0; i < sched_domains_levels; i++) {
+		struct sched_domain_topology_level *tl = &sched_domain_topology[i];
 		struct sd_data *sdd = &tl->data;
 
 		sdd->sd = alloc_percpu(struct sched_domain *);
@@ -5819,10 +5820,10 @@ static int __sdt_alloc(const struct cpumask *cpu_map)
 
 static void __sdt_free(const struct cpumask *cpu_map)
 {
-	struct sched_domain_topology_level *tl;
-	int j;
+	int i, j;
 
-	for_each_sd_topology(tl) {
+	for (i = 0; i < sched_domains_levels; i++) {
+		struct sched_domain_topology_level *tl = &sched_domain_topology[i];
 		struct sd_data *sdd = &tl->data;
 
 		for_each_cpu(j, cpu_map) {
@@ -5887,10 +5888,10 @@ static int build_sched_domains(const struct cpumask *cpu_map,
 
 	/* Set up domains for cpus specified by the cpu_map. */
 	for_each_cpu(i, cpu_map) {
-		struct sched_domain_topology_level *tl;
-
+		int y;
 		sd = NULL;
-		for_each_sd_topology(tl) {
+		for (y = 0; y < sched_domains_levels; y++) {
+			struct sched_domain_topology_level *tl = &sched_domain_topology[y];
 			sd = build_sched_domain(tl, cpu_map, attr, sd, i);
 			if (tl == sched_domain_topology)
 				*per_cpu_ptr(d.sd, i) = sd;
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 7/8] sched: replace topology level init func ptr with sd_init
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
                   ` (5 preceding siblings ...)
  2013-12-13 12:11 ` [RFC PATCH 6/8] sched: replace for_each_sd_topology with explicit for loop dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-13 12:11 ` [RFC PATCH 8/8] sched: remove scheduler domain naming dietmar.eggemann
  2013-12-20 14:00 ` [RFC PATCH 0/8] change scheduler domain hierarchy set-up Peter Zijlstra
  8 siblings, 0 replies; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

This patch replaces the call to the sched_domain_topology_level
initialization function pointer tl->init() with a call to sd_init() in
build_sched_domain().  It sets the cpu mask and the flags for each
conventional scheduler domain level (i.e. the one without SD_NUMA topology
flag) in sched_alloc() and adds SD_NUMA to all NUMA levels.
Since the sched_domain_topology_level initialization function pointer, the
default_topology[] array and default SD_FOO_INIT macros are not used
any more, the patch deletes them.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 include/linux/topology.h |  115 ----------------------------------------------
 kernel/sched/core.c      |  108 ++++---------------------------------------
 2 files changed, 10 insertions(+), 213 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index d147f63c2b1f..57301fd96fdb 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif
 
-/*
- * Below are the 3 major initializers used in building sched_domains:
- * SD_SIBLING_INIT, for SMT domains
- * SD_CPU_INIT, for SMP domains
- *
- * Any architecture that cares to do any tuning to these values should do so
- * by defining their own arch-specific initializer in include/asm/topology.h.
- * A definition there will automagically override these default initializers
- * and allow arch-specific performance tuning of sched_domains.
- * (Only non-zero and non-null fields need be specified.)
- */
-
-#ifdef CONFIG_SCHED_SMT
-/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
- * so can't we drop this in favor of CONFIG_SCHED_SMT?
- */
-#define ARCH_HAS_SCHED_WAKE_IDLE
-/* Common values for SMT siblings */
-#ifndef SD_SIBLING_INIT
-#define SD_SIBLING_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 2,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 110,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 1*SD_SHARE_CPUPOWER			\
-				| 1*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				| 0*SD_PREFER_SIBLING			\
-				| arch_sd_sibling_asym_packing()	\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.smt_gain		= 1178,	/* 15% */			\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-#endif /* CONFIG_SCHED_SMT */
-
-#ifdef CONFIG_SCHED_MC
-/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
-#ifndef SD_MC_INIT
-#define SD_MC_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 4,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 1*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-#endif /* CONFIG_SCHED_MC */
-
-/* Common values for CPUs */
-#ifndef SD_CPU_INIT
-#define SD_CPU_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 4,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 0*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				| 1*SD_PREFER_SIBLING			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-
-#ifdef CONFIG_SCHED_BOOK
-#ifndef SD_BOOK_INIT
-#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
-#endif
-#endif /* CONFIG_SCHED_BOOK */
-
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DECLARE_PER_CPU(int, numa_node);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 897ff9222cab..3bb8e3e2e58a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5005,14 +5005,9 @@ enum s_alloc {
 	sa_none,
 };
 
-struct sched_domain_topology_level;
-
-typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
-
 #define SDTL_OVERLAP	0x01
 
 struct sched_domain_topology_level {
-	sched_domain_init_f init;
 	sched_domain_mask_f mask;
 	int		    flags;
 	int		    numa_level;
@@ -5252,28 +5247,6 @@ int __weak arch_sd_sibling_asym_packing(void)
 # define SD_INIT_NAME(sd, type)		do { } while (0)
 #endif
 
-#define SD_INIT_FUNC(type)						\
-static noinline struct sched_domain *					\
-sd_init_##type(struct sched_domain_topology_level *tl, int cpu) 	\
-{									\
-	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);	\
-	*sd = SD_##type##_INIT;						\
-	SD_INIT_NAME(sd, type);						\
-	sd->private = &tl->data;					\
-	return sd;							\
-}
-
-SD_INIT_FUNC(CPU)
-#ifdef CONFIG_SCHED_SMT
- SD_INIT_FUNC(SIBLING)
-#endif
-#ifdef CONFIG_SCHED_MC
- SD_INIT_FUNC(MC)
-#endif
-#ifdef CONFIG_SCHED_BOOK
- SD_INIT_FUNC(BOOK)
-#endif
-
 static int default_relax_domain_level = -1;
 int sched_domain_level_max;
 
@@ -5361,24 +5334,7 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 		*per_cpu_ptr(sdd->sgp, cpu) = NULL;
 }
 
-/*
- * Topology list, bottom-up.
- */
-static struct sched_domain_topology_level default_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-	{ sd_init_SIBLING, cpu_smt_mask, },
-#endif
-#ifdef CONFIG_SCHED_MC
-	{ sd_init_MC, cpu_coregroup_mask, },
-#endif
-#ifdef CONFIG_SCHED_BOOK
-	{ sd_init_BOOK, cpu_book_mask, },
-#endif
-	{ sd_init_CPU, cpu_cpu_mask, },
-	{ NULL, },
-};
-
-static struct sched_domain_topology_level *sched_domain_topology = default_topology;
+static struct sched_domain_topology_level *sched_domain_topology;
 static int sched_domains_levels;
 
 #ifdef CONFIG_NUMA
@@ -5396,53 +5352,6 @@ static inline int sd_local_flags(int level)
 	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
 }
 
-static struct sched_domain *
-sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
-{
-	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
-	int level = tl->numa_level;
-	int sd_weight = cpumask_weight(
-			sched_domains_numa_masks[level][cpu_to_node(cpu)]);
-
-	*sd = (struct sched_domain){
-		.min_interval		= sd_weight,
-		.max_interval		= 2*sd_weight,
-		.busy_factor		= 32,
-		.imbalance_pct		= 125,
-		.cache_nice_tries	= 2,
-		.busy_idx		= 3,
-		.idle_idx		= 2,
-		.newidle_idx		= 0,
-		.wake_idx		= 0,
-		.forkexec_idx		= 0,
-
-		.flags			= 1*SD_LOAD_BALANCE
-					| 1*SD_BALANCE_NEWIDLE
-					| 0*SD_BALANCE_EXEC
-					| 0*SD_BALANCE_FORK
-					| 0*SD_BALANCE_WAKE
-					| 0*SD_WAKE_AFFINE
-					| 0*SD_SHARE_CPUPOWER
-					| 0*SD_SHARE_PKG_RESOURCES
-					| 1*SD_SERIALIZE
-					| 0*SD_PREFER_SIBLING
-					| 1*SD_NUMA
-					| sd_local_flags(level)
-					,
-		.last_balance		= jiffies,
-		.balance_interval	= sd_weight,
-	};
-	SD_INIT_NAME(sd, NUMA);
-	sd->private = &tl->data;
-
-	/*
-	 * Ugly hack to pass state to sd_numa_mask()...
-	 */
-	sched_domains_curr_level = tl->numa_level;
-
-	return sd;
-}
-
 static const struct cpumask *sd_numa_mask(int cpu)
 {
 	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
@@ -5742,10 +5651,14 @@ static void sched_alloc(void)
 	sched_domain_topology = tl;
 
 	/*
-	 * Copy the default topology bits..
+	 * Setup non NUMA levels
 	 */
-	for (i = 0; default_topology[i].init; i++)
-		tl[i] = default_topology[i];
+	for (i = 0; i < sched_domains_levels; i++) {
+		tl[i] = (struct sched_domain_topology_level) {
+			.mask = arch_sd_mask(i),
+			.flags = arch_sd_flags(i),
+		};
+	}
 
 #ifdef CONFIG_NUMA
 	/*
@@ -5753,9 +5666,8 @@ static void sched_alloc(void)
 	 */
 	for (j = 0; j < level; i++, j++) {
 		tl[i] = (struct sched_domain_topology_level){
-			.init = sd_numa_init,
 			.mask = sd_numa_mask,
-			.flags = SDTL_OVERLAP,
+			.flags = SD_NUMA | SDTL_OVERLAP,
 			.numa_level = j,
 		};
 	}
@@ -5854,7 +5766,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
 		const struct cpumask *cpu_map, struct sched_domain_attr *attr,
 		struct sched_domain *child, int cpu)
 {
-	struct sched_domain *sd = tl->init(tl, cpu);
+	struct sched_domain *sd = sd_init(tl, cpu);
 	if (!sd)
 		return child;
 
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC PATCH 8/8] sched: remove scheduler domain naming
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
                   ` (6 preceding siblings ...)
  2013-12-13 12:11 ` [RFC PATCH 7/8] sched: replace topology level init func ptr with sd_init dietmar.eggemann
@ 2013-12-13 12:11 ` dietmar.eggemann
  2013-12-20 14:08   ` Peter Zijlstra
  2013-12-20 14:00 ` [RFC PATCH 0/8] change scheduler domain hierarchy set-up Peter Zijlstra
  8 siblings, 1 reply; 17+ messages in thread
From: dietmar.eggemann @ 2013-12-13 12:11 UTC (permalink / raw)
  To: peterz, mingo, vincent.guittot, morten.rasmussen, chris.redpath
  Cc: linux-kernel, dietmar.eggemann

From: Dietmar Eggemann <dietmar.eggemann@arm.com>

In case the arch is allowed to define the conventional scheduler domain
topology level (i.e. the one without SD_NUMA topology flag) layout, it is
not feasible any more for the scheduler to name these levels.  Therefore,
this patch gets rid of of the sched_domain_topology_level structure
member 'name' and the corresponding SD_INIT_NAME macro.  It was only used
when CONFIG_SCHED_DEBUG was set any way.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 include/linux/sched.h |    3 ---
 kernel/sched/core.c   |   20 ++++----------------
 2 files changed, 4 insertions(+), 19 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 768b037dfacb..511700ddd7f7 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -866,9 +866,6 @@ struct sched_domain {
 	unsigned int ttwu_move_affine;
 	unsigned int ttwu_move_balance;
 #endif
-#ifdef CONFIG_SCHED_DEBUG
-	char *name;
-#endif
 	union {
 		void *private;		/* used during construction */
 		struct rcu_head rcu;	/* used during destruction */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3bb8e3e2e58a..e4f6a184333a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4302,7 +4302,7 @@ set_table_entry(struct ctl_table *entry,
 static struct ctl_table *
 sd_alloc_ctl_domain_table(struct sched_domain *sd)
 {
-	struct ctl_table *table = sd_alloc_ctl_entry(13);
+	struct ctl_table *table = sd_alloc_ctl_entry(12);
 
 	if (table == NULL)
 		return NULL;
@@ -4330,9 +4330,7 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd)
 		sizeof(int), 0644, proc_dointvec_minmax, false);
 	set_table_entry(&table[10], "flags", &sd->flags,
 		sizeof(int), 0644, proc_dointvec_minmax, false);
-	set_table_entry(&table[11], "name", sd->name,
-		CORENAME_MAX_SIZE, 0444, proc_dostring, false);
-	/* &table[12] is terminator */
+	/* &table[11] is terminator */
 
 	return table;
 }
@@ -4573,7 +4571,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 	cpulist_scnprintf(str, sizeof(str), sched_domain_span(sd));
 	cpumask_clear(groupmask);
 
-	printk(KERN_DEBUG "%*s domain %d: ", level, "", level);
+	printk(KERN_DEBUG "%*s domain level %d: ", level, "", level);
 
 	if (!(sd->flags & SD_LOAD_BALANCE)) {
 		printk("does not load-balance\n");
@@ -4583,7 +4581,7 @@ static int sched_domain_debug_one(struct sched_domain *sd, int cpu, int level,
 		return -1;
 	}
 
-	printk(KERN_CONT "span %s level %s\n", str, sd->name);
+	printk(KERN_CONT "span %s\n", str);
 
 	if (!cpumask_test_cpu(cpu, sched_domain_span(sd))) {
 		printk(KERN_ERR "ERROR: domain->span does not contain "
@@ -5241,12 +5239,6 @@ int __weak arch_sd_sibling_asym_packing(void)
  * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
  */
 
-#ifdef CONFIG_SCHED_DEBUG
-# define SD_INIT_NAME(sd, type)		sd->name = #type
-#else
-# define SD_INIT_NAME(sd, type)		do { } while (0)
-#endif
-
 static int default_relax_domain_level = -1;
 int sched_domain_level_max;
 
@@ -5494,7 +5486,6 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 	if (sd->flags & SD_SHARE_CPUPOWER) {
 		sd->imbalance_pct = 110;
 		sd->smt_gain = 1178; /* ~15% */
-		SD_INIT_NAME(sd, SMT);
 	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
 		sd->cache_nice_tries = 1;
 		sd->busy_idx = 2;
@@ -5507,7 +5498,6 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 		 */
 		sd->flags |= arch_sd_sibling_asym_packing();
 
-		SD_INIT_NAME(sd, MC);
 #ifdef CONFIG_NUMA
 	} else if (sd->flags & SD_NUMA) {
 		sd->busy_factor = 32,
@@ -5521,13 +5511,11 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 				       SD_BALANCE_FORK |
 				       SD_WAKE_AFFINE);
 		}
-		SD_INIT_NAME(sd, NUMA);
 #endif
 	} else {
 		sd->cache_nice_tries = 1;
 		sd->busy_idx = 2;
 		sd->idle_idx = 1;
-		SD_INIT_NAME(sd, CPU);
 	}
 
 	sd->private = &tl->data;
-- 
1.7.9.5



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] change scheduler domain hierarchy set-up
  2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
                   ` (7 preceding siblings ...)
  2013-12-13 12:11 ` [RFC PATCH 8/8] sched: remove scheduler domain naming dietmar.eggemann
@ 2013-12-20 14:00 ` Peter Zijlstra
  2014-01-06 18:40   ` Dietmar Eggemann
  8 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-12-20 14:00 UTC (permalink / raw)
  To: dietmar.eggemann
  Cc: mingo, vincent.guittot, morten.rasmussen, chris.redpath, linux-kernel

On Fri, Dec 13, 2013 at 12:11:20PM +0000, dietmar.eggemann@arm.com wrote:
> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
> 
> This patch-set cleans up the scheduler domain level initialization code.
> It is based on the idea of Peter Zijlstra to use a single scheduler domain
> init function sketched here: https://lkml.org/lkml/2013/11/5/239 
> 
> What does the patch-set try to achieve:
> 
> 1) Let the arch define the conventional (here defined to all levels except
>    the NUMA levels) scheduler domain hierarchy.  The arch specifies per
>    scheduler domain the pointer to the getter function of the
>    corresponding cpu mask as well as the topology related scheduler
>    domain flags.
> 
> 2) Unify the set-up code for conventional and NUMA scheduler domains.
>    All scheduler domain topology levels are now allocated in the same
>    function and the scheduler does not rely on a default scheduler
>    domain topology array any more.  All scheduler domains now use a
>    common initialization function which makes the existing SD_FOO_INIT
>    macros redundant.

Yeah, still a tad confused on what you did there, need to look in more
detail.

> 3) The arch is no longer limited to the existing scheduler domain levels
>    (SMT, MC, BOOK, CPU) but can easily define additional levels.
> 
> 4) Prepare the mechanics to make it easier to integrate the provision of
>    additional topology related data (e.g. energy information) to the
>    scheduler.

Right, I was hoping you'd have a little more on that, but we'll get
there I suppose ;-)

> Current limitations:
> 
> 1) The arch interface for scheduler domain set-up is only implemented for
>    the ARM and the x86 arch and tested on an ARM TC2 (2 clusters, one with
>    2 Cortex A15 and the other with 3 Cortex A7) and an Intel i5-520M (2
>    cores with 2 threads each) platform.   
> 
> 2) For other archs it has only been compile tested for certain
>    configurations (powerpc: chroma_defconfig, mips: ip27_defconfig,
>    s390: defconfig, tile: tilegx_defconfig).  Obviously, linking these
>    kernels doesn't succeed due to the missing arch interface for
>    scheduler domain set-up implementation (undefined reference to
>    arch_sched_domain_info).
> 
> 3) It does not delete the arch specific SD_FOO_INIT macros for ia64,
>    metag, s390 and tile arch.
> 
> 4) It does not delete the arch_sd_sibling_asym_packing function which
>    will be redundant once the arch interface for scheduler domain set-up
>    has been implemented for powerpc arch.
> 
> 5) There is no default set-up any more.  Each arch has to define a
>    arch_sched_domain_info array, a circumstance which might not be
>    desirable.

Yeah, that's sad, I think we want to keep the default thing to limit the
amount of pointless duplication for all archs that are not special.

Also, like you point out above, breaking all archs isn't nice :-)

> 6) It has to be specified what happens when an arch specifies an
>    arch_sched_domain_info array with only a { NULL, } entry.

Crash hard on boot :-) Although I suppose since its all compile time
constants we could try and be smart and make the build fail somehow.

The one thing I do dislike is that you mixed SDTL_flags and SD_flags
into a single variable. Don't do that its bound to collide and give
weird results at some point, and its not like any of these structures
are space critical in any way shape or form.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/8] sched: introduce common topology level init function
  2013-12-13 12:11 ` [RFC PATCH 5/8] sched: introduce common topology level init function dietmar.eggemann
@ 2013-12-20 14:04   ` Peter Zijlstra
  2014-01-06 18:41     ` Dietmar Eggemann
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-12-20 14:04 UTC (permalink / raw)
  To: dietmar.eggemann
  Cc: mingo, vincent.guittot, morten.rasmussen, chris.redpath, linux-kernel

> +/*
> + * SD_flags allowed in topology descriptions.
> + *
> + * SD_SHARE_CPUPOWER      - describes SMT topologies
> + * SD_SHARE_PKG_RESOURCES - describes shared caches
> + * SD_NUMA                - describes NUMA topologies
> + *
> + * Odd one out:
> + * SD_ASYM_PACKING        - describes SMT quirks
> + *
> + * SD_PREFER_SIBLING      - describes preference for sibling domain
> + */
> +#define TOPOLOGY_SD_FLAGS         \
> +	(SD_SHARE_CPUPOWER |      \
> +	 SD_SHARE_PKG_RESOURCES | \
> +	 SD_NUMA |                \
> +	 SD_ASYM_PACKING |        \
> +	 SD_PREFER_SIBLING)

See SD_PREFER_SIBLING is behavioural, the exact kinda thing we want to
keep out of this mask,

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 8/8] sched: remove scheduler domain naming
  2013-12-13 12:11 ` [RFC PATCH 8/8] sched: remove scheduler domain naming dietmar.eggemann
@ 2013-12-20 14:08   ` Peter Zijlstra
  2014-01-06 18:41     ` Dietmar Eggemann
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2013-12-20 14:08 UTC (permalink / raw)
  To: dietmar.eggemann
  Cc: mingo, vincent.guittot, morten.rasmussen, chris.redpath, linux-kernel

On Fri, Dec 13, 2013 at 12:11:28PM +0000, dietmar.eggemann@arm.com wrote:
> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
> 
> In case the arch is allowed to define the conventional scheduler domain
> topology level (i.e. the one without SD_NUMA topology flag) layout, it is
> not feasible any more for the scheduler to name these levels.  Therefore,
> this patch gets rid of of the sched_domain_topology_level structure
> member 'name' and the corresponding SD_INIT_NAME macro.  It was only used
> when CONFIG_SCHED_DEBUG was set any way.

Right, so for debug purposes it might be convenient to keep it; we could
simply put it in the topology array, something like:

 { cpu_smt_mask, SD_SHARE_CPU_POWER | SD_SHARE_PKG_RESOURCE, SD_NAME(smt) },

which would still allow us to make it go away on !debug, but does
provide us with a nice label to print for the debug topology prints.

Alternatively we could do something like:

#define SD_mask(name, flags) \
	{ cpu_##name##_mask, (flags), .name = #name }

to further reduce typing.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 0/8] change scheduler domain hierarchy set-up
  2013-12-20 14:00 ` [RFC PATCH 0/8] change scheduler domain hierarchy set-up Peter Zijlstra
@ 2014-01-06 18:40   ` Dietmar Eggemann
  0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2014-01-06 18:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, Morten Rasmussen, Chris Redpath, linux-kernel

On 20/12/13 14:00, Peter Zijlstra wrote:
> On Fri, Dec 13, 2013 at 12:11:20PM +0000, dietmar.eggemann@arm.com wrote:
>> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
>>
>> This patch-set cleans up the scheduler domain level initialization code.
>> It is based on the idea of Peter Zijlstra to use a single scheduler domain
>> init function sketched here: https://lkml.org/lkml/2013/11/5/239 
>>
>> What does the patch-set try to achieve:
>>
>> 1) Let the arch define the conventional (here defined to all levels except
>>    the NUMA levels) scheduler domain hierarchy.  The arch specifies per
>>    scheduler domain the pointer to the getter function of the
>>    corresponding cpu mask as well as the topology related scheduler
>>    domain flags.
>>
>> 2) Unify the set-up code for conventional and NUMA scheduler domains.
>>    All scheduler domain topology levels are now allocated in the same
>>    function and the scheduler does not rely on a default scheduler
>>    domain topology array any more.  All scheduler domains now use a
>>    common initialization function which makes the existing SD_FOO_INIT
>>    macros redundant.
> 
> Yeah, still a tad confused on what you did there, need to look in more
> detail.

I will come up w/ a V2 of this patch set. So don't worry to review this
bit. I just thought that we can unify the existing code in
sched_init_numa() function w/ the conventional sched domain set-up.


> 
>> 3) The arch is no longer limited to the existing scheduler domain levels
>>    (SMT, MC, BOOK, CPU) but can easily define additional levels.
>>
>> 4) Prepare the mechanics to make it easier to integrate the provision of
>>    additional topology related data (e.g. energy information) to the
>>    scheduler.
> 
> Right, I was hoping you'd have a little more on that, but we'll get
> there I suppose ;-)

I still think that sd_energy will be an (optional) additional column in
xxx_topology[] (besides cpu mask func ptr and topology flags).
We're currently in the process of deriving something like this starting
from the use-cases described in Morten's email-set sent out on linux-pm
on 20/12/13 'Energy-aware scheduling use-cases and scheduler issues' via
an appropriate energy model.
The current patch set is more a preparation and clean-up exercise for
this so far.

> 
>> Current limitations:
>>
>> 1) The arch interface for scheduler domain set-up is only implemented for
>>    the ARM and the x86 arch and tested on an ARM TC2 (2 clusters, one with
>>    2 Cortex A15 and the other with 3 Cortex A7) and an Intel i5-520M (2
>>    cores with 2 threads each) platform.   
>>
>> 2) For other archs it has only been compile tested for certain
>>    configurations (powerpc: chroma_defconfig, mips: ip27_defconfig,
>>    s390: defconfig, tile: tilegx_defconfig).  Obviously, linking these
>>    kernels doesn't succeed due to the missing arch interface for
>>    scheduler domain set-up implementation (undefined reference to
>>    arch_sched_domain_info).
>>
>> 3) It does not delete the arch specific SD_FOO_INIT macros for ia64,
>>    metag, s390 and tile arch.
>>
>> 4) It does not delete the arch_sd_sibling_asym_packing function which
>>    will be redundant once the arch interface for scheduler domain set-up
>>    has been implemented for powerpc arch.
>>
>> 5) There is no default set-up any more.  Each arch has to define a
>>    arch_sched_domain_info array, a circumstance which might not be
>>    desirable.
> 
> Yeah, that's sad, I think we want to keep the default thing to limit the
> amount of pointless duplication for all archs that are not special.
> 
> Also, like you point out above, breaking all archs isn't nice :-)
>

Understood. V2 patch set will have a default set-up again.


>> 6) It has to be specified what happens when an arch specifies an
>>    arch_sched_domain_info array with only a { NULL, } entry.
> 
> Crash hard on boot :-) Although I suppose since its all compile time
> constants we could try and be smart and make the build fail somehow.
> 
> The one thing I do dislike is that you mixed SDTL_flags and SD_flags
> into a single variable. Don't do that its bound to collide and give
> weird results at some point, and its not like any of these structures
> are space critical in any way shape or form.

Understood. Will change that.

-- Dietmar
> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 5/8] sched: introduce common topology level init function
  2013-12-20 14:04   ` Peter Zijlstra
@ 2014-01-06 18:41     ` Dietmar Eggemann
  0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2014-01-06 18:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, Morten Rasmussen, Chris Redpath, linux-kernel

On 20/12/13 14:04, Peter Zijlstra wrote:
>> +/*
>> + * SD_flags allowed in topology descriptions.
>> + *
>> + * SD_SHARE_CPUPOWER      - describes SMT topologies
>> + * SD_SHARE_PKG_RESOURCES - describes shared caches
>> + * SD_NUMA                - describes NUMA topologies
>> + *
>> + * Odd one out:
>> + * SD_ASYM_PACKING        - describes SMT quirks
>> + *
>> + * SD_PREFER_SIBLING      - describes preference for sibling domain
>> + */
>> +#define TOPOLOGY_SD_FLAGS         \
>> +	(SD_SHARE_CPUPOWER |      \
>> +	 SD_SHARE_PKG_RESOURCES | \
>> +	 SD_NUMA |                \
>> +	 SD_ASYM_PACKING |        \
>> +	 SD_PREFER_SIBLING)
> 
> See SD_PREFER_SIBLING is behavioural, the exact kinda thing we want to
> keep out of this mask,

Understood.

Since this flag is only set for the CPU level, it will only effect the
first NUMA level because sd->child has to have this flag set.
Unfortunately, I don't have a NUMA system.

AFAICS, we get some level of packing in sd->parent when this flag is set.

-- Dietmar

> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 8/8] sched: remove scheduler domain naming
  2013-12-20 14:08   ` Peter Zijlstra
@ 2014-01-06 18:41     ` Dietmar Eggemann
  2014-01-07 10:22       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Dietmar Eggemann @ 2014-01-06 18:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, Morten Rasmussen, Chris Redpath, linux-kernel

On 20/12/13 14:08, Peter Zijlstra wrote:
> On Fri, Dec 13, 2013 at 12:11:28PM +0000, dietmar.eggemann@arm.com wrote:
>> From: Dietmar Eggemann <dietmar.eggemann@arm.com>
>>
>> In case the arch is allowed to define the conventional scheduler domain
>> topology level (i.e. the one without SD_NUMA topology flag) layout, it is
>> not feasible any more for the scheduler to name these levels.  Therefore,
>> this patch gets rid of of the sched_domain_topology_level structure
>> member 'name' and the corresponding SD_INIT_NAME macro.  It was only used
>> when CONFIG_SCHED_DEBUG was set any way.
> 
> Right, so for debug purposes it might be convenient to keep it; we could
> simply put it in the topology array, something like:
> 
>  { cpu_smt_mask, SD_SHARE_CPU_POWER | SD_SHARE_PKG_RESOURCE, SD_NAME(smt) },
> 
> which would still allow us to make it go away on !debug, but does
> provide us with a nice label to print for the debug topology prints.

I will incorporate this idea in my V2 patch set.

> 
> Alternatively we could do something like:
> 
> #define SD_mask(name, flags) \
> 	{ cpu_##name##_mask, (flags), .name = #name }
> 
> to further reduce typing.

But the MC level cpu mask func ptr is called cpu_coregroup_mask.

-- Dietmar

> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 8/8] sched: remove scheduler domain naming
  2014-01-06 18:41     ` Dietmar Eggemann
@ 2014-01-07 10:22       ` Peter Zijlstra
  2014-01-07 14:33         ` Dietmar Eggemann
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2014-01-07 10:22 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: mingo, vincent.guittot, Morten Rasmussen, Chris Redpath, linux-kernel

On Mon, Jan 06, 2014 at 06:41:25PM +0000, Dietmar Eggemann wrote:
> But the MC level cpu mask func ptr is called cpu_coregroup_mask.

Nothing a bit of sed won't cure very quickly indeed :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC PATCH 8/8] sched: remove scheduler domain naming
  2014-01-07 10:22       ` Peter Zijlstra
@ 2014-01-07 14:33         ` Dietmar Eggemann
  0 siblings, 0 replies; 17+ messages in thread
From: Dietmar Eggemann @ 2014-01-07 14:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, vincent.guittot, Morten Rasmussen, Chris Redpath, linux-kernel

On 07/01/14 10:22, Peter Zijlstra wrote:
> On Mon, Jan 06, 2014 at 06:41:25PM +0000, Dietmar Eggemann wrote:
>> But the MC level cpu mask func ptr is called cpu_coregroup_mask.
> 
> Nothing a bit of sed won't cure very quickly indeed :-)

True, but that would mean that if two adjacent levels use the same
cpu_foo_mask function, then their debug name would have to be the same.
AFAICS, sd_parent_degenerate() allows adjacent levels to have the same
cpu mask if the flags are different. Probably not a big deal.

> 



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-01-07 14:33 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-13 12:11 [RFC PATCH 0/8] change scheduler domain hierarchy set-up dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 1/8] sched: arch interface for scheduler domain setup dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 2/8] arm: implement " dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 3/8] x86: " dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 4/8] sched: allocate the entire topology array dynamically dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 5/8] sched: introduce common topology level init function dietmar.eggemann
2013-12-20 14:04   ` Peter Zijlstra
2014-01-06 18:41     ` Dietmar Eggemann
2013-12-13 12:11 ` [RFC PATCH 6/8] sched: replace for_each_sd_topology with explicit for loop dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 7/8] sched: replace topology level init func ptr with sd_init dietmar.eggemann
2013-12-13 12:11 ` [RFC PATCH 8/8] sched: remove scheduler domain naming dietmar.eggemann
2013-12-20 14:08   ` Peter Zijlstra
2014-01-06 18:41     ` Dietmar Eggemann
2014-01-07 10:22       ` Peter Zijlstra
2014-01-07 14:33         ` Dietmar Eggemann
2013-12-20 14:00 ` [RFC PATCH 0/8] change scheduler domain hierarchy set-up Peter Zijlstra
2014-01-06 18:40   ` Dietmar Eggemann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).