All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/7] rework sched_domain topology description
@ 2014-03-18 17:56 ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

This patchset was previously part of the larger tasks packing patchset [1].
I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology (this patchset)
-update and consolidation of cpu_power
-tasks packing algorithm

Based on Peter Z's proposal [2][3], this patchset modifies the way to configure
the sched_domain level in order to let architectures to add specific level like
the current BOOK level or the proposed power gating level for ARM architecture.

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2013/11/5/239
[3] https://lkml.org/lkml/2013/11/5/449

Change since v1:
- move sched_domains_curr_level back under #ifdef CONFIG_NUMA
- use function pointer to set flag instead of a plain value.
- add list of tunable flags in the commit message of patch 2
- add SD_SHARE_POWER_DOMAIN flag for powerpc's SMT level

Vincent Guittot (7):
  sched: remove unused SCHED_INIT_NODE
  sched: rework of sched_domain topology definition
  sched: s390: create a dedicated topology table
  sched: powerpc: create a dedicated topology table
  sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  sched: ARM: create a dedicated scheduler topology table
  sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level

 arch/arm/kernel/topology.c        |  26 ++++
 arch/ia64/include/asm/topology.h  |  24 ----
 arch/metag/include/asm/topology.h |  27 -----
 arch/powerpc/kernel/smp.c         |  31 +++--
 arch/s390/include/asm/topology.h  |  13 +-
 arch/s390/kernel/topology.c       |  20 ++++
 arch/tile/include/asm/topology.h  |  33 ------
 include/linux/sched.h             |  49 +++++++-
 include/linux/topology.h          | 128 ++------------------
 kernel/sched/core.c               | 243 +++++++++++++++++++-------------------
 10 files changed, 254 insertions(+), 340 deletions(-)

-- 
1.9.0


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 0/7] rework sched_domain topology description
@ 2014-03-18 17:56 ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

This patchset was previously part of the larger tasks packing patchset [1].
I have splitted the latter in 3 different patchsets (at least) to make the
thing easier.
-configuration of sched_domain topology (this patchset)
-update and consolidation of cpu_power
-tasks packing algorithm

Based on Peter Z's proposal [2][3], this patchset modifies the way to configure
the sched_domain level in order to let architectures to add specific level like
the current BOOK level or the proposed power gating level for ARM architecture.

[1] https://lkml.org/lkml/2013/10/18/121
[2] https://lkml.org/lkml/2013/11/5/239
[3] https://lkml.org/lkml/2013/11/5/449

Change since v1:
- move sched_domains_curr_level back under #ifdef CONFIG_NUMA
- use function pointer to set flag instead of a plain value.
- add list of tunable flags in the commit message of patch 2
- add SD_SHARE_POWER_DOMAIN flag for powerpc's SMT level

Vincent Guittot (7):
  sched: remove unused SCHED_INIT_NODE
  sched: rework of sched_domain topology definition
  sched: s390: create a dedicated topology table
  sched: powerpc: create a dedicated topology table
  sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  sched: ARM: create a dedicated scheduler topology table
  sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level

 arch/arm/kernel/topology.c        |  26 ++++
 arch/ia64/include/asm/topology.h  |  24 ----
 arch/metag/include/asm/topology.h |  27 -----
 arch/powerpc/kernel/smp.c         |  31 +++--
 arch/s390/include/asm/topology.h  |  13 +-
 arch/s390/kernel/topology.c       |  20 ++++
 arch/tile/include/asm/topology.h  |  33 ------
 include/linux/sched.h             |  49 +++++++-
 include/linux/topology.h          | 128 ++------------------
 kernel/sched/core.c               | 243 +++++++++++++++++++-------------------
 10 files changed, 254 insertions(+), 340 deletions(-)

-- 
1.9.0

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 1/7] sched: remove unused SCHED_INIT_NODE
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

not used since new numa scheduler init sequence

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/metag/include/asm/topology.h | 27 ---------------------------
 1 file changed, 27 deletions(-)

diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
index 8e9c0b3..e95f874 100644
--- a/arch/metag/include/asm/topology.h
+++ b/arch/metag/include/asm/topology.h
@@ -3,33 +3,6 @@
 
 #ifdef CONFIG_NUMA
 
-/* sched_domains SD_NODE_INIT for Meta machines */
-#define SD_NODE_INIT (struct sched_domain) {		\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
-	.min_interval		= 8,			\
-	.max_interval		= 32,			\
-	.busy_factor		= 32,			\
-	.imbalance_pct		= 125,			\
-	.cache_nice_tries	= 2,			\
-	.busy_idx		= 3,			\
-	.idle_idx		= 2,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
-	.flags			= SD_LOAD_BALANCE	\
-				| SD_BALANCE_FORK	\
-				| SD_BALANCE_EXEC	\
-				| SD_BALANCE_NEWIDLE	\
-				| SD_SERIALIZE,		\
-	.last_balance		= jiffies,		\
-	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
-	.max_newidle_lb_cost	= 0,			\
-	.next_decay_max_lb_cost	= jiffies,		\
-}
-
 #define cpu_to_node(cpu)	((void)(cpu), 0)
 #define parent_node(node)	((void)(node), 0)
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 1/7] sched: remove unused SCHED_INIT_NODE
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

not used since new numa scheduler init sequence

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/metag/include/asm/topology.h | 27 ---------------------------
 1 file changed, 27 deletions(-)

diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
index 8e9c0b3..e95f874 100644
--- a/arch/metag/include/asm/topology.h
+++ b/arch/metag/include/asm/topology.h
@@ -3,33 +3,6 @@
 
 #ifdef CONFIG_NUMA
 
-/* sched_domains SD_NODE_INIT for Meta machines */
-#define SD_NODE_INIT (struct sched_domain) {		\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
-	.min_interval		= 8,			\
-	.max_interval		= 32,			\
-	.busy_factor		= 32,			\
-	.imbalance_pct		= 125,			\
-	.cache_nice_tries	= 2,			\
-	.busy_idx		= 3,			\
-	.idle_idx		= 2,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
-	.flags			= SD_LOAD_BALANCE	\
-				| SD_BALANCE_FORK	\
-				| SD_BALANCE_EXEC	\
-				| SD_BALANCE_NEWIDLE	\
-				| SD_SERIALIZE,		\
-	.last_balance		= jiffies,		\
-	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
-	.max_newidle_lb_cost	= 0,			\
-	.next_decay_max_lb_cost	= jiffies,		\
-}
-
 #define cpu_to_node(cpu)	((void)(cpu), 0)
 #define parent_node(node)	((void)(node), 0)
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

We replace the old way to configure the scheduler topology with a new method
which enables a platform to declare additionnal level (if needed).

We still have a default topology table definition that can be used by platform
that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
be overwritten by an arch which wants to add new level where a load balance
make sense like BOOK or powergating level.

For each level, we need a function pointer that returns cpumask for each cpu,
a function pointer that returns the flags for the level and a name. Only flags
that describe topology, can be set by an architecture. The current topology
flags are:
 SD_SHARE_CPUPOWER
 SD_SHARE_PKG_RESOURCES
 SD_NUMA
 SD_ASYM_PACKING

Then, each level must be a subset on the next one. The build sequence of the
sched_domain will take care of removing useless levels like those with 1 CPU
and those with the same CPU span and relevant information for load balancing
than its child.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/ia64/include/asm/topology.h |  24 ----
 arch/s390/include/asm/topology.h |   2 -
 arch/tile/include/asm/topology.h |  33 ------
 include/linux/sched.h            |  48 ++++++++
 include/linux/topology.h         | 128 +++------------------
 kernel/sched/core.c              | 235 ++++++++++++++++++++-------------------
 6 files changed, 183 insertions(+), 287 deletions(-)

diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index 5cb55a1..3202aa7 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -46,30 +46,6 @@
 
 void build_cpu_to_node_map(void);
 
-#define SD_CPU_INIT (struct sched_domain) {		\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
-	.min_interval		= 1,			\
-	.max_interval		= 4,			\
-	.busy_factor		= 64,			\
-	.imbalance_pct		= 125,			\
-	.cache_nice_tries	= 2,			\
-	.busy_idx		= 2,			\
-	.idle_idx		= 1,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
-	.flags			= SD_LOAD_BALANCE	\
-				| SD_BALANCE_NEWIDLE	\
-				| SD_BALANCE_EXEC	\
-				| SD_BALANCE_FORK	\
-				| SD_WAKE_AFFINE,	\
-	.last_balance		= jiffies,		\
-	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
-}
-
 #endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_SMP
diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index 05425b1..07763bd 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -64,8 +64,6 @@ static inline void s390_init_cpu_topology(void)
 };
 #endif
 
-#define SD_BOOK_INIT	SD_CPU_INIT
-
 #include <asm-generic/topology.h>
 
 #endif /* _ASM_S390_TOPOLOGY_H */
diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index d15c0d8..9383118 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
 /* For now, use numa node -1 for global allocation. */
 #define pcibus_to_node(bus)		((void)(bus), -1)
 
-/*
- * TILE architecture has many cores integrated in one processor, so we need
- * setup bigger balance_interval for both CPU/NODE scheduling domains to
- * reduce process scheduling costs.
- */
-
-/* sched_domains SD_CPU_INIT for TILE architecture */
-#define SD_CPU_INIT (struct sched_domain) {				\
-	.min_interval		= 4,					\
-	.max_interval		= 128,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 0*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 0*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 32,					\
-}
-
 /* By definition, we create nodes based on online memory. */
 #define node_has_online_mem(nid) 1
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 825ed83..4db592a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -870,6 +870,20 @@ enum cpu_idle_type {
 
 extern int __weak arch_sd_sibiling_asym_packing(void);
 
+#ifdef CONFIG_SCHED_SMT
+static inline const int cpu_smt_flags(void)
+{
+	return SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+}
+#endif
+
+#ifdef CONFIG_SCHED_MC
+static inline const int cpu_core_flags(void)
+{
+	return SD_SHARE_PKG_RESOURCES;
+}
+#endif
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
@@ -976,6 +990,38 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
 
 bool cpus_share_cache(int this_cpu, int that_cpu);
 
+typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
+typedef const int (*sched_domain_flags_f)(void);
+
+#define SDTL_OVERLAP	0x01
+
+struct sd_data {
+	struct sched_domain **__percpu sd;
+	struct sched_group **__percpu sg;
+	struct sched_group_power **__percpu sgp;
+};
+
+struct sched_domain_topology_level {
+	sched_domain_mask_f mask;
+	sched_domain_flags_f sd_flags;
+	int		    flags;
+	int		    numa_level;
+	struct sd_data      data;
+#ifdef CONFIG_SCHED_DEBUG
+	char                *name;
+#endif
+};
+
+extern struct sched_domain_topology_level *sched_domain_topology;
+
+extern void set_sched_topology(struct sched_domain_topology_level *tl);
+
+#ifdef CONFIG_SCHED_DEBUG
+# define SD_INIT_NAME(type)		.name = #type
+#else
+# define SD_INIT_NAME(type)
+#endif
+
 #else /* CONFIG_SMP */
 
 struct sched_domain_attr;
@@ -991,6 +1037,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)
 	return true;
 }
 
+static inline void set_sched_topology(struct sched_domain_topology_level *tl) { }
+
 #endif	/* !CONFIG_SMP */
 
 
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce..3a9db05 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif
 
-/*
- * Below are the 3 major initializers used in building sched_domains:
- * SD_SIBLING_INIT, for SMT domains
- * SD_CPU_INIT, for SMP domains
- *
- * Any architecture that cares to do any tuning to these values should do so
- * by defining their own arch-specific initializer in include/asm/topology.h.
- * A definition there will automagically override these default initializers
- * and allow arch-specific performance tuning of sched_domains.
- * (Only non-zero and non-null fields need be specified.)
- */
-
-#ifdef CONFIG_SCHED_SMT
-/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
- * so can't we drop this in favor of CONFIG_SCHED_SMT?
- */
-#define ARCH_HAS_SCHED_WAKE_IDLE
-/* Common values for SMT siblings */
-#ifndef SD_SIBLING_INIT
-#define SD_SIBLING_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 2,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 110,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 1*SD_SHARE_CPUPOWER			\
-				| 1*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				| 0*SD_PREFER_SIBLING			\
-				| arch_sd_sibling_asym_packing()	\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.smt_gain		= 1178,	/* 15% */			\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-#endif /* CONFIG_SCHED_SMT */
-
-#ifdef CONFIG_SCHED_MC
-/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
-#ifndef SD_MC_INIT
-#define SD_MC_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 4,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 1*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-#endif /* CONFIG_SCHED_MC */
-
-/* Common values for CPUs */
-#ifndef SD_CPU_INIT
-#define SD_CPU_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 4,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 0*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				| 1*SD_PREFER_SIBLING			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-
-#ifdef CONFIG_SCHED_BOOK
-#ifndef SD_BOOK_INIT
-#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
-#endif
-#endif /* CONFIG_SCHED_BOOK */
-
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DECLARE_PER_CPU(int, numa_node);
 
@@ -295,4 +180,17 @@ static inline int cpu_to_mem(int cpu)
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
+#ifdef CONFIG_SCHED_SMT
+static inline const struct cpumask *cpu_smt_mask(int cpu)
+{
+	return topology_thread_cpumask(cpu);
+}
+#endif
+
+static inline const struct cpumask *cpu_cpu_mask(int cpu)
+{
+	return cpumask_of_node(cpu_to_node(cpu));
+}
+
+
 #endif /* _LINUX_TOPOLOGY_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ae365aa..3397bcb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5603,17 +5603,6 @@ static int __init isolated_cpu_setup(char *str)
 
 __setup("isolcpus=", isolated_cpu_setup);
 
-static const struct cpumask *cpu_cpu_mask(int cpu)
-{
-	return cpumask_of_node(cpu_to_node(cpu));
-}
-
-struct sd_data {
-	struct sched_domain **__percpu sd;
-	struct sched_group **__percpu sg;
-	struct sched_group_power **__percpu sgp;
-};
-
 struct s_data {
 	struct sched_domain ** __percpu sd;
 	struct root_domain	*rd;
@@ -5626,21 +5615,6 @@ enum s_alloc {
 	sa_none,
 };
 
-struct sched_domain_topology_level;
-
-typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
-typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
-
-#define SDTL_OVERLAP	0x01
-
-struct sched_domain_topology_level {
-	sched_domain_init_f init;
-	sched_domain_mask_f mask;
-	int		    flags;
-	int		    numa_level;
-	struct sd_data      data;
-};
-
 /*
  * Build an iteration mask that can exclude certain CPUs from the upwards
  * domain traversal.
@@ -5869,34 +5843,6 @@ int __weak arch_sd_sibling_asym_packing(void)
  * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
  */
 
-#ifdef CONFIG_SCHED_DEBUG
-# define SD_INIT_NAME(sd, type)		sd->name = #type
-#else
-# define SD_INIT_NAME(sd, type)		do { } while (0)
-#endif
-
-#define SD_INIT_FUNC(type)						\
-static noinline struct sched_domain *					\
-sd_init_##type(struct sched_domain_topology_level *tl, int cpu) 	\
-{									\
-	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);	\
-	*sd = SD_##type##_INIT;						\
-	SD_INIT_NAME(sd, type);						\
-	sd->private = &tl->data;					\
-	return sd;							\
-}
-
-SD_INIT_FUNC(CPU)
-#ifdef CONFIG_SCHED_SMT
- SD_INIT_FUNC(SIBLING)
-#endif
-#ifdef CONFIG_SCHED_MC
- SD_INIT_FUNC(MC)
-#endif
-#ifdef CONFIG_SCHED_BOOK
- SD_INIT_FUNC(BOOK)
-#endif
-
 static int default_relax_domain_level = -1;
 int sched_domain_level_max;
 
@@ -5984,97 +5930,156 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 		*per_cpu_ptr(sdd->sgp, cpu) = NULL;
 }
 
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *cpu_smt_mask(int cpu)
-{
-	return topology_thread_cpumask(cpu);
-}
-#endif
-
-/*
- * Topology list, bottom-up.
- */
-static struct sched_domain_topology_level default_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-	{ sd_init_SIBLING, cpu_smt_mask, },
-#endif
-#ifdef CONFIG_SCHED_MC
-	{ sd_init_MC, cpu_coregroup_mask, },
-#endif
-#ifdef CONFIG_SCHED_BOOK
-	{ sd_init_BOOK, cpu_book_mask, },
-#endif
-	{ sd_init_CPU, cpu_cpu_mask, },
-	{ NULL, },
-};
-
-static struct sched_domain_topology_level *sched_domain_topology = default_topology;
-
-#define for_each_sd_topology(tl)			\
-	for (tl = sched_domain_topology; tl->init; tl++)
-
 #ifdef CONFIG_NUMA
-
 static int sched_domains_numa_levels;
 static int *sched_domains_numa_distance;
 static struct cpumask ***sched_domains_numa_masks;
 static int sched_domains_curr_level;
+#endif
 
-static inline int sd_local_flags(int level)
-{
-	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
-		return 0;
-
-	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
-}
+/*
+ * SD_flags allowed in topology descriptions.
+ *
+ * SD_SHARE_CPUPOWER      - describes SMT topologies
+ * SD_SHARE_PKG_RESOURCES - describes shared caches
+ * SD_NUMA                - describes NUMA topologies
+ *
+ * Odd one out:
+ * SD_ASYM_PACKING        - describes SMT quirks
+ */
+#define TOPOLOGY_SD_FLAGS		\
+	(SD_SHARE_CPUPOWER |		\
+	 SD_SHARE_PKG_RESOURCES |	\
+	 SD_NUMA |			\
+	 SD_ASYM_PACKING)
 
 static struct sched_domain *
-sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
+sd_init(struct sched_domain_topology_level *tl, int cpu)
 {
 	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
-	int level = tl->numa_level;
-	int sd_weight = cpumask_weight(
-			sched_domains_numa_masks[level][cpu_to_node(cpu)]);
+	int sd_weight, sd_flags = 0;
+
+#ifdef CONFIG_NUMA
+	/*
+	 * Ugly hack to pass state to sd_numa_mask()...
+	 */
+	sched_domains_curr_level = tl->numa_level;
+#endif
+
+	sd_weight = cpumask_weight(tl->mask(cpu));
+
+	if (tl->sd_flags)
+		sd_flags = (*tl->sd_flags)();
+	if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
+			"wrong sd_flags in topology description\n"))
+		sd_flags &= ~TOPOLOGY_SD_FLAGS;
 
 	*sd = (struct sched_domain){
 		.min_interval		= sd_weight,
 		.max_interval		= 2*sd_weight,
 		.busy_factor		= 32,
 		.imbalance_pct		= 125,
-		.cache_nice_tries	= 2,
-		.busy_idx		= 3,
-		.idle_idx		= 2,
+
+		.cache_nice_tries	= 0,
+		.busy_idx		= 0,
+		.idle_idx		= 0,
 		.newidle_idx		= 0,
 		.wake_idx		= 0,
 		.forkexec_idx		= 0,
 
 		.flags			= 1*SD_LOAD_BALANCE
 					| 1*SD_BALANCE_NEWIDLE
-					| 0*SD_BALANCE_EXEC
-					| 0*SD_BALANCE_FORK
+					| 1*SD_BALANCE_EXEC
+					| 1*SD_BALANCE_FORK
 					| 0*SD_BALANCE_WAKE
-					| 0*SD_WAKE_AFFINE
+					| 1*SD_WAKE_AFFINE
 					| 0*SD_SHARE_CPUPOWER
 					| 0*SD_SHARE_PKG_RESOURCES
-					| 1*SD_SERIALIZE
+					| 0*SD_SERIALIZE
 					| 0*SD_PREFER_SIBLING
-					| 1*SD_NUMA
-					| sd_local_flags(level)
+					| 0*SD_NUMA
+					| sd_flags
 					,
+
 		.last_balance		= jiffies,
 		.balance_interval	= sd_weight,
+		.smt_gain		= 0,
+		.max_newidle_lb_cost	= 0,
+		.next_decay_max_lb_cost	= jiffies,
+#ifdef CONFIG_SCHED_DEBUG
+		.name			= tl->name,
+#endif
 	};
-	SD_INIT_NAME(sd, NUMA);
-	sd->private = &tl->data;
 
 	/*
-	 * Ugly hack to pass state to sd_numa_mask()...
+	 * Convert topological properties into behaviour.
 	 */
-	sched_domains_curr_level = tl->numa_level;
+
+	if (sd->flags & SD_SHARE_CPUPOWER) {
+		sd->imbalance_pct = 110;
+		sd->smt_gain = 1178; /* ~15% */
+		sd->flags |= arch_sd_sibling_asym_packing();
+
+	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
+		sd->imbalance_pct = 117;
+		sd->cache_nice_tries = 1;
+		sd->busy_idx = 2;
+
+#ifdef CONFIG_NUMA
+	} else if (sd->flags & SD_NUMA) {
+		sd->cache_nice_tries = 2;
+		sd->busy_idx = 3;
+		sd->idle_idx = 2;
+
+		sd->flags |= SD_SERIALIZE;
+		if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
+			sd->flags &= ~(SD_BALANCE_EXEC |
+				       SD_BALANCE_FORK |
+				       SD_WAKE_AFFINE);
+		}
+
+#endif
+	} else {
+		sd->flags |= SD_PREFER_SIBLING;
+		sd->cache_nice_tries = 1;
+		sd->busy_idx = 2;
+		sd->idle_idx = 1;
+	}
+
+	sd->private = &tl->data;
 
 	return sd;
 }
 
+/*
+ * Topology list, bottom-up.
+ */
+static struct sched_domain_topology_level default_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+#ifdef CONFIG_SCHED_BOOK
+	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
+struct sched_domain_topology_level *sched_domain_topology = default_topology;
+
+#define for_each_sd_topology(tl)			\
+	for (tl = sched_domain_topology; tl->mask; tl++)
+
+void set_sched_topology(struct sched_domain_topology_level *tl)
+{
+	sched_domain_topology = tl;
+}
+
+#ifdef CONFIG_NUMA
+
 static const struct cpumask *sd_numa_mask(int cpu)
 {
 	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
@@ -6218,7 +6223,10 @@ static void sched_init_numa(void)
 		}
 	}
 
-	tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
+	/* Compute default topology size */
+	for (i = 0; sched_domain_topology[i].mask; i++);
+
+	tl = kzalloc((i + level) *
 			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
 	if (!tl)
 		return;
@@ -6226,18 +6234,19 @@ static void sched_init_numa(void)
 	/*
 	 * Copy the default topology bits..
 	 */
-	for (i = 0; default_topology[i].init; i++)
-		tl[i] = default_topology[i];
+	for (i = 0; sched_domain_topology[i].mask; i++)
+		tl[i] = sched_domain_topology[i];
 
 	/*
 	 * .. and append 'j' levels of NUMA goodness.
 	 */
 	for (j = 0; j < level; i++, j++) {
 		tl[i] = (struct sched_domain_topology_level){
-			.init = sd_numa_init,
 			.mask = sd_numa_mask,
+			.sd_flags = SD_NUMA,
 			.flags = SDTL_OVERLAP,
 			.numa_level = j,
+			SD_INIT_NAME(NUMA)
 		};
 	}
 
@@ -6395,7 +6404,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
 		const struct cpumask *cpu_map, struct sched_domain_attr *attr,
 		struct sched_domain *child, int cpu)
 {
-	struct sched_domain *sd = tl->init(tl, cpu);
+	struct sched_domain *sd = sd_init(tl, cpu);
 	if (!sd)
 		return child;
 
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

We replace the old way to configure the scheduler topology with a new method
which enables a platform to declare additionnal level (if needed).

We still have a default topology table definition that can be used by platform
that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
be overwritten by an arch which wants to add new level where a load balance
make sense like BOOK or powergating level.

For each level, we need a function pointer that returns cpumask for each cpu,
a function pointer that returns the flags for the level and a name. Only flags
that describe topology, can be set by an architecture. The current topology
flags are:
 SD_SHARE_CPUPOWER
 SD_SHARE_PKG_RESOURCES
 SD_NUMA
 SD_ASYM_PACKING

Then, each level must be a subset on the next one. The build sequence of the
sched_domain will take care of removing useless levels like those with 1 CPU
and those with the same CPU span and relevant information for load balancing
than its child.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/ia64/include/asm/topology.h |  24 ----
 arch/s390/include/asm/topology.h |   2 -
 arch/tile/include/asm/topology.h |  33 ------
 include/linux/sched.h            |  48 ++++++++
 include/linux/topology.h         | 128 +++------------------
 kernel/sched/core.c              | 235 ++++++++++++++++++++-------------------
 6 files changed, 183 insertions(+), 287 deletions(-)

diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
index 5cb55a1..3202aa7 100644
--- a/arch/ia64/include/asm/topology.h
+++ b/arch/ia64/include/asm/topology.h
@@ -46,30 +46,6 @@
 
 void build_cpu_to_node_map(void);
 
-#define SD_CPU_INIT (struct sched_domain) {		\
-	.parent			= NULL,			\
-	.child			= NULL,			\
-	.groups			= NULL,			\
-	.min_interval		= 1,			\
-	.max_interval		= 4,			\
-	.busy_factor		= 64,			\
-	.imbalance_pct		= 125,			\
-	.cache_nice_tries	= 2,			\
-	.busy_idx		= 2,			\
-	.idle_idx		= 1,			\
-	.newidle_idx		= 0,			\
-	.wake_idx		= 0,			\
-	.forkexec_idx		= 0,			\
-	.flags			= SD_LOAD_BALANCE	\
-				| SD_BALANCE_NEWIDLE	\
-				| SD_BALANCE_EXEC	\
-				| SD_BALANCE_FORK	\
-				| SD_WAKE_AFFINE,	\
-	.last_balance		= jiffies,		\
-	.balance_interval	= 1,			\
-	.nr_balance_failed	= 0,			\
-}
-
 #endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_SMP
diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index 05425b1..07763bd 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -64,8 +64,6 @@ static inline void s390_init_cpu_topology(void)
 };
 #endif
 
-#define SD_BOOK_INIT	SD_CPU_INIT
-
 #include <asm-generic/topology.h>
 
 #endif /* _ASM_S390_TOPOLOGY_H */
diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
index d15c0d8..9383118 100644
--- a/arch/tile/include/asm/topology.h
+++ b/arch/tile/include/asm/topology.h
@@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
 /* For now, use numa node -1 for global allocation. */
 #define pcibus_to_node(bus)		((void)(bus), -1)
 
-/*
- * TILE architecture has many cores integrated in one processor, so we need
- * setup bigger balance_interval for both CPU/NODE scheduling domains to
- * reduce process scheduling costs.
- */
-
-/* sched_domains SD_CPU_INIT for TILE architecture */
-#define SD_CPU_INIT (struct sched_domain) {				\
-	.min_interval		= 4,					\
-	.max_interval		= 128,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 0*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 0*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 32,					\
-}
-
 /* By definition, we create nodes based on online memory. */
 #define node_has_online_mem(nid) 1
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 825ed83..4db592a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -870,6 +870,20 @@ enum cpu_idle_type {
 
 extern int __weak arch_sd_sibiling_asym_packing(void);
 
+#ifdef CONFIG_SCHED_SMT
+static inline const int cpu_smt_flags(void)
+{
+	return SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+}
+#endif
+
+#ifdef CONFIG_SCHED_MC
+static inline const int cpu_core_flags(void)
+{
+	return SD_SHARE_PKG_RESOURCES;
+}
+#endif
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
@@ -976,6 +990,38 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
 
 bool cpus_share_cache(int this_cpu, int that_cpu);
 
+typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
+typedef const int (*sched_domain_flags_f)(void);
+
+#define SDTL_OVERLAP	0x01
+
+struct sd_data {
+	struct sched_domain **__percpu sd;
+	struct sched_group **__percpu sg;
+	struct sched_group_power **__percpu sgp;
+};
+
+struct sched_domain_topology_level {
+	sched_domain_mask_f mask;
+	sched_domain_flags_f sd_flags;
+	int		    flags;
+	int		    numa_level;
+	struct sd_data      data;
+#ifdef CONFIG_SCHED_DEBUG
+	char                *name;
+#endif
+};
+
+extern struct sched_domain_topology_level *sched_domain_topology;
+
+extern void set_sched_topology(struct sched_domain_topology_level *tl);
+
+#ifdef CONFIG_SCHED_DEBUG
+# define SD_INIT_NAME(type)		.name = #type
+#else
+# define SD_INIT_NAME(type)
+#endif
+
 #else /* CONFIG_SMP */
 
 struct sched_domain_attr;
@@ -991,6 +1037,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)
 	return true;
 }
 
+static inline void set_sched_topology(struct sched_domain_topology_level *tl) { }
+
 #endif	/* !CONFIG_SMP */
 
 
diff --git a/include/linux/topology.h b/include/linux/topology.h
index 12ae6ce..3a9db05 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
 #endif
 
-/*
- * Below are the 3 major initializers used in building sched_domains:
- * SD_SIBLING_INIT, for SMT domains
- * SD_CPU_INIT, for SMP domains
- *
- * Any architecture that cares to do any tuning to these values should do so
- * by defining their own arch-specific initializer in include/asm/topology.h.
- * A definition there will automagically override these default initializers
- * and allow arch-specific performance tuning of sched_domains.
- * (Only non-zero and non-null fields need be specified.)
- */
-
-#ifdef CONFIG_SCHED_SMT
-/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
- * so can't we drop this in favor of CONFIG_SCHED_SMT?
- */
-#define ARCH_HAS_SCHED_WAKE_IDLE
-/* Common values for SMT siblings */
-#ifndef SD_SIBLING_INIT
-#define SD_SIBLING_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 2,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 110,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 1*SD_SHARE_CPUPOWER			\
-				| 1*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				| 0*SD_PREFER_SIBLING			\
-				| arch_sd_sibling_asym_packing()	\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.smt_gain		= 1178,	/* 15% */			\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-#endif /* CONFIG_SCHED_SMT */
-
-#ifdef CONFIG_SCHED_MC
-/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
-#ifndef SD_MC_INIT
-#define SD_MC_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 4,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 1*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-#endif /* CONFIG_SCHED_MC */
-
-/* Common values for CPUs */
-#ifndef SD_CPU_INIT
-#define SD_CPU_INIT (struct sched_domain) {				\
-	.min_interval		= 1,					\
-	.max_interval		= 4,					\
-	.busy_factor		= 64,					\
-	.imbalance_pct		= 125,					\
-	.cache_nice_tries	= 1,					\
-	.busy_idx		= 2,					\
-	.idle_idx		= 1,					\
-	.newidle_idx		= 0,					\
-	.wake_idx		= 0,					\
-	.forkexec_idx		= 0,					\
-									\
-	.flags			= 1*SD_LOAD_BALANCE			\
-				| 1*SD_BALANCE_NEWIDLE			\
-				| 1*SD_BALANCE_EXEC			\
-				| 1*SD_BALANCE_FORK			\
-				| 0*SD_BALANCE_WAKE			\
-				| 1*SD_WAKE_AFFINE			\
-				| 0*SD_SHARE_CPUPOWER			\
-				| 0*SD_SHARE_PKG_RESOURCES		\
-				| 0*SD_SERIALIZE			\
-				| 1*SD_PREFER_SIBLING			\
-				,					\
-	.last_balance		= jiffies,				\
-	.balance_interval	= 1,					\
-	.max_newidle_lb_cost	= 0,					\
-	.next_decay_max_lb_cost	= jiffies,				\
-}
-#endif
-
-#ifdef CONFIG_SCHED_BOOK
-#ifndef SD_BOOK_INIT
-#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
-#endif
-#endif /* CONFIG_SCHED_BOOK */
-
 #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
 DECLARE_PER_CPU(int, numa_node);
 
@@ -295,4 +180,17 @@ static inline int cpu_to_mem(int cpu)
 #define topology_core_cpumask(cpu)		cpumask_of(cpu)
 #endif
 
+#ifdef CONFIG_SCHED_SMT
+static inline const struct cpumask *cpu_smt_mask(int cpu)
+{
+	return topology_thread_cpumask(cpu);
+}
+#endif
+
+static inline const struct cpumask *cpu_cpu_mask(int cpu)
+{
+	return cpumask_of_node(cpu_to_node(cpu));
+}
+
+
 #endif /* _LINUX_TOPOLOGY_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ae365aa..3397bcb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5603,17 +5603,6 @@ static int __init isolated_cpu_setup(char *str)
 
 __setup("isolcpus=", isolated_cpu_setup);
 
-static const struct cpumask *cpu_cpu_mask(int cpu)
-{
-	return cpumask_of_node(cpu_to_node(cpu));
-}
-
-struct sd_data {
-	struct sched_domain **__percpu sd;
-	struct sched_group **__percpu sg;
-	struct sched_group_power **__percpu sgp;
-};
-
 struct s_data {
 	struct sched_domain ** __percpu sd;
 	struct root_domain	*rd;
@@ -5626,21 +5615,6 @@ enum s_alloc {
 	sa_none,
 };
 
-struct sched_domain_topology_level;
-
-typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
-typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
-
-#define SDTL_OVERLAP	0x01
-
-struct sched_domain_topology_level {
-	sched_domain_init_f init;
-	sched_domain_mask_f mask;
-	int		    flags;
-	int		    numa_level;
-	struct sd_data      data;
-};
-
 /*
  * Build an iteration mask that can exclude certain CPUs from the upwards
  * domain traversal.
@@ -5869,34 +5843,6 @@ int __weak arch_sd_sibling_asym_packing(void)
  * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
  */
 
-#ifdef CONFIG_SCHED_DEBUG
-# define SD_INIT_NAME(sd, type)		sd->name = #type
-#else
-# define SD_INIT_NAME(sd, type)		do { } while (0)
-#endif
-
-#define SD_INIT_FUNC(type)						\
-static noinline struct sched_domain *					\
-sd_init_##type(struct sched_domain_topology_level *tl, int cpu) 	\
-{									\
-	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);	\
-	*sd = SD_##type##_INIT;						\
-	SD_INIT_NAME(sd, type);						\
-	sd->private = &tl->data;					\
-	return sd;							\
-}
-
-SD_INIT_FUNC(CPU)
-#ifdef CONFIG_SCHED_SMT
- SD_INIT_FUNC(SIBLING)
-#endif
-#ifdef CONFIG_SCHED_MC
- SD_INIT_FUNC(MC)
-#endif
-#ifdef CONFIG_SCHED_BOOK
- SD_INIT_FUNC(BOOK)
-#endif
-
 static int default_relax_domain_level = -1;
 int sched_domain_level_max;
 
@@ -5984,97 +5930,156 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
 		*per_cpu_ptr(sdd->sgp, cpu) = NULL;
 }
 
-#ifdef CONFIG_SCHED_SMT
-static const struct cpumask *cpu_smt_mask(int cpu)
-{
-	return topology_thread_cpumask(cpu);
-}
-#endif
-
-/*
- * Topology list, bottom-up.
- */
-static struct sched_domain_topology_level default_topology[] = {
-#ifdef CONFIG_SCHED_SMT
-	{ sd_init_SIBLING, cpu_smt_mask, },
-#endif
-#ifdef CONFIG_SCHED_MC
-	{ sd_init_MC, cpu_coregroup_mask, },
-#endif
-#ifdef CONFIG_SCHED_BOOK
-	{ sd_init_BOOK, cpu_book_mask, },
-#endif
-	{ sd_init_CPU, cpu_cpu_mask, },
-	{ NULL, },
-};
-
-static struct sched_domain_topology_level *sched_domain_topology = default_topology;
-
-#define for_each_sd_topology(tl)			\
-	for (tl = sched_domain_topology; tl->init; tl++)
-
 #ifdef CONFIG_NUMA
-
 static int sched_domains_numa_levels;
 static int *sched_domains_numa_distance;
 static struct cpumask ***sched_domains_numa_masks;
 static int sched_domains_curr_level;
+#endif
 
-static inline int sd_local_flags(int level)
-{
-	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
-		return 0;
-
-	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
-}
+/*
+ * SD_flags allowed in topology descriptions.
+ *
+ * SD_SHARE_CPUPOWER      - describes SMT topologies
+ * SD_SHARE_PKG_RESOURCES - describes shared caches
+ * SD_NUMA                - describes NUMA topologies
+ *
+ * Odd one out:
+ * SD_ASYM_PACKING        - describes SMT quirks
+ */
+#define TOPOLOGY_SD_FLAGS		\
+	(SD_SHARE_CPUPOWER |		\
+	 SD_SHARE_PKG_RESOURCES |	\
+	 SD_NUMA |			\
+	 SD_ASYM_PACKING)
 
 static struct sched_domain *
-sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
+sd_init(struct sched_domain_topology_level *tl, int cpu)
 {
 	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
-	int level = tl->numa_level;
-	int sd_weight = cpumask_weight(
-			sched_domains_numa_masks[level][cpu_to_node(cpu)]);
+	int sd_weight, sd_flags = 0;
+
+#ifdef CONFIG_NUMA
+	/*
+	 * Ugly hack to pass state to sd_numa_mask()...
+	 */
+	sched_domains_curr_level = tl->numa_level;
+#endif
+
+	sd_weight = cpumask_weight(tl->mask(cpu));
+
+	if (tl->sd_flags)
+		sd_flags = (*tl->sd_flags)();
+	if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
+			"wrong sd_flags in topology description\n"))
+		sd_flags &= ~TOPOLOGY_SD_FLAGS;
 
 	*sd = (struct sched_domain){
 		.min_interval		= sd_weight,
 		.max_interval		= 2*sd_weight,
 		.busy_factor		= 32,
 		.imbalance_pct		= 125,
-		.cache_nice_tries	= 2,
-		.busy_idx		= 3,
-		.idle_idx		= 2,
+
+		.cache_nice_tries	= 0,
+		.busy_idx		= 0,
+		.idle_idx		= 0,
 		.newidle_idx		= 0,
 		.wake_idx		= 0,
 		.forkexec_idx		= 0,
 
 		.flags			= 1*SD_LOAD_BALANCE
 					| 1*SD_BALANCE_NEWIDLE
-					| 0*SD_BALANCE_EXEC
-					| 0*SD_BALANCE_FORK
+					| 1*SD_BALANCE_EXEC
+					| 1*SD_BALANCE_FORK
 					| 0*SD_BALANCE_WAKE
-					| 0*SD_WAKE_AFFINE
+					| 1*SD_WAKE_AFFINE
 					| 0*SD_SHARE_CPUPOWER
 					| 0*SD_SHARE_PKG_RESOURCES
-					| 1*SD_SERIALIZE
+					| 0*SD_SERIALIZE
 					| 0*SD_PREFER_SIBLING
-					| 1*SD_NUMA
-					| sd_local_flags(level)
+					| 0*SD_NUMA
+					| sd_flags
 					,
+
 		.last_balance		= jiffies,
 		.balance_interval	= sd_weight,
+		.smt_gain		= 0,
+		.max_newidle_lb_cost	= 0,
+		.next_decay_max_lb_cost	= jiffies,
+#ifdef CONFIG_SCHED_DEBUG
+		.name			= tl->name,
+#endif
 	};
-	SD_INIT_NAME(sd, NUMA);
-	sd->private = &tl->data;
 
 	/*
-	 * Ugly hack to pass state to sd_numa_mask()...
+	 * Convert topological properties into behaviour.
 	 */
-	sched_domains_curr_level = tl->numa_level;
+
+	if (sd->flags & SD_SHARE_CPUPOWER) {
+		sd->imbalance_pct = 110;
+		sd->smt_gain = 1178; /* ~15% */
+		sd->flags |= arch_sd_sibling_asym_packing();
+
+	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
+		sd->imbalance_pct = 117;
+		sd->cache_nice_tries = 1;
+		sd->busy_idx = 2;
+
+#ifdef CONFIG_NUMA
+	} else if (sd->flags & SD_NUMA) {
+		sd->cache_nice_tries = 2;
+		sd->busy_idx = 3;
+		sd->idle_idx = 2;
+
+		sd->flags |= SD_SERIALIZE;
+		if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
+			sd->flags &= ~(SD_BALANCE_EXEC |
+				       SD_BALANCE_FORK |
+				       SD_WAKE_AFFINE);
+		}
+
+#endif
+	} else {
+		sd->flags |= SD_PREFER_SIBLING;
+		sd->cache_nice_tries = 1;
+		sd->busy_idx = 2;
+		sd->idle_idx = 1;
+	}
+
+	sd->private = &tl->data;
 
 	return sd;
 }
 
+/*
+ * Topology list, bottom-up.
+ */
+static struct sched_domain_topology_level default_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+#ifdef CONFIG_SCHED_BOOK
+	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
+struct sched_domain_topology_level *sched_domain_topology = default_topology;
+
+#define for_each_sd_topology(tl)			\
+	for (tl = sched_domain_topology; tl->mask; tl++)
+
+void set_sched_topology(struct sched_domain_topology_level *tl)
+{
+	sched_domain_topology = tl;
+}
+
+#ifdef CONFIG_NUMA
+
 static const struct cpumask *sd_numa_mask(int cpu)
 {
 	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
@@ -6218,7 +6223,10 @@ static void sched_init_numa(void)
 		}
 	}
 
-	tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
+	/* Compute default topology size */
+	for (i = 0; sched_domain_topology[i].mask; i++);
+
+	tl = kzalloc((i + level) *
 			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
 	if (!tl)
 		return;
@@ -6226,18 +6234,19 @@ static void sched_init_numa(void)
 	/*
 	 * Copy the default topology bits..
 	 */
-	for (i = 0; default_topology[i].init; i++)
-		tl[i] = default_topology[i];
+	for (i = 0; sched_domain_topology[i].mask; i++)
+		tl[i] = sched_domain_topology[i];
 
 	/*
 	 * .. and append 'j' levels of NUMA goodness.
 	 */
 	for (j = 0; j < level; i++, j++) {
 		tl[i] = (struct sched_domain_topology_level){
-			.init = sd_numa_init,
 			.mask = sd_numa_mask,
+			.sd_flags = SD_NUMA,
 			.flags = SDTL_OVERLAP,
 			.numa_level = j,
+			SD_INIT_NAME(NUMA)
 		};
 	}
 
@@ -6395,7 +6404,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
 		const struct cpumask *cpu_map, struct sched_domain_attr *attr,
 		struct sched_domain *child, int cpu)
 {
-	struct sched_domain *sd = tl->init(tl, cpu);
+	struct sched_domain *sd = sd_init(tl, cpu);
 	if (!sd)
 		return child;
 
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 3/7] sched: s390: create a dedicated topology table
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

BOOK level is only relevant for s390 so we create a dedicated topology table
with BOOK level and remove it from default table.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/s390/include/asm/topology.h | 11 +----------
 arch/s390/kernel/topology.c      | 20 ++++++++++++++++++++
 kernel/sched/core.c              |  3 ---
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index 07763bd..56af530 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -26,21 +26,12 @@ extern struct cpu_topology_s390 cpu_topology[NR_CPUS];
 
 #define mc_capable() 1
 
-static inline const struct cpumask *cpu_coregroup_mask(int cpu)
-{
-	return &cpu_topology[cpu].core_mask;
-}
-
-static inline const struct cpumask *cpu_book_mask(int cpu)
-{
-	return &cpu_topology[cpu].book_mask;
-}
-
 int topology_cpu_init(struct cpu *);
 int topology_set_cpu_management(int fc);
 void topology_schedule_update(void);
 void store_topology(struct sysinfo_15_1_x *info);
 void topology_expect_change(void);
+const struct cpumask *cpu_coregroup_mask(int cpu);
 
 #else /* CONFIG_SCHED_BOOK */
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 4b2e3e3..ceddd77 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -443,6 +443,23 @@ int topology_cpu_init(struct cpu *cpu)
 	return sysfs_create_group(&cpu->dev.kobj, &topology_cpu_attr_group);
 }
 
+const struct cpumask *cpu_coregroup_mask(int cpu)
+{
+	return &cpu_topology[cpu].core_mask;
+}
+
+static const struct cpumask *cpu_book_mask(int cpu)
+{
+	return &cpu_topology[cpu].book_mask;
+}
+
+static struct sched_domain_topology_level s390_topology[] = {
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 static int __init topology_init(void)
 {
 	if (!MACHINE_HAS_TOPOLOGY) {
@@ -452,6 +469,9 @@ static int __init topology_init(void)
 	set_topology_timer();
 out:
 	update_cpu_masks();
+
+	set_sched_topology(s390_topology);
+
 	return device_create_file(cpu_subsys.dev_root, &dev_attr_dispatching);
 }
 device_initcall(topology_init);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3397bcb..f2bfa76 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6061,9 +6061,6 @@ static struct sched_domain_topology_level default_topology[] = {
 #ifdef CONFIG_SCHED_MC
 	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
 #endif
-#ifdef CONFIG_SCHED_BOOK
-	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
-#endif
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },
 };
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 3/7] sched: s390: create a dedicated topology table
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

BOOK level is only relevant for s390 so we create a dedicated topology table
with BOOK level and remove it from default table.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/s390/include/asm/topology.h | 11 +----------
 arch/s390/kernel/topology.c      | 20 ++++++++++++++++++++
 kernel/sched/core.c              |  3 ---
 3 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
index 07763bd..56af530 100644
--- a/arch/s390/include/asm/topology.h
+++ b/arch/s390/include/asm/topology.h
@@ -26,21 +26,12 @@ extern struct cpu_topology_s390 cpu_topology[NR_CPUS];
 
 #define mc_capable() 1
 
-static inline const struct cpumask *cpu_coregroup_mask(int cpu)
-{
-	return &cpu_topology[cpu].core_mask;
-}
-
-static inline const struct cpumask *cpu_book_mask(int cpu)
-{
-	return &cpu_topology[cpu].book_mask;
-}
-
 int topology_cpu_init(struct cpu *);
 int topology_set_cpu_management(int fc);
 void topology_schedule_update(void);
 void store_topology(struct sysinfo_15_1_x *info);
 void topology_expect_change(void);
+const struct cpumask *cpu_coregroup_mask(int cpu);
 
 #else /* CONFIG_SCHED_BOOK */
 
diff --git a/arch/s390/kernel/topology.c b/arch/s390/kernel/topology.c
index 4b2e3e3..ceddd77 100644
--- a/arch/s390/kernel/topology.c
+++ b/arch/s390/kernel/topology.c
@@ -443,6 +443,23 @@ int topology_cpu_init(struct cpu *cpu)
 	return sysfs_create_group(&cpu->dev.kobj, &topology_cpu_attr_group);
 }
 
+const struct cpumask *cpu_coregroup_mask(int cpu)
+{
+	return &cpu_topology[cpu].core_mask;
+}
+
+static const struct cpumask *cpu_book_mask(int cpu)
+{
+	return &cpu_topology[cpu].book_mask;
+}
+
+static struct sched_domain_topology_level s390_topology[] = {
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 static int __init topology_init(void)
 {
 	if (!MACHINE_HAS_TOPOLOGY) {
@@ -452,6 +469,9 @@ static int __init topology_init(void)
 	set_topology_timer();
 out:
 	update_cpu_masks();
+
+	set_sched_topology(s390_topology);
+
 	return device_create_file(cpu_subsys.dev_root, &dev_attr_dispatching);
 }
 device_initcall(topology_init);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 3397bcb..f2bfa76 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6061,9 +6061,6 @@ static struct sched_domain_topology_level default_topology[] = {
 #ifdef CONFIG_SCHED_MC
 	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
 #endif
-#ifdef CONFIG_SCHED_BOOK
-	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
-#endif
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },
 };
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 4/7] sched: powerpc: create a dedicated topology table
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

Create a dedicated topology table for handling asymetric feature of powerpc.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/powerpc/kernel/smp.c | 31 +++++++++++++++++++++++--------
 include/linux/sched.h     |  2 --
 kernel/sched/core.c       |  6 ------
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..c9cade5 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -755,6 +755,28 @@ int setup_profiling_timer(unsigned int multiplier)
 	return 0;
 }
 
+#ifdef CONFIG_SCHED_SMT
+/* cpumask of CPUs with asymetric SMT dependancy */
+static const int powerpc_smt_flags(void)
+{
+	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+
+	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
+		flags |= SD_ASYM_PACKING;
+	}
+	return flags;
+}
+#endif
+
+static struct sched_domain_topology_level powerpc_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 void __init smp_cpus_done(unsigned int max_cpus)
 {
 	cpumask_var_t old_mask;
@@ -779,15 +801,8 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 	dump_numa_cpu_topology();
 
-}
+	set_sched_topology(powerpc_topology);
 
-int arch_sd_sibling_asym_packing(void)
-{
-	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
-		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
-		return SD_ASYM_PACKING;
-	}
-	return 0;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4db592a..6479de4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -868,8 +868,6 @@ enum cpu_idle_type {
 #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
 #define SD_NUMA			0x4000	/* cross-node balancing */
 
-extern int __weak arch_sd_sibiling_asym_packing(void);
-
 #ifdef CONFIG_SCHED_SMT
 static inline const int cpu_smt_flags(void)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f2bfa76..0b51ee3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5833,11 +5833,6 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 	atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
 }
 
-int __weak arch_sd_sibling_asym_packing(void)
-{
-       return 0*SD_ASYM_PACKING;
-}
-
 /*
  * Initializers for schedule domains
  * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
@@ -6018,7 +6013,6 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 	if (sd->flags & SD_SHARE_CPUPOWER) {
 		sd->imbalance_pct = 110;
 		sd->smt_gain = 1178; /* ~15% */
-		sd->flags |= arch_sd_sibling_asym_packing();
 
 	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
 		sd->imbalance_pct = 117;
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 4/7] sched: powerpc: create a dedicated topology table
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

Create a dedicated topology table for handling asymetric feature of powerpc.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/powerpc/kernel/smp.c | 31 +++++++++++++++++++++++--------
 include/linux/sched.h     |  2 --
 kernel/sched/core.c       |  6 ------
 3 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index ac2621a..c9cade5 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -755,6 +755,28 @@ int setup_profiling_timer(unsigned int multiplier)
 	return 0;
 }
 
+#ifdef CONFIG_SCHED_SMT
+/* cpumask of CPUs with asymetric SMT dependancy */
+static const int powerpc_smt_flags(void)
+{
+	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+
+	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
+		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
+		flags |= SD_ASYM_PACKING;
+	}
+	return flags;
+}
+#endif
+
+static struct sched_domain_topology_level powerpc_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 void __init smp_cpus_done(unsigned int max_cpus)
 {
 	cpumask_var_t old_mask;
@@ -779,15 +801,8 @@ void __init smp_cpus_done(unsigned int max_cpus)
 
 	dump_numa_cpu_topology();
 
-}
+	set_sched_topology(powerpc_topology);
 
-int arch_sd_sibling_asym_packing(void)
-{
-	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
-		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
-		return SD_ASYM_PACKING;
-	}
-	return 0;
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4db592a..6479de4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -868,8 +868,6 @@ enum cpu_idle_type {
 #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
 #define SD_NUMA			0x4000	/* cross-node balancing */
 
-extern int __weak arch_sd_sibiling_asym_packing(void);
-
 #ifdef CONFIG_SCHED_SMT
 static inline const int cpu_smt_flags(void)
 {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f2bfa76..0b51ee3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5833,11 +5833,6 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
 	atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
 }
 
-int __weak arch_sd_sibling_asym_packing(void)
-{
-       return 0*SD_ASYM_PACKING;
-}
-
 /*
  * Initializers for schedule domains
  * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
@@ -6018,7 +6013,6 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
 	if (sd->flags & SD_SHARE_CPUPOWER) {
 		sd->imbalance_pct = 110;
 		sd->smt_gain = 1178; /* ~15% */
-		sd->flags |= arch_sd_sibling_asym_packing();
 
 	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
 		sd->imbalance_pct = 117;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
in a sched_domain level can or not reach different power state. As an example,
the flag should be cleared at CPU level if groups of cores can be power gated
independently. This information can be used to add load balancing level between
group of CPUs than can power gate independantly. The default behavior of the
scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
into all sched_domains.
This flag is part of the topology flags that can be set by arch.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h | 1 +
 kernel/sched/core.c   | 9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6479de4..7048369 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -861,6 +861,7 @@ enum cpu_idle_type {
 #define SD_BALANCE_WAKE		0x0010  /* Balance on wakeup */
 #define SD_WAKE_AFFINE		0x0020	/* Wake task to waking CPU */
 #define SD_SHARE_CPUPOWER	0x0080	/* Domain members share cpu power */
+#define SD_SHARE_POWERDOMAIN	0x0100	/* Domain members share power domain */
 #define SD_SHARE_PKG_RESOURCES	0x0200	/* Domain members share cpu pkg resources */
 #define SD_SERIALIZE		0x0400	/* Only a single load balancing instance */
 #define SD_ASYM_PACKING		0x0800  /* Place busy groups earlier in the domain */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0b51ee3..224ec3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
 			 SD_SHARE_CPUPOWER |
-			 SD_SHARE_PKG_RESOURCES)) {
+			 SD_SHARE_PKG_RESOURCES |
+			 SD_SHARE_POWERDOMAIN)) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
@@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 				SD_BALANCE_EXEC |
 				SD_SHARE_CPUPOWER |
 				SD_SHARE_PKG_RESOURCES |
-				SD_PREFER_SIBLING);
+				SD_PREFER_SIBLING |
+				SD_SHARE_POWERDOMAIN);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}
@@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
 	(SD_SHARE_CPUPOWER |		\
 	 SD_SHARE_PKG_RESOURCES |	\
 	 SD_NUMA |			\
-	 SD_ASYM_PACKING)
+	 SD_ASYM_PACKING |		\
+	 SD_SHARE_POWERDOMAIN)
 
 static struct sched_domain *
 sd_init(struct sched_domain_topology_level *tl, int cpu)
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
in a sched_domain level can or not reach different power state. As an example,
the flag should be cleared at CPU level if groups of cores can be power gated
independently. This information can be used to add load balancing level between
group of CPUs than can power gate independantly. The default behavior of the
scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
into all sched_domains.
This flag is part of the topology flags that can be set by arch.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h | 1 +
 kernel/sched/core.c   | 9 ++++++---
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6479de4..7048369 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -861,6 +861,7 @@ enum cpu_idle_type {
 #define SD_BALANCE_WAKE		0x0010  /* Balance on wakeup */
 #define SD_WAKE_AFFINE		0x0020	/* Wake task to waking CPU */
 #define SD_SHARE_CPUPOWER	0x0080	/* Domain members share cpu power */
+#define SD_SHARE_POWERDOMAIN	0x0100	/* Domain members share power domain */
 #define SD_SHARE_PKG_RESOURCES	0x0200	/* Domain members share cpu pkg resources */
 #define SD_SERIALIZE		0x0400	/* Only a single load balancing instance */
 #define SD_ASYM_PACKING		0x0800  /* Place busy groups earlier in the domain */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0b51ee3..224ec3b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
 			 SD_BALANCE_FORK |
 			 SD_BALANCE_EXEC |
 			 SD_SHARE_CPUPOWER |
-			 SD_SHARE_PKG_RESOURCES)) {
+			 SD_SHARE_PKG_RESOURCES |
+			 SD_SHARE_POWERDOMAIN)) {
 		if (sd->groups != sd->groups->next)
 			return 0;
 	}
@@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
 				SD_BALANCE_EXEC |
 				SD_SHARE_CPUPOWER |
 				SD_SHARE_PKG_RESOURCES |
-				SD_PREFER_SIBLING);
+				SD_PREFER_SIBLING |
+				SD_SHARE_POWERDOMAIN);
 		if (nr_node_ids == 1)
 			pflags &= ~SD_SERIALIZE;
 	}
@@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
 	(SD_SHARE_CPUPOWER |		\
 	 SD_SHARE_PKG_RESOURCES |	\
 	 SD_NUMA |			\
-	 SD_ASYM_PACKING)
+	 SD_ASYM_PACKING |		\
+	 SD_SHARE_POWERDOMAIN)
 
 static struct sched_domain *
 sd_init(struct sched_domain_topology_level *tl, int cpu)
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 6/7] sched: ARM: create a dedicated scheduler topology table
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

Create a dedicated topology table for ARM which will create new level to
differentiate CPUs that can or not powergate independantly from others.

The patch gives an example of how to add domain that will take advantage of
SD_SHARE_POWERDOMAIN.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm/kernel/topology.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 0bc94b1..71e1fec 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -185,6 +185,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 	return &cpu_topology[cpu].core_sibling;
 }
 
+/*
+ * The current assumption is that we can power gate each core independently.
+ * This will be superseded by DT binding once available.
+ */
+const struct cpumask *cpu_corepower_mask(int cpu)
+{
+	return &cpu_topology[cpu].thread_sibling;
+}
+
 static void update_siblings_masks(unsigned int cpuid)
 {
 	struct cputopo_arm *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
@@ -266,6 +275,20 @@ void store_cpu_topology(unsigned int cpuid)
 		cpu_topology[cpuid].socket_id, mpidr);
 }
 
+static inline const int cpu_corepower_flags(void)
+{
+	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
+}
+
+static struct sched_domain_topology_level arm_topology[] = {
+#ifdef CONFIG_SCHED_MC
+	{ cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
@@ -289,4 +312,7 @@ void __init init_cpu_topology(void)
 	smp_wmb();
 
 	parse_dt_topology();
+
+	/* Set scheduler topology descriptor */
+	set_sched_topology(arm_topology);
 }
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 6/7] sched: ARM: create a dedicated scheduler topology table
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

Create a dedicated topology table for ARM which will create new level to
differentiate CPUs that can or not powergate independantly from others.

The patch gives an example of how to add domain that will take advantage of
SD_SHARE_POWERDOMAIN.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/arm/kernel/topology.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 0bc94b1..71e1fec 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -185,6 +185,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
 	return &cpu_topology[cpu].core_sibling;
 }
 
+/*
+ * The current assumption is that we can power gate each core independently.
+ * This will be superseded by DT binding once available.
+ */
+const struct cpumask *cpu_corepower_mask(int cpu)
+{
+	return &cpu_topology[cpu].thread_sibling;
+}
+
 static void update_siblings_masks(unsigned int cpuid)
 {
 	struct cputopo_arm *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
@@ -266,6 +275,20 @@ void store_cpu_topology(unsigned int cpuid)
 		cpu_topology[cpuid].socket_id, mpidr);
 }
 
+static inline const int cpu_corepower_flags(void)
+{
+	return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
+}
+
+static struct sched_domain_topology_level arm_topology[] = {
+#ifdef CONFIG_SCHED_MC
+	{ cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 /*
  * init_cpu_topology is called at boot when only one cpu is running
  * which prevent simultaneous write access to cpu_topology array
@@ -289,4 +312,7 @@ void __init init_cpu_topology(void)
 	smp_wmb();
 
 	parse_dt_topology();
+
+	/* Set scheduler topology descriptor */
+	set_sched_topology(arm_topology);
 }
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
  2014-03-18 17:56 ` Vincent Guittot
@ 2014-03-18 17:56   ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel, Vincent Guittot

Set the power domain dependency at SMT level of Power8 but keep the flag
clear at CPU level. The goal is to consolidate tasks on the threads of a
core up to a level as decribed in the link below:
https://lkml.org/lkml/2014/3/12/16

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c9cade5..fbbac3c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -759,7 +759,7 @@ int setup_profiling_timer(unsigned int multiplier)
 /* cpumask of CPUs with asymetric SMT dependancy */
 static const int powerpc_smt_flags(void)
 {
-	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN;
 
 	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
 		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
-- 
1.9.0


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
@ 2014-03-18 17:56   ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-18 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

Set the power domain dependency at SMT level of Power8 but keep the flag
clear at CPU level. The goal is to consolidate tasks on the threads of a
core up to a level as decribed in the link below:
https://lkml.org/lkml/2014/3/12/16

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c9cade5..fbbac3c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -759,7 +759,7 @@ int setup_profiling_timer(unsigned int multiplier)
 /* cpumask of CPUs with asymetric SMT dependancy */
 static const int powerpc_smt_flags(void)
 {
-	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
+	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN;
 
 	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
 		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19  6:01     ` Preeti U Murthy
  -1 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19  6:01 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, dietmar.eggemann, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On 03/18/2014 11:26 PM, Vincent Guittot wrote:
> We replace the old way to configure the scheduler topology with a new method
> which enables a platform to declare additionnal level (if needed).
> 
> We still have a default topology table definition that can be used by platform
> that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
> be overwritten by an arch which wants to add new level where a load balance
> make sense like BOOK or powergating level.
> 
> For each level, we need a function pointer that returns cpumask for each cpu,
> a function pointer that returns the flags for the level and a name. Only flags
> that describe topology, can be set by an architecture. The current topology
> flags are:
>  SD_SHARE_CPUPOWER
>  SD_SHARE_PKG_RESOURCES
>  SD_NUMA
>  SD_ASYM_PACKING
> 
> Then, each level must be a subset on the next one. The build sequence of the
> sched_domain will take care of removing useless levels like those with 1 CPU
> and those with the same CPU span and relevant information for load balancing
> than its child.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  arch/ia64/include/asm/topology.h |  24 ----
>  arch/s390/include/asm/topology.h |   2 -
>  arch/tile/include/asm/topology.h |  33 ------
>  include/linux/sched.h            |  48 ++++++++
>  include/linux/topology.h         | 128 +++------------------
>  kernel/sched/core.c              | 235 ++++++++++++++++++++-------------------
>  6 files changed, 183 insertions(+), 287 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
> index 5cb55a1..3202aa7 100644
> --- a/arch/ia64/include/asm/topology.h
> +++ b/arch/ia64/include/asm/topology.h
> @@ -46,30 +46,6 @@
> 
>  void build_cpu_to_node_map(void);
> 
> -#define SD_CPU_INIT (struct sched_domain) {		\
> -	.parent			= NULL,			\
> -	.child			= NULL,			\
> -	.groups			= NULL,			\
> -	.min_interval		= 1,			\
> -	.max_interval		= 4,			\
> -	.busy_factor		= 64,			\
> -	.imbalance_pct		= 125,			\
> -	.cache_nice_tries	= 2,			\
> -	.busy_idx		= 2,			\
> -	.idle_idx		= 1,			\
> -	.newidle_idx		= 0,			\
> -	.wake_idx		= 0,			\
> -	.forkexec_idx		= 0,			\
> -	.flags			= SD_LOAD_BALANCE	\
> -				| SD_BALANCE_NEWIDLE	\
> -				| SD_BALANCE_EXEC	\
> -				| SD_BALANCE_FORK	\
> -				| SD_WAKE_AFFINE,	\
> -	.last_balance		= jiffies,		\
> -	.balance_interval	= 1,			\
> -	.nr_balance_failed	= 0,			\
> -}
> -
>  #endif /* CONFIG_NUMA */
> 
>  #ifdef CONFIG_SMP
> diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
> index 05425b1..07763bd 100644
> --- a/arch/s390/include/asm/topology.h
> +++ b/arch/s390/include/asm/topology.h
> @@ -64,8 +64,6 @@ static inline void s390_init_cpu_topology(void)
>  };
>  #endif
> 
> -#define SD_BOOK_INIT	SD_CPU_INIT
> -
>  #include <asm-generic/topology.h>
> 
>  #endif /* _ASM_S390_TOPOLOGY_H */
> diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
> index d15c0d8..9383118 100644
> --- a/arch/tile/include/asm/topology.h
> +++ b/arch/tile/include/asm/topology.h
> @@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
>  /* For now, use numa node -1 for global allocation. */
>  #define pcibus_to_node(bus)		((void)(bus), -1)
> 
> -/*
> - * TILE architecture has many cores integrated in one processor, so we need
> - * setup bigger balance_interval for both CPU/NODE scheduling domains to
> - * reduce process scheduling costs.
> - */
> -
> -/* sched_domains SD_CPU_INIT for TILE architecture */
> -#define SD_CPU_INIT (struct sched_domain) {				\
> -	.min_interval		= 4,					\
> -	.max_interval		= 128,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 125,					\
> -	.cache_nice_tries	= 1,					\
> -	.busy_idx		= 2,					\
> -	.idle_idx		= 1,					\
> -	.newidle_idx		= 0,					\
> -	.wake_idx		= 0,					\
> -	.forkexec_idx		= 0,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 0*SD_WAKE_AFFINE			\
> -				| 0*SD_SHARE_CPUPOWER			\
> -				| 0*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 32,					\
> -}
> -
>  /* By definition, we create nodes based on online memory. */
>  #define node_has_online_mem(nid) 1
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 825ed83..4db592a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -870,6 +870,20 @@ enum cpu_idle_type {
> 
>  extern int __weak arch_sd_sibiling_asym_packing(void);
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const int cpu_smt_flags(void)
> +{
> +	return SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
> +#ifdef CONFIG_SCHED_MC
> +static inline const int cpu_core_flags(void)
> +{
> +	return SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
>  struct sched_domain_attr {
>  	int relax_domain_level;
>  };
> @@ -976,6 +990,38 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
> 
>  bool cpus_share_cache(int this_cpu, int that_cpu);
> 
> +typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> +typedef const int (*sched_domain_flags_f)(void);
> +
> +#define SDTL_OVERLAP	0x01
> +
> +struct sd_data {
> +	struct sched_domain **__percpu sd;
> +	struct sched_group **__percpu sg;
> +	struct sched_group_power **__percpu sgp;
> +};
> +
> +struct sched_domain_topology_level {
> +	sched_domain_mask_f mask;
> +	sched_domain_flags_f sd_flags;
> +	int		    flags;
> +	int		    numa_level;
> +	struct sd_data      data;
> +#ifdef CONFIG_SCHED_DEBUG
> +	char                *name;
> +#endif
> +};
> +
> +extern struct sched_domain_topology_level *sched_domain_topology;
> +
> +extern void set_sched_topology(struct sched_domain_topology_level *tl);
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +# define SD_INIT_NAME(type)		.name = #type
> +#else
> +# define SD_INIT_NAME(type)
> +#endif
> +
>  #else /* CONFIG_SMP */
> 
>  struct sched_domain_attr;
> @@ -991,6 +1037,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)
>  	return true;
>  }
> 
> +static inline void set_sched_topology(struct sched_domain_topology_level *tl) { }
> +
>  #endif	/* !CONFIG_SMP */
> 
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 12ae6ce..3a9db05 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
>  #endif
> 
> -/*
> - * Below are the 3 major initializers used in building sched_domains:
> - * SD_SIBLING_INIT, for SMT domains
> - * SD_CPU_INIT, for SMP domains
> - *
> - * Any architecture that cares to do any tuning to these values should do so
> - * by defining their own arch-specific initializer in include/asm/topology.h.
> - * A definition there will automagically override these default initializers
> - * and allow arch-specific performance tuning of sched_domains.
> - * (Only non-zero and non-null fields need be specified.)
> - */
> -
> -#ifdef CONFIG_SCHED_SMT
> -/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
> - * so can't we drop this in favor of CONFIG_SCHED_SMT?
> - */
> -#define ARCH_HAS_SCHED_WAKE_IDLE
> -/* Common values for SMT siblings */
> -#ifndef SD_SIBLING_INIT
> -#define SD_SIBLING_INIT (struct sched_domain) {				\
> -	.min_interval		= 1,					\
> -	.max_interval		= 2,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 110,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 1*SD_WAKE_AFFINE			\
> -				| 1*SD_SHARE_CPUPOWER			\
> -				| 1*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				| 0*SD_PREFER_SIBLING			\
> -				| arch_sd_sibling_asym_packing()	\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 1,					\
> -	.smt_gain		= 1178,	/* 15% */			\
> -	.max_newidle_lb_cost	= 0,					\
> -	.next_decay_max_lb_cost	= jiffies,				\
> -}
> -#endif
> -#endif /* CONFIG_SCHED_SMT */
> -
> -#ifdef CONFIG_SCHED_MC
> -/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
> -#ifndef SD_MC_INIT
> -#define SD_MC_INIT (struct sched_domain) {				\
> -	.min_interval		= 1,					\
> -	.max_interval		= 4,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 125,					\
> -	.cache_nice_tries	= 1,					\
> -	.busy_idx		= 2,					\
> -	.wake_idx		= 0,					\
> -	.forkexec_idx		= 0,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 1*SD_WAKE_AFFINE			\
> -				| 0*SD_SHARE_CPUPOWER			\
> -				| 1*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 1,					\
> -	.max_newidle_lb_cost	= 0,					\
> -	.next_decay_max_lb_cost	= jiffies,				\
> -}
> -#endif
> -#endif /* CONFIG_SCHED_MC */
> -
> -/* Common values for CPUs */
> -#ifndef SD_CPU_INIT
> -#define SD_CPU_INIT (struct sched_domain) {				\
> -	.min_interval		= 1,					\
> -	.max_interval		= 4,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 125,					\
> -	.cache_nice_tries	= 1,					\
> -	.busy_idx		= 2,					\
> -	.idle_idx		= 1,					\
> -	.newidle_idx		= 0,					\
> -	.wake_idx		= 0,					\
> -	.forkexec_idx		= 0,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 1*SD_WAKE_AFFINE			\
> -				| 0*SD_SHARE_CPUPOWER			\
> -				| 0*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				| 1*SD_PREFER_SIBLING			\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 1,					\
> -	.max_newidle_lb_cost	= 0,					\
> -	.next_decay_max_lb_cost	= jiffies,				\
> -}
> -#endif
> -
> -#ifdef CONFIG_SCHED_BOOK
> -#ifndef SD_BOOK_INIT
> -#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
> -#endif
> -#endif /* CONFIG_SCHED_BOOK */
> -
>  #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
>  DECLARE_PER_CPU(int, numa_node);
> 
> @@ -295,4 +180,17 @@ static inline int cpu_to_mem(int cpu)
>  #define topology_core_cpumask(cpu)		cpumask_of(cpu)
>  #endif
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const struct cpumask *cpu_smt_mask(int cpu)
> +{
> +	return topology_thread_cpumask(cpu);
> +}
> +#endif
> +
> +static inline const struct cpumask *cpu_cpu_mask(int cpu)
> +{
> +	return cpumask_of_node(cpu_to_node(cpu));
> +}
> +
> +
>  #endif /* _LINUX_TOPOLOGY_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ae365aa..3397bcb 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5603,17 +5603,6 @@ static int __init isolated_cpu_setup(char *str)
> 
>  __setup("isolcpus=", isolated_cpu_setup);
> 
> -static const struct cpumask *cpu_cpu_mask(int cpu)
> -{
> -	return cpumask_of_node(cpu_to_node(cpu));
> -}
> -
> -struct sd_data {
> -	struct sched_domain **__percpu sd;
> -	struct sched_group **__percpu sg;
> -	struct sched_group_power **__percpu sgp;
> -};
> -
>  struct s_data {
>  	struct sched_domain ** __percpu sd;
>  	struct root_domain	*rd;
> @@ -5626,21 +5615,6 @@ enum s_alloc {
>  	sa_none,
>  };
> 
> -struct sched_domain_topology_level;
> -
> -typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
> -typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> -
> -#define SDTL_OVERLAP	0x01
> -
> -struct sched_domain_topology_level {
> -	sched_domain_init_f init;
> -	sched_domain_mask_f mask;
> -	int		    flags;
> -	int		    numa_level;
> -	struct sd_data      data;
> -};
> -
>  /*
>   * Build an iteration mask that can exclude certain CPUs from the upwards
>   * domain traversal.
> @@ -5869,34 +5843,6 @@ int __weak arch_sd_sibling_asym_packing(void)
>   * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
>   */
> 
> -#ifdef CONFIG_SCHED_DEBUG
> -# define SD_INIT_NAME(sd, type)		sd->name = #type
> -#else
> -# define SD_INIT_NAME(sd, type)		do { } while (0)
> -#endif
> -
> -#define SD_INIT_FUNC(type)						\
> -static noinline struct sched_domain *					\
> -sd_init_##type(struct sched_domain_topology_level *tl, int cpu) 	\
> -{									\
> -	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);	\
> -	*sd = SD_##type##_INIT;						\
> -	SD_INIT_NAME(sd, type);						\
> -	sd->private = &tl->data;					\
> -	return sd;							\
> -}
> -
> -SD_INIT_FUNC(CPU)
> -#ifdef CONFIG_SCHED_SMT
> - SD_INIT_FUNC(SIBLING)
> -#endif
> -#ifdef CONFIG_SCHED_MC
> - SD_INIT_FUNC(MC)
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> - SD_INIT_FUNC(BOOK)
> -#endif
> -
>  static int default_relax_domain_level = -1;
>  int sched_domain_level_max;
> 
> @@ -5984,97 +5930,156 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
>  		*per_cpu_ptr(sdd->sgp, cpu) = NULL;
>  }
> 
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *cpu_smt_mask(int cpu)
> -{
> -	return topology_thread_cpumask(cpu);
> -}
> -#endif
> -
> -/*
> - * Topology list, bottom-up.
> - */
> -static struct sched_domain_topology_level default_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -	{ sd_init_SIBLING, cpu_smt_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_MC
> -	{ sd_init_MC, cpu_coregroup_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> -	{ sd_init_BOOK, cpu_book_mask, },
> -#endif
> -	{ sd_init_CPU, cpu_cpu_mask, },
> -	{ NULL, },
> -};
> -
> -static struct sched_domain_topology_level *sched_domain_topology = default_topology;
> -
> -#define for_each_sd_topology(tl)			\
> -	for (tl = sched_domain_topology; tl->init; tl++)
> -
>  #ifdef CONFIG_NUMA
> -
>  static int sched_domains_numa_levels;
>  static int *sched_domains_numa_distance;
>  static struct cpumask ***sched_domains_numa_masks;
>  static int sched_domains_curr_level;
> +#endif
> 
> -static inline int sd_local_flags(int level)
> -{
> -	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
> -		return 0;
> -
> -	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
> -}
> +/*
> + * SD_flags allowed in topology descriptions.
> + *
> + * SD_SHARE_CPUPOWER      - describes SMT topologies
> + * SD_SHARE_PKG_RESOURCES - describes shared caches
> + * SD_NUMA                - describes NUMA topologies
> + *
> + * Odd one out:
> + * SD_ASYM_PACKING        - describes SMT quirks
> + */
> +#define TOPOLOGY_SD_FLAGS		\
> +	(SD_SHARE_CPUPOWER |		\
> +	 SD_SHARE_PKG_RESOURCES |	\
> +	 SD_NUMA |			\
> +	 SD_ASYM_PACKING)
> 
>  static struct sched_domain *
> -sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
> +sd_init(struct sched_domain_topology_level *tl, int cpu)
>  {
>  	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
> -	int level = tl->numa_level;
> -	int sd_weight = cpumask_weight(
> -			sched_domains_numa_masks[level][cpu_to_node(cpu)]);
> +	int sd_weight, sd_flags = 0;
> +
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * Ugly hack to pass state to sd_numa_mask()...
> +	 */
> +	sched_domains_curr_level = tl->numa_level;
> +#endif
> +
> +	sd_weight = cpumask_weight(tl->mask(cpu));
> +
> +	if (tl->sd_flags)
> +		sd_flags = (*tl->sd_flags)();
> +	if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> +			"wrong sd_flags in topology description\n"))
> +		sd_flags &= ~TOPOLOGY_SD_FLAGS;
> 
>  	*sd = (struct sched_domain){
>  		.min_interval		= sd_weight,
>  		.max_interval		= 2*sd_weight,
>  		.busy_factor		= 32,
>  		.imbalance_pct		= 125,
> -		.cache_nice_tries	= 2,
> -		.busy_idx		= 3,
> -		.idle_idx		= 2,
> +
> +		.cache_nice_tries	= 0,
> +		.busy_idx		= 0,
> +		.idle_idx		= 0,
>  		.newidle_idx		= 0,
>  		.wake_idx		= 0,
>  		.forkexec_idx		= 0,
> 
>  		.flags			= 1*SD_LOAD_BALANCE
>  					| 1*SD_BALANCE_NEWIDLE
> -					| 0*SD_BALANCE_EXEC
> -					| 0*SD_BALANCE_FORK
> +					| 1*SD_BALANCE_EXEC
> +					| 1*SD_BALANCE_FORK
>  					| 0*SD_BALANCE_WAKE
> -					| 0*SD_WAKE_AFFINE
> +					| 1*SD_WAKE_AFFINE
>  					| 0*SD_SHARE_CPUPOWER
>  					| 0*SD_SHARE_PKG_RESOURCES
> -					| 1*SD_SERIALIZE
> +					| 0*SD_SERIALIZE
>  					| 0*SD_PREFER_SIBLING
> -					| 1*SD_NUMA
> -					| sd_local_flags(level)
> +					| 0*SD_NUMA
> +					| sd_flags
>  					,
> +
>  		.last_balance		= jiffies,
>  		.balance_interval	= sd_weight,
> +		.smt_gain		= 0,
> +		.max_newidle_lb_cost	= 0,
> +		.next_decay_max_lb_cost	= jiffies,
> +#ifdef CONFIG_SCHED_DEBUG
> +		.name			= tl->name,
> +#endif
>  	};
> -	SD_INIT_NAME(sd, NUMA);
> -	sd->private = &tl->data;
> 
>  	/*
> -	 * Ugly hack to pass state to sd_numa_mask()...
> +	 * Convert topological properties into behaviour.
>  	 */
> -	sched_domains_curr_level = tl->numa_level;
> +
> +	if (sd->flags & SD_SHARE_CPUPOWER) {
> +		sd->imbalance_pct = 110;
> +		sd->smt_gain = 1178; /* ~15% */
> +		sd->flags |= arch_sd_sibling_asym_packing();
> +
> +	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> +		sd->imbalance_pct = 117;
> +		sd->cache_nice_tries = 1;
> +		sd->busy_idx = 2;
> +
> +#ifdef CONFIG_NUMA
> +	} else if (sd->flags & SD_NUMA) {
> +		sd->cache_nice_tries = 2;
> +		sd->busy_idx = 3;
> +		sd->idle_idx = 2;
> +
> +		sd->flags |= SD_SERIALIZE;
> +		if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
> +			sd->flags &= ~(SD_BALANCE_EXEC |
> +				       SD_BALANCE_FORK |
> +				       SD_WAKE_AFFINE);
> +		}
> +
> +#endif
> +	} else {
> +		sd->flags |= SD_PREFER_SIBLING;
> +		sd->cache_nice_tries = 1;
> +		sd->busy_idx = 2;
> +		sd->idle_idx = 1;
> +	}
> +
> +	sd->private = &tl->data;
> 
>  	return sd;
>  }
> 
> +/*
> + * Topology list, bottom-up.
> + */
> +static struct sched_domain_topology_level default_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +#ifdef CONFIG_SCHED_MC
> +	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> +#endif
> +#ifdef CONFIG_SCHED_BOOK
> +	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
> +#endif
> +	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +	{ NULL, },
> +};
> +
> +struct sched_domain_topology_level *sched_domain_topology = default_topology;
> +
> +#define for_each_sd_topology(tl)			\
> +	for (tl = sched_domain_topology; tl->mask; tl++)
> +
> +void set_sched_topology(struct sched_domain_topology_level *tl)
> +{
> +	sched_domain_topology = tl;
> +}
> +
> +#ifdef CONFIG_NUMA
> +
>  static const struct cpumask *sd_numa_mask(int cpu)
>  {
>  	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
> @@ -6218,7 +6223,10 @@ static void sched_init_numa(void)
>  		}
>  	}
> 
> -	tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
> +	/* Compute default topology size */
> +	for (i = 0; sched_domain_topology[i].mask; i++);
> +
> +	tl = kzalloc((i + level) *
>  			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
>  	if (!tl)
>  		return;
> @@ -6226,18 +6234,19 @@ static void sched_init_numa(void)
>  	/*
>  	 * Copy the default topology bits..
>  	 */
> -	for (i = 0; default_topology[i].init; i++)
> -		tl[i] = default_topology[i];
> +	for (i = 0; sched_domain_topology[i].mask; i++)
> +		tl[i] = sched_domain_topology[i];
> 
>  	/*
>  	 * .. and append 'j' levels of NUMA goodness.
>  	 */
>  	for (j = 0; j < level; i++, j++) {
>  		tl[i] = (struct sched_domain_topology_level){
> -			.init = sd_numa_init,
>  			.mask = sd_numa_mask,
> +			.sd_flags = SD_NUMA,
>  			.flags = SDTL_OVERLAP,
>  			.numa_level = j,
> +			SD_INIT_NAME(NUMA)
>  		};
>  	}
> 
> @@ -6395,7 +6404,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
>  		const struct cpumask *cpu_map, struct sched_domain_attr *attr,
>  		struct sched_domain *child, int cpu)
>  {
> -	struct sched_domain *sd = tl->init(tl, cpu);
> +	struct sched_domain *sd = sd_init(tl, cpu);
>  	if (!sd)
>  		return child;
> 
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19  6:01     ` Preeti U Murthy
  0 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19  6:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/18/2014 11:26 PM, Vincent Guittot wrote:
> We replace the old way to configure the scheduler topology with a new method
> which enables a platform to declare additionnal level (if needed).
> 
> We still have a default topology table definition that can be used by platform
> that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
> be overwritten by an arch which wants to add new level where a load balance
> make sense like BOOK or powergating level.
> 
> For each level, we need a function pointer that returns cpumask for each cpu,
> a function pointer that returns the flags for the level and a name. Only flags
> that describe topology, can be set by an architecture. The current topology
> flags are:
>  SD_SHARE_CPUPOWER
>  SD_SHARE_PKG_RESOURCES
>  SD_NUMA
>  SD_ASYM_PACKING
> 
> Then, each level must be a subset on the next one. The build sequence of the
> sched_domain will take care of removing useless levels like those with 1 CPU
> and those with the same CPU span and relevant information for load balancing
> than its child.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  arch/ia64/include/asm/topology.h |  24 ----
>  arch/s390/include/asm/topology.h |   2 -
>  arch/tile/include/asm/topology.h |  33 ------
>  include/linux/sched.h            |  48 ++++++++
>  include/linux/topology.h         | 128 +++------------------
>  kernel/sched/core.c              | 235 ++++++++++++++++++++-------------------
>  6 files changed, 183 insertions(+), 287 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
> index 5cb55a1..3202aa7 100644
> --- a/arch/ia64/include/asm/topology.h
> +++ b/arch/ia64/include/asm/topology.h
> @@ -46,30 +46,6 @@
> 
>  void build_cpu_to_node_map(void);
> 
> -#define SD_CPU_INIT (struct sched_domain) {		\
> -	.parent			= NULL,			\
> -	.child			= NULL,			\
> -	.groups			= NULL,			\
> -	.min_interval		= 1,			\
> -	.max_interval		= 4,			\
> -	.busy_factor		= 64,			\
> -	.imbalance_pct		= 125,			\
> -	.cache_nice_tries	= 2,			\
> -	.busy_idx		= 2,			\
> -	.idle_idx		= 1,			\
> -	.newidle_idx		= 0,			\
> -	.wake_idx		= 0,			\
> -	.forkexec_idx		= 0,			\
> -	.flags			= SD_LOAD_BALANCE	\
> -				| SD_BALANCE_NEWIDLE	\
> -				| SD_BALANCE_EXEC	\
> -				| SD_BALANCE_FORK	\
> -				| SD_WAKE_AFFINE,	\
> -	.last_balance		= jiffies,		\
> -	.balance_interval	= 1,			\
> -	.nr_balance_failed	= 0,			\
> -}
> -
>  #endif /* CONFIG_NUMA */
> 
>  #ifdef CONFIG_SMP
> diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
> index 05425b1..07763bd 100644
> --- a/arch/s390/include/asm/topology.h
> +++ b/arch/s390/include/asm/topology.h
> @@ -64,8 +64,6 @@ static inline void s390_init_cpu_topology(void)
>  };
>  #endif
> 
> -#define SD_BOOK_INIT	SD_CPU_INIT
> -
>  #include <asm-generic/topology.h>
> 
>  #endif /* _ASM_S390_TOPOLOGY_H */
> diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
> index d15c0d8..9383118 100644
> --- a/arch/tile/include/asm/topology.h
> +++ b/arch/tile/include/asm/topology.h
> @@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
>  /* For now, use numa node -1 for global allocation. */
>  #define pcibus_to_node(bus)		((void)(bus), -1)
> 
> -/*
> - * TILE architecture has many cores integrated in one processor, so we need
> - * setup bigger balance_interval for both CPU/NODE scheduling domains to
> - * reduce process scheduling costs.
> - */
> -
> -/* sched_domains SD_CPU_INIT for TILE architecture */
> -#define SD_CPU_INIT (struct sched_domain) {				\
> -	.min_interval		= 4,					\
> -	.max_interval		= 128,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 125,					\
> -	.cache_nice_tries	= 1,					\
> -	.busy_idx		= 2,					\
> -	.idle_idx		= 1,					\
> -	.newidle_idx		= 0,					\
> -	.wake_idx		= 0,					\
> -	.forkexec_idx		= 0,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 0*SD_WAKE_AFFINE			\
> -				| 0*SD_SHARE_CPUPOWER			\
> -				| 0*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 32,					\
> -}
> -
>  /* By definition, we create nodes based on online memory. */
>  #define node_has_online_mem(nid) 1
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 825ed83..4db592a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -870,6 +870,20 @@ enum cpu_idle_type {
> 
>  extern int __weak arch_sd_sibiling_asym_packing(void);
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const int cpu_smt_flags(void)
> +{
> +	return SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
> +#ifdef CONFIG_SCHED_MC
> +static inline const int cpu_core_flags(void)
> +{
> +	return SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
>  struct sched_domain_attr {
>  	int relax_domain_level;
>  };
> @@ -976,6 +990,38 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
> 
>  bool cpus_share_cache(int this_cpu, int that_cpu);
> 
> +typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> +typedef const int (*sched_domain_flags_f)(void);
> +
> +#define SDTL_OVERLAP	0x01
> +
> +struct sd_data {
> +	struct sched_domain **__percpu sd;
> +	struct sched_group **__percpu sg;
> +	struct sched_group_power **__percpu sgp;
> +};
> +
> +struct sched_domain_topology_level {
> +	sched_domain_mask_f mask;
> +	sched_domain_flags_f sd_flags;
> +	int		    flags;
> +	int		    numa_level;
> +	struct sd_data      data;
> +#ifdef CONFIG_SCHED_DEBUG
> +	char                *name;
> +#endif
> +};
> +
> +extern struct sched_domain_topology_level *sched_domain_topology;
> +
> +extern void set_sched_topology(struct sched_domain_topology_level *tl);
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +# define SD_INIT_NAME(type)		.name = #type
> +#else
> +# define SD_INIT_NAME(type)
> +#endif
> +
>  #else /* CONFIG_SMP */
> 
>  struct sched_domain_attr;
> @@ -991,6 +1037,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)
>  	return true;
>  }
> 
> +static inline void set_sched_topology(struct sched_domain_topology_level *tl) { }
> +
>  #endif	/* !CONFIG_SMP */
> 
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 12ae6ce..3a9db05 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
>  #endif
> 
> -/*
> - * Below are the 3 major initializers used in building sched_domains:
> - * SD_SIBLING_INIT, for SMT domains
> - * SD_CPU_INIT, for SMP domains
> - *
> - * Any architecture that cares to do any tuning to these values should do so
> - * by defining their own arch-specific initializer in include/asm/topology.h.
> - * A definition there will automagically override these default initializers
> - * and allow arch-specific performance tuning of sched_domains.
> - * (Only non-zero and non-null fields need be specified.)
> - */
> -
> -#ifdef CONFIG_SCHED_SMT
> -/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
> - * so can't we drop this in favor of CONFIG_SCHED_SMT?
> - */
> -#define ARCH_HAS_SCHED_WAKE_IDLE
> -/* Common values for SMT siblings */
> -#ifndef SD_SIBLING_INIT
> -#define SD_SIBLING_INIT (struct sched_domain) {				\
> -	.min_interval		= 1,					\
> -	.max_interval		= 2,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 110,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 1*SD_WAKE_AFFINE			\
> -				| 1*SD_SHARE_CPUPOWER			\
> -				| 1*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				| 0*SD_PREFER_SIBLING			\
> -				| arch_sd_sibling_asym_packing()	\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 1,					\
> -	.smt_gain		= 1178,	/* 15% */			\
> -	.max_newidle_lb_cost	= 0,					\
> -	.next_decay_max_lb_cost	= jiffies,				\
> -}
> -#endif
> -#endif /* CONFIG_SCHED_SMT */
> -
> -#ifdef CONFIG_SCHED_MC
> -/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
> -#ifndef SD_MC_INIT
> -#define SD_MC_INIT (struct sched_domain) {				\
> -	.min_interval		= 1,					\
> -	.max_interval		= 4,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 125,					\
> -	.cache_nice_tries	= 1,					\
> -	.busy_idx		= 2,					\
> -	.wake_idx		= 0,					\
> -	.forkexec_idx		= 0,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 1*SD_WAKE_AFFINE			\
> -				| 0*SD_SHARE_CPUPOWER			\
> -				| 1*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 1,					\
> -	.max_newidle_lb_cost	= 0,					\
> -	.next_decay_max_lb_cost	= jiffies,				\
> -}
> -#endif
> -#endif /* CONFIG_SCHED_MC */
> -
> -/* Common values for CPUs */
> -#ifndef SD_CPU_INIT
> -#define SD_CPU_INIT (struct sched_domain) {				\
> -	.min_interval		= 1,					\
> -	.max_interval		= 4,					\
> -	.busy_factor		= 64,					\
> -	.imbalance_pct		= 125,					\
> -	.cache_nice_tries	= 1,					\
> -	.busy_idx		= 2,					\
> -	.idle_idx		= 1,					\
> -	.newidle_idx		= 0,					\
> -	.wake_idx		= 0,					\
> -	.forkexec_idx		= 0,					\
> -									\
> -	.flags			= 1*SD_LOAD_BALANCE			\
> -				| 1*SD_BALANCE_NEWIDLE			\
> -				| 1*SD_BALANCE_EXEC			\
> -				| 1*SD_BALANCE_FORK			\
> -				| 0*SD_BALANCE_WAKE			\
> -				| 1*SD_WAKE_AFFINE			\
> -				| 0*SD_SHARE_CPUPOWER			\
> -				| 0*SD_SHARE_PKG_RESOURCES		\
> -				| 0*SD_SERIALIZE			\
> -				| 1*SD_PREFER_SIBLING			\
> -				,					\
> -	.last_balance		= jiffies,				\
> -	.balance_interval	= 1,					\
> -	.max_newidle_lb_cost	= 0,					\
> -	.next_decay_max_lb_cost	= jiffies,				\
> -}
> -#endif
> -
> -#ifdef CONFIG_SCHED_BOOK
> -#ifndef SD_BOOK_INIT
> -#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
> -#endif
> -#endif /* CONFIG_SCHED_BOOK */
> -
>  #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
>  DECLARE_PER_CPU(int, numa_node);
> 
> @@ -295,4 +180,17 @@ static inline int cpu_to_mem(int cpu)
>  #define topology_core_cpumask(cpu)		cpumask_of(cpu)
>  #endif
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const struct cpumask *cpu_smt_mask(int cpu)
> +{
> +	return topology_thread_cpumask(cpu);
> +}
> +#endif
> +
> +static inline const struct cpumask *cpu_cpu_mask(int cpu)
> +{
> +	return cpumask_of_node(cpu_to_node(cpu));
> +}
> +
> +
>  #endif /* _LINUX_TOPOLOGY_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ae365aa..3397bcb 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5603,17 +5603,6 @@ static int __init isolated_cpu_setup(char *str)
> 
>  __setup("isolcpus=", isolated_cpu_setup);
> 
> -static const struct cpumask *cpu_cpu_mask(int cpu)
> -{
> -	return cpumask_of_node(cpu_to_node(cpu));
> -}
> -
> -struct sd_data {
> -	struct sched_domain **__percpu sd;
> -	struct sched_group **__percpu sg;
> -	struct sched_group_power **__percpu sgp;
> -};
> -
>  struct s_data {
>  	struct sched_domain ** __percpu sd;
>  	struct root_domain	*rd;
> @@ -5626,21 +5615,6 @@ enum s_alloc {
>  	sa_none,
>  };
> 
> -struct sched_domain_topology_level;
> -
> -typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
> -typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> -
> -#define SDTL_OVERLAP	0x01
> -
> -struct sched_domain_topology_level {
> -	sched_domain_init_f init;
> -	sched_domain_mask_f mask;
> -	int		    flags;
> -	int		    numa_level;
> -	struct sd_data      data;
> -};
> -
>  /*
>   * Build an iteration mask that can exclude certain CPUs from the upwards
>   * domain traversal.
> @@ -5869,34 +5843,6 @@ int __weak arch_sd_sibling_asym_packing(void)
>   * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
>   */
> 
> -#ifdef CONFIG_SCHED_DEBUG
> -# define SD_INIT_NAME(sd, type)		sd->name = #type
> -#else
> -# define SD_INIT_NAME(sd, type)		do { } while (0)
> -#endif
> -
> -#define SD_INIT_FUNC(type)						\
> -static noinline struct sched_domain *					\
> -sd_init_##type(struct sched_domain_topology_level *tl, int cpu) 	\
> -{									\
> -	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);	\
> -	*sd = SD_##type##_INIT;						\
> -	SD_INIT_NAME(sd, type);						\
> -	sd->private = &tl->data;					\
> -	return sd;							\
> -}
> -
> -SD_INIT_FUNC(CPU)
> -#ifdef CONFIG_SCHED_SMT
> - SD_INIT_FUNC(SIBLING)
> -#endif
> -#ifdef CONFIG_SCHED_MC
> - SD_INIT_FUNC(MC)
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> - SD_INIT_FUNC(BOOK)
> -#endif
> -
>  static int default_relax_domain_level = -1;
>  int sched_domain_level_max;
> 
> @@ -5984,97 +5930,156 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
>  		*per_cpu_ptr(sdd->sgp, cpu) = NULL;
>  }
> 
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *cpu_smt_mask(int cpu)
> -{
> -	return topology_thread_cpumask(cpu);
> -}
> -#endif
> -
> -/*
> - * Topology list, bottom-up.
> - */
> -static struct sched_domain_topology_level default_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -	{ sd_init_SIBLING, cpu_smt_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_MC
> -	{ sd_init_MC, cpu_coregroup_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> -	{ sd_init_BOOK, cpu_book_mask, },
> -#endif
> -	{ sd_init_CPU, cpu_cpu_mask, },
> -	{ NULL, },
> -};
> -
> -static struct sched_domain_topology_level *sched_domain_topology = default_topology;
> -
> -#define for_each_sd_topology(tl)			\
> -	for (tl = sched_domain_topology; tl->init; tl++)
> -
>  #ifdef CONFIG_NUMA
> -
>  static int sched_domains_numa_levels;
>  static int *sched_domains_numa_distance;
>  static struct cpumask ***sched_domains_numa_masks;
>  static int sched_domains_curr_level;
> +#endif
> 
> -static inline int sd_local_flags(int level)
> -{
> -	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
> -		return 0;
> -
> -	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
> -}
> +/*
> + * SD_flags allowed in topology descriptions.
> + *
> + * SD_SHARE_CPUPOWER      - describes SMT topologies
> + * SD_SHARE_PKG_RESOURCES - describes shared caches
> + * SD_NUMA                - describes NUMA topologies
> + *
> + * Odd one out:
> + * SD_ASYM_PACKING        - describes SMT quirks
> + */
> +#define TOPOLOGY_SD_FLAGS		\
> +	(SD_SHARE_CPUPOWER |		\
> +	 SD_SHARE_PKG_RESOURCES |	\
> +	 SD_NUMA |			\
> +	 SD_ASYM_PACKING)
> 
>  static struct sched_domain *
> -sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
> +sd_init(struct sched_domain_topology_level *tl, int cpu)
>  {
>  	struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
> -	int level = tl->numa_level;
> -	int sd_weight = cpumask_weight(
> -			sched_domains_numa_masks[level][cpu_to_node(cpu)]);
> +	int sd_weight, sd_flags = 0;
> +
> +#ifdef CONFIG_NUMA
> +	/*
> +	 * Ugly hack to pass state to sd_numa_mask()...
> +	 */
> +	sched_domains_curr_level = tl->numa_level;
> +#endif
> +
> +	sd_weight = cpumask_weight(tl->mask(cpu));
> +
> +	if (tl->sd_flags)
> +		sd_flags = (*tl->sd_flags)();
> +	if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> +			"wrong sd_flags in topology description\n"))
> +		sd_flags &= ~TOPOLOGY_SD_FLAGS;
> 
>  	*sd = (struct sched_domain){
>  		.min_interval		= sd_weight,
>  		.max_interval		= 2*sd_weight,
>  		.busy_factor		= 32,
>  		.imbalance_pct		= 125,
> -		.cache_nice_tries	= 2,
> -		.busy_idx		= 3,
> -		.idle_idx		= 2,
> +
> +		.cache_nice_tries	= 0,
> +		.busy_idx		= 0,
> +		.idle_idx		= 0,
>  		.newidle_idx		= 0,
>  		.wake_idx		= 0,
>  		.forkexec_idx		= 0,
> 
>  		.flags			= 1*SD_LOAD_BALANCE
>  					| 1*SD_BALANCE_NEWIDLE
> -					| 0*SD_BALANCE_EXEC
> -					| 0*SD_BALANCE_FORK
> +					| 1*SD_BALANCE_EXEC
> +					| 1*SD_BALANCE_FORK
>  					| 0*SD_BALANCE_WAKE
> -					| 0*SD_WAKE_AFFINE
> +					| 1*SD_WAKE_AFFINE
>  					| 0*SD_SHARE_CPUPOWER
>  					| 0*SD_SHARE_PKG_RESOURCES
> -					| 1*SD_SERIALIZE
> +					| 0*SD_SERIALIZE
>  					| 0*SD_PREFER_SIBLING
> -					| 1*SD_NUMA
> -					| sd_local_flags(level)
> +					| 0*SD_NUMA
> +					| sd_flags
>  					,
> +
>  		.last_balance		= jiffies,
>  		.balance_interval	= sd_weight,
> +		.smt_gain		= 0,
> +		.max_newidle_lb_cost	= 0,
> +		.next_decay_max_lb_cost	= jiffies,
> +#ifdef CONFIG_SCHED_DEBUG
> +		.name			= tl->name,
> +#endif
>  	};
> -	SD_INIT_NAME(sd, NUMA);
> -	sd->private = &tl->data;
> 
>  	/*
> -	 * Ugly hack to pass state to sd_numa_mask()...
> +	 * Convert topological properties into behaviour.
>  	 */
> -	sched_domains_curr_level = tl->numa_level;
> +
> +	if (sd->flags & SD_SHARE_CPUPOWER) {
> +		sd->imbalance_pct = 110;
> +		sd->smt_gain = 1178; /* ~15% */
> +		sd->flags |= arch_sd_sibling_asym_packing();
> +
> +	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> +		sd->imbalance_pct = 117;
> +		sd->cache_nice_tries = 1;
> +		sd->busy_idx = 2;
> +
> +#ifdef CONFIG_NUMA
> +	} else if (sd->flags & SD_NUMA) {
> +		sd->cache_nice_tries = 2;
> +		sd->busy_idx = 3;
> +		sd->idle_idx = 2;
> +
> +		sd->flags |= SD_SERIALIZE;
> +		if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
> +			sd->flags &= ~(SD_BALANCE_EXEC |
> +				       SD_BALANCE_FORK |
> +				       SD_WAKE_AFFINE);
> +		}
> +
> +#endif
> +	} else {
> +		sd->flags |= SD_PREFER_SIBLING;
> +		sd->cache_nice_tries = 1;
> +		sd->busy_idx = 2;
> +		sd->idle_idx = 1;
> +	}
> +
> +	sd->private = &tl->data;
> 
>  	return sd;
>  }
> 
> +/*
> + * Topology list, bottom-up.
> + */
> +static struct sched_domain_topology_level default_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +#ifdef CONFIG_SCHED_MC
> +	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> +#endif
> +#ifdef CONFIG_SCHED_BOOK
> +	{ cpu_book_mask, SD_INIT_NAME(BOOK) },
> +#endif
> +	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +	{ NULL, },
> +};
> +
> +struct sched_domain_topology_level *sched_domain_topology = default_topology;
> +
> +#define for_each_sd_topology(tl)			\
> +	for (tl = sched_domain_topology; tl->mask; tl++)
> +
> +void set_sched_topology(struct sched_domain_topology_level *tl)
> +{
> +	sched_domain_topology = tl;
> +}
> +
> +#ifdef CONFIG_NUMA
> +
>  static const struct cpumask *sd_numa_mask(int cpu)
>  {
>  	return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
> @@ -6218,7 +6223,10 @@ static void sched_init_numa(void)
>  		}
>  	}
> 
> -	tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
> +	/* Compute default topology size */
> +	for (i = 0; sched_domain_topology[i].mask; i++);
> +
> +	tl = kzalloc((i + level) *
>  			sizeof(struct sched_domain_topology_level), GFP_KERNEL);
>  	if (!tl)
>  		return;
> @@ -6226,18 +6234,19 @@ static void sched_init_numa(void)
>  	/*
>  	 * Copy the default topology bits..
>  	 */
> -	for (i = 0; default_topology[i].init; i++)
> -		tl[i] = default_topology[i];
> +	for (i = 0; sched_domain_topology[i].mask; i++)
> +		tl[i] = sched_domain_topology[i];
> 
>  	/*
>  	 * .. and append 'j' levels of NUMA goodness.
>  	 */
>  	for (j = 0; j < level; i++, j++) {
>  		tl[i] = (struct sched_domain_topology_level){
> -			.init = sd_numa_init,
>  			.mask = sd_numa_mask,
> +			.sd_flags = SD_NUMA,
>  			.flags = SDTL_OVERLAP,
>  			.numa_level = j,
> +			SD_INIT_NAME(NUMA)
>  		};
>  	}
> 
> @@ -6395,7 +6404,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
>  		const struct cpumask *cpu_map, struct sched_domain_attr *attr,
>  		struct sched_domain *child, int cpu)
>  {
> -	struct sched_domain *sd = tl->init(tl, cpu);
> +	struct sched_domain *sd = sd_init(tl, cpu);
>  	if (!sd)
>  		return child;
> 
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 4/7] sched: powerpc: create a dedicated topology table
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19  6:04     ` Preeti U Murthy
  -1 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19  6:04 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, dietmar.eggemann, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On 03/18/2014 11:26 PM, Vincent Guittot wrote:
> Create a dedicated topology table for handling asymetric feature of powerpc.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  arch/powerpc/kernel/smp.c | 31 +++++++++++++++++++++++--------
>  include/linux/sched.h     |  2 --
>  kernel/sched/core.c       |  6 ------
>  3 files changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index ac2621a..c9cade5 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -755,6 +755,28 @@ int setup_profiling_timer(unsigned int multiplier)
>  	return 0;
>  }
> 
> +#ifdef CONFIG_SCHED_SMT
> +/* cpumask of CPUs with asymetric SMT dependancy */
> +static const int powerpc_smt_flags(void)
> +{
> +	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
> +
> +	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> +		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> +		flags |= SD_ASYM_PACKING;
> +	}
> +	return flags;
> +}
> +#endif
> +
> +static struct sched_domain_topology_level powerpc_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +	{ NULL, },
> +};
> +
>  void __init smp_cpus_done(unsigned int max_cpus)
>  {
>  	cpumask_var_t old_mask;
> @@ -779,15 +801,8 @@ void __init smp_cpus_done(unsigned int max_cpus)
> 
>  	dump_numa_cpu_topology();
> 
> -}
> +	set_sched_topology(powerpc_topology);
> 
> -int arch_sd_sibling_asym_packing(void)
> -{
> -	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> -		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> -		return SD_ASYM_PACKING;
> -	}
> -	return 0;
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4db592a..6479de4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -868,8 +868,6 @@ enum cpu_idle_type {
>  #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
>  #define SD_NUMA			0x4000	/* cross-node balancing */
> 
> -extern int __weak arch_sd_sibiling_asym_packing(void);
> -
>  #ifdef CONFIG_SCHED_SMT
>  static inline const int cpu_smt_flags(void)
>  {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2bfa76..0b51ee3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5833,11 +5833,6 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
>  	atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
>  }
> 
> -int __weak arch_sd_sibling_asym_packing(void)
> -{
> -       return 0*SD_ASYM_PACKING;
> -}
> -
>  /*
>   * Initializers for schedule domains
>   * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
> @@ -6018,7 +6013,6 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
>  	if (sd->flags & SD_SHARE_CPUPOWER) {
>  		sd->imbalance_pct = 110;
>  		sd->smt_gain = 1178; /* ~15% */
> -		sd->flags |= arch_sd_sibling_asym_packing();
> 
>  	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
>  		sd->imbalance_pct = 117;
> 
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 4/7] sched: powerpc: create a dedicated topology table
@ 2014-03-19  6:04     ` Preeti U Murthy
  0 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19  6:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/18/2014 11:26 PM, Vincent Guittot wrote:
> Create a dedicated topology table for handling asymetric feature of powerpc.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  arch/powerpc/kernel/smp.c | 31 +++++++++++++++++++++++--------
>  include/linux/sched.h     |  2 --
>  kernel/sched/core.c       |  6 ------
>  3 files changed, 23 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index ac2621a..c9cade5 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -755,6 +755,28 @@ int setup_profiling_timer(unsigned int multiplier)
>  	return 0;
>  }
> 
> +#ifdef CONFIG_SCHED_SMT
> +/* cpumask of CPUs with asymetric SMT dependancy */
> +static const int powerpc_smt_flags(void)
> +{
> +	int flags = SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
> +
> +	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> +		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> +		flags |= SD_ASYM_PACKING;
> +	}
> +	return flags;
> +}
> +#endif
> +
> +static struct sched_domain_topology_level powerpc_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +	{ cpu_smt_mask, powerpc_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +	{ NULL, },
> +};
> +
>  void __init smp_cpus_done(unsigned int max_cpus)
>  {
>  	cpumask_var_t old_mask;
> @@ -779,15 +801,8 @@ void __init smp_cpus_done(unsigned int max_cpus)
> 
>  	dump_numa_cpu_topology();
> 
> -}
> +	set_sched_topology(powerpc_topology);
> 
> -int arch_sd_sibling_asym_packing(void)
> -{
> -	if (cpu_has_feature(CPU_FTR_ASYM_SMT)) {
> -		printk_once(KERN_INFO "Enabling Asymmetric SMT scheduling\n");
> -		return SD_ASYM_PACKING;
> -	}
> -	return 0;
>  }
> 
>  #ifdef CONFIG_HOTPLUG_CPU
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 4db592a..6479de4 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -868,8 +868,6 @@ enum cpu_idle_type {
>  #define SD_OVERLAP		0x2000	/* sched_domains of this level overlap */
>  #define SD_NUMA			0x4000	/* cross-node balancing */
> 
> -extern int __weak arch_sd_sibiling_asym_packing(void);
> -
>  #ifdef CONFIG_SCHED_SMT
>  static inline const int cpu_smt_flags(void)
>  {
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f2bfa76..0b51ee3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5833,11 +5833,6 @@ static void init_sched_groups_power(int cpu, struct sched_domain *sd)
>  	atomic_set(&sg->sgp->nr_busy_cpus, sg->group_weight);
>  }
> 
> -int __weak arch_sd_sibling_asym_packing(void)
> -{
> -       return 0*SD_ASYM_PACKING;
> -}
> -
>  /*
>   * Initializers for schedule domains
>   * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
> @@ -6018,7 +6013,6 @@ sd_init(struct sched_domain_topology_level *tl, int cpu)
>  	if (sd->flags & SD_SHARE_CPUPOWER) {
>  		sd->imbalance_pct = 110;
>  		sd->smt_gain = 1178; /* ~15% */
> -		sd->flags |= arch_sd_sibling_asym_packing();
> 
>  	} else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
>  		sd->imbalance_pct = 117;
> 
Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19  6:21     ` Preeti U Murthy
  -1 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19  6:21 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, dietmar.eggemann, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

Hi Vincent,

On 03/18/2014 11:26 PM, Vincent Guittot wrote:
> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
> in a sched_domain level can or not reach different power state. As an example,
> the flag should be cleared at CPU level if groups of cores can be power gated
> independently. This information can be used to add load balancing level between
> group of CPUs than can power gate independantly. The default behavior of the
> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
> into all sched_domains.

I don't see this flag being set either in sd_init() or in
default_topology[]. Should not the default_topology[] flag setting
routines set this flag at every level of sched domain along with other
topology flags, unless the arch wants to override it?

Regards
Preeti U Murthy
> This flag is part of the topology flags that can be set by arch.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  include/linux/sched.h | 1 +
>  kernel/sched/core.c   | 9 ++++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6479de4..7048369 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>  #define SD_BALANCE_WAKE		0x0010  /* Balance on wakeup */
>  #define SD_WAKE_AFFINE		0x0020	/* Wake task to waking CPU */
>  #define SD_SHARE_CPUPOWER	0x0080	/* Domain members share cpu power */
> +#define SD_SHARE_POWERDOMAIN	0x0100	/* Domain members share power domain */
>  #define SD_SHARE_PKG_RESOURCES	0x0200	/* Domain members share cpu pkg resources */
>  #define SD_SERIALIZE		0x0400	/* Only a single load balancing instance */
>  #define SD_ASYM_PACKING		0x0800  /* Place busy groups earlier in the domain */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0b51ee3..224ec3b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>  			 SD_BALANCE_FORK |
>  			 SD_BALANCE_EXEC |
>  			 SD_SHARE_CPUPOWER |
> -			 SD_SHARE_PKG_RESOURCES)) {
> +			 SD_SHARE_PKG_RESOURCES |
> +			 SD_SHARE_POWERDOMAIN)) {
>  		if (sd->groups != sd->groups->next)
>  			return 0;
>  	}
> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>  				SD_BALANCE_EXEC |
>  				SD_SHARE_CPUPOWER |
>  				SD_SHARE_PKG_RESOURCES |
> -				SD_PREFER_SIBLING);
> +				SD_PREFER_SIBLING |
> +				SD_SHARE_POWERDOMAIN);
>  		if (nr_node_ids == 1)
>  			pflags &= ~SD_SERIALIZE;
>  	}
> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>  	(SD_SHARE_CPUPOWER |		\
>  	 SD_SHARE_PKG_RESOURCES |	\
>  	 SD_NUMA |			\
> -	 SD_ASYM_PACKING)
> +	 SD_ASYM_PACKING |		\
> +	 SD_SHARE_POWERDOMAIN)
> 
>  static struct sched_domain *
>  sd_init(struct sched_domain_topology_level *tl, int cpu)
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19  6:21     ` Preeti U Murthy
  0 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19  6:21 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Vincent,

On 03/18/2014 11:26 PM, Vincent Guittot wrote:
> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
> in a sched_domain level can or not reach different power state. As an example,
> the flag should be cleared at CPU level if groups of cores can be power gated
> independently. This information can be used to add load balancing level between
> group of CPUs than can power gate independantly. The default behavior of the
> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
> into all sched_domains.

I don't see this flag being set either in sd_init() or in
default_topology[]. Should not the default_topology[] flag setting
routines set this flag at every level of sched domain along with other
topology flags, unless the arch wants to override it?

Regards
Preeti U Murthy
> This flag is part of the topology flags that can be set by arch.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  include/linux/sched.h | 1 +
>  kernel/sched/core.c   | 9 ++++++---
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6479de4..7048369 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>  #define SD_BALANCE_WAKE		0x0010  /* Balance on wakeup */
>  #define SD_WAKE_AFFINE		0x0020	/* Wake task to waking CPU */
>  #define SD_SHARE_CPUPOWER	0x0080	/* Domain members share cpu power */
> +#define SD_SHARE_POWERDOMAIN	0x0100	/* Domain members share power domain */
>  #define SD_SHARE_PKG_RESOURCES	0x0200	/* Domain members share cpu pkg resources */
>  #define SD_SERIALIZE		0x0400	/* Only a single load balancing instance */
>  #define SD_ASYM_PACKING		0x0800  /* Place busy groups earlier in the domain */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0b51ee3..224ec3b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>  			 SD_BALANCE_FORK |
>  			 SD_BALANCE_EXEC |
>  			 SD_SHARE_CPUPOWER |
> -			 SD_SHARE_PKG_RESOURCES)) {
> +			 SD_SHARE_PKG_RESOURCES |
> +			 SD_SHARE_POWERDOMAIN)) {
>  		if (sd->groups != sd->groups->next)
>  			return 0;
>  	}
> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>  				SD_BALANCE_EXEC |
>  				SD_SHARE_CPUPOWER |
>  				SD_SHARE_PKG_RESOURCES |
> -				SD_PREFER_SIBLING);
> +				SD_PREFER_SIBLING |
> +				SD_SHARE_POWERDOMAIN);
>  		if (nr_node_ids == 1)
>  			pflags &= ~SD_SERIALIZE;
>  	}
> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>  	(SD_SHARE_CPUPOWER |		\
>  	 SD_SHARE_PKG_RESOURCES |	\
>  	 SD_NUMA |			\
> -	 SD_ASYM_PACKING)
> +	 SD_ASYM_PACKING |		\
> +	 SD_SHARE_POWERDOMAIN)
> 
>  static struct sched_domain *
>  sd_init(struct sched_domain_topology_level *tl, int cpu)
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-19  6:21     ` Preeti U Murthy
@ 2014-03-19  9:52       ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19  9:52 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann,
	tony.luck, fenghua.yu, schwidefsky, james.hogan, cmetcalf,
	Benjamin Herrenschmidt, Russell King - ARM Linux, LAK,
	linaro-kernel

On 19 March 2014 07:21, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> Hi Vincent,
>
> On 03/18/2014 11:26 PM, Vincent Guittot wrote:
>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>> in a sched_domain level can or not reach different power state. As an example,
>> the flag should be cleared at CPU level if groups of cores can be power gated
>> independently. This information can be used to add load balancing level between
>> group of CPUs than can power gate independantly. The default behavior of the
>> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
>> into all sched_domains.
>
> I don't see this flag being set either in sd_init() or in
> default_topology[]. Should not the default_topology[] flag setting
> routines set this flag at every level of sched domain along with other
> topology flags, unless the arch wants to override it?

Hi Preeti

I have made the choice to not add it in the default table for the
moment because the scheduler behavior is not changed. It will be added
with patchset that will take advantage of this flag in the load
balance decision.

Regards,
Vincent

>
> Regards
> Preeti U Murthy
>> This flag is part of the topology flags that can be set by arch.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  include/linux/sched.h | 1 +
>>  kernel/sched/core.c   | 9 ++++++---
>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 6479de4..7048369 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>>  #define SD_BALANCE_WAKE              0x0010  /* Balance on wakeup */
>>  #define SD_WAKE_AFFINE               0x0020  /* Wake task to waking CPU */
>>  #define SD_SHARE_CPUPOWER    0x0080  /* Domain members share cpu power */
>> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>>  #define SD_SHARE_PKG_RESOURCES       0x0200  /* Domain members share cpu pkg resources */
>>  #define SD_SERIALIZE         0x0400  /* Only a single load balancing instance */
>>  #define SD_ASYM_PACKING              0x0800  /* Place busy groups earlier in the domain */
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 0b51ee3..224ec3b 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>>                        SD_BALANCE_FORK |
>>                        SD_BALANCE_EXEC |
>>                        SD_SHARE_CPUPOWER |
>> -                      SD_SHARE_PKG_RESOURCES)) {
>> +                      SD_SHARE_PKG_RESOURCES |
>> +                      SD_SHARE_POWERDOMAIN)) {
>>               if (sd->groups != sd->groups->next)
>>                       return 0;
>>       }
>> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>>                               SD_BALANCE_EXEC |
>>                               SD_SHARE_CPUPOWER |
>>                               SD_SHARE_PKG_RESOURCES |
>> -                             SD_PREFER_SIBLING);
>> +                             SD_PREFER_SIBLING |
>> +                             SD_SHARE_POWERDOMAIN);
>>               if (nr_node_ids == 1)
>>                       pflags &= ~SD_SERIALIZE;
>>       }
>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>       (SD_SHARE_CPUPOWER |            \
>>        SD_SHARE_PKG_RESOURCES |       \
>>        SD_NUMA |                      \
>> -      SD_ASYM_PACKING)
>> +      SD_ASYM_PACKING |              \
>> +      SD_SHARE_POWERDOMAIN)
>>
>>  static struct sched_domain *
>>  sd_init(struct sched_domain_topology_level *tl, int cpu)
>>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19  9:52       ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19  9:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 07:21, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> Hi Vincent,
>
> On 03/18/2014 11:26 PM, Vincent Guittot wrote:
>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>> in a sched_domain level can or not reach different power state. As an example,
>> the flag should be cleared at CPU level if groups of cores can be power gated
>> independently. This information can be used to add load balancing level between
>> group of CPUs than can power gate independantly. The default behavior of the
>> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
>> into all sched_domains.
>
> I don't see this flag being set either in sd_init() or in
> default_topology[]. Should not the default_topology[] flag setting
> routines set this flag at every level of sched domain along with other
> topology flags, unless the arch wants to override it?

Hi Preeti

I have made the choice to not add it in the default table for the
moment because the scheduler behavior is not changed. It will be added
with patchset that will take advantage of this flag in the load
balance decision.

Regards,
Vincent

>
> Regards
> Preeti U Murthy
>> This flag is part of the topology flags that can be set by arch.
>>
>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>> ---
>>  include/linux/sched.h | 1 +
>>  kernel/sched/core.c   | 9 ++++++---
>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 6479de4..7048369 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>>  #define SD_BALANCE_WAKE              0x0010  /* Balance on wakeup */
>>  #define SD_WAKE_AFFINE               0x0020  /* Wake task to waking CPU */
>>  #define SD_SHARE_CPUPOWER    0x0080  /* Domain members share cpu power */
>> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>>  #define SD_SHARE_PKG_RESOURCES       0x0200  /* Domain members share cpu pkg resources */
>>  #define SD_SERIALIZE         0x0400  /* Only a single load balancing instance */
>>  #define SD_ASYM_PACKING              0x0800  /* Place busy groups earlier in the domain */
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 0b51ee3..224ec3b 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>>                        SD_BALANCE_FORK |
>>                        SD_BALANCE_EXEC |
>>                        SD_SHARE_CPUPOWER |
>> -                      SD_SHARE_PKG_RESOURCES)) {
>> +                      SD_SHARE_PKG_RESOURCES |
>> +                      SD_SHARE_POWERDOMAIN)) {
>>               if (sd->groups != sd->groups->next)
>>                       return 0;
>>       }
>> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>>                               SD_BALANCE_EXEC |
>>                               SD_SHARE_CPUPOWER |
>>                               SD_SHARE_PKG_RESOURCES |
>> -                             SD_PREFER_SIBLING);
>> +                             SD_PREFER_SIBLING |
>> +                             SD_SHARE_POWERDOMAIN);
>>               if (nr_node_ids == 1)
>>                       pflags &= ~SD_SERIALIZE;
>>       }
>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>       (SD_SHARE_CPUPOWER |            \
>>        SD_SHARE_PKG_RESOURCES |       \
>>        SD_NUMA |                      \
>> -      SD_ASYM_PACKING)
>> +      SD_ASYM_PACKING |              \
>> +      SD_SHARE_POWERDOMAIN)
>>
>>  static struct sched_domain *
>>  sd_init(struct sched_domain_topology_level *tl, int cpu)
>>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-19  9:52       ` Vincent Guittot
@ 2014-03-19 11:05         ` Preeti U Murthy
  -1 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19 11:05 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann,
	tony.luck, fenghua.yu, schwidefsky, james.hogan, cmetcalf,
	Benjamin Herrenschmidt, Russell King - ARM Linux, LAK,
	linaro-kernel

On 03/19/2014 03:22 PM, Vincent Guittot wrote:
> On 19 March 2014 07:21, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> Hi Vincent,
>>
>> On 03/18/2014 11:26 PM, Vincent Guittot wrote:
>>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>>> in a sched_domain level can or not reach different power state. As an example,
>>> the flag should be cleared at CPU level if groups of cores can be power gated
>>> independently. This information can be used to add load balancing level between
>>> group of CPUs than can power gate independantly. The default behavior of the
>>> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
>>> into all sched_domains.
>>
>> I don't see this flag being set either in sd_init() or in
>> default_topology[]. Should not the default_topology[] flag setting
>> routines set this flag at every level of sched domain along with other
>> topology flags, unless the arch wants to override it?
> 
> Hi Preeti
> 
> I have made the choice to not add it in the default table for the
> moment because the scheduler behavior is not changed. It will be added
> with patchset that will take advantage of this flag in the load
> balance decision.

Ok if you are looking at setting this flag in the default topology table
then [patch 7/7]:sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
looks good to me. Please add my Reviewed-by to this patch.

However if you are looking at initializing this flag as being set by
default in sd_init() then the archs will have to revert the flag, rather
than set it in their respective topology tables for the sched domains
which have their groups power gated. In which case the    [patch 7/7]
would be incorrect.
   But wait, I see that you  mention that the topology level flags are
left to the archs to set if required. So I am assuming you will not set
the SD_SHARE_POWER_DOMAIN flag in sd_init() right?

Regards
Preeti U Murthy
> 
> Regards,
> Vincent
> 
>>
>> Regards
>> Preeti U Murthy
>>> This flag is part of the topology flags that can be set by arch.
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>  include/linux/sched.h | 1 +
>>>  kernel/sched/core.c   | 9 ++++++---
>>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 6479de4..7048369 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>>>  #define SD_BALANCE_WAKE              0x0010  /* Balance on wakeup */
>>>  #define SD_WAKE_AFFINE               0x0020  /* Wake task to waking CPU */
>>>  #define SD_SHARE_CPUPOWER    0x0080  /* Domain members share cpu power */
>>> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>>>  #define SD_SHARE_PKG_RESOURCES       0x0200  /* Domain members share cpu pkg resources */
>>>  #define SD_SERIALIZE         0x0400  /* Only a single load balancing instance */
>>>  #define SD_ASYM_PACKING              0x0800  /* Place busy groups earlier in the domain */
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 0b51ee3..224ec3b 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>>>                        SD_BALANCE_FORK |
>>>                        SD_BALANCE_EXEC |
>>>                        SD_SHARE_CPUPOWER |
>>> -                      SD_SHARE_PKG_RESOURCES)) {
>>> +                      SD_SHARE_PKG_RESOURCES |
>>> +                      SD_SHARE_POWERDOMAIN)) {
>>>               if (sd->groups != sd->groups->next)
>>>                       return 0;
>>>       }
>>> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>>>                               SD_BALANCE_EXEC |
>>>                               SD_SHARE_CPUPOWER |
>>>                               SD_SHARE_PKG_RESOURCES |
>>> -                             SD_PREFER_SIBLING);
>>> +                             SD_PREFER_SIBLING |
>>> +                             SD_SHARE_POWERDOMAIN);
>>>               if (nr_node_ids == 1)
>>>                       pflags &= ~SD_SERIALIZE;
>>>       }
>>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>>       (SD_SHARE_CPUPOWER |            \
>>>        SD_SHARE_PKG_RESOURCES |       \
>>>        SD_NUMA |                      \
>>> -      SD_ASYM_PACKING)
>>> +      SD_ASYM_PACKING |              \
>>> +      SD_SHARE_POWERDOMAIN)
>>>
>>>  static struct sched_domain *
>>>  sd_init(struct sched_domain_topology_level *tl, int cpu)
>>>
>>
> 


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19 11:05         ` Preeti U Murthy
  0 siblings, 0 replies; 55+ messages in thread
From: Preeti U Murthy @ 2014-03-19 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 03/19/2014 03:22 PM, Vincent Guittot wrote:
> On 19 March 2014 07:21, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>> Hi Vincent,
>>
>> On 03/18/2014 11:26 PM, Vincent Guittot wrote:
>>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>>> in a sched_domain level can or not reach different power state. As an example,
>>> the flag should be cleared at CPU level if groups of cores can be power gated
>>> independently. This information can be used to add load balancing level between
>>> group of CPUs than can power gate independantly. The default behavior of the
>>> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
>>> into all sched_domains.
>>
>> I don't see this flag being set either in sd_init() or in
>> default_topology[]. Should not the default_topology[] flag setting
>> routines set this flag at every level of sched domain along with other
>> topology flags, unless the arch wants to override it?
> 
> Hi Preeti
> 
> I have made the choice to not add it in the default table for the
> moment because the scheduler behavior is not changed. It will be added
> with patchset that will take advantage of this flag in the load
> balance decision.

Ok if you are looking at setting this flag in the default topology table
then [patch 7/7]:sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
looks good to me. Please add my Reviewed-by to this patch.

However if you are looking at initializing this flag as being set by
default in sd_init() then the archs will have to revert the flag, rather
than set it in their respective topology tables for the sched domains
which have their groups power gated. In which case the    [patch 7/7]
would be incorrect.
   But wait, I see that you  mention that the topology level flags are
left to the archs to set if required. So I am assuming you will not set
the SD_SHARE_POWER_DOMAIN flag in sd_init() right?

Regards
Preeti U Murthy
> 
> Regards,
> Vincent
> 
>>
>> Regards
>> Preeti U Murthy
>>> This flag is part of the topology flags that can be set by arch.
>>>
>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>> ---
>>>  include/linux/sched.h | 1 +
>>>  kernel/sched/core.c   | 9 ++++++---
>>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 6479de4..7048369 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>>>  #define SD_BALANCE_WAKE              0x0010  /* Balance on wakeup */
>>>  #define SD_WAKE_AFFINE               0x0020  /* Wake task to waking CPU */
>>>  #define SD_SHARE_CPUPOWER    0x0080  /* Domain members share cpu power */
>>> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>>>  #define SD_SHARE_PKG_RESOURCES       0x0200  /* Domain members share cpu pkg resources */
>>>  #define SD_SERIALIZE         0x0400  /* Only a single load balancing instance */
>>>  #define SD_ASYM_PACKING              0x0800  /* Place busy groups earlier in the domain */
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index 0b51ee3..224ec3b 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>>>                        SD_BALANCE_FORK |
>>>                        SD_BALANCE_EXEC |
>>>                        SD_SHARE_CPUPOWER |
>>> -                      SD_SHARE_PKG_RESOURCES)) {
>>> +                      SD_SHARE_PKG_RESOURCES |
>>> +                      SD_SHARE_POWERDOMAIN)) {
>>>               if (sd->groups != sd->groups->next)
>>>                       return 0;
>>>       }
>>> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>>>                               SD_BALANCE_EXEC |
>>>                               SD_SHARE_CPUPOWER |
>>>                               SD_SHARE_PKG_RESOURCES |
>>> -                             SD_PREFER_SIBLING);
>>> +                             SD_PREFER_SIBLING |
>>> +                             SD_SHARE_POWERDOMAIN);
>>>               if (nr_node_ids == 1)
>>>                       pflags &= ~SD_SERIALIZE;
>>>       }
>>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>>       (SD_SHARE_CPUPOWER |            \
>>>        SD_SHARE_PKG_RESOURCES |       \
>>>        SD_NUMA |                      \
>>> -      SD_ASYM_PACKING)
>>> +      SD_ASYM_PACKING |              \
>>> +      SD_SHARE_POWERDOMAIN)
>>>
>>>  static struct sched_domain *
>>>  sd_init(struct sched_domain_topology_level *tl, int cpu)
>>>
>>
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 1/7] sched: remove unused SCHED_INIT_NODE
  2014-03-18 17:56   ` Vincent Guittot
  (?)
@ 2014-03-19 11:07     ` James Hogan
  -1 siblings, 0 replies; 55+ messages in thread
From: James Hogan @ 2014-03-19 11:07 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz, mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, cmetcalf, benh, linux, linux-arm-kernel,
	linaro-kernel, linux-metag

[-- Attachment #1: Type: text/plain, Size: 1531 bytes --]

On 18/03/14 17:56, Vincent Guittot wrote:
> not used since new numa scheduler init sequence
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Applied to metag tree for v3.15,

Thanks for spotting it.

Cheers
James

> ---
>  arch/metag/include/asm/topology.h | 27 ---------------------------
>  1 file changed, 27 deletions(-)
> 
> diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
> index 8e9c0b3..e95f874 100644
> --- a/arch/metag/include/asm/topology.h
> +++ b/arch/metag/include/asm/topology.h
> @@ -3,33 +3,6 @@
>  
>  #ifdef CONFIG_NUMA
>  
> -/* sched_domains SD_NODE_INIT for Meta machines */
> -#define SD_NODE_INIT (struct sched_domain) {		\
> -	.parent			= NULL,			\
> -	.child			= NULL,			\
> -	.groups			= NULL,			\
> -	.min_interval		= 8,			\
> -	.max_interval		= 32,			\
> -	.busy_factor		= 32,			\
> -	.imbalance_pct		= 125,			\
> -	.cache_nice_tries	= 2,			\
> -	.busy_idx		= 3,			\
> -	.idle_idx		= 2,			\
> -	.newidle_idx		= 0,			\
> -	.wake_idx		= 0,			\
> -	.forkexec_idx		= 0,			\
> -	.flags			= SD_LOAD_BALANCE	\
> -				| SD_BALANCE_FORK	\
> -				| SD_BALANCE_EXEC	\
> -				| SD_BALANCE_NEWIDLE	\
> -				| SD_SERIALIZE,		\
> -	.last_balance		= jiffies,		\
> -	.balance_interval	= 1,			\
> -	.nr_balance_failed	= 0,			\
> -	.max_newidle_lb_cost	= 0,			\
> -	.next_decay_max_lb_cost	= jiffies,		\
> -}
> -
>  #define cpu_to_node(cpu)	((void)(cpu), 0)
>  #define parent_node(node)	((void)(node), 0)
>  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 1/7] sched: remove unused SCHED_INIT_NODE
@ 2014-03-19 11:07     ` James Hogan
  0 siblings, 0 replies; 55+ messages in thread
From: James Hogan @ 2014-03-19 11:07 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/03/14 17:56, Vincent Guittot wrote:
> not used since new numa scheduler init sequence
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>

Applied to metag tree for v3.15,

Thanks for spotting it.

Cheers
James

> ---
>  arch/metag/include/asm/topology.h | 27 ---------------------------
>  1 file changed, 27 deletions(-)
> 
> diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
> index 8e9c0b3..e95f874 100644
> --- a/arch/metag/include/asm/topology.h
> +++ b/arch/metag/include/asm/topology.h
> @@ -3,33 +3,6 @@
>  
>  #ifdef CONFIG_NUMA
>  
> -/* sched_domains SD_NODE_INIT for Meta machines */
> -#define SD_NODE_INIT (struct sched_domain) {		\
> -	.parent			= NULL,			\
> -	.child			= NULL,			\
> -	.groups			= NULL,			\
> -	.min_interval		= 8,			\
> -	.max_interval		= 32,			\
> -	.busy_factor		= 32,			\
> -	.imbalance_pct		= 125,			\
> -	.cache_nice_tries	= 2,			\
> -	.busy_idx		= 3,			\
> -	.idle_idx		= 2,			\
> -	.newidle_idx		= 0,			\
> -	.wake_idx		= 0,			\
> -	.forkexec_idx		= 0,			\
> -	.flags			= SD_LOAD_BALANCE	\
> -				| SD_BALANCE_FORK	\
> -				| SD_BALANCE_EXEC	\
> -				| SD_BALANCE_NEWIDLE	\
> -				| SD_SERIALIZE,		\
> -	.last_balance		= jiffies,		\
> -	.balance_interval	= 1,			\
> -	.nr_balance_failed	= 0,			\
> -	.max_newidle_lb_cost	= 0,			\
> -	.next_decay_max_lb_cost	= jiffies,		\
> -}
> -
>  #define cpu_to_node(cpu)	((void)(cpu), 0)
>  #define parent_node(node)	((void)(node), 0)
>  
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: OpenPGP digital signature
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20140319/cc4d03e2/attachment.sig>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 1/7] sched: remove unused SCHED_INIT_NODE
@ 2014-03-19 11:07     ` James Hogan
  0 siblings, 0 replies; 55+ messages in thread
From: James Hogan @ 2014-03-19 11:07 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: peterz-wEGCiKHe2LqWVfeAwA7xHQ, mingo-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dietmar.eggemann-5wv7dgnIgG8,
	preeti-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w,
	fenghua.yu-ral2JQCrhuEAvxtiuMwx3w,
	schwidefsky-tA70FqPdS9bQT0dZR+AlfA,
	cmetcalf-kv+TWInifGbQT0dZR+AlfA,
	benh-XVmvHMARGAS8U2dJNN8I7kB+6BGkLq7r,
	linux-lFZ/pmaqli7XmaaqVzeoHQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linaro-kernel-cunTk1MwBs8s++Sfvej+rw, linux-metag

[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]

On 18/03/14 17:56, Vincent Guittot wrote:
> not used since new numa scheduler init sequence
> 
> Signed-off-by: Vincent Guittot <vincent.guittot-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

Applied to metag tree for v3.15,

Thanks for spotting it.

Cheers
James

> ---
>  arch/metag/include/asm/topology.h | 27 ---------------------------
>  1 file changed, 27 deletions(-)
> 
> diff --git a/arch/metag/include/asm/topology.h b/arch/metag/include/asm/topology.h
> index 8e9c0b3..e95f874 100644
> --- a/arch/metag/include/asm/topology.h
> +++ b/arch/metag/include/asm/topology.h
> @@ -3,33 +3,6 @@
>  
>  #ifdef CONFIG_NUMA
>  
> -/* sched_domains SD_NODE_INIT for Meta machines */
> -#define SD_NODE_INIT (struct sched_domain) {		\
> -	.parent			= NULL,			\
> -	.child			= NULL,			\
> -	.groups			= NULL,			\
> -	.min_interval		= 8,			\
> -	.max_interval		= 32,			\
> -	.busy_factor		= 32,			\
> -	.imbalance_pct		= 125,			\
> -	.cache_nice_tries	= 2,			\
> -	.busy_idx		= 3,			\
> -	.idle_idx		= 2,			\
> -	.newidle_idx		= 0,			\
> -	.wake_idx		= 0,			\
> -	.forkexec_idx		= 0,			\
> -	.flags			= SD_LOAD_BALANCE	\
> -				| SD_BALANCE_FORK	\
> -				| SD_BALANCE_EXEC	\
> -				| SD_BALANCE_NEWIDLE	\
> -				| SD_SERIALIZE,		\
> -	.last_balance		= jiffies,		\
> -	.balance_interval	= 1,			\
> -	.nr_balance_failed	= 0,			\
> -	.max_newidle_lb_cost	= 0,			\
> -	.next_decay_max_lb_cost	= jiffies,		\
> -}
> -
>  #define cpu_to_node(cpu)	((void)(cpu), 0)
>  #define parent_node(node)	((void)(node), 0)
>  
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19 11:27     ` Dietmar Eggemann
  -1 siblings, 0 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2014-03-19 11:27 UTC (permalink / raw)
  To: Vincent Guittot, peterz, mingo, linux-kernel, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel
  Cc: linaro-kernel

On 18/03/14 17:56, Vincent Guittot wrote:
> We replace the old way to configure the scheduler topology with a new method
> which enables a platform to declare additionnal level (if needed).
> 
> We still have a default topology table definition that can be used by platform
> that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
> be overwritten by an arch which wants to add new level where a load balance
> make sense like BOOK or powergating level.
> 
> For each level, we need a function pointer that returns cpumask for each cpu,
> a function pointer that returns the flags for the level and a name. Only flags
> that describe topology, can be set by an architecture. The current topology
> flags are:
>  SD_SHARE_CPUPOWER
>  SD_SHARE_PKG_RESOURCES
>  SD_NUMA
>  SD_ASYM_PACKING
> 
> Then, each level must be a subset on the next one. The build sequence of the
> sched_domain will take care of removing useless levels like those with 1 CPU
> and those with the same CPU span and relevant information for load balancing
> than its child.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  arch/ia64/include/asm/topology.h |  24 ----
>  arch/s390/include/asm/topology.h |   2 -
>  arch/tile/include/asm/topology.h |  33 ------
>  include/linux/sched.h            |  48 ++++++++
>  include/linux/topology.h         | 128 +++------------------
>  kernel/sched/core.c              | 235 ++++++++++++++++++++-------------------
>  6 files changed, 183 insertions(+), 287 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
> index 5cb55a1..3202aa7 100644
> --- a/arch/ia64/include/asm/topology.h
> +++ b/arch/ia64/include/asm/topology.h
> @@ -46,30 +46,6 @@
> 
>  void build_cpu_to_node_map(void);
> 
> -#define SD_CPU_INIT (struct sched_domain) {            \
> -       .parent                 = NULL,                 \
> -       .child                  = NULL,                 \
> -       .groups                 = NULL,                 \
> -       .min_interval           = 1,                    \
> -       .max_interval           = 4,                    \
> -       .busy_factor            = 64,                   \
> -       .imbalance_pct          = 125,                  \
> -       .cache_nice_tries       = 2,                    \
> -       .busy_idx               = 2,                    \
> -       .idle_idx               = 1,                    \
> -       .newidle_idx            = 0,                    \
> -       .wake_idx               = 0,                    \
> -       .forkexec_idx           = 0,                    \
> -       .flags                  = SD_LOAD_BALANCE       \
> -                               | SD_BALANCE_NEWIDLE    \
> -                               | SD_BALANCE_EXEC       \
> -                               | SD_BALANCE_FORK       \
> -                               | SD_WAKE_AFFINE,       \
> -       .last_balance           = jiffies,              \
> -       .balance_interval       = 1,                    \
> -       .nr_balance_failed      = 0,                    \
> -}
> -
>  #endif /* CONFIG_NUMA */
> 
>  #ifdef CONFIG_SMP
> diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
> index 05425b1..07763bd 100644
> --- a/arch/s390/include/asm/topology.h
> +++ b/arch/s390/include/asm/topology.h
> @@ -64,8 +64,6 @@ static inline void s390_init_cpu_topology(void)
>  };
>  #endif
> 
> -#define SD_BOOK_INIT   SD_CPU_INIT
> -
>  #include <asm-generic/topology.h>
> 
>  #endif /* _ASM_S390_TOPOLOGY_H */
> diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
> index d15c0d8..9383118 100644
> --- a/arch/tile/include/asm/topology.h
> +++ b/arch/tile/include/asm/topology.h
> @@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
>  /* For now, use numa node -1 for global allocation. */
>  #define pcibus_to_node(bus)            ((void)(bus), -1)
> 
> -/*
> - * TILE architecture has many cores integrated in one processor, so we need
> - * setup bigger balance_interval for both CPU/NODE scheduling domains to
> - * reduce process scheduling costs.
> - */
> -
> -/* sched_domains SD_CPU_INIT for TILE architecture */
> -#define SD_CPU_INIT (struct sched_domain) {                            \
> -       .min_interval           = 4,                                    \
> -       .max_interval           = 128,                                  \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 125,                                  \
> -       .cache_nice_tries       = 1,                                    \
> -       .busy_idx               = 2,                                    \
> -       .idle_idx               = 1,                                    \
> -       .newidle_idx            = 0,                                    \
> -       .wake_idx               = 0,                                    \
> -       .forkexec_idx           = 0,                                    \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 0*SD_WAKE_AFFINE                      \
> -                               | 0*SD_SHARE_CPUPOWER                   \
> -                               | 0*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 32,                                   \
> -}
> -
>  /* By definition, we create nodes based on online memory. */
>  #define node_has_online_mem(nid) 1
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 825ed83..4db592a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -870,6 +870,20 @@ enum cpu_idle_type {
> 
>  extern int __weak arch_sd_sibiling_asym_packing(void);
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const int cpu_smt_flags(void)
> +{
> +       return SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
> +#ifdef CONFIG_SCHED_MC
> +static inline const int cpu_core_flags(void)
> +{
> +       return SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
>  struct sched_domain_attr {
>         int relax_domain_level;
>  };
> @@ -976,6 +990,38 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
> 
>  bool cpus_share_cache(int this_cpu, int that_cpu);
> 
> +typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> +typedef const int (*sched_domain_flags_f)(void);
> +
> +#define SDTL_OVERLAP   0x01
> +
> +struct sd_data {
> +       struct sched_domain **__percpu sd;
> +       struct sched_group **__percpu sg;
> +       struct sched_group_power **__percpu sgp;
> +};
> +
> +struct sched_domain_topology_level {
> +       sched_domain_mask_f mask;
> +       sched_domain_flags_f sd_flags;
> +       int                 flags;
> +       int                 numa_level;
> +       struct sd_data      data;
> +#ifdef CONFIG_SCHED_DEBUG
> +       char                *name;
> +#endif
> +};
> +
> +extern struct sched_domain_topology_level *sched_domain_topology;
> +
> +extern void set_sched_topology(struct sched_domain_topology_level *tl);
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +# define SD_INIT_NAME(type)            .name = #type
> +#else
> +# define SD_INIT_NAME(type)
> +#endif
> +
>  #else /* CONFIG_SMP */
> 
>  struct sched_domain_attr;
> @@ -991,6 +1037,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)
>         return true;
>  }
> 
> +static inline void set_sched_topology(struct sched_domain_topology_level *tl) { }
> +
>  #endif /* !CONFIG_SMP */
> 
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 12ae6ce..3a9db05 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
>  #define PENALTY_FOR_NODE_WITH_CPUS     (1)
>  #endif
> 
> -/*
> - * Below are the 3 major initializers used in building sched_domains:
> - * SD_SIBLING_INIT, for SMT domains
> - * SD_CPU_INIT, for SMP domains
> - *
> - * Any architecture that cares to do any tuning to these values should do so
> - * by defining their own arch-specific initializer in include/asm/topology.h.
> - * A definition there will automagically override these default initializers
> - * and allow arch-specific performance tuning of sched_domains.
> - * (Only non-zero and non-null fields need be specified.)
> - */
> -
> -#ifdef CONFIG_SCHED_SMT
> -/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
> - * so can't we drop this in favor of CONFIG_SCHED_SMT?
> - */
> -#define ARCH_HAS_SCHED_WAKE_IDLE
> -/* Common values for SMT siblings */
> -#ifndef SD_SIBLING_INIT
> -#define SD_SIBLING_INIT (struct sched_domain) {                                \
> -       .min_interval           = 1,                                    \
> -       .max_interval           = 2,                                    \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 110,                                  \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 1*SD_WAKE_AFFINE                      \
> -                               | 1*SD_SHARE_CPUPOWER                   \
> -                               | 1*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               | 0*SD_PREFER_SIBLING                   \
> -                               | arch_sd_sibling_asym_packing()        \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 1,                                    \
> -       .smt_gain               = 1178, /* 15% */                       \
> -       .max_newidle_lb_cost    = 0,                                    \
> -       .next_decay_max_lb_cost = jiffies,                              \
> -}
> -#endif
> -#endif /* CONFIG_SCHED_SMT */
> -
> -#ifdef CONFIG_SCHED_MC
> -/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
> -#ifndef SD_MC_INIT
> -#define SD_MC_INIT (struct sched_domain) {                             \
> -       .min_interval           = 1,                                    \
> -       .max_interval           = 4,                                    \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 125,                                  \
> -       .cache_nice_tries       = 1,                                    \
> -       .busy_idx               = 2,                                    \
> -       .wake_idx               = 0,                                    \
> -       .forkexec_idx           = 0,                                    \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 1*SD_WAKE_AFFINE                      \
> -                               | 0*SD_SHARE_CPUPOWER                   \
> -                               | 1*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 1,                                    \
> -       .max_newidle_lb_cost    = 0,                                    \
> -       .next_decay_max_lb_cost = jiffies,                              \
> -}
> -#endif
> -#endif /* CONFIG_SCHED_MC */
> -
> -/* Common values for CPUs */
> -#ifndef SD_CPU_INIT
> -#define SD_CPU_INIT (struct sched_domain) {                            \
> -       .min_interval           = 1,                                    \
> -       .max_interval           = 4,                                    \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 125,                                  \
> -       .cache_nice_tries       = 1,                                    \
> -       .busy_idx               = 2,                                    \
> -       .idle_idx               = 1,                                    \
> -       .newidle_idx            = 0,                                    \
> -       .wake_idx               = 0,                                    \
> -       .forkexec_idx           = 0,                                    \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 1*SD_WAKE_AFFINE                      \
> -                               | 0*SD_SHARE_CPUPOWER                   \
> -                               | 0*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               | 1*SD_PREFER_SIBLING                   \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 1,                                    \
> -       .max_newidle_lb_cost    = 0,                                    \
> -       .next_decay_max_lb_cost = jiffies,                              \
> -}
> -#endif
> -
> -#ifdef CONFIG_SCHED_BOOK
> -#ifndef SD_BOOK_INIT
> -#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
> -#endif
> -#endif /* CONFIG_SCHED_BOOK */
> -
>  #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
>  DECLARE_PER_CPU(int, numa_node);
> 
> @@ -295,4 +180,17 @@ static inline int cpu_to_mem(int cpu)
>  #define topology_core_cpumask(cpu)             cpumask_of(cpu)
>  #endif
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const struct cpumask *cpu_smt_mask(int cpu)
> +{
> +       return topology_thread_cpumask(cpu);
> +}
> +#endif
> +
> +static inline const struct cpumask *cpu_cpu_mask(int cpu)
> +{
> +       return cpumask_of_node(cpu_to_node(cpu));
> +}
> +
> +
>  #endif /* _LINUX_TOPOLOGY_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ae365aa..3397bcb 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5603,17 +5603,6 @@ static int __init isolated_cpu_setup(char *str)
> 
>  __setup("isolcpus=", isolated_cpu_setup);
> 
> -static const struct cpumask *cpu_cpu_mask(int cpu)
> -{
> -       return cpumask_of_node(cpu_to_node(cpu));
> -}
> -
> -struct sd_data {
> -       struct sched_domain **__percpu sd;
> -       struct sched_group **__percpu sg;
> -       struct sched_group_power **__percpu sgp;
> -};
> -
>  struct s_data {
>         struct sched_domain ** __percpu sd;
>         struct root_domain      *rd;
> @@ -5626,21 +5615,6 @@ enum s_alloc {
>         sa_none,
>  };
> 
> -struct sched_domain_topology_level;
> -
> -typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
> -typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> -
> -#define SDTL_OVERLAP   0x01
> -
> -struct sched_domain_topology_level {
> -       sched_domain_init_f init;
> -       sched_domain_mask_f mask;
> -       int                 flags;
> -       int                 numa_level;
> -       struct sd_data      data;
> -};
> -
>  /*
>   * Build an iteration mask that can exclude certain CPUs from the upwards
>   * domain traversal.
> @@ -5869,34 +5843,6 @@ int __weak arch_sd_sibling_asym_packing(void)
>   * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
>   */
> 
> -#ifdef CONFIG_SCHED_DEBUG
> -# define SD_INIT_NAME(sd, type)                sd->name = #type
> -#else
> -# define SD_INIT_NAME(sd, type)                do { } while (0)
> -#endif
> -
> -#define SD_INIT_FUNC(type)                                             \
> -static noinline struct sched_domain *                                  \
> -sd_init_##type(struct sched_domain_topology_level *tl, int cpu)        \
> -{                                                                      \
> -       struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);       \
> -       *sd = SD_##type##_INIT;                                         \
> -       SD_INIT_NAME(sd, type);                                         \
> -       sd->private = &tl->data;                                        \
> -       return sd;                                                      \
> -}
> -
> -SD_INIT_FUNC(CPU)
> -#ifdef CONFIG_SCHED_SMT
> - SD_INIT_FUNC(SIBLING)
> -#endif
> -#ifdef CONFIG_SCHED_MC
> - SD_INIT_FUNC(MC)
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> - SD_INIT_FUNC(BOOK)
> -#endif
> -
>  static int default_relax_domain_level = -1;
>  int sched_domain_level_max;
> 
> @@ -5984,97 +5930,156 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
>                 *per_cpu_ptr(sdd->sgp, cpu) = NULL;
>  }
> 
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *cpu_smt_mask(int cpu)
> -{
> -       return topology_thread_cpumask(cpu);
> -}
> -#endif
> -
> -/*
> - * Topology list, bottom-up.
> - */
> -static struct sched_domain_topology_level default_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -       { sd_init_SIBLING, cpu_smt_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_MC
> -       { sd_init_MC, cpu_coregroup_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> -       { sd_init_BOOK, cpu_book_mask, },
> -#endif
> -       { sd_init_CPU, cpu_cpu_mask, },
> -       { NULL, },
> -};
> -
> -static struct sched_domain_topology_level *sched_domain_topology = default_topology;
> -
> -#define for_each_sd_topology(tl)                       \
> -       for (tl = sched_domain_topology; tl->init; tl++)
> -
>  #ifdef CONFIG_NUMA
> -
>  static int sched_domains_numa_levels;
>  static int *sched_domains_numa_distance;
>  static struct cpumask ***sched_domains_numa_masks;
>  static int sched_domains_curr_level;
> +#endif
> 
> -static inline int sd_local_flags(int level)
> -{
> -       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
> -               return 0;
> -
> -       return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
> -}
> +/*
> + * SD_flags allowed in topology descriptions.
> + *
> + * SD_SHARE_CPUPOWER      - describes SMT topologies
> + * SD_SHARE_PKG_RESOURCES - describes shared caches
> + * SD_NUMA                - describes NUMA topologies
> + *
> + * Odd one out:
> + * SD_ASYM_PACKING        - describes SMT quirks
> + */
> +#define TOPOLOGY_SD_FLAGS              \
> +       (SD_SHARE_CPUPOWER |            \
> +        SD_SHARE_PKG_RESOURCES |       \
> +        SD_NUMA |                      \
> +        SD_ASYM_PACKING)
> 
>  static struct sched_domain *
> -sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
> +sd_init(struct sched_domain_topology_level *tl, int cpu)
>  {
>         struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
> -       int level = tl->numa_level;
> -       int sd_weight = cpumask_weight(
> -                       sched_domains_numa_masks[level][cpu_to_node(cpu)]);
> +       int sd_weight, sd_flags = 0;
> +
> +#ifdef CONFIG_NUMA
> +       /*
> +        * Ugly hack to pass state to sd_numa_mask()...
> +        */
> +       sched_domains_curr_level = tl->numa_level;
> +#endif
> +
> +       sd_weight = cpumask_weight(tl->mask(cpu));
> +
> +       if (tl->sd_flags)
> +               sd_flags = (*tl->sd_flags)();
> +       if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> +                       "wrong sd_flags in topology description\n"))
> +               sd_flags &= ~TOPOLOGY_SD_FLAGS;
> 
>         *sd = (struct sched_domain){
>                 .min_interval           = sd_weight,
>                 .max_interval           = 2*sd_weight,
>                 .busy_factor            = 32,
>                 .imbalance_pct          = 125,
> -               .cache_nice_tries       = 2,
> -               .busy_idx               = 3,
> -               .idle_idx               = 2,
> +
> +               .cache_nice_tries       = 0,
> +               .busy_idx               = 0,
> +               .idle_idx               = 0,
>                 .newidle_idx            = 0,
>                 .wake_idx               = 0,
>                 .forkexec_idx           = 0,
> 
>                 .flags                  = 1*SD_LOAD_BALANCE
>                                         | 1*SD_BALANCE_NEWIDLE
> -                                       | 0*SD_BALANCE_EXEC
> -                                       | 0*SD_BALANCE_FORK
> +                                       | 1*SD_BALANCE_EXEC
> +                                       | 1*SD_BALANCE_FORK
>                                         | 0*SD_BALANCE_WAKE
> -                                       | 0*SD_WAKE_AFFINE
> +                                       | 1*SD_WAKE_AFFINE
>                                         | 0*SD_SHARE_CPUPOWER
>                                         | 0*SD_SHARE_PKG_RESOURCES
> -                                       | 1*SD_SERIALIZE
> +                                       | 0*SD_SERIALIZE
>                                         | 0*SD_PREFER_SIBLING
> -                                       | 1*SD_NUMA
> -                                       | sd_local_flags(level)
> +                                       | 0*SD_NUMA
> +                                       | sd_flags
>                                         ,
> +
>                 .last_balance           = jiffies,
>                 .balance_interval       = sd_weight,
> +               .smt_gain               = 0,
> +               .max_newidle_lb_cost    = 0,
> +               .next_decay_max_lb_cost = jiffies,
> +#ifdef CONFIG_SCHED_DEBUG
> +               .name                   = tl->name,
> +#endif
>         };
> -       SD_INIT_NAME(sd, NUMA);
> -       sd->private = &tl->data;
> 
>         /*
> -        * Ugly hack to pass state to sd_numa_mask()...
> +        * Convert topological properties into behaviour.
>          */
> -       sched_domains_curr_level = tl->numa_level;
> +
> +       if (sd->flags & SD_SHARE_CPUPOWER) {
> +               sd->imbalance_pct = 110;
> +               sd->smt_gain = 1178; /* ~15% */
> +               sd->flags |= arch_sd_sibling_asym_packing();
> +
> +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> +               sd->imbalance_pct = 117;
> +               sd->cache_nice_tries = 1;
> +               sd->busy_idx = 2;
> +
> +#ifdef CONFIG_NUMA
> +       } else if (sd->flags & SD_NUMA) {
> +               sd->cache_nice_tries = 2;
> +               sd->busy_idx = 3;
> +               sd->idle_idx = 2;
> +
> +               sd->flags |= SD_SERIALIZE;
> +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
> +                       sd->flags &= ~(SD_BALANCE_EXEC |
> +                                      SD_BALANCE_FORK |
> +                                      SD_WAKE_AFFINE);
> +               }
> +
> +#endif
> +       } else {
> +               sd->flags |= SD_PREFER_SIBLING;
> +               sd->cache_nice_tries = 1;
> +               sd->busy_idx = 2;
> +               sd->idle_idx = 1;
> +       }

This 'if ... else statement' is still a weak point from the perspective
of making the code robust:

On TC2 w/ the following change in cpu_corepower_mask()

 const struct cpumask *cpu_corepower_mask(int cpu)
 {
-       return &cpu_topology[cpu].thread_sibling;
+       return cpu_topology[cpu].socket_id ?
&cpu_topology[cpu].thread_sibling :
+                       &cpu_topology[cpu].core_sibling;
 }

I get a sane set-up:

root@linaro-developer:~# cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
GMC
DIE
GMC
DIE
MC
DIE
MC
DIE
MC
DIE
root@linaro-developer:~# cat
/proc/sys/kernel/sched_domain/cpu*/domain*/flags
815
4143
815
4143
559
4143
559
4143
559
4143

w/ 815 (0x32F : SD_LOAD_BALANCE SD_BALANCE_NEWIDLE SD_BALANCE_EXEC
SD_BALANCE_FORK SD_WAKE_AFFINE *SD_SHARE_POWERDOMAIN*
SD_SHARE_PKG_RESOURCES)

w/ 559 (0x22F : SD_LOAD_BALANCE SD_BALANCE_NEWIDLE SD_BALANCE_EXEC
SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES)

But when I introduce the following error into the arch specific
cpu_corepower_flags() function

 static inline const int cpu_corepower_flags(void)
 {
-       return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
+       return SD_SHARE_POWERDOMAIN;
 }

the GMC related sd's for CPU0,1 are initialized as DIE in sd_init()
resulting in this wrong set-up w/o any warning/error message:

root@linaro-developer:~# cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
GMC
DIE
GMC
DIE
MC
DIE
MC
DIE
MC
DIE
root@linaro-developer:~# cat
/proc/sys/kernel/sched_domain/cpu*/domain*/flags
4399
4143
4399
4143
559
4143
559
4143
559
4143

w/ 4399 (0x112f : SD_LOAD_BALANCE SD_BALANCE_NEWIDLE SD_BALANCE_EXEC
SD_BALANCE_FORK SD_WAKE_AFFINE *SD_SHARE_POWERDOMAIN* SD_PREFER_SIBLING

Is there a way to check that MC and GMC have to have
SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?

-- Dietmar

> +
> +       sd->private = &tl->data;
> 
>         return sd;
>  }
> 
> +/*
> + * Topology list, bottom-up.
> + */
> +static struct sched_domain_topology_level default_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +#ifdef CONFIG_SCHED_MC
> +       { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> +#endif
> +#ifdef CONFIG_SCHED_BOOK
> +       { cpu_book_mask, SD_INIT_NAME(BOOK) },
> +#endif
> +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +       { NULL, },
> +};
> +
> +struct sched_domain_topology_level *sched_domain_topology = default_topology;
> +
> +#define for_each_sd_topology(tl)                       \
> +       for (tl = sched_domain_topology; tl->mask; tl++)
> +
> +void set_sched_topology(struct sched_domain_topology_level *tl)
> +{
> +       sched_domain_topology = tl;
> +}
> +
> +#ifdef CONFIG_NUMA
> +
>  static const struct cpumask *sd_numa_mask(int cpu)
>  {
>         return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
> @@ -6218,7 +6223,10 @@ static void sched_init_numa(void)
>                 }
>         }
> 
> -       tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
> +       /* Compute default topology size */
> +       for (i = 0; sched_domain_topology[i].mask; i++);
> +
> +       tl = kzalloc((i + level) *
>                         sizeof(struct sched_domain_topology_level), GFP_KERNEL);
>         if (!tl)
>                 return;
> @@ -6226,18 +6234,19 @@ static void sched_init_numa(void)
>         /*
>          * Copy the default topology bits..
>          */
> -       for (i = 0; default_topology[i].init; i++)
> -               tl[i] = default_topology[i];
> +       for (i = 0; sched_domain_topology[i].mask; i++)
> +               tl[i] = sched_domain_topology[i];
> 
>         /*
>          * .. and append 'j' levels of NUMA goodness.
>          */
>         for (j = 0; j < level; i++, j++) {
>                 tl[i] = (struct sched_domain_topology_level){
> -                       .init = sd_numa_init,
>                         .mask = sd_numa_mask,
> +                       .sd_flags = SD_NUMA,
>                         .flags = SDTL_OVERLAP,
>                         .numa_level = j,
> +                       SD_INIT_NAME(NUMA)
>                 };
>         }
> 
> @@ -6395,7 +6404,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
>                 const struct cpumask *cpu_map, struct sched_domain_attr *attr,
>                 struct sched_domain *child, int cpu)
>  {
> -       struct sched_domain *sd = tl->init(tl, cpu);
> +       struct sched_domain *sd = sd_init(tl, cpu);
>         if (!sd)
>                 return child;
> 
> --
> 1.9.0
> 
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19 11:27     ` Dietmar Eggemann
  0 siblings, 0 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2014-03-19 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 18/03/14 17:56, Vincent Guittot wrote:
> We replace the old way to configure the scheduler topology with a new method
> which enables a platform to declare additionnal level (if needed).
> 
> We still have a default topology table definition that can be used by platform
> that don't want more level than the SMT, MC, CPU and NUMA ones. This table can
> be overwritten by an arch which wants to add new level where a load balance
> make sense like BOOK or powergating level.
> 
> For each level, we need a function pointer that returns cpumask for each cpu,
> a function pointer that returns the flags for the level and a name. Only flags
> that describe topology, can be set by an architecture. The current topology
> flags are:
>  SD_SHARE_CPUPOWER
>  SD_SHARE_PKG_RESOURCES
>  SD_NUMA
>  SD_ASYM_PACKING
> 
> Then, each level must be a subset on the next one. The build sequence of the
> sched_domain will take care of removing useless levels like those with 1 CPU
> and those with the same CPU span and relevant information for load balancing
> than its child.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  arch/ia64/include/asm/topology.h |  24 ----
>  arch/s390/include/asm/topology.h |   2 -
>  arch/tile/include/asm/topology.h |  33 ------
>  include/linux/sched.h            |  48 ++++++++
>  include/linux/topology.h         | 128 +++------------------
>  kernel/sched/core.c              | 235 ++++++++++++++++++++-------------------
>  6 files changed, 183 insertions(+), 287 deletions(-)
> 
> diff --git a/arch/ia64/include/asm/topology.h b/arch/ia64/include/asm/topology.h
> index 5cb55a1..3202aa7 100644
> --- a/arch/ia64/include/asm/topology.h
> +++ b/arch/ia64/include/asm/topology.h
> @@ -46,30 +46,6 @@
> 
>  void build_cpu_to_node_map(void);
> 
> -#define SD_CPU_INIT (struct sched_domain) {            \
> -       .parent                 = NULL,                 \
> -       .child                  = NULL,                 \
> -       .groups                 = NULL,                 \
> -       .min_interval           = 1,                    \
> -       .max_interval           = 4,                    \
> -       .busy_factor            = 64,                   \
> -       .imbalance_pct          = 125,                  \
> -       .cache_nice_tries       = 2,                    \
> -       .busy_idx               = 2,                    \
> -       .idle_idx               = 1,                    \
> -       .newidle_idx            = 0,                    \
> -       .wake_idx               = 0,                    \
> -       .forkexec_idx           = 0,                    \
> -       .flags                  = SD_LOAD_BALANCE       \
> -                               | SD_BALANCE_NEWIDLE    \
> -                               | SD_BALANCE_EXEC       \
> -                               | SD_BALANCE_FORK       \
> -                               | SD_WAKE_AFFINE,       \
> -       .last_balance           = jiffies,              \
> -       .balance_interval       = 1,                    \
> -       .nr_balance_failed      = 0,                    \
> -}
> -
>  #endif /* CONFIG_NUMA */
> 
>  #ifdef CONFIG_SMP
> diff --git a/arch/s390/include/asm/topology.h b/arch/s390/include/asm/topology.h
> index 05425b1..07763bd 100644
> --- a/arch/s390/include/asm/topology.h
> +++ b/arch/s390/include/asm/topology.h
> @@ -64,8 +64,6 @@ static inline void s390_init_cpu_topology(void)
>  };
>  #endif
> 
> -#define SD_BOOK_INIT   SD_CPU_INIT
> -
>  #include <asm-generic/topology.h>
> 
>  #endif /* _ASM_S390_TOPOLOGY_H */
> diff --git a/arch/tile/include/asm/topology.h b/arch/tile/include/asm/topology.h
> index d15c0d8..9383118 100644
> --- a/arch/tile/include/asm/topology.h
> +++ b/arch/tile/include/asm/topology.h
> @@ -44,39 +44,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
>  /* For now, use numa node -1 for global allocation. */
>  #define pcibus_to_node(bus)            ((void)(bus), -1)
> 
> -/*
> - * TILE architecture has many cores integrated in one processor, so we need
> - * setup bigger balance_interval for both CPU/NODE scheduling domains to
> - * reduce process scheduling costs.
> - */
> -
> -/* sched_domains SD_CPU_INIT for TILE architecture */
> -#define SD_CPU_INIT (struct sched_domain) {                            \
> -       .min_interval           = 4,                                    \
> -       .max_interval           = 128,                                  \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 125,                                  \
> -       .cache_nice_tries       = 1,                                    \
> -       .busy_idx               = 2,                                    \
> -       .idle_idx               = 1,                                    \
> -       .newidle_idx            = 0,                                    \
> -       .wake_idx               = 0,                                    \
> -       .forkexec_idx           = 0,                                    \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 0*SD_WAKE_AFFINE                      \
> -                               | 0*SD_SHARE_CPUPOWER                   \
> -                               | 0*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 32,                                   \
> -}
> -
>  /* By definition, we create nodes based on online memory. */
>  #define node_has_online_mem(nid) 1
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 825ed83..4db592a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -870,6 +870,20 @@ enum cpu_idle_type {
> 
>  extern int __weak arch_sd_sibiling_asym_packing(void);
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const int cpu_smt_flags(void)
> +{
> +       return SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
> +#ifdef CONFIG_SCHED_MC
> +static inline const int cpu_core_flags(void)
> +{
> +       return SD_SHARE_PKG_RESOURCES;
> +}
> +#endif
> +
>  struct sched_domain_attr {
>         int relax_domain_level;
>  };
> @@ -976,6 +990,38 @@ void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
> 
>  bool cpus_share_cache(int this_cpu, int that_cpu);
> 
> +typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> +typedef const int (*sched_domain_flags_f)(void);
> +
> +#define SDTL_OVERLAP   0x01
> +
> +struct sd_data {
> +       struct sched_domain **__percpu sd;
> +       struct sched_group **__percpu sg;
> +       struct sched_group_power **__percpu sgp;
> +};
> +
> +struct sched_domain_topology_level {
> +       sched_domain_mask_f mask;
> +       sched_domain_flags_f sd_flags;
> +       int                 flags;
> +       int                 numa_level;
> +       struct sd_data      data;
> +#ifdef CONFIG_SCHED_DEBUG
> +       char                *name;
> +#endif
> +};
> +
> +extern struct sched_domain_topology_level *sched_domain_topology;
> +
> +extern void set_sched_topology(struct sched_domain_topology_level *tl);
> +
> +#ifdef CONFIG_SCHED_DEBUG
> +# define SD_INIT_NAME(type)            .name = #type
> +#else
> +# define SD_INIT_NAME(type)
> +#endif
> +
>  #else /* CONFIG_SMP */
> 
>  struct sched_domain_attr;
> @@ -991,6 +1037,8 @@ static inline bool cpus_share_cache(int this_cpu, int that_cpu)
>         return true;
>  }
> 
> +static inline void set_sched_topology(struct sched_domain_topology_level *tl) { }
> +
>  #endif /* !CONFIG_SMP */
> 
> 
> diff --git a/include/linux/topology.h b/include/linux/topology.h
> index 12ae6ce..3a9db05 100644
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -66,121 +66,6 @@ int arch_update_cpu_topology(void);
>  #define PENALTY_FOR_NODE_WITH_CPUS     (1)
>  #endif
> 
> -/*
> - * Below are the 3 major initializers used in building sched_domains:
> - * SD_SIBLING_INIT, for SMT domains
> - * SD_CPU_INIT, for SMP domains
> - *
> - * Any architecture that cares to do any tuning to these values should do so
> - * by defining their own arch-specific initializer in include/asm/topology.h.
> - * A definition there will automagically override these default initializers
> - * and allow arch-specific performance tuning of sched_domains.
> - * (Only non-zero and non-null fields need be specified.)
> - */
> -
> -#ifdef CONFIG_SCHED_SMT
> -/* MCD - Do we really need this?  It is always on if CONFIG_SCHED_SMT is,
> - * so can't we drop this in favor of CONFIG_SCHED_SMT?
> - */
> -#define ARCH_HAS_SCHED_WAKE_IDLE
> -/* Common values for SMT siblings */
> -#ifndef SD_SIBLING_INIT
> -#define SD_SIBLING_INIT (struct sched_domain) {                                \
> -       .min_interval           = 1,                                    \
> -       .max_interval           = 2,                                    \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 110,                                  \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 1*SD_WAKE_AFFINE                      \
> -                               | 1*SD_SHARE_CPUPOWER                   \
> -                               | 1*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               | 0*SD_PREFER_SIBLING                   \
> -                               | arch_sd_sibling_asym_packing()        \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 1,                                    \
> -       .smt_gain               = 1178, /* 15% */                       \
> -       .max_newidle_lb_cost    = 0,                                    \
> -       .next_decay_max_lb_cost = jiffies,                              \
> -}
> -#endif
> -#endif /* CONFIG_SCHED_SMT */
> -
> -#ifdef CONFIG_SCHED_MC
> -/* Common values for MC siblings. for now mostly derived from SD_CPU_INIT */
> -#ifndef SD_MC_INIT
> -#define SD_MC_INIT (struct sched_domain) {                             \
> -       .min_interval           = 1,                                    \
> -       .max_interval           = 4,                                    \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 125,                                  \
> -       .cache_nice_tries       = 1,                                    \
> -       .busy_idx               = 2,                                    \
> -       .wake_idx               = 0,                                    \
> -       .forkexec_idx           = 0,                                    \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 1*SD_WAKE_AFFINE                      \
> -                               | 0*SD_SHARE_CPUPOWER                   \
> -                               | 1*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 1,                                    \
> -       .max_newidle_lb_cost    = 0,                                    \
> -       .next_decay_max_lb_cost = jiffies,                              \
> -}
> -#endif
> -#endif /* CONFIG_SCHED_MC */
> -
> -/* Common values for CPUs */
> -#ifndef SD_CPU_INIT
> -#define SD_CPU_INIT (struct sched_domain) {                            \
> -       .min_interval           = 1,                                    \
> -       .max_interval           = 4,                                    \
> -       .busy_factor            = 64,                                   \
> -       .imbalance_pct          = 125,                                  \
> -       .cache_nice_tries       = 1,                                    \
> -       .busy_idx               = 2,                                    \
> -       .idle_idx               = 1,                                    \
> -       .newidle_idx            = 0,                                    \
> -       .wake_idx               = 0,                                    \
> -       .forkexec_idx           = 0,                                    \
> -                                                                       \
> -       .flags                  = 1*SD_LOAD_BALANCE                     \
> -                               | 1*SD_BALANCE_NEWIDLE                  \
> -                               | 1*SD_BALANCE_EXEC                     \
> -                               | 1*SD_BALANCE_FORK                     \
> -                               | 0*SD_BALANCE_WAKE                     \
> -                               | 1*SD_WAKE_AFFINE                      \
> -                               | 0*SD_SHARE_CPUPOWER                   \
> -                               | 0*SD_SHARE_PKG_RESOURCES              \
> -                               | 0*SD_SERIALIZE                        \
> -                               | 1*SD_PREFER_SIBLING                   \
> -                               ,                                       \
> -       .last_balance           = jiffies,                              \
> -       .balance_interval       = 1,                                    \
> -       .max_newidle_lb_cost    = 0,                                    \
> -       .next_decay_max_lb_cost = jiffies,                              \
> -}
> -#endif
> -
> -#ifdef CONFIG_SCHED_BOOK
> -#ifndef SD_BOOK_INIT
> -#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
> -#endif
> -#endif /* CONFIG_SCHED_BOOK */
> -
>  #ifdef CONFIG_USE_PERCPU_NUMA_NODE_ID
>  DECLARE_PER_CPU(int, numa_node);
> 
> @@ -295,4 +180,17 @@ static inline int cpu_to_mem(int cpu)
>  #define topology_core_cpumask(cpu)             cpumask_of(cpu)
>  #endif
> 
> +#ifdef CONFIG_SCHED_SMT
> +static inline const struct cpumask *cpu_smt_mask(int cpu)
> +{
> +       return topology_thread_cpumask(cpu);
> +}
> +#endif
> +
> +static inline const struct cpumask *cpu_cpu_mask(int cpu)
> +{
> +       return cpumask_of_node(cpu_to_node(cpu));
> +}
> +
> +
>  #endif /* _LINUX_TOPOLOGY_H */
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ae365aa..3397bcb 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5603,17 +5603,6 @@ static int __init isolated_cpu_setup(char *str)
> 
>  __setup("isolcpus=", isolated_cpu_setup);
> 
> -static const struct cpumask *cpu_cpu_mask(int cpu)
> -{
> -       return cpumask_of_node(cpu_to_node(cpu));
> -}
> -
> -struct sd_data {
> -       struct sched_domain **__percpu sd;
> -       struct sched_group **__percpu sg;
> -       struct sched_group_power **__percpu sgp;
> -};
> -
>  struct s_data {
>         struct sched_domain ** __percpu sd;
>         struct root_domain      *rd;
> @@ -5626,21 +5615,6 @@ enum s_alloc {
>         sa_none,
>  };
> 
> -struct sched_domain_topology_level;
> -
> -typedef struct sched_domain *(*sched_domain_init_f)(struct sched_domain_topology_level *tl, int cpu);
> -typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
> -
> -#define SDTL_OVERLAP   0x01
> -
> -struct sched_domain_topology_level {
> -       sched_domain_init_f init;
> -       sched_domain_mask_f mask;
> -       int                 flags;
> -       int                 numa_level;
> -       struct sd_data      data;
> -};
> -
>  /*
>   * Build an iteration mask that can exclude certain CPUs from the upwards
>   * domain traversal.
> @@ -5869,34 +5843,6 @@ int __weak arch_sd_sibling_asym_packing(void)
>   * Non-inlined to reduce accumulated stack pressure in build_sched_domains()
>   */
> 
> -#ifdef CONFIG_SCHED_DEBUG
> -# define SD_INIT_NAME(sd, type)                sd->name = #type
> -#else
> -# define SD_INIT_NAME(sd, type)                do { } while (0)
> -#endif
> -
> -#define SD_INIT_FUNC(type)                                             \
> -static noinline struct sched_domain *                                  \
> -sd_init_##type(struct sched_domain_topology_level *tl, int cpu)        \
> -{                                                                      \
> -       struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);       \
> -       *sd = SD_##type##_INIT;                                         \
> -       SD_INIT_NAME(sd, type);                                         \
> -       sd->private = &tl->data;                                        \
> -       return sd;                                                      \
> -}
> -
> -SD_INIT_FUNC(CPU)
> -#ifdef CONFIG_SCHED_SMT
> - SD_INIT_FUNC(SIBLING)
> -#endif
> -#ifdef CONFIG_SCHED_MC
> - SD_INIT_FUNC(MC)
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> - SD_INIT_FUNC(BOOK)
> -#endif
> -
>  static int default_relax_domain_level = -1;
>  int sched_domain_level_max;
> 
> @@ -5984,97 +5930,156 @@ static void claim_allocations(int cpu, struct sched_domain *sd)
>                 *per_cpu_ptr(sdd->sgp, cpu) = NULL;
>  }
> 
> -#ifdef CONFIG_SCHED_SMT
> -static const struct cpumask *cpu_smt_mask(int cpu)
> -{
> -       return topology_thread_cpumask(cpu);
> -}
> -#endif
> -
> -/*
> - * Topology list, bottom-up.
> - */
> -static struct sched_domain_topology_level default_topology[] = {
> -#ifdef CONFIG_SCHED_SMT
> -       { sd_init_SIBLING, cpu_smt_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_MC
> -       { sd_init_MC, cpu_coregroup_mask, },
> -#endif
> -#ifdef CONFIG_SCHED_BOOK
> -       { sd_init_BOOK, cpu_book_mask, },
> -#endif
> -       { sd_init_CPU, cpu_cpu_mask, },
> -       { NULL, },
> -};
> -
> -static struct sched_domain_topology_level *sched_domain_topology = default_topology;
> -
> -#define for_each_sd_topology(tl)                       \
> -       for (tl = sched_domain_topology; tl->init; tl++)
> -
>  #ifdef CONFIG_NUMA
> -
>  static int sched_domains_numa_levels;
>  static int *sched_domains_numa_distance;
>  static struct cpumask ***sched_domains_numa_masks;
>  static int sched_domains_curr_level;
> +#endif
> 
> -static inline int sd_local_flags(int level)
> -{
> -       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
> -               return 0;
> -
> -       return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;
> -}
> +/*
> + * SD_flags allowed in topology descriptions.
> + *
> + * SD_SHARE_CPUPOWER      - describes SMT topologies
> + * SD_SHARE_PKG_RESOURCES - describes shared caches
> + * SD_NUMA                - describes NUMA topologies
> + *
> + * Odd one out:
> + * SD_ASYM_PACKING        - describes SMT quirks
> + */
> +#define TOPOLOGY_SD_FLAGS              \
> +       (SD_SHARE_CPUPOWER |            \
> +        SD_SHARE_PKG_RESOURCES |       \
> +        SD_NUMA |                      \
> +        SD_ASYM_PACKING)
> 
>  static struct sched_domain *
> -sd_numa_init(struct sched_domain_topology_level *tl, int cpu)
> +sd_init(struct sched_domain_topology_level *tl, int cpu)
>  {
>         struct sched_domain *sd = *per_cpu_ptr(tl->data.sd, cpu);
> -       int level = tl->numa_level;
> -       int sd_weight = cpumask_weight(
> -                       sched_domains_numa_masks[level][cpu_to_node(cpu)]);
> +       int sd_weight, sd_flags = 0;
> +
> +#ifdef CONFIG_NUMA
> +       /*
> +        * Ugly hack to pass state to sd_numa_mask()...
> +        */
> +       sched_domains_curr_level = tl->numa_level;
> +#endif
> +
> +       sd_weight = cpumask_weight(tl->mask(cpu));
> +
> +       if (tl->sd_flags)
> +               sd_flags = (*tl->sd_flags)();
> +       if (WARN_ONCE(sd_flags & ~TOPOLOGY_SD_FLAGS,
> +                       "wrong sd_flags in topology description\n"))
> +               sd_flags &= ~TOPOLOGY_SD_FLAGS;
> 
>         *sd = (struct sched_domain){
>                 .min_interval           = sd_weight,
>                 .max_interval           = 2*sd_weight,
>                 .busy_factor            = 32,
>                 .imbalance_pct          = 125,
> -               .cache_nice_tries       = 2,
> -               .busy_idx               = 3,
> -               .idle_idx               = 2,
> +
> +               .cache_nice_tries       = 0,
> +               .busy_idx               = 0,
> +               .idle_idx               = 0,
>                 .newidle_idx            = 0,
>                 .wake_idx               = 0,
>                 .forkexec_idx           = 0,
> 
>                 .flags                  = 1*SD_LOAD_BALANCE
>                                         | 1*SD_BALANCE_NEWIDLE
> -                                       | 0*SD_BALANCE_EXEC
> -                                       | 0*SD_BALANCE_FORK
> +                                       | 1*SD_BALANCE_EXEC
> +                                       | 1*SD_BALANCE_FORK
>                                         | 0*SD_BALANCE_WAKE
> -                                       | 0*SD_WAKE_AFFINE
> +                                       | 1*SD_WAKE_AFFINE
>                                         | 0*SD_SHARE_CPUPOWER
>                                         | 0*SD_SHARE_PKG_RESOURCES
> -                                       | 1*SD_SERIALIZE
> +                                       | 0*SD_SERIALIZE
>                                         | 0*SD_PREFER_SIBLING
> -                                       | 1*SD_NUMA
> -                                       | sd_local_flags(level)
> +                                       | 0*SD_NUMA
> +                                       | sd_flags
>                                         ,
> +
>                 .last_balance           = jiffies,
>                 .balance_interval       = sd_weight,
> +               .smt_gain               = 0,
> +               .max_newidle_lb_cost    = 0,
> +               .next_decay_max_lb_cost = jiffies,
> +#ifdef CONFIG_SCHED_DEBUG
> +               .name                   = tl->name,
> +#endif
>         };
> -       SD_INIT_NAME(sd, NUMA);
> -       sd->private = &tl->data;
> 
>         /*
> -        * Ugly hack to pass state to sd_numa_mask()...
> +        * Convert topological properties into behaviour.
>          */
> -       sched_domains_curr_level = tl->numa_level;
> +
> +       if (sd->flags & SD_SHARE_CPUPOWER) {
> +               sd->imbalance_pct = 110;
> +               sd->smt_gain = 1178; /* ~15% */
> +               sd->flags |= arch_sd_sibling_asym_packing();
> +
> +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> +               sd->imbalance_pct = 117;
> +               sd->cache_nice_tries = 1;
> +               sd->busy_idx = 2;
> +
> +#ifdef CONFIG_NUMA
> +       } else if (sd->flags & SD_NUMA) {
> +               sd->cache_nice_tries = 2;
> +               sd->busy_idx = 3;
> +               sd->idle_idx = 2;
> +
> +               sd->flags |= SD_SERIALIZE;
> +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
> +                       sd->flags &= ~(SD_BALANCE_EXEC |
> +                                      SD_BALANCE_FORK |
> +                                      SD_WAKE_AFFINE);
> +               }
> +
> +#endif
> +       } else {
> +               sd->flags |= SD_PREFER_SIBLING;
> +               sd->cache_nice_tries = 1;
> +               sd->busy_idx = 2;
> +               sd->idle_idx = 1;
> +       }

This 'if ... else statement' is still a weak point from the perspective
of making the code robust:

On TC2 w/ the following change in cpu_corepower_mask()

 const struct cpumask *cpu_corepower_mask(int cpu)
 {
-       return &cpu_topology[cpu].thread_sibling;
+       return cpu_topology[cpu].socket_id ?
&cpu_topology[cpu].thread_sibling :
+                       &cpu_topology[cpu].core_sibling;
 }

I get a sane set-up:

root at linaro-developer:~# cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
GMC
DIE
GMC
DIE
MC
DIE
MC
DIE
MC
DIE
root at linaro-developer:~# cat
/proc/sys/kernel/sched_domain/cpu*/domain*/flags
815
4143
815
4143
559
4143
559
4143
559
4143

w/ 815 (0x32F : SD_LOAD_BALANCE SD_BALANCE_NEWIDLE SD_BALANCE_EXEC
SD_BALANCE_FORK SD_WAKE_AFFINE *SD_SHARE_POWERDOMAIN*
SD_SHARE_PKG_RESOURCES)

w/ 559 (0x22F : SD_LOAD_BALANCE SD_BALANCE_NEWIDLE SD_BALANCE_EXEC
SD_BALANCE_FORK SD_WAKE_AFFINE SD_SHARE_PKG_RESOURCES)

But when I introduce the following error into the arch specific
cpu_corepower_flags() function

 static inline const int cpu_corepower_flags(void)
 {
-       return SD_SHARE_PKG_RESOURCES  | SD_SHARE_POWERDOMAIN;
+       return SD_SHARE_POWERDOMAIN;
 }

the GMC related sd's for CPU0,1 are initialized as DIE in sd_init()
resulting in this wrong set-up w/o any warning/error message:

root at linaro-developer:~# cat /proc/sys/kernel/sched_domain/cpu*/domain*/name
GMC
DIE
GMC
DIE
MC
DIE
MC
DIE
MC
DIE
root at linaro-developer:~# cat
/proc/sys/kernel/sched_domain/cpu*/domain*/flags
4399
4143
4399
4143
559
4143
559
4143
559
4143

w/ 4399 (0x112f : SD_LOAD_BALANCE SD_BALANCE_NEWIDLE SD_BALANCE_EXEC
SD_BALANCE_FORK SD_WAKE_AFFINE *SD_SHARE_POWERDOMAIN* SD_PREFER_SIBLING

Is there a way to check that MC and GMC have to have
SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?

-- Dietmar

> +
> +       sd->private = &tl->data;
> 
>         return sd;
>  }
> 
> +/*
> + * Topology list, bottom-up.
> + */
> +static struct sched_domain_topology_level default_topology[] = {
> +#ifdef CONFIG_SCHED_SMT
> +       { cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
> +#endif
> +#ifdef CONFIG_SCHED_MC
> +       { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
> +#endif
> +#ifdef CONFIG_SCHED_BOOK
> +       { cpu_book_mask, SD_INIT_NAME(BOOK) },
> +#endif
> +       { cpu_cpu_mask, SD_INIT_NAME(DIE) },
> +       { NULL, },
> +};
> +
> +struct sched_domain_topology_level *sched_domain_topology = default_topology;
> +
> +#define for_each_sd_topology(tl)                       \
> +       for (tl = sched_domain_topology; tl->mask; tl++)
> +
> +void set_sched_topology(struct sched_domain_topology_level *tl)
> +{
> +       sched_domain_topology = tl;
> +}
> +
> +#ifdef CONFIG_NUMA
> +
>  static const struct cpumask *sd_numa_mask(int cpu)
>  {
>         return sched_domains_numa_masks[sched_domains_curr_level][cpu_to_node(cpu)];
> @@ -6218,7 +6223,10 @@ static void sched_init_numa(void)
>                 }
>         }
> 
> -       tl = kzalloc((ARRAY_SIZE(default_topology) + level) *
> +       /* Compute default topology size */
> +       for (i = 0; sched_domain_topology[i].mask; i++);
> +
> +       tl = kzalloc((i + level) *
>                         sizeof(struct sched_domain_topology_level), GFP_KERNEL);
>         if (!tl)
>                 return;
> @@ -6226,18 +6234,19 @@ static void sched_init_numa(void)
>         /*
>          * Copy the default topology bits..
>          */
> -       for (i = 0; default_topology[i].init; i++)
> -               tl[i] = default_topology[i];
> +       for (i = 0; sched_domain_topology[i].mask; i++)
> +               tl[i] = sched_domain_topology[i];
> 
>         /*
>          * .. and append 'j' levels of NUMA goodness.
>          */
>         for (j = 0; j < level; i++, j++) {
>                 tl[i] = (struct sched_domain_topology_level){
> -                       .init = sd_numa_init,
>                         .mask = sd_numa_mask,
> +                       .sd_flags = SD_NUMA,
>                         .flags = SDTL_OVERLAP,
>                         .numa_level = j,
> +                       SD_INIT_NAME(NUMA)
>                 };
>         }
> 
> @@ -6395,7 +6404,7 @@ struct sched_domain *build_sched_domain(struct sched_domain_topology_level *tl,
>                 const struct cpumask *cpu_map, struct sched_domain_attr *attr,
>                 struct sched_domain *child, int cpu)
>  {
> -       struct sched_domain *sd = tl->init(tl, cpu);
> +       struct sched_domain *sd = sd_init(tl, cpu);
>         if (!sd)
>                 return child;
> 
> --
> 1.9.0
> 
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19 11:59     ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 11:59 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:


> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>  	(SD_SHARE_CPUPOWER |		\
>  	 SD_SHARE_PKG_RESOURCES |	\
>  	 SD_NUMA |			\
> -	 SD_ASYM_PACKING)
> +	 SD_ASYM_PACKING |		\
> +	 SD_SHARE_POWERDOMAIN)

You forgot to update the pretty comment above that :-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19 11:59     ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 11:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:


> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>  	(SD_SHARE_CPUPOWER |		\
>  	 SD_SHARE_PKG_RESOURCES |	\
>  	 SD_NUMA |			\
> -	 SD_ASYM_PACKING)
> +	 SD_ASYM_PACKING |		\
> +	 SD_SHARE_POWERDOMAIN)

You forgot to update the pretty comment above that :-)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19 12:01     ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 12:01 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:
> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
> in a sched_domain level can or not reach different power state. As an example,
> the flag should be cleared at CPU level if groups of cores can be power gated
> independently. This information can be used to add load balancing level between
> group of CPUs than can power gate independantly. The default behavior of the
> scheduler is to spread tasks across CPUs and groups of CPUs 

> so the flag is set into all sched_domains.

I suppose that is the part Preeti stumbled over; as did I. Its not set
at all.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19 12:01     ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 12:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:
> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
> in a sched_domain level can or not reach different power state. As an example,
> the flag should be cleared at CPU level if groups of cores can be power gated
> independently. This information can be used to add load balancing level between
> group of CPUs than can power gate independantly. The default behavior of the
> scheduler is to spread tasks across CPUs and groups of CPUs 

> so the flag is set into all sched_domains.

I suppose that is the part Preeti stumbled over; as did I. Its not set
at all.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
  2014-03-18 17:56   ` Vincent Guittot
@ 2014-03-19 12:05     ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 12:05 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, linux-kernel, dietmar.eggemann, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On Tue, Mar 18, 2014 at 06:56:49PM +0100, Vincent Guittot wrote:
> Set the power domain dependency at SMT level of Power8 but keep the flag
> clear at CPU level. The goal is to consolidate tasks on the threads of a
> core up to a level as decribed in the link below:
> https://lkml.org/lkml/2014/3/12/16

I've not yet looked at the link; but its best to avoid references and
have a Changelog that can stand on its own.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
@ 2014-03-19 12:05     ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 12:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 18, 2014 at 06:56:49PM +0100, Vincent Guittot wrote:
> Set the power domain dependency at SMT level of Power8 but keep the flag
> clear at CPU level. The goal is to consolidate tasks on the threads of a
> core up to a level as decribed in the link below:
> https://lkml.org/lkml/2014/3/12/16

I've not yet looked at the link; but its best to avoid references and
have a Changelog that can stand on its own.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-19 11:05         ` Preeti U Murthy
@ 2014-03-19 12:26           ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:26 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Dietmar Eggemann,
	tony.luck, fenghua.yu, schwidefsky, james.hogan, cmetcalf,
	Benjamin Herrenschmidt, Russell King - ARM Linux, LAK,
	linaro-kernel

On 19 March 2014 12:05, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> On 03/19/2014 03:22 PM, Vincent Guittot wrote:
>> On 19 March 2014 07:21, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>>> Hi Vincent,
>>>
>>> On 03/18/2014 11:26 PM, Vincent Guittot wrote:
>>>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>>>> in a sched_domain level can or not reach different power state. As an example,
>>>> the flag should be cleared at CPU level if groups of cores can be power gated
>>>> independently. This information can be used to add load balancing level between
>>>> group of CPUs than can power gate independantly. The default behavior of the
>>>> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
>>>> into all sched_domains.
>>>
>>> I don't see this flag being set either in sd_init() or in
>>> default_topology[]. Should not the default_topology[] flag setting
>>> routines set this flag at every level of sched domain along with other
>>> topology flags, unless the arch wants to override it?
>>
>> Hi Preeti
>>
>> I have made the choice to not add it in the default table for the
>> moment because the scheduler behavior is not changed. It will be added
>> with patchset that will take advantage of this flag in the load
>> balance decision.
>
> Ok if you are looking at setting this flag in the default topology table
> then [patch 7/7]:sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
> looks good to me. Please add my Reviewed-by to this patch.
>
> However if you are looking at initializing this flag as being set by
> default in sd_init() then the archs will have to revert the flag, rather
> than set it in their respective topology tables for the sched domains
> which have their groups power gated. In which case the    [patch 7/7]
> would be incorrect.
>    But wait, I see that you  mention that the topology level flags are
> left to the archs to set if required. So I am assuming you will not set
> the SD_SHARE_POWER_DOMAIN flag in sd_init() right?

Yes, it will not be set in sd_init but with the function pointer of the table

Vincent

>
> Regards
> Preeti U Murthy
>>
>> Regards,
>> Vincent
>>
>>>
>>> Regards
>>> Preeti U Murthy
>>>> This flag is part of the topology flags that can be set by arch.
>>>>
>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>>> ---
>>>>  include/linux/sched.h | 1 +
>>>>  kernel/sched/core.c   | 9 ++++++---
>>>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 6479de4..7048369 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>>>>  #define SD_BALANCE_WAKE              0x0010  /* Balance on wakeup */
>>>>  #define SD_WAKE_AFFINE               0x0020  /* Wake task to waking CPU */
>>>>  #define SD_SHARE_CPUPOWER    0x0080  /* Domain members share cpu power */
>>>> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>>>>  #define SD_SHARE_PKG_RESOURCES       0x0200  /* Domain members share cpu pkg resources */
>>>>  #define SD_SERIALIZE         0x0400  /* Only a single load balancing instance */
>>>>  #define SD_ASYM_PACKING              0x0800  /* Place busy groups earlier in the domain */
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index 0b51ee3..224ec3b 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>>>>                        SD_BALANCE_FORK |
>>>>                        SD_BALANCE_EXEC |
>>>>                        SD_SHARE_CPUPOWER |
>>>> -                      SD_SHARE_PKG_RESOURCES)) {
>>>> +                      SD_SHARE_PKG_RESOURCES |
>>>> +                      SD_SHARE_POWERDOMAIN)) {
>>>>               if (sd->groups != sd->groups->next)
>>>>                       return 0;
>>>>       }
>>>> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>>>>                               SD_BALANCE_EXEC |
>>>>                               SD_SHARE_CPUPOWER |
>>>>                               SD_SHARE_PKG_RESOURCES |
>>>> -                             SD_PREFER_SIBLING);
>>>> +                             SD_PREFER_SIBLING |
>>>> +                             SD_SHARE_POWERDOMAIN);
>>>>               if (nr_node_ids == 1)
>>>>                       pflags &= ~SD_SERIALIZE;
>>>>       }
>>>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>>>       (SD_SHARE_CPUPOWER |            \
>>>>        SD_SHARE_PKG_RESOURCES |       \
>>>>        SD_NUMA |                      \
>>>> -      SD_ASYM_PACKING)
>>>> +      SD_ASYM_PACKING |              \
>>>> +      SD_SHARE_POWERDOMAIN)
>>>>
>>>>  static struct sched_domain *
>>>>  sd_init(struct sched_domain_topology_level *tl, int cpu)
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19 12:26           ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 12:05, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
> On 03/19/2014 03:22 PM, Vincent Guittot wrote:
>> On 19 March 2014 07:21, Preeti U Murthy <preeti@linux.vnet.ibm.com> wrote:
>>> Hi Vincent,
>>>
>>> On 03/18/2014 11:26 PM, Vincent Guittot wrote:
>>>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>>>> in a sched_domain level can or not reach different power state. As an example,
>>>> the flag should be cleared at CPU level if groups of cores can be power gated
>>>> independently. This information can be used to add load balancing level between
>>>> group of CPUs than can power gate independantly. The default behavior of the
>>>> scheduler is to spread tasks across CPUs and groups of CPUs so the flag is set
>>>> into all sched_domains.
>>>
>>> I don't see this flag being set either in sd_init() or in
>>> default_topology[]. Should not the default_topology[] flag setting
>>> routines set this flag at every level of sched domain along with other
>>> topology flags, unless the arch wants to override it?
>>
>> Hi Preeti
>>
>> I have made the choice to not add it in the default table for the
>> moment because the scheduler behavior is not changed. It will be added
>> with patchset that will take advantage of this flag in the load
>> balance decision.
>
> Ok if you are looking at setting this flag in the default topology table
> then [patch 7/7]:sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
> looks good to me. Please add my Reviewed-by to this patch.
>
> However if you are looking at initializing this flag as being set by
> default in sd_init() then the archs will have to revert the flag, rather
> than set it in their respective topology tables for the sched domains
> which have their groups power gated. In which case the    [patch 7/7]
> would be incorrect.
>    But wait, I see that you  mention that the topology level flags are
> left to the archs to set if required. So I am assuming you will not set
> the SD_SHARE_POWER_DOMAIN flag in sd_init() right?

Yes, it will not be set in sd_init but with the function pointer of the table

Vincent

>
> Regards
> Preeti U Murthy
>>
>> Regards,
>> Vincent
>>
>>>
>>> Regards
>>> Preeti U Murthy
>>>> This flag is part of the topology flags that can be set by arch.
>>>>
>>>> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
>>>> ---
>>>>  include/linux/sched.h | 1 +
>>>>  kernel/sched/core.c   | 9 ++++++---
>>>>  2 files changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>>> index 6479de4..7048369 100644
>>>> --- a/include/linux/sched.h
>>>> +++ b/include/linux/sched.h
>>>> @@ -861,6 +861,7 @@ enum cpu_idle_type {
>>>>  #define SD_BALANCE_WAKE              0x0010  /* Balance on wakeup */
>>>>  #define SD_WAKE_AFFINE               0x0020  /* Wake task to waking CPU */
>>>>  #define SD_SHARE_CPUPOWER    0x0080  /* Domain members share cpu power */
>>>> +#define SD_SHARE_POWERDOMAIN 0x0100  /* Domain members share power domain */
>>>>  #define SD_SHARE_PKG_RESOURCES       0x0200  /* Domain members share cpu pkg resources */
>>>>  #define SD_SERIALIZE         0x0400  /* Only a single load balancing instance */
>>>>  #define SD_ASYM_PACKING              0x0800  /* Place busy groups earlier in the domain */
>>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>>> index 0b51ee3..224ec3b 100644
>>>> --- a/kernel/sched/core.c
>>>> +++ b/kernel/sched/core.c
>>>> @@ -5298,7 +5298,8 @@ static int sd_degenerate(struct sched_domain *sd)
>>>>                        SD_BALANCE_FORK |
>>>>                        SD_BALANCE_EXEC |
>>>>                        SD_SHARE_CPUPOWER |
>>>> -                      SD_SHARE_PKG_RESOURCES)) {
>>>> +                      SD_SHARE_PKG_RESOURCES |
>>>> +                      SD_SHARE_POWERDOMAIN)) {
>>>>               if (sd->groups != sd->groups->next)
>>>>                       return 0;
>>>>       }
>>>> @@ -5329,7 +5330,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>>>>                               SD_BALANCE_EXEC |
>>>>                               SD_SHARE_CPUPOWER |
>>>>                               SD_SHARE_PKG_RESOURCES |
>>>> -                             SD_PREFER_SIBLING);
>>>> +                             SD_PREFER_SIBLING |
>>>> +                             SD_SHARE_POWERDOMAIN);
>>>>               if (nr_node_ids == 1)
>>>>                       pflags &= ~SD_SERIALIZE;
>>>>       }
>>>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>>>       (SD_SHARE_CPUPOWER |            \
>>>>        SD_SHARE_PKG_RESOURCES |       \
>>>>        SD_NUMA |                      \
>>>> -      SD_ASYM_PACKING)
>>>> +      SD_ASYM_PACKING |              \
>>>> +      SD_SHARE_POWERDOMAIN)
>>>>
>>>>  static struct sched_domain *
>>>>  sd_init(struct sched_domain_topology_level *tl, int cpu)
>>>>
>>>
>>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-19 11:59     ` Peter Zijlstra
@ 2014-03-19 12:28       ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: fenghua.yu, linaro-kernel, tony.luck, Russell King - ARM Linux,
	Benjamin Herrenschmidt, linux-kernel, cmetcalf, Dietmar Eggemann,
	schwidefsky, james.hogan, Preeti U Murthy, Ingo Molnar, LAK

On 19 March 2014 12:59, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:
>
>
>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>       (SD_SHARE_CPUPOWER |            \
>>        SD_SHARE_PKG_RESOURCES |       \
>>        SD_NUMA |                      \
>> -      SD_ASYM_PACKING)
>> +      SD_ASYM_PACKING |              \
>> +      SD_SHARE_POWERDOMAIN)
>
> You forgot to update the pretty comment above that :-)

yes, i'm going to update it

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19 12:28       ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:28 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 12:59, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:
>
>
>> @@ -5946,7 +5948,8 @@ static int sched_domains_curr_level;
>>       (SD_SHARE_CPUPOWER |            \
>>        SD_SHARE_PKG_RESOURCES |       \
>>        SD_NUMA |                      \
>> -      SD_ASYM_PACKING)
>> +      SD_ASYM_PACKING |              \
>> +      SD_SHARE_POWERDOMAIN)
>
> You forgot to update the pretty comment above that :-)

yes, i'm going to update it

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
  2014-03-19 12:01     ` Peter Zijlstra
@ 2014-03-19 12:29       ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: fenghua.yu, linaro-kernel, tony.luck, Russell King - ARM Linux,
	Benjamin Herrenschmidt, linux-kernel, cmetcalf, Dietmar Eggemann,
	schwidefsky, james.hogan, Preeti U Murthy, Ingo Molnar, LAK

On 19 March 2014 13:01, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:
>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>> in a sched_domain level can or not reach different power state. As an example,
>> the flag should be cleared at CPU level if groups of cores can be power gated
>> independently. This information can be used to add load balancing level between
>> group of CPUs than can power gate independantly. The default behavior of the
>> scheduler is to spread tasks across CPUs and groups of CPUs
>
>> so the flag is set into all sched_domains.
>
> I suppose that is the part Preeti stumbled over; as did I. Its not set
> at all.

yes, it's confusing. i'm going to update the commit message

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain
@ 2014-03-19 12:29       ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 13:01, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 06:56:47PM +0100, Vincent Guittot wrote:
>> A new flag SD_SHARE_POWERDOMAIN is created to reflect whether groups of CPUs
>> in a sched_domain level can or not reach different power state. As an example,
>> the flag should be cleared at CPU level if groups of cores can be power gated
>> independently. This information can be used to add load balancing level between
>> group of CPUs than can power gate independantly. The default behavior of the
>> scheduler is to spread tasks across CPUs and groups of CPUs
>
>> so the flag is set into all sched_domains.
>
> I suppose that is the part Preeti stumbled over; as did I. Its not set
> at all.

yes, it's confusing. i'm going to update the commit message

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
  2014-03-19 12:05     ` Peter Zijlstra
@ 2014-03-19 12:30       ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: fenghua.yu, linaro-kernel, tony.luck, Russell King - ARM Linux,
	Benjamin Herrenschmidt, linux-kernel, cmetcalf, Dietmar Eggemann,
	schwidefsky, james.hogan, Preeti U Murthy, Ingo Molnar, LAK

On 19 March 2014 13:05, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 06:56:49PM +0100, Vincent Guittot wrote:
>> Set the power domain dependency at SMT level of Power8 but keep the flag
>> clear at CPU level. The goal is to consolidate tasks on the threads of a
>> core up to a level as decribed in the link below:
>> https://lkml.org/lkml/2014/3/12/16
>
> I've not yet looked at the link; but its best to avoid references and
> have a Changelog that can stand on its own.

I can copy/paste Preeti's explanation in the commit message

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level
@ 2014-03-19 12:30       ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 12:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 13:05, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Mar 18, 2014 at 06:56:49PM +0100, Vincent Guittot wrote:
>> Set the power domain dependency at SMT level of Power8 but keep the flag
>> clear at CPU level. The goal is to consolidate tasks on the threads of a
>> core up to a level as decribed in the link below:
>> https://lkml.org/lkml/2014/3/12/16
>
> I've not yet looked at the link; but its best to avoid references and
> have a Changelog that can stand on its own.

I can copy/paste Preeti's explanation in the commit message

>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-19 11:27     ` Dietmar Eggemann
@ 2014-03-19 12:41       ` Peter Zijlstra
  -1 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 12:41 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Vincent Guittot, mingo, linux-kernel, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel


The keyboard deity gave us delete, please apply graciously when replying
to large emails.

On Wed, Mar 19, 2014 at 11:27:12AM +0000, Dietmar Eggemann wrote:
> On 18/03/14 17:56, Vincent Guittot wrote:
> > +       if (sd->flags & SD_SHARE_CPUPOWER) {
> > +               sd->imbalance_pct = 110;
> > +               sd->smt_gain = 1178; /* ~15% */
> > +               sd->flags |= arch_sd_sibling_asym_packing();
> > +
> > +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> > +               sd->imbalance_pct = 117;
> > +               sd->cache_nice_tries = 1;
> > +               sd->busy_idx = 2;
> > +
> > +#ifdef CONFIG_NUMA
> > +       } else if (sd->flags & SD_NUMA) {
> > +               sd->cache_nice_tries = 2;
> > +               sd->busy_idx = 3;
> > +               sd->idle_idx = 2;
> > +
> > +               sd->flags |= SD_SERIALIZE;
> > +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
> > +                       sd->flags &= ~(SD_BALANCE_EXEC |
> > +                                      SD_BALANCE_FORK |
> > +                                      SD_WAKE_AFFINE);
> > +               }
> > +
> > +#endif
> > +       } else {
> > +               sd->flags |= SD_PREFER_SIBLING;
> > +               sd->cache_nice_tries = 1;
> > +               sd->busy_idx = 2;
> > +               sd->idle_idx = 1;
> > +       }
> 
> This 'if ... else statement' is still a weak point from the perspective
> of making the code robust:

<snip>

> Is there a way to check that MC and GMC have to have
> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?

So from the core codes perspective those names mean less than nothing.
Its just a string to carry along for us meat-bags. The string isn't even
there when !SCHED_DEBUG.

So from this codes POV you told it it had a domain without PKGSHARE,
that's fine.

That said; yeah the thing isn't the prettiest piece of code. But it has
the big advantage of being the one place where we convert topology into
behaviour.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19 12:41       ` Peter Zijlstra
  0 siblings, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2014-03-19 12:41 UTC (permalink / raw)
  To: linux-arm-kernel


The keyboard deity gave us delete, please apply graciously when replying
to large emails.

On Wed, Mar 19, 2014 at 11:27:12AM +0000, Dietmar Eggemann wrote:
> On 18/03/14 17:56, Vincent Guittot wrote:
> > +       if (sd->flags & SD_SHARE_CPUPOWER) {
> > +               sd->imbalance_pct = 110;
> > +               sd->smt_gain = 1178; /* ~15% */
> > +               sd->flags |= arch_sd_sibling_asym_packing();
> > +
> > +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
> > +               sd->imbalance_pct = 117;
> > +               sd->cache_nice_tries = 1;
> > +               sd->busy_idx = 2;
> > +
> > +#ifdef CONFIG_NUMA
> > +       } else if (sd->flags & SD_NUMA) {
> > +               sd->cache_nice_tries = 2;
> > +               sd->busy_idx = 3;
> > +               sd->idle_idx = 2;
> > +
> > +               sd->flags |= SD_SERIALIZE;
> > +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
> > +                       sd->flags &= ~(SD_BALANCE_EXEC |
> > +                                      SD_BALANCE_FORK |
> > +                                      SD_WAKE_AFFINE);
> > +               }
> > +
> > +#endif
> > +       } else {
> > +               sd->flags |= SD_PREFER_SIBLING;
> > +               sd->cache_nice_tries = 1;
> > +               sd->busy_idx = 2;
> > +               sd->idle_idx = 1;
> > +       }
> 
> This 'if ... else statement' is still a weak point from the perspective
> of making the code robust:

<snip>

> Is there a way to check that MC and GMC have to have
> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?

So from the core codes perspective those names mean less than nothing.
Its just a string to carry along for us meat-bags. The string isn't even
there when !SCHED_DEBUG.

So from this codes POV you told it it had a domain without PKGSHARE,
that's fine.

That said; yeah the thing isn't the prettiest piece of code. But it has
the big advantage of being the one place where we convert topology into
behaviour.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-19 12:41       ` Peter Zijlstra
@ 2014-03-19 13:33         ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 13:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dietmar Eggemann, mingo, linux-kernel, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On 19 March 2014 13:41, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The keyboard deity gave us delete, please apply graciously when replying
> to large emails.
>
> On Wed, Mar 19, 2014 at 11:27:12AM +0000, Dietmar Eggemann wrote:
>> On 18/03/14 17:56, Vincent Guittot wrote:
>> > +       if (sd->flags & SD_SHARE_CPUPOWER) {
>> > +               sd->imbalance_pct = 110;
>> > +               sd->smt_gain = 1178; /* ~15% */
>> > +               sd->flags |= arch_sd_sibling_asym_packing();
>> > +
>> > +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
>> > +               sd->imbalance_pct = 117;
>> > +               sd->cache_nice_tries = 1;
>> > +               sd->busy_idx = 2;
>> > +
>> > +#ifdef CONFIG_NUMA
>> > +       } else if (sd->flags & SD_NUMA) {
>> > +               sd->cache_nice_tries = 2;
>> > +               sd->busy_idx = 3;
>> > +               sd->idle_idx = 2;
>> > +
>> > +               sd->flags |= SD_SERIALIZE;
>> > +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
>> > +                       sd->flags &= ~(SD_BALANCE_EXEC |
>> > +                                      SD_BALANCE_FORK |
>> > +                                      SD_WAKE_AFFINE);
>> > +               }
>> > +
>> > +#endif
>> > +       } else {
>> > +               sd->flags |= SD_PREFER_SIBLING;
>> > +               sd->cache_nice_tries = 1;
>> > +               sd->busy_idx = 2;
>> > +               sd->idle_idx = 1;
>> > +       }
>>
>> This 'if ... else statement' is still a weak point from the perspective
>> of making the code robust:
>
> <snip>
>
>> Is there a way to check that MC and GMC have to have
>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
>
> So from the core codes perspective those names mean less than nothing.
> Its just a string to carry along for us meat-bags. The string isn't even
> there when !SCHED_DEBUG.
>
> So from this codes POV you told it it had a domain without PKGSHARE,
> that's fine.
>
> That said; yeah the thing isn't the prettiest piece of code. But it has
> the big advantage of being the one place where we convert topology into
> behaviour.

We might add a check of the child in sd_init to ensure that the child
has at least some properties of the current level.
I mean that if a level has got the SD_SHARE_PKG_RESOURCES flag, its
child must also have it. The same for SD_SHARE_CPUPOWER and
SD_ASYM_PACKING.

so we can add something like the below in sd_init

child_flags = SD_SHARE_PKG_RESOURCES | SD_SHARE_CPUPOWER | SD_ASYM_PACKING
flags = sd->flags & child_flags
if (sd->child)
   child_flags &= sd->child->flags
child_flags &= flags
if (flags != child_flags)
    pr_info("The topology description looks strange \n");

Vincent

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19 13:33         ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 13:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 13:41, Peter Zijlstra <peterz@infradead.org> wrote:
>
> The keyboard deity gave us delete, please apply graciously when replying
> to large emails.
>
> On Wed, Mar 19, 2014 at 11:27:12AM +0000, Dietmar Eggemann wrote:
>> On 18/03/14 17:56, Vincent Guittot wrote:
>> > +       if (sd->flags & SD_SHARE_CPUPOWER) {
>> > +               sd->imbalance_pct = 110;
>> > +               sd->smt_gain = 1178; /* ~15% */
>> > +               sd->flags |= arch_sd_sibling_asym_packing();
>> > +
>> > +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
>> > +               sd->imbalance_pct = 117;
>> > +               sd->cache_nice_tries = 1;
>> > +               sd->busy_idx = 2;
>> > +
>> > +#ifdef CONFIG_NUMA
>> > +       } else if (sd->flags & SD_NUMA) {
>> > +               sd->cache_nice_tries = 2;
>> > +               sd->busy_idx = 3;
>> > +               sd->idle_idx = 2;
>> > +
>> > +               sd->flags |= SD_SERIALIZE;
>> > +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
>> > +                       sd->flags &= ~(SD_BALANCE_EXEC |
>> > +                                      SD_BALANCE_FORK |
>> > +                                      SD_WAKE_AFFINE);
>> > +               }
>> > +
>> > +#endif
>> > +       } else {
>> > +               sd->flags |= SD_PREFER_SIBLING;
>> > +               sd->cache_nice_tries = 1;
>> > +               sd->busy_idx = 2;
>> > +               sd->idle_idx = 1;
>> > +       }
>>
>> This 'if ... else statement' is still a weak point from the perspective
>> of making the code robust:
>
> <snip>
>
>> Is there a way to check that MC and GMC have to have
>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
>
> So from the core codes perspective those names mean less than nothing.
> Its just a string to carry along for us meat-bags. The string isn't even
> there when !SCHED_DEBUG.
>
> So from this codes POV you told it it had a domain without PKGSHARE,
> that's fine.
>
> That said; yeah the thing isn't the prettiest piece of code. But it has
> the big advantage of being the one place where we convert topology into
> behaviour.

We might add a check of the child in sd_init to ensure that the child
has at least some properties of the current level.
I mean that if a level has got the SD_SHARE_PKG_RESOURCES flag, its
child must also have it. The same for SD_SHARE_CPUPOWER and
SD_ASYM_PACKING.

so we can add something like the below in sd_init

child_flags = SD_SHARE_PKG_RESOURCES | SD_SHARE_CPUPOWER | SD_ASYM_PACKING
flags = sd->flags & child_flags
if (sd->child)
   child_flags &= sd->child->flags
child_flags &= flags
if (flags != child_flags)
    pr_info("The topology description looks strange \n");

Vincent

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-19 12:41       ` Peter Zijlstra
@ 2014-03-19 13:46         ` Dietmar Eggemann
  -1 siblings, 0 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2014-03-19 13:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, mingo, linux-kernel, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On 19/03/14 12:41, Peter Zijlstra wrote:
> 
> The keyboard deity gave us delete, please apply graciously when replying
> to large emails.

Sorry about that, will do next time.

> 
> On Wed, Mar 19, 2014 at 11:27:12AM +0000, Dietmar Eggemann wrote:
>> On 18/03/14 17:56, Vincent Guittot wrote:
>>> +       if (sd->flags & SD_SHARE_CPUPOWER) {
>>> +               sd->imbalance_pct = 110;
>>> +               sd->smt_gain = 1178; /* ~15% */
>>> +               sd->flags |= arch_sd_sibling_asym_packing();
>>> +
>>> +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
>>> +               sd->imbalance_pct = 117;
>>> +               sd->cache_nice_tries = 1;
>>> +               sd->busy_idx = 2;
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +       } else if (sd->flags & SD_NUMA) {
>>> +               sd->cache_nice_tries = 2;
>>> +               sd->busy_idx = 3;
>>> +               sd->idle_idx = 2;
>>> +
>>> +               sd->flags |= SD_SERIALIZE;
>>> +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
>>> +                       sd->flags &= ~(SD_BALANCE_EXEC |
>>> +                                      SD_BALANCE_FORK |
>>> +                                      SD_WAKE_AFFINE);
>>> +               }
>>> +
>>> +#endif
>>> +       } else {
>>> +               sd->flags |= SD_PREFER_SIBLING;
>>> +               sd->cache_nice_tries = 1;
>>> +               sd->busy_idx = 2;
>>> +               sd->idle_idx = 1;
>>> +       }
>>
>> This 'if ... else statement' is still a weak point from the perspective
>> of making the code robust:
> 
> <snip>
> 
>> Is there a way to check that MC and GMC have to have
>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
> 
> So from the core codes perspective those names mean less than nothing.
> Its just a string to carry along for us meat-bags. The string isn't even
> there when !SCHED_DEBUG.
> 
> So from this codes POV you told it it had a domain without PKGSHARE,
> that's fine.

I see your point. So what we want to avoid is to enable archs to create
different (per-cpu) set-ups inside a domain (as a specific set of cpu's
from a viewpoint of a cpu) but misconfiguration of the whole domain is a
different story. Got it!

> 
> That said; yeah the thing isn't the prettiest piece of code. But it has
> the big advantage of being the one place where we convert topology into
> behaviour.
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19 13:46         ` Dietmar Eggemann
  0 siblings, 0 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2014-03-19 13:46 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/03/14 12:41, Peter Zijlstra wrote:
> 
> The keyboard deity gave us delete, please apply graciously when replying
> to large emails.

Sorry about that, will do next time.

> 
> On Wed, Mar 19, 2014 at 11:27:12AM +0000, Dietmar Eggemann wrote:
>> On 18/03/14 17:56, Vincent Guittot wrote:
>>> +       if (sd->flags & SD_SHARE_CPUPOWER) {
>>> +               sd->imbalance_pct = 110;
>>> +               sd->smt_gain = 1178; /* ~15% */
>>> +               sd->flags |= arch_sd_sibling_asym_packing();
>>> +
>>> +       } else if (sd->flags & SD_SHARE_PKG_RESOURCES) {
>>> +               sd->imbalance_pct = 117;
>>> +               sd->cache_nice_tries = 1;
>>> +               sd->busy_idx = 2;
>>> +
>>> +#ifdef CONFIG_NUMA
>>> +       } else if (sd->flags & SD_NUMA) {
>>> +               sd->cache_nice_tries = 2;
>>> +               sd->busy_idx = 3;
>>> +               sd->idle_idx = 2;
>>> +
>>> +               sd->flags |= SD_SERIALIZE;
>>> +               if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
>>> +                       sd->flags &= ~(SD_BALANCE_EXEC |
>>> +                                      SD_BALANCE_FORK |
>>> +                                      SD_WAKE_AFFINE);
>>> +               }
>>> +
>>> +#endif
>>> +       } else {
>>> +               sd->flags |= SD_PREFER_SIBLING;
>>> +               sd->cache_nice_tries = 1;
>>> +               sd->busy_idx = 2;
>>> +               sd->idle_idx = 1;
>>> +       }
>>
>> This 'if ... else statement' is still a weak point from the perspective
>> of making the code robust:
> 
> <snip>
> 
>> Is there a way to check that MC and GMC have to have
>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
> 
> So from the core codes perspective those names mean less than nothing.
> Its just a string to carry along for us meat-bags. The string isn't even
> there when !SCHED_DEBUG.
> 
> So from this codes POV you told it it had a domain without PKGSHARE,
> that's fine.

I see your point. So what we want to avoid is to enable archs to create
different (per-cpu) set-ups inside a domain (as a specific set of cpu's
from a viewpoint of a cpu) but misconfiguration of the whole domain is a
different story. Got it!

> 
> That said; yeah the thing isn't the prettiest piece of code. But it has
> the big advantage of being the one place where we convert topology into
> behaviour.
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-19 13:33         ` Vincent Guittot
@ 2014-03-19 15:22           ` Dietmar Eggemann
  -1 siblings, 0 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2014-03-19 15:22 UTC (permalink / raw)
  To: Vincent Guittot, Peter Zijlstra
  Cc: mingo, linux-kernel, preeti, tony.luck, fenghua.yu, schwidefsky,
	james.hogan, cmetcalf, benh, linux, linux-arm-kernel,
	linaro-kernel

On 19/03/14 13:33, Vincent Guittot wrote:
[...]

>>> Is there a way to check that MC and GMC have to have
>>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
>>
>> So from the core codes perspective those names mean less than nothing.
>> Its just a string to carry along for us meat-bags. The string isn't even
>> there when !SCHED_DEBUG.
>>
>> So from this codes POV you told it it had a domain without PKGSHARE,
>> that's fine.
>>
>> That said; yeah the thing isn't the prettiest piece of code. But it has
>> the big advantage of being the one place where we convert topology into
>> behaviour.
> 
> We might add a check of the child in sd_init to ensure that the child
> has at least some properties of the current level.
> I mean that if a level has got the SD_SHARE_PKG_RESOURCES flag, its
> child must also have it. The same for SD_SHARE_CPUPOWER and
> SD_ASYM_PACKING.
> 
> so we can add something like the below in sd_init
> 
> child_flags = SD_SHARE_PKG_RESOURCES | SD_SHARE_CPUPOWER | SD_ASYM_PACKING
> flags = sd->flags & child_flags
> if (sd->child)
>    child_flags &= sd->child->flags
> child_flags &= flags
> if (flags != child_flags)
>     pr_info("The topology description looks strange \n");

I tried it with my faulty set-up on TC2 and I get the info message for
the GMC level for all CPU's in sd_init.

I had to pass an 'struct sched_domain *child' pointer into sd_init()
from build_sched_domain() because inside sd_init() sd->child is always NULL.

So one of the requirements of this approach is that a child level like
GMC (which could potentially replace its parent level or otherwise is
destroyed itself) has to specify all flags of its parent level (MC)?

What about SD_NUMA in child_flags? SD_ASYM_PACKING is also a little bit
different than SD_SHARE_PKG_RESOURCES or SD_SHARE_CPUPOWER because it's
not used in the if ... else statement.

But I'm afraid this only works for this specific case of the MC/GMC
layer and is not scalable. If sd->child is a level for which you don't
want to potentially destroy itself or its parent, then you would get
false alarms. IMHO, sd_init() has no information for which pair of
adjacent levels it should apply this check and for which not. Do I miss
something here?

-- Dietmar

> 
> Vincent
> 



^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19 15:22           ` Dietmar Eggemann
  0 siblings, 0 replies; 55+ messages in thread
From: Dietmar Eggemann @ 2014-03-19 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

On 19/03/14 13:33, Vincent Guittot wrote:
[...]

>>> Is there a way to check that MC and GMC have to have
>>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
>>
>> So from the core codes perspective those names mean less than nothing.
>> Its just a string to carry along for us meat-bags. The string isn't even
>> there when !SCHED_DEBUG.
>>
>> So from this codes POV you told it it had a domain without PKGSHARE,
>> that's fine.
>>
>> That said; yeah the thing isn't the prettiest piece of code. But it has
>> the big advantage of being the one place where we convert topology into
>> behaviour.
> 
> We might add a check of the child in sd_init to ensure that the child
> has at least some properties of the current level.
> I mean that if a level has got the SD_SHARE_PKG_RESOURCES flag, its
> child must also have it. The same for SD_SHARE_CPUPOWER and
> SD_ASYM_PACKING.
> 
> so we can add something like the below in sd_init
> 
> child_flags = SD_SHARE_PKG_RESOURCES | SD_SHARE_CPUPOWER | SD_ASYM_PACKING
> flags = sd->flags & child_flags
> if (sd->child)
>    child_flags &= sd->child->flags
> child_flags &= flags
> if (flags != child_flags)
>     pr_info("The topology description looks strange \n");

I tried it with my faulty set-up on TC2 and I get the info message for
the GMC level for all CPU's in sd_init.

I had to pass an 'struct sched_domain *child' pointer into sd_init()
from build_sched_domain() because inside sd_init() sd->child is always NULL.

So one of the requirements of this approach is that a child level like
GMC (which could potentially replace its parent level or otherwise is
destroyed itself) has to specify all flags of its parent level (MC)?

What about SD_NUMA in child_flags? SD_ASYM_PACKING is also a little bit
different than SD_SHARE_PKG_RESOURCES or SD_SHARE_CPUPOWER because it's
not used in the if ... else statement.

But I'm afraid this only works for this specific case of the MC/GMC
layer and is not scalable. If sd->child is a level for which you don't
want to potentially destroy itself or its parent, then you would get
false alarms. IMHO, sd_init() has no information for which pair of
adjacent levels it should apply this check and for which not. Do I miss
something here?

-- Dietmar

> 
> Vincent
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v2 2/7] sched: rework of sched_domain topology definition
  2014-03-19 15:22           ` Dietmar Eggemann
@ 2014-03-19 16:14             ` Vincent Guittot
  -1 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 16:14 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Peter Zijlstra, mingo, linux-kernel, preeti, tony.luck,
	fenghua.yu, schwidefsky, james.hogan, cmetcalf, benh, linux,
	linux-arm-kernel, linaro-kernel

On 19 March 2014 16:22, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 19/03/14 13:33, Vincent Guittot wrote:
> [...]
>
>>>> Is there a way to check that MC and GMC have to have
>>>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
>>>
>>> So from the core codes perspective those names mean less than nothing.
>>> Its just a string to carry along for us meat-bags. The string isn't even
>>> there when !SCHED_DEBUG.
>>>
>>> So from this codes POV you told it it had a domain without PKGSHARE,
>>> that's fine.
>>>
>>> That said; yeah the thing isn't the prettiest piece of code. But it has
>>> the big advantage of being the one place where we convert topology into
>>> behaviour.
>>
>> We might add a check of the child in sd_init to ensure that the child
>> has at least some properties of the current level.
>> I mean that if a level has got the SD_SHARE_PKG_RESOURCES flag, its
>> child must also have it. The same for SD_SHARE_CPUPOWER and
>> SD_ASYM_PACKING.
>>
>> so we can add something like the below in sd_init
>>
>> child_flags = SD_SHARE_PKG_RESOURCES | SD_SHARE_CPUPOWER | SD_ASYM_PACKING
>> flags = sd->flags & child_flags
>> if (sd->child)
>>    child_flags &= sd->child->flags
>> child_flags &= flags
>> if (flags != child_flags)
>>     pr_info("The topology description looks strange \n");
>
> I tried it with my faulty set-up on TC2 and I get the info message for
> the GMC level for all CPU's in sd_init.
>
> I had to pass an 'struct sched_domain *child' pointer into sd_init()
> from build_sched_domain() because inside sd_init() sd->child is always NULL.

ah yes... the child is set after the call to sd_init so we don't have
access to the child

>
> So one of the requirements of this approach is that a child level like
> GMC (which could potentially replace its parent level or otherwise is
> destroyed itself) has to specify all flags of its parent level (MC)?

yes among the 3 flags that i mention because we have simple
parent/child relation for this 3 flags

>
> What about SD_NUMA in child_flags? SD_ASYM_PACKING is also a little bit

SD_NUMA doesn't follow the same rule

> different than SD_SHARE_PKG_RESOURCES or SD_SHARE_CPUPOWER because it's
> not used in the if ... else statement.

It's not a matter of being in a if else statement but more a topology
dependency.

>
> But I'm afraid this only works for this specific case of the MC/GMC

This also works if a level with SD_SHARE_CPUPOWER flag is declared in
the table after a level without the flag which doesn't make sense
AFAIK.

> layer and is not scalable. If sd->child is a level for which you don't
> want to potentially destroy itself or its parent, then you would get
> false alarms. IMHO, sd_init() has no information for which pair of
> adjacent levels it should apply this check and for which not. Do I miss
> something here?

This check could apply on all level.

Vincent

>
> -- Dietmar
>
>>
>> Vincent
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v2 2/7] sched: rework of sched_domain topology definition
@ 2014-03-19 16:14             ` Vincent Guittot
  0 siblings, 0 replies; 55+ messages in thread
From: Vincent Guittot @ 2014-03-19 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 19 March 2014 16:22, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 19/03/14 13:33, Vincent Guittot wrote:
> [...]
>
>>>> Is there a way to check that MC and GMC have to have
>>>> SD_SHARE_PKG_RESOURCES set so that this can't happen unnoticed?
>>>
>>> So from the core codes perspective those names mean less than nothing.
>>> Its just a string to carry along for us meat-bags. The string isn't even
>>> there when !SCHED_DEBUG.
>>>
>>> So from this codes POV you told it it had a domain without PKGSHARE,
>>> that's fine.
>>>
>>> That said; yeah the thing isn't the prettiest piece of code. But it has
>>> the big advantage of being the one place where we convert topology into
>>> behaviour.
>>
>> We might add a check of the child in sd_init to ensure that the child
>> has at least some properties of the current level.
>> I mean that if a level has got the SD_SHARE_PKG_RESOURCES flag, its
>> child must also have it. The same for SD_SHARE_CPUPOWER and
>> SD_ASYM_PACKING.
>>
>> so we can add something like the below in sd_init
>>
>> child_flags = SD_SHARE_PKG_RESOURCES | SD_SHARE_CPUPOWER | SD_ASYM_PACKING
>> flags = sd->flags & child_flags
>> if (sd->child)
>>    child_flags &= sd->child->flags
>> child_flags &= flags
>> if (flags != child_flags)
>>     pr_info("The topology description looks strange \n");
>
> I tried it with my faulty set-up on TC2 and I get the info message for
> the GMC level for all CPU's in sd_init.
>
> I had to pass an 'struct sched_domain *child' pointer into sd_init()
> from build_sched_domain() because inside sd_init() sd->child is always NULL.

ah yes... the child is set after the call to sd_init so we don't have
access to the child

>
> So one of the requirements of this approach is that a child level like
> GMC (which could potentially replace its parent level or otherwise is
> destroyed itself) has to specify all flags of its parent level (MC)?

yes among the 3 flags that i mention because we have simple
parent/child relation for this 3 flags

>
> What about SD_NUMA in child_flags? SD_ASYM_PACKING is also a little bit

SD_NUMA doesn't follow the same rule

> different than SD_SHARE_PKG_RESOURCES or SD_SHARE_CPUPOWER because it's
> not used in the if ... else statement.

It's not a matter of being in a if else statement but more a topology
dependency.

>
> But I'm afraid this only works for this specific case of the MC/GMC

This also works if a level with SD_SHARE_CPUPOWER flag is declared in
the table after a level without the flag which doesn't make sense
AFAIK.

> layer and is not scalable. If sd->child is a level for which you don't
> want to potentially destroy itself or its parent, then you would get
> false alarms. IMHO, sd_init() has no information for which pair of
> adjacent levels it should apply this check and for which not. Do I miss
> something here?

This check could apply on all level.

Vincent

>
> -- Dietmar
>
>>
>> Vincent
>>
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2014-03-19 16:14 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-18 17:56 [PATCH v2 0/7] rework sched_domain topology description Vincent Guittot
2014-03-18 17:56 ` Vincent Guittot
2014-03-18 17:56 ` [PATCH v2 1/7] sched: remove unused SCHED_INIT_NODE Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-19 11:07   ` James Hogan
2014-03-19 11:07     ` James Hogan
2014-03-19 11:07     ` James Hogan
2014-03-18 17:56 ` [PATCH v2 2/7] sched: rework of sched_domain topology definition Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-19  6:01   ` Preeti U Murthy
2014-03-19  6:01     ` Preeti U Murthy
2014-03-19 11:27   ` Dietmar Eggemann
2014-03-19 11:27     ` Dietmar Eggemann
2014-03-19 12:41     ` Peter Zijlstra
2014-03-19 12:41       ` Peter Zijlstra
2014-03-19 13:33       ` Vincent Guittot
2014-03-19 13:33         ` Vincent Guittot
2014-03-19 15:22         ` Dietmar Eggemann
2014-03-19 15:22           ` Dietmar Eggemann
2014-03-19 16:14           ` Vincent Guittot
2014-03-19 16:14             ` Vincent Guittot
2014-03-19 13:46       ` Dietmar Eggemann
2014-03-19 13:46         ` Dietmar Eggemann
2014-03-18 17:56 ` [PATCH v2 3/7] sched: s390: create a dedicated topology table Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-18 17:56 ` [PATCH v2 4/7] sched: powerpc: " Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-19  6:04   ` Preeti U Murthy
2014-03-19  6:04     ` Preeti U Murthy
2014-03-18 17:56 ` [PATCH v2 5/7] sched: add a new SD_SHARE_POWERDOMAIN for sched_domain Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-19  6:21   ` Preeti U Murthy
2014-03-19  6:21     ` Preeti U Murthy
2014-03-19  9:52     ` Vincent Guittot
2014-03-19  9:52       ` Vincent Guittot
2014-03-19 11:05       ` Preeti U Murthy
2014-03-19 11:05         ` Preeti U Murthy
2014-03-19 12:26         ` Vincent Guittot
2014-03-19 12:26           ` Vincent Guittot
2014-03-19 11:59   ` Peter Zijlstra
2014-03-19 11:59     ` Peter Zijlstra
2014-03-19 12:28     ` Vincent Guittot
2014-03-19 12:28       ` Vincent Guittot
2014-03-19 12:01   ` Peter Zijlstra
2014-03-19 12:01     ` Peter Zijlstra
2014-03-19 12:29     ` Vincent Guittot
2014-03-19 12:29       ` Vincent Guittot
2014-03-18 17:56 ` [PATCH v2 6/7] sched: ARM: create a dedicated scheduler topology table Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-18 17:56 ` [PATCH v2 7/7] sched: powerpc: Add SD_SHARE_POWERDOMAIN for SMT level Vincent Guittot
2014-03-18 17:56   ` Vincent Guittot
2014-03-19 12:05   ` Peter Zijlstra
2014-03-19 12:05     ` Peter Zijlstra
2014-03-19 12:30     ` Vincent Guittot
2014-03-19 12:30       ` Vincent Guittot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.