linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0
@ 2016-09-08 22:26 Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT Srinivas Pandruvada
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

v3:
- Fix race clash when more than one program are enabling/disabling ITMT
- Remove group_priority_cpu macro to simplify code.
- Error reported by 0-day for compile issue on ARM

v2
- The patchset is split into two parts so that CPPC changes can be merged first
 1. Only ACPI CPPC changes (It is posted separately)
 2. ITMT changes (scheduler and Intel P-State)

- Changes in patch: sched,x86: Enable Turbo Boost Max Technology
 1. Use arch_update_cpu_topology to indicate need to completely
    rebuild sched domain when ITMT related sched domain flags change
 2. Enable client (single node) platform capable of ITMT with ITMT
    scheduling by default
 3. Implement arch_asym_cpu_priority to provide the cpu priority
    value to scheduler for asym packing.
 4. Fix a compile bug for i386 architecture.

- Changes in patch: sched: Extend scheduler's asym packing
 1. Use arch_asym_cpu_priority() to provide cpu priority
    value used for asym packing to the scheduler.

- Changes in acpi: bus: Enable HWP CPPC objects and
  acpi: bus: Set _OSC for diverse core support
  Minor code cleanup by removing #ifdef
- Changes in Kconfig for Intel P-State
  Avoid building CPPC lib for i386 for issue reported by 0-day

- Feature is enabled by default for single socket systems

With Intel® Turbo Boost Max Technology 3.0 (ITMT), single-threaded performance is
optimized by identifying processor's fastest core and running critical workloads
on it.
Refere to:
http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-max-technology.html

This patchset consist of all changes required to support ITMT feature:
- Use CPPC information in Intel P-State driver to get performance information
- Scheduler enhancements
- cppc lib patches (split in to a seprate series)

This featured can be enabled by writing at runtime
# echo 1 > /proc/sys/kernel/sched_itmt_enabled
This featured can be disabled by writing at runtime
# echo 0 > /proc/sys/kernel/sched_itmt_enabled

Srinivas Pandruvada (3):
  acpi: bus: Enable HWP CPPC objects
  acpi: bus: Set _OSC for diverse core support
  cpufreq: intel_pstate: Use CPPC to get max performance

Tim Chen (5):
  sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  sched: Extend scheduler's asym packing
  x86, cpu: provide a function topology_num_packages to enumerate
    #packages
  sched, x86: use arch_update_cpu_topology to indicate x86 need sched
    domain rebuild
  sched,x86: Enable Turbo Boost Max Technology

 arch/x86/Kconfig                |   9 +++
 arch/x86/include/asm/topology.h |  25 ++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 164 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |  92 +++++++++++++++++-----
 drivers/acpi/bus.c              |  10 +++
 drivers/cpufreq/Kconfig.x86     |   1 +
 drivers/cpufreq/intel_pstate.c  |  75 +++++++++++++++++-
 include/linux/acpi.h            |   1 +
 include/linux/sched.h           |   2 +
 kernel/sched/core.c             |  21 +++++
 kernel/sched/fair.c             |  35 ++++++---
 kernel/sched/sched.h            |   6 ++
 13 files changed, 410 insertions(+), 32 deletions(-)
 create mode 100644 arch/x86/kernel/itmt.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-10 13:10   ` Thomas Gleixner
  2016-09-10 13:13   ` Thomas Gleixner
  2016-09-08 22:26 ` [PATCH v3 2/8] sched: Extend scheduler's asym packing Srinivas Pandruvada
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

We uses ASYM_PACKING feature in the scheduler to move tasks to more
capable cpus that can be boosted to higher frequency. This is enabled by
Intel Turbo Boost Max Technology 3.0 (ITMT).  We mark the sched domain
topology level with SD_ASYM_PACKING flag for such systems to indicate
scheduler can use the ASYM_PACKING feature to move load to the
more capable cpus.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/include/asm/topology.h |  4 +++
 arch/x86/kernel/smpboot.c       | 77 +++++++++++++++++++++++++++++++----------
 kernel/sched/core.c             |  3 ++
 3 files changed, 66 insertions(+), 18 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index cf75871..98b669d 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -146,4 +146,8 @@ struct pci_bus;
 int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
+#ifdef CONFIG_SCHED_ITMT
+extern unsigned int __read_mostly sysctl_sched_itmt_enabled;
+#endif /* CONFIG_SCHED_ITMT */
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4296beb..3782bd4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -109,6 +109,8 @@ static bool logical_packages_frozen __read_mostly;
 /* Maximum number of SMT threads on any online core */
 int __max_smt_threads __read_mostly;
 
+unsigned int __read_mostly sysctl_sched_itmt_enabled;
+
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 {
 	unsigned long flags;
@@ -471,31 +473,57 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
-static struct sched_domain_topology_level numa_inside_package_topology[] = {
+#ifdef CONFIG_SCHED_ITMT
+static int x86_core_flags(void)
+{
+	int flags = cpu_core_flags();
+
+	if (sysctl_sched_itmt_enabled)
+		flags |= SD_ASYM_PACKING;
+
+	return flags;
+}
+
+static int x86_smt_flags(void)
+{
+	int flags = cpu_smt_flags();
+
+	if (sysctl_sched_itmt_enabled)
+		flags |= SD_ASYM_PACKING;
+
+	return flags;
+}
+#else
+#define x86_core_flags cpu_core_flags
+#define x86_smt_flags cpu_smt_flags
+#endif
+
+static struct sched_domain_topology_level x86_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
+static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ NULL, },
 };
+
 /*
- * set_sched_topology() sets the topology internal to a CPU.  The
- * NUMA topologies are layered on top of it to build the full
- * system topology.
- *
- * If NUMA nodes are observed to occur within a CPU package, this
- * function should be called.  It forces the sched domain code to
- * only use the SMT level for the CPU portion of the topology.
- * This essentially falls back to relying on NUMA information
- * from the SRAT table to describe the entire system topology
- * (except for hyperthreads).
+ * Set if a package/die has multiple NUMA nodes inside.
+ * AMD Magny-Cours and Intel Cluster-on-Die have this.
  */
-static void primarily_use_numa_for_topology(void)
-{
-	set_sched_topology(numa_inside_package_topology);
-}
+static bool x86_has_numa_in_package;
 
 void set_cpu_sibling_map(int cpu)
 {
@@ -558,7 +586,7 @@ void set_cpu_sibling_map(int cpu)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
 		if (match_die(c, o) && !topology_same_node(c, o))
-			primarily_use_numa_for_topology();
+			x86_has_numa_in_package = true;
 	}
 
 	threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1304,6 +1332,16 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
+
+	/*
+	 * Set 'default' x86 topology, this matches default_topology() in that
+	 * it has NUMA nodes as a topology level. See also
+	 * native_smp_cpus_done().
+	 *
+	 * Must be done before set_cpus_sibling_map() is ran.
+	 */
+	set_sched_topology(x86_topology);
+
 	set_cpu_sibling_map(0);
 
 	switch (smp_sanity_check(max_cpus)) {
@@ -1370,6 +1408,9 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 {
 	pr_debug("Boot done\n");
 
+	if (x86_has_numa_in_package)
+		set_sched_topology(x86_numa_in_package_topology);
+
 	nmi_selftest();
 	impress_friends();
 	setup_ioapic_dest();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..e86c4a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6487,6 +6487,9 @@ static struct sched_domain_topology_level *sched_domain_topology =
 
 void set_sched_topology(struct sched_domain_topology_level *tl)
 {
+	if (WARN_ON_ONCE(sched_smp_initialized))
+		return;
+
 	sched_domain_topology = tl;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/8] sched: Extend scheduler's asym packing
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages Srinivas Pandruvada
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number.  This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 18 ++++++++++++++++++
 kernel/sched/fair.c   | 35 ++++++++++++++++++++++++-----------
 kernel/sched/sched.h  |  6 ++++++
 4 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index b0fa726..15d3f29 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1052,6 +1052,8 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
+int arch_asym_cpu_priority(int cpu);
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e86c4a5..08135ca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6237,7 +6237,25 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1, prev_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (prev_cpu < 0) {
+				prev_cpu = cpu;
+				max_cpu = cpu;
+			} else {
+				if (sched_asym_prefer(cpu, max_cpu))
+					max_cpu = cpu;
+			}
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5d558cc..a37a390 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -100,6 +100,16 @@ const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
  */
 unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
 
+#ifdef CONFIG_SMP
+/*
+ * For asym packing, by default the lower numbered cpu has higher priority.
+ */
+int __weak arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
@@ -6853,16 +6863,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(sds->busiest->asym_prefer_cpu,
+				      sg->asym_prefer_cpu))
 			return true;
 	}
 
@@ -7014,8 +7026,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = sds->busiest->asym_prefer_cpu;
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7356,10 +7368,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b7fc1ce..3f3d04a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -532,6 +532,11 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+}
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -884,6 +889,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 2/8] sched: Extend scheduler's asym packing Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-10 13:28   ` Thomas Gleixner
  2016-09-08 22:26 ` [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild Srinivas Pandruvada
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

We compute the the number of active packages during boot and
topology update.  Provide a function to export this info for
functions that need this topology info.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/include/asm/topology.h | 1 +
 arch/x86/kernel/smpboot.c       | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 98b669d..0bcf3b7 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -129,6 +129,7 @@ static inline int topology_max_smt_threads(void)
 }
 
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
+extern int topology_num_packages(void);
 extern int topology_phys_to_logical_pkg(unsigned int pkg);
 #else
 #define topology_max_packages()			(1)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 3782bd4..292df31 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -297,6 +297,11 @@ found:
 	return 0;
 }
 
+int topology_num_packages(void)
+{
+	return logical_packages;
+}
+
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (2 preceding siblings ...)
  2016-09-08 22:26 ` [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-10 16:20   ` Thomas Gleixner
  2016-09-08 22:26 ` [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

Provides x86 with arch_update_cpu_topology function. This function
allows us to indicate that a condition is detected that the sched
domain of x86 needs a complete rebuild.  This is done by setting the
x86_topology_update flag.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/include/asm/topology.h |  2 ++
 arch/x86/kernel/smpboot.c       | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 0bcf3b7..8d6df77 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -147,6 +147,8 @@ struct pci_bus;
 int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
+extern bool x86_topology_update;
+
 #ifdef CONFIG_SCHED_ITMT
 extern unsigned int __read_mostly sysctl_sched_itmt_enabled;
 #endif /* CONFIG_SCHED_ITMT */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 292df31..737b9edf 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -110,6 +110,17 @@ static bool logical_packages_frozen __read_mostly;
 int __max_smt_threads __read_mostly;
 
 unsigned int __read_mostly sysctl_sched_itmt_enabled;
+/* Flag to indicate if a complete sched domain rebuild is required */
+bool x86_topology_update;
+
+int arch_update_cpu_topology(void)
+{
+	if (x86_topology_update) {
+		x86_topology_update = false;
+		return 1;
+	} else
+		return 0;
+}
 
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (3 preceding siblings ...)
  2016-09-08 22:26 ` [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-10 16:21   ` Thomas Gleixner
  2016-09-08 22:26 ` [PATCH v3 6/8] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

On some Intel cores, they can boosted to a higher turbo frequency than
the other cores on the same die.  So we prefer processes to be run on
them vs other lower frequency ones for extra performance.

We extend the asym packing feature in the scheduler to support packing
task to the higher frequency core at the core sched domain level.

We set up a core priority metric to abstract the core preferences based
on the maximum boost frequency.  The priority is instantiated such that
the core with a higher priority is favored over the core with lower
priority when making scheduling decision using ASYM_PACKING.  The smt
threads that are of higher number are discounted in their priority so
we will not try to pack tasks onto all the threads of a favored core
before using other cpu cores.  The cpu that's of the highest priority
in a sched_group is recorded in sched_group->asym_prefer_cpu during
initialization to save lookup during load balancing.

A sysctl variable /proc/sys/kernel/sched_itmt_enabled is provided so
the scheduling based on favored core can be turned on or off at run time.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/Kconfig                |   9 +++
 arch/x86/include/asm/topology.h |  18 +++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 164 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |   1 -
 5 files changed, 192 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/itmt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce..6dfb97d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -927,6 +927,15 @@ config SCHED_MC
 	  making when dealing with multi-core CPU chips at a cost of slightly
 	  increased overhead in some places. If unsure say N here.
 
+config SCHED_ITMT
+	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
+	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+	---help---
+	  ITMT enabled scheduler support improves the CPU scheduler's decision
+	  to move tasks to cpu core that can be boosted to a higher frequency
+	  than others. It will have better performance at a cost of slightly
+	  increased overhead in task migrations. If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config UP_LATE_INIT
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 8d6df77..ac86a0b 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -150,7 +150,25 @@ void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 extern bool x86_topology_update;
 
 #ifdef CONFIG_SCHED_ITMT
+#include <asm/percpu.h>
+
+DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
 extern unsigned int __read_mostly sysctl_sched_itmt_enabled;
+
+/* Interface to set priority of a cpu */
+void sched_set_itmt_core_prio(int prio, int core_cpu);
+
+/* Interface to notify scheduler that system supports ITMT */
+void set_sched_itmt(bool support_itmt);
+
+#else /* CONFIG_SCHED_ITMT */
+
+static inline void set_sched_itmt(bool support_itmt)
+{
+}
+static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+}
 #endif /* CONFIG_SCHED_ITMT */
 
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0503f5b..2008335 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
+obj-$(CONFIG_SCHED_ITMT)		+= itmt.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
new file mode 100644
index 0000000..93f9316
--- /dev/null
+++ b/arch/x86/kernel/itmt.c
@@ -0,0 +1,164 @@
+/*
+ * itmt.c: Functions and data structures for enabling
+ *	   scheduler to favor scheduling on cores that
+ *	   can be boosted to a higher frequency using
+ *	   Intel Turbo Boost Max Technology 3.0
+ *
+ * (C) Copyright 2016 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/sched.h>
+#include <linux/cpumask.h>
+#include <linux/cpuset.h>
+#include <asm/mutex.h>
+#include <linux/sched.h>
+#include <linux/sysctl.h>
+#include <linux/nodemask.h>
+
+DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+static DEFINE_MUTEX(itmt_update_mutex);
+
+static unsigned int zero;
+static unsigned int one = 1;
+
+/*
+ * Boolean to control whether we want to move processes to cpu capable
+ * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
+ * Technology 3.0.
+ *
+ * It can be set via /proc/sys/kernel/sched_itmt_enabled
+ */
+unsigned int __read_mostly sysctl_sched_itmt_enabled;
+
+/*
+ * The pstate_driver calls set_sched_itmt to indicate if the system
+ * is ITMT capable.
+ */
+static bool __read_mostly sched_itmt_capable;
+
+int arch_asym_cpu_priority(int cpu)
+{
+	return per_cpu(sched_core_priority, cpu);
+}
+
+/* Called with itmt_update_mutex lock held */
+static void enable_sched_itmt(bool enable_itmt)
+{
+	sysctl_sched_itmt_enabled = enable_itmt;
+	x86_topology_update = true;
+	rebuild_sched_domains();
+}
+
+static int sched_itmt_update_handler(struct ctl_table *table, int write,
+			      void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret;
+
+	mutex_lock(&itmt_update_mutex);
+
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (ret || !write) {
+		mutex_unlock(&itmt_update_mutex);
+		return ret;
+	}
+
+	enable_sched_itmt(sysctl_sched_itmt_enabled);
+
+	mutex_unlock(&itmt_update_mutex);
+
+	return ret;
+}
+
+static struct ctl_table itmt_kern_table[] = {
+	{
+		.procname	= "sched_itmt_enabled",
+		.data		= &sysctl_sched_itmt_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_itmt_update_handler,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{}
+};
+
+static struct ctl_table itmt_root_table[] = {
+	{
+		.procname	= "kernel",
+		.mode		= 0555,
+		.child		= itmt_kern_table,
+	},
+	{}
+};
+
+static struct ctl_table_header *itmt_sysctl_header;
+
+/*
+ * The boot code will find out the max boost frequency
+ * and call this function to set a priority proportional
+ * to the max boost frequency. CPU with higher boost
+ * frequency will receive higher priority.
+ */
+void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+	int cpu, i = 1;
+
+	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
+		int smt_prio;
+
+		/*
+		 * Discount the priority of sibling so that we don't
+		 * pack all loads to the same core before using other cores.
+		 */
+		smt_prio = prio * smp_num_siblings / i;
+		i++;
+		per_cpu(sched_core_priority, cpu) = smt_prio;
+	}
+}
+
+/*
+ * During boot up, boot code will detect if the system
+ * is ITMT capable and call set_sched_itmt.
+ *
+ * This should be called after sched_set_itmt_core_prio
+ * has been called to set the cpus' priorities.
+ *
+ * This function should be called without cpu hot plug lock
+ * as we need to acquire the lock to rebuild sched domains
+ * later.
+ */
+void set_sched_itmt(bool itmt_capable)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	if (itmt_capable != sched_itmt_capable) {
+
+		if (itmt_capable) {
+			itmt_sysctl_header =
+				register_sysctl_table(itmt_root_table);
+			/*
+			 * ITMT capability automatically enables ITMT
+			 * scheduling for client systems (single node).
+			 */
+			if (topology_num_packages() == 1)
+				sysctl_sched_itmt_enabled = 1;
+		} else {
+			if (itmt_sysctl_header)
+				unregister_sysctl_table(itmt_sysctl_header);
+			sysctl_sched_itmt_enabled = 0;
+		}
+
+		sched_itmt_capable = itmt_capable;
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
+	mutex_unlock(&itmt_update_mutex);
+}
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 737b9edf..17f3ac7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -109,7 +109,6 @@ static bool logical_packages_frozen __read_mostly;
 /* Maximum number of SMT threads on any online core */
 int __max_smt_threads __read_mostly;
 
-unsigned int __read_mostly sysctl_sched_itmt_enabled;
 /* Flag to indicate if a complete sched domain rebuild is required */
 bool x86_topology_update;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/8] acpi: bus: Enable HWP CPPC objects
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (4 preceding siblings ...)
  2016-09-08 22:26 ` [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 7/8] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
  7 siblings, 0 replies; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposess CPPC tables.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 85b7d07..61643a5 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -330,6 +330,13 @@ static void acpi_bus_osc_support(void)
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_OST_SUPPORT;
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_PCLPI_SUPPORT;
 
+#ifdef CONFIG_X86
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT;
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
+	}
+#endif
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 7/8] acpi: bus: Set _OSC for diverse core support
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (5 preceding siblings ...)
  2016-09-08 22:26 ` [PATCH v3 6/8] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-08 22:26 ` [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
  7 siblings, 0 replies; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/bus.c   | 3 +++
 include/linux/acpi.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 61643a5..8ab6ec2 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -337,6 +337,9 @@ static void acpi_bus_osc_support(void)
 	}
 #endif
 
+	if (IS_ENABLED(CONFIG_SCHED_ITMT))
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index e746552..53841a2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -462,6 +462,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
 #define OSC_SB_CPCV2_SUPPORT			0x00000040
 #define OSC_SB_PCLPI_SUPPORT			0x00000080
 #define OSC_SB_OSLPI_SUPPORT			0x00000100
+#define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT		0x00001000
 
 extern bool osc_sb_apei_support_acked;
 extern bool osc_pc_lpi_support_confirmed;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (6 preceding siblings ...)
  2016-09-08 22:26 ` [PATCH v3 7/8] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
@ 2016-09-08 22:26 ` Srinivas Pandruvada
  2016-09-10 16:22   ` Thomas Gleixner
  7 siblings, 1 reply; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-08 22:26 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	Srinivas Pandruvada

This change uses acpi cppc_lib interface to get CPPC performance limits.
Once CPPC limits of all online cores are read, first check if there is
difference in max performance. If there is a difference, then the
scheduler interface is called to update per cpu priority. After updating
priority of all current cpus, the itmt feature is enabled.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/Kconfig.x86    |  1 +
 drivers/cpufreq/intel_pstate.c | 75 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index adbd1de..3328c6b 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -6,6 +6,7 @@ config X86_INTEL_PSTATE
        bool "Intel P state control"
        depends on X86
        select ACPI_PROCESSOR if ACPI
+       select ACPI_CPPC_LIB if X86_64 && ACPI
        help
           This driver provides a P state for Intel core processors.
 	  The driver implements an internal governor and will become
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index bdbe936..a0bf244 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -44,6 +44,7 @@
 
 #ifdef CONFIG_ACPI
 #include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
 #endif
 
 #define FRAC_BITS 8
@@ -193,6 +194,8 @@ struct _pid {
  * @sample:		Storage for storing last Sample data
  * @acpi_perf_data:	Stores ACPI perf information read from _PSS
  * @valid_pss_table:	Set to true for valid ACPI _PSS entries found
+ * @cppc_data:		Stores CPPC information for HWP capable CPUs
+ * @valid_cppc_table:	Set to true for valid CPPC entries are found
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -215,6 +218,8 @@ struct cpudata {
 #ifdef CONFIG_ACPI
 	struct acpi_processor_performance acpi_perf_data;
 	bool valid_pss_table;
+	struct cppc_cpudata *cppc_data;
+	bool valid_cppc_table;
 #endif
 };
 
@@ -361,6 +366,15 @@ static struct perf_limits *limits = &powersave_limits;
 #endif
 
 #ifdef CONFIG_ACPI
+static cpumask_t cppc_rd_cpu_mask;
+
+/* Call set_sched_itmt from a work function to be able to use hotplug locks */
+static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
+{
+	set_sched_itmt(true);
+}
+
+static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
 
 static bool intel_pstate_get_ppc_enable_status(void)
 {
@@ -377,14 +391,63 @@ static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
 	int ret;
 	int i;
 
-	if (hwp_active)
+	cpu = all_cpu_data[policy->cpu];
+
+	if (hwp_active) {
+		struct cppc_perf_caps *perf_caps;
+
+		cpu->cppc_data = kzalloc(sizeof(struct cppc_cpudata),
+					 GFP_KERNEL);
+		if (!cpu->cppc_data)
+			return;
+
+		perf_caps = &cpu->cppc_data->perf_caps;
+		ret = cppc_get_perf_caps(policy->cpu, perf_caps);
+		if (ret) {
+			kfree(cpu->cppc_data);
+			return;
+		}
+
+		cpu->valid_cppc_table = true;
+		pr_debug("cpu:%d H:0x%x N:0x%x L:0x%x\n", policy->cpu,
+			 perf_caps->highest_perf, perf_caps->nominal_perf,
+			 perf_caps->lowest_perf);
+
+		cpumask_set_cpu(policy->cpu, &cppc_rd_cpu_mask);
+		if (cpumask_subset(topology_core_cpumask(policy->cpu),
+				   &cppc_rd_cpu_mask)) {
+			int cpu_index;
+			int max_prio;
+			bool itmt_support = false;
+
+			cpu = all_cpu_data[0];
+			max_prio = cpu->cppc_data->perf_caps.highest_perf;
+			for_each_cpu(cpu_index, &cppc_rd_cpu_mask) {
+				cpu = all_cpu_data[cpu_index];
+				perf_caps = &cpu->cppc_data->perf_caps;
+				if (max_prio != perf_caps->highest_perf) {
+					itmt_support = true;
+					break;
+				}
+			}
+
+			if (!itmt_support)
+				return;
+
+			for_each_cpu(cpu_index, &cppc_rd_cpu_mask) {
+				cpu = all_cpu_data[cpu_index];
+				perf_caps = &cpu->cppc_data->perf_caps;
+				sched_set_itmt_core_prio(
+					perf_caps->highest_perf, cpu_index);
+			}
+			schedule_work(&sched_itmt_work);
+		}
 		return;
+	}
 
 	if (!intel_pstate_get_ppc_enable_status())
 		return;
 
-	cpu = all_cpu_data[policy->cpu];
-
 	ret = acpi_processor_register_performance(&cpu->acpi_perf_data,
 						  policy->cpu);
 	if (ret)
@@ -444,6 +507,12 @@ static void intel_pstate_exit_perf_limits(struct cpufreq_policy *policy)
 	struct cpudata *cpu;
 
 	cpu = all_cpu_data[policy->cpu];
+
+	if (cpu->valid_cppc_table) {
+		cpumask_clear_cpu(policy->cpu, &cppc_rd_cpu_mask);
+		kfree(cpu->cppc_data);
+	}
+
 	if (!cpu->valid_pss_table)
 		return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-08 22:26 ` [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT Srinivas Pandruvada
@ 2016-09-10 13:10   ` Thomas Gleixner
  2016-09-12 20:50     ` Tim Chen
  2016-09-10 13:13   ` Thomas Gleixner
  1 sibling, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-10 13:10 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> We uses ASYM_PACKING feature in the scheduler to move tasks to more
> capable cpus that can be boosted to higher frequency. This is enabled by
> Intel Turbo Boost Max Technology 3.0 (ITMT).  We mark the sched domain
> topology level with SD_ASYM_PACKING flag for such systems to indicate
> scheduler can use the ASYM_PACKING feature to move load to the
> more capable cpus.

Sigh. This changelog does not tell anything about the nature of the patch,
the rationale for it etc. It's just a meaningless blurb.


> +unsigned int __read_mostly sysctl_sched_itmt_enabled;
> +
>  static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
>  {
>  	unsigned long flags;
> @@ -471,31 +473,57 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>  	return false;
>  }
>  
> -static struct sched_domain_topology_level numa_inside_package_topology[] = {
> +#ifdef CONFIG_SCHED_ITMT
> +static int x86_core_flags(void)
> +{
> +	int flags = cpu_core_flags();
> +
> +	if (sysctl_sched_itmt_enabled)
> +		flags |= SD_ASYM_PACKING;
> +
> +	return flags;
> +}
> +
> +static int x86_smt_flags(void)
> +{
> +	int flags = cpu_smt_flags();
> +
> +	if (sysctl_sched_itmt_enabled)
> +		flags |= SD_ASYM_PACKING;
> +
> +	return flags;
> +}
> +#else
> +#define x86_core_flags cpu_core_flags
> +#define x86_smt_flags cpu_smt_flags
> +#endif

No. We first rework the code so that the IMT stuff can be added in a later
patch easily.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-08 22:26 ` [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT Srinivas Pandruvada
  2016-09-10 13:10   ` Thomas Gleixner
@ 2016-09-10 13:13   ` Thomas Gleixner
  1 sibling, 0 replies; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-10 13:13 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
>  
>  void set_sched_topology(struct sched_domain_topology_level *tl)
>  {
> +	if (WARN_ON_ONCE(sched_smp_initialized))
> +		return;
> +
>  	sched_domain_topology = tl;
>  }

This change has nothing to do with $subject. Seperate patch please.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages
  2016-09-08 22:26 ` [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages Srinivas Pandruvada
@ 2016-09-10 13:28   ` Thomas Gleixner
  2016-09-12 20:51     ` Tim Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-10 13:28 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

B1;2802;0cOn Thu, 8 Sep 2016, Srinivas Pandruvada wrote:

> From: Tim Chen <tim.c.chen@linux.intel.com>

$subject: x86, cpu: provide a function topology_num_packages  to enumerate #packages

- we switched to the prefix scheme 'x86/subsys'. Please use this.

- this is not related to x86/cpu. x86/topology is the proper prefix.

- Sentence after ':' starts with an uppercase letter.

- please make the subject line short and descriptive. 

  x86/topology: Provide topology_num_packages()

  is completely sufficient, because it's entirely clear that it is a
  function and the function name is self explaining.

> We compute the the number of active packages during boot and
> topology update.

We? We do not do anything..... and how is that information useful for the
reader?

> Provide a function to export this info for functions that need this
> topology info.

Well, it's obvious that a new function is going to be used by something
which needs it.

In changelogs/comments there is only one thing worse than superflous
informatioin: wrong information.

If you have nothing to say, then omit it instead of forcing the reader to
parse incoherent blurbs for nothing.

>  int topology_update_package_map(unsigned int apicid, unsigned int cpu);
> +extern int topology_num_packages(void);
>  extern int topology_phys_to_logical_pkg(unsigned int pkg);
>  #else
>  #define topology_max_packages()			(1)

stub function for the !SMP case is missing....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild
  2016-09-08 22:26 ` [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild Srinivas Pandruvada
@ 2016-09-10 16:20   ` Thomas Gleixner
  2016-09-12 21:00     ` Tim Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-10 16:20 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:

How is this related to sched? And please stop writing lengthy sentences in
the subject line.

> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> Provides x86 with arch_update_cpu_topology function. This function
> allows us to indicate that a condition is detected that the sched
> domain of x86 needs a complete rebuild.

scheduler domains are not x86 specific .... 

> This is done by setting the x86_topology_update flag.

So without reading the patch I expect that the function sets the
x86_topology_update flag. Crap.

What you really want to say is:

  The scheduler calls arch_update_cpu_topology() to check whether the
  scheduler domains have to be rebuilt.

  So far x86 has no requirement for this, but the upcoming IMTI support
  makes this necessary.

  Request the rebuild when the x86 internal update flag is set.

Or something along these lines. Changelog is a important part of a patch,
really..

> +/* Flag to indicate if a complete sched domain rebuild is required */
> +bool x86_topology_update;
> +
> +int arch_update_cpu_topology(void)
> +{
> +	if (x86_topology_update) {
> +		x86_topology_update = false;
> +		return 1;
> +	} else
> +		return 0;

That lacks braces around the else path, but why aren't you just doing the
obvious:

	if (!x86_topology_update)
		return false;
	
	x86_topology_update = false;
	return true;

That would be too simple to read, right?;

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology
  2016-09-08 22:26 ` [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
@ 2016-09-10 16:21   ` Thomas Gleixner
  2016-09-12 21:34     ` Tim Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-10 16:21 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> +
> +DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
> +static DEFINE_MUTEX(itmt_update_mutex);
> +
> +static unsigned int zero;
> +static unsigned int one = 1;

Please stick that close to the ctl_table so it gets obvious why we need
this. My first reaction to this was: WTF!

> +/*
> + * Boolean to control whether we want to move processes to cpu capable
> + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
> + * Technology 3.0.
> + *
> + * It can be set via /proc/sys/kernel/sched_itmt_enabled
> + */
> +unsigned int __read_mostly sysctl_sched_itmt_enabled;

Please put the sysctl into a seperate patch and document the sysctl at the
proper place.

> +
> +/*
> + * The pstate_driver calls set_sched_itmt to indicate if the system
> + * is ITMT capable.
> + */
> +static bool __read_mostly sched_itmt_capable;
> +
> +int arch_asym_cpu_priority(int cpu)
> +{
> +	return per_cpu(sched_core_priority, cpu);
> +}
> +

The ordering or your functions is completely random and makes not sense
whatsoever.

> +/* Called with itmt_update_mutex lock held */
> +static void enable_sched_itmt(bool enable_itmt)
> +{
> +	sysctl_sched_itmt_enabled = enable_itmt;
> +	x86_topology_update = true;
> +	rebuild_sched_domains();

So you rebuild that even if the state of the sysctl did not change,

> +}
> +
> +static int sched_itmt_update_handler(struct ctl_table *table, int write,
> +			      void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	int ret;
> +
> +	mutex_lock(&itmt_update_mutex);
> +
> +	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +
> +	if (ret || !write) {
> +		mutex_unlock(&itmt_update_mutex);
> +		return ret;
> +	}
> +
> +	enable_sched_itmt(sysctl_sched_itmt_enabled);

So you hand in sysctl_sched_itmt_enabled so that enable_sched_itmt() can
store that value in sysctl_sched_itmt_enabled. Brilliant stuff that.

> +static struct ctl_table_header *itmt_sysctl_header;
> +
> +/*
> + * The boot code will find out the max boost frequency
> + * and call this function to set a priority proportional
> + * to the max boost frequency. CPU with higher boost
> + * frequency will receive higher priority.
> + */
> +void sched_set_itmt_core_prio(int prio, int core_cpu)
> +{
> +	int cpu, i = 1;
> +
> +	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
> +		int smt_prio;
> +
> +		/*
> +		 * Discount the priority of sibling so that we don't
> +		 * pack all loads to the same core before using other cores.

-EMAKESNOSENSE

Is this a winter or summer sale where we get a discount? 10% or 50% or what?

The math here is:

      smt_prio = prio * num_siblings / sibling_idx;

The purpose of this is to ensure that the siblings are moved to the end of
the priority chain and therefor only used when all other high priority
cpus are out of capacity.

> +		 */
> +		smt_prio = prio * smp_num_siblings / i;
> +		i++;
> +		per_cpu(sched_core_priority, cpu) = smt_prio;
> +	}
> +}
> +
> +/*
> + * During boot up, boot code will detect if the system
> + * is ITMT capable and call set_sched_itmt.

Groan. In the comment above the declaration of sched_itmt_capable you
write:

> + * The pstate_driver calls set_sched_itmt to indicate if the system
> + * is ITMT capable.

So what is calling this? Boot code or pstate driver or something else?

> + *
> + * This should be called after sched_set_itmt_core_prio
> + * has been called to set the cpus' priorities.

should? So the call order is optional and this is just a recommendation.

> + *
> + * This function should be called without cpu hot plug lock
> + * as we need to acquire the lock to rebuild sched domains
> + * later.

Ditto. Must not be called with cpu hotplug lock held.

> + */
> +void set_sched_itmt(bool itmt_capable)

This function can be called more than once and it can be called to
anable and disable the feature, right?

So again: instead of incoherent comments above the function describe the
calling conventions and what the function does proper.

> +{
> +	mutex_lock(&itmt_update_mutex);
> +
> +	if (itmt_capable != sched_itmt_capable) {

If you use a goto out_unlock here, then you spare one indentation level and
the annoying line breaks.

> +
> +		if (itmt_capable) {
> +			itmt_sysctl_header =
> +				register_sysctl_table(itmt_root_table);

What if this fails? Error handling is overrated ...

> +			/*
> +			 * ITMT capability automatically enables ITMT
> +			 * scheduling for client systems (single node).

Why? Just because it can and just because something thinks that it's a good
idea? There are server systems with single node as well....

> +			 */
> +			if (topology_num_packages() == 1)
> +				sysctl_sched_itmt_enabled = 1;
> +		} else {
> +			if (itmt_sysctl_header)
> +				unregister_sysctl_table(itmt_sysctl_header);

And what clears imt_sysctl_header ?

> +			sysctl_sched_itmt_enabled = 0;
> +		}
> +
> +		sched_itmt_capable = itmt_capable;
> +		x86_topology_update = true;
> +		rebuild_sched_domains();

So another place where you call that unconditionally. If the sysctl is
disabled then this is completely pointless.

What about a single function which is called from the sysctl write and from
this code, which does the update and rebuild only if something has changed?

> +	}
> +
> +	mutex_unlock(&itmt_update_mutex);
> +}
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 737b9edf..17f3ac7 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -109,7 +109,6 @@ static bool logical_packages_frozen __read_mostly;
>  /* Maximum number of SMT threads on any online core */
>  int __max_smt_threads __read_mostly;
>  
> -unsigned int __read_mostly sysctl_sched_itmt_enabled;

So if you wouldn't have added this in the first place then you would not
have to remove it.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-08 22:26 ` [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
@ 2016-09-10 16:22   ` Thomas Gleixner
  2016-09-12 22:34     ` Srinivas Pandruvada
  0 siblings, 1 reply; 23+ messages in thread
From: Thomas Gleixner @ 2016-09-10 16:22 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> +
> +/* Call set_sched_itmt from a work function to be able to use hotplug locks */

And why can't the function below not do that? Is the reader required to
figure that out himslf?

> +static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
> +{
> +	set_sched_itmt(true);
> +}
> +
> +static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
>  
>  static bool intel_pstate_get_ppc_enable_status(void)
>  {
> @@ -377,14 +391,63 @@ static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
>  	int ret;
>  	int i;
>  
> -	if (hwp_active)
> +	cpu = all_cpu_data[policy->cpu];
> +
> +	if (hwp_active) {

Please split this out into a seperate function.

> +		struct cppc_perf_caps *perf_caps;
> +
> +		cpu->cppc_data = kzalloc(sizeof(struct cppc_cpudata),
> +					 GFP_KERNEL);
> +		if (!cpu->cppc_data)
> +			return;
> +
> +		perf_caps = &cpu->cppc_data->perf_caps;
> +		ret = cppc_get_perf_caps(policy->cpu, perf_caps);
> +		if (ret) {
> +			kfree(cpu->cppc_data);
> +			return;
> +		}
> +
> +		cpu->valid_cppc_table = true;
> +		pr_debug("cpu:%d H:0x%x N:0x%x L:0x%x\n", policy->cpu,
> +			 perf_caps->highest_perf, perf_caps->nominal_perf,
> +			 perf_caps->lowest_perf);
> +
> +		cpumask_set_cpu(policy->cpu, &cppc_rd_cpu_mask);
> +		if (cpumask_subset(topology_core_cpumask(policy->cpu),
> +				   &cppc_rd_cpu_mask)) {
> +			int cpu_index;
> +			int max_prio;
> +			bool itmt_support = false;
> +
> +			cpu = all_cpu_data[0];
> +			max_prio = cpu->cppc_data->perf_caps.highest_perf;
> +			for_each_cpu(cpu_index, &cppc_rd_cpu_mask) {
> +				cpu = all_cpu_data[cpu_index];
> +				perf_caps = &cpu->cppc_data->perf_caps;
> +				if (max_prio != perf_caps->highest_perf) {
> +					itmt_support = true;
> +					break;
> +				}
> +			}

20 lines of magic logic w/o a userful comment. Darn, you yourself will have
forgotten the magic within 3 month ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-10 13:10   ` Thomas Gleixner
@ 2016-09-12 20:50     ` Tim Chen
  2016-09-13  7:26       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Tim Chen @ 2016-09-12 20:50 UTC (permalink / raw)
  To: Thomas Gleixner, Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz

On Sat, 2016-09-10 at 15:10 +0200, Thomas Gleixner wrote:
> On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> > 
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > We uses ASYM_PACKING feature in the scheduler to move tasks to more
> > capable cpus that can be boosted to higher frequency. This is enabled by
> > Intel Turbo Boost Max Technology 3.0 (ITMT).  We mark the sched domain
> > topology level with SD_ASYM_PACKING flag for such systems to indicate
> > scheduler can use the ASYM_PACKING feature to move load to the
> > more capable cpus.
> Sigh. This changelog does not tell anything about the nature of the patch,
> the rationale for it etc. It's just a meaningless blurb.

Okay, I can add more details about ITMT.  Will also be clearer if
the ITMT patch (patch 5) comes before this one.



> > +}
> > +#else
> > +#define x86_core_flags cpu_core_flags
> > +#define x86_smt_flags cpu_smt_flags
> > +#endif
> No. We first rework the code so that the IMT stuff can be added in a later
> patch easily.

I'll move this patch to come after current patch 5.

Thanks.

Tim

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages
  2016-09-10 13:28   ` Thomas Gleixner
@ 2016-09-12 20:51     ` Tim Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Tim Chen @ 2016-09-12 20:51 UTC (permalink / raw)
  To: Thomas Gleixner, Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz

On Sat, 2016-09-10 at 15:28 +0200, Thomas Gleixner wrote:
> B1;2802;0cOn Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> 
> > 
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> $subject: x86, cpu: provide a function topology_num_packages  to enumerate #packages
> 
> - we switched to the prefix scheme 'x86/subsys'. Please use this.
> 
> - this is not related to x86/cpu. x86/topology is the proper prefix.
> 
> - Sentence after ':' starts with an uppercase letter.
> 
> - please make the subject line short and descriptive. 
> 
>   x86/topology: Provide topology_num_packages()
> 
>   is completely sufficient, because it's entirely clear that it is a
>   function and the function name is self explaining.

Will do.

> 
> > 
> > We compute the the number of active packages during boot and
> > topology update.
> We? We do not do anything..... and how is that information useful for the
> reader?
> 
> > 
> > Provide a function to export this info for functions that need this
> > topology info.
> Well, it's obvious that a new function is going to be used by something
> which needs it.
> 
> In changelogs/comments there is only one thing worse than superflous
> informatioin: wrong information.
> 
> If you have nothing to say, then omit it instead of forcing the reader to
> parse incoherent blurbs for nothing.
> 
> > 
> >  int topology_update_package_map(unsigned int apicid, unsigned int cpu);
> > +extern int topology_num_packages(void);
> >  extern int topology_phys_to_logical_pkg(unsigned int pkg);
> >  #else
> >  #define topology_max_packages()			(1)
> stub function for the !SMP case is missing....
> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild
  2016-09-10 16:20   ` Thomas Gleixner
@ 2016-09-12 21:00     ` Tim Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Tim Chen @ 2016-09-12 21:00 UTC (permalink / raw)
  To: Thomas Gleixner, Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz

On Sat, 2016-09-10 at 18:20 +0200, Thomas Gleixner wrote:
> On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> 
> How is this related to sched? And please stop writing lengthy sentences in
> the subject line.


> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > Provides x86 with arch_update_cpu_topology function. This function
> > allows us to indicate that a condition is detected that the sched
> > domain of x86 needs a complete rebuild.
> scheduler domains are not x86 specific .... 


> > 
> > This is done by setting the x86_topology_update flag.
> So without reading the patch I expect that the function sets the
> x86_topology_update flag. Crap.
> 
> What you really want to say is:
> 
>   The scheduler calls arch_update_cpu_topology() to check whether the
>   scheduler domains have to be rebuilt.
> 
>   So far x86 has no requirement for this, but the upcoming IMTI support
>   makes this necessary.
> 
>   Request the rebuild when the x86 internal update flag is set.
> 
> Or something along these lines. Changelog is a important part of a patch,
> really..

Sure.  Will do.

> 
> > 
> > +/* Flag to indicate if a complete sched domain rebuild is required */
> > +bool x86_topology_update;
> > +
> > +int arch_update_cpu_topology(void)
> > +{
> > +	if (x86_topology_update) {
> > +		x86_topology_update = false;
> > +		return 1;
> > +	} else
> > +		return 0;
> That lacks braces around the else path, but why aren't you just doing the
> obvious:
> 
> 	if (!x86_topology_update)
> 		return false;
> 	
> 	x86_topology_update = false;
> 	return true;
> 
> That would be too simple to read, right?;

Sure.  Will do.


> 
> Thanks,
> 
> 	tglx

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology
  2016-09-10 16:21   ` Thomas Gleixner
@ 2016-09-12 21:34     ` Tim Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Tim Chen @ 2016-09-12 21:34 UTC (permalink / raw)
  To: Thomas Gleixner, Srinivas Pandruvada
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz

On Sat, 2016-09-10 at 18:21 +0200, Thomas Gleixner wrote:
> On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:
> > 
> > +
> > +DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
> > +static DEFINE_MUTEX(itmt_update_mutex);
> > +
> > +static unsigned int zero;
> > +static unsigned int one = 1;
> Please stick that close to the ctl_table so it gets obvious why we need
> this. My first reaction to this was: WTF!

Will do.

> 
> > 
> > +/*
> > + * Boolean to control whether we want to move processes to cpu capable
> > + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
> > + * Technology 3.0.
> > + *
> > + * It can be set via /proc/sys/kernel/sched_itmt_enabled
> > + */
> > +unsigned int __read_mostly sysctl_sched_itmt_enabled;
> Please put the sysctl into a seperate patch and document the sysctl at the
> proper place.
> 

Okay, I can move the sysctl related code to a separate patch.

> > 
> > +
> > +/*
> > + * The pstate_driver calls set_sched_itmt to indicate if the system
> > + * is ITMT capable.
> > + */
> > +static bool __read_mostly sched_itmt_capable;
> > +
> > +int arch_asym_cpu_priority(int cpu)
> > +{
> > +	return per_cpu(sched_core_priority, cpu);
> > +}
> > +
> The ordering or your functions is completely random and makes not sense
> whatsoever.

Okay, I'll put it at a more logical place.

> 
> > 
> > +/* Called with itmt_update_mutex lock held */
> > +static void enable_sched_itmt(bool enable_itmt)
> > +{
> > +	sysctl_sched_itmt_enabled = enable_itmt;
> > +	x86_topology_update = true;
> > +	rebuild_sched_domains();
> So you rebuild that even if the state of the sysctl did not change,

Will check for change in updated patch.

> 
> > 
> > +}
> > +
> > +static int sched_itmt_update_handler(struct ctl_table *table, int write,
> > +			      void __user *buffer, size_t *lenp, loff_t *ppos)
> > +{
> > +	int ret;
> > +
> > +	mutex_lock(&itmt_update_mutex);
> > +
> > +	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > +
> > +	if (ret || !write) {
> > +		mutex_unlock(&itmt_update_mutex);
> > +		return ret;
> > +	}
> > +
> > +	enable_sched_itmt(sysctl_sched_itmt_enabled);
> So you hand in sysctl_sched_itmt_enabled so that enable_sched_itmt() can
> store that value in sysctl_sched_itmt_enabled. Brilliant stuff that.

Will merge enable_sched_itmt into sched_itmt_update_handler.

> 
> > 
> > +static struct ctl_table_header *itmt_sysctl_header;
> > +
> > +/*
> > + * The boot code will find out the max boost frequency
> > + * and call this function to set a priority proportional
> > + * to the max boost frequency. CPU with higher boost
> > + * frequency will receive higher priority.
> > + */
> > +void sched_set_itmt_core_prio(int prio, int core_cpu)
> > +{
> > +	int cpu, i = 1;
> > +
> > +	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
> > +		int smt_prio;
> > +
> > +		/*
> > +		 * Discount the priority of sibling so that we don't
> > +		 * pack all loads to the same core before using other cores.
> -EMAKESNOSENSE
> 
> Is this a winter or summer sale where we get a discount? 10% or 50% or what?
> 
> The math here is:
> 
>       smt_prio = prio * num_siblings / sibling_idx;
> 
> The purpose of this is to ensure that the siblings are moved to the end of
> the priority chain and therefor only used when all other high priority
> cpus are out of capacity.

Will update comment here.

> 
> > 
> > +		 */
> > +		smt_prio = prio * smp_num_siblings / i;
> > +		i++;
> > +		per_cpu(sched_core_priority, cpu) = smt_prio;
> > +	}
> > +}
> > +
> > +/*
> > + * During boot up, boot code will detect if the system
> > + * is ITMT capable and call set_sched_itmt.
> Groan. In the comment above the declaration of sched_itmt_capable you
> write:
> 
> > 
> > + * The pstate_driver calls set_sched_itmt to indicate if the system
> > + * is ITMT capable.
> So what is calling this? Boot code or pstate driver or something else?

Will just say pstate driver.

> 
> > 
> > + *
> > + * This should be called after sched_set_itmt_core_prio
> > + * has been called to set the cpus' priorities.
> should? So the call order is optional and this is just a recommendation.
> 
> > 
> > + *
> > + * This function should be called without cpu hot plug lock
> > + * as we need to acquire the lock to rebuild sched domains
> > + * later.
> Ditto. Must not be called with cpu hotplug lock held.

Okay.

> 
> > 
> > + */
> > +void set_sched_itmt(bool itmt_capable)
> This function can be called more than once and it can be called to
> anable and disable the feature, right?

> So again: instead of incoherent comments above the function describe the
> calling conventions and what the function does proper.

Will add that.

> 
> > 
> > +{
> > +	mutex_lock(&itmt_update_mutex);
> > +
> > +	if (itmt_capable != sched_itmt_capable) {
> If you use a goto out_unlock here, then you spare one indentation level and
> the annoying line breaks.

Will do.

> 
> > 
> > +
> > +		if (itmt_capable) {
> > +			itmt_sysctl_header =
> > +				register_sysctl_table(itmt_root_table);
> What if this fails? Error handling is overrated ...

Will add code to disable itmt totally then if we can register
sysctl table.

> 
> > 
> > +			/*
> > +			 * ITMT capability automatically enables ITMT
> > +			 * scheduling for client systems (single node).
> Why? Just because it can and just because something thinks that it's a good
> idea? There are server systems with single node as well....

Using the single node as a decision criteria is a compromise
on guessing what the usage scenario is likely to be.  For lightly loaded
client system, this feature will help to boost performance.  But on very busy
server system, it is unlikely to operate in turbo mode for this feature
to help.  So in a previous discussion thread with Ingo and Peter Z, we decided
that we will use the number of node as a criteria to turn this feature
on by default.

> 
> > 
> > +			 */
> > +			if (topology_num_packages() == 1)
> > +				sysctl_sched_itmt_enabled = 1;
> > +		} else {
> > +			if (itmt_sysctl_header)
> > +				unregister_sysctl_table(itmt_sysctl_header);
> And what clears imt_sysctl_header ?
> 
> > 
> > +			sysctl_sched_itmt_enabled = 0;
> > +		}
> > +
> > +		sched_itmt_capable = itmt_capable;
> > +		x86_topology_update = true;
> > +		rebuild_sched_domains();
> So another place where you call that unconditionally. If the sysctl is
> disabled then this is completely pointless.

Okay, will only rebuild sched domains here when sysctl_sched_itmt_enabled is true.

> 
> What about a single function which is called from the sysctl write and from
> this code, which does the update and rebuild only if something has changed?

Sure.

> 
> > 
> > +	}
> > +
> > +	mutex_unlock(&itmt_update_mutex);
> > +}
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 737b9edf..17f3ac7 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -109,7 +109,6 @@ static bool logical_packages_frozen __read_mostly;
> >  /* Maximum number of SMT threads on any online core */
> >  int __max_smt_threads __read_mostly;
> >  
> > -unsigned int __read_mostly sysctl_sched_itmt_enabled;
> So if you wouldn't have added this in the first place then you would not
> have to remove it.
> 
> Thanks,
> 
> 	tglx
> 
> 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-10 16:22   ` Thomas Gleixner
@ 2016-09-12 22:34     ` Srinivas Pandruvada
  0 siblings, 0 replies; 23+ messages in thread
From: Srinivas Pandruvada @ 2016-09-12 22:34 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi, peterz,
	tim.c.chen

On Sat, 2016-09-10 at 18:22 +0200, Thomas Gleixner wrote:
> On Thu, 8 Sep 2016, Srinivas Pandruvada wrote:

[...]

> 
> 20 lines of magic logic w/o a userful comment. Darn, you yourself
> will have
> forgotten the magic within 3 month ....
> 
Valid points. Will take care in the next patchset.

Thanks,
Srinivas

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-12 20:50     ` Tim Chen
@ 2016-09-13  7:26       ` Peter Zijlstra
  2016-09-19  9:11         ` Jiri Olsa
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2016-09-13  7:26 UTC (permalink / raw)
  To: Tim Chen
  Cc: Thomas Gleixner, Srinivas Pandruvada, rjw, mingo, bp, x86,
	linux-pm, linux-kernel, linux-acpi

On Mon, Sep 12, 2016 at 01:50:15PM -0700, Tim Chen wrote:
> > No. We first rework the code so that the IMT stuff can be added in a later
> > patch easily.
> 
> I'll move this patch to come after current patch 5.

Split the patch, one doing the topo cleanup, one adding the
x86_*_flags().

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-13  7:26       ` Peter Zijlstra
@ 2016-09-19  9:11         ` Jiri Olsa
  2016-09-19 16:01           ` Tim Chen
  0 siblings, 1 reply; 23+ messages in thread
From: Jiri Olsa @ 2016-09-19  9:11 UTC (permalink / raw)
  To: Tim Chen, Peter Zijlstra, Thomas Gleixner, Srinivas Pandruvada,
	rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi

On Tue, Sep 13, 2016 at 09:26:13AM +0200, Peter Zijlstra wrote:
> On Mon, Sep 12, 2016 at 01:50:15PM -0700, Tim Chen wrote:
> > > No. We first rework the code so that the IMT stuff can be added in a later
> > > patch easily.
> > 
> > I'll move this patch to come after current patch 5.
> 
> Split the patch, one doing the topo cleanup, one adding the
> x86_*_flags().

hi,
could you please CC me on the next version,
I'm interested in that cleanup patch..

thanks a lot,
jirka

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT
  2016-09-19  9:11         ` Jiri Olsa
@ 2016-09-19 16:01           ` Tim Chen
  0 siblings, 0 replies; 23+ messages in thread
From: Tim Chen @ 2016-09-19 16:01 UTC (permalink / raw)
  To: Jiri Olsa, Peter Zijlstra, Thomas Gleixner, Srinivas Pandruvada,
	rjw, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi

On Mon, 2016-09-19 at 11:11 +0200, Jiri Olsa wrote:
> On Tue, Sep 13, 2016 at 09:26:13AM +0200, Peter Zijlstra wrote:
> > 
> > On Mon, Sep 12, 2016 at 01:50:15PM -0700, Tim Chen wrote:
> > > 
> > > > 
> > > > No. We first rework the code so that the IMT stuff can be added in a later
> > > > patch easily.
> > > I'll move this patch to come after current patch 5.
> > Split the patch, one doing the topo cleanup, one adding the
> > x86_*_flags().
> hi,
> could you please CC me on the next version,
> I'm interested in that cleanup patch..

Sure. Will do.

Tim

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2016-09-19 16:01 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 22:26 [PATCH v3 0/8] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
2016-09-08 22:26 ` [PATCH v3 1/8] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for ITMT Srinivas Pandruvada
2016-09-10 13:10   ` Thomas Gleixner
2016-09-12 20:50     ` Tim Chen
2016-09-13  7:26       ` Peter Zijlstra
2016-09-19  9:11         ` Jiri Olsa
2016-09-19 16:01           ` Tim Chen
2016-09-10 13:13   ` Thomas Gleixner
2016-09-08 22:26 ` [PATCH v3 2/8] sched: Extend scheduler's asym packing Srinivas Pandruvada
2016-09-08 22:26 ` [PATCH v3 3/8] x86, cpu: provide a function topology_num_packages to enumerate #packages Srinivas Pandruvada
2016-09-10 13:28   ` Thomas Gleixner
2016-09-12 20:51     ` Tim Chen
2016-09-08 22:26 ` [PATCH v3 4/8] sched, x86: use arch_update_cpu_topology to indicate x86 need sched domain rebuild Srinivas Pandruvada
2016-09-10 16:20   ` Thomas Gleixner
2016-09-12 21:00     ` Tim Chen
2016-09-08 22:26 ` [PATCH v3 5/8] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
2016-09-10 16:21   ` Thomas Gleixner
2016-09-12 21:34     ` Tim Chen
2016-09-08 22:26 ` [PATCH v3 6/8] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
2016-09-08 22:26 ` [PATCH v3 7/8] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
2016-09-08 22:26 ` [PATCH v3 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
2016-09-10 16:22   ` Thomas Gleixner
2016-09-12 22:34     ` Srinivas Pandruvada

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).