linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0
@ 2016-11-22 20:23 Tim Chen
  2016-11-22 20:23 ` [PATCH v8 1/8] sched: Extend scheduler's asym packing Tim Chen
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Tim Chen, x86, linux-pm, linux-kernel, linux-acpi, peterz, jolsa,
	Srinivas Pandruvada

With Intel Turbo Boost Max Technology 3.0 (ITMT), single-threaded
performance is optimized by identifying processor's fastest
core and running critical workloads on it.

Refer to:
http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-max-technology.html

This patchset consist of all changes required to support ITMT feature:
- Use CPPC information in Intel P-State driver to get performance information
- Scheduler enhancements
- cppc lib patches (split in to a seprate series)

This featured can be enabled by writing at runtime
# echo 1 > /proc/sys/kernel/sched_itmt_enabled
This featured can be disabled by writing at runtime
# echo 0 > /proc/sys/kernel/sched_itmt_enabled

Change Log:
v8:
1. Remove arch_asym_max_cpu_and function and implement the logic
explicitly in nohz_kick_needed. 

v7:
1. ITMT3.0 is enabled by default also for multi-socket systems supporting
the feature.  Remove single socket restriction.  
2. Add proper ASYM_PACKING check for high priority idle cpu in
nohz_kick_needed.
3. Nulling of itmt_sysctl_header after sysctl table
unregistration.
4. Minor code format cleanups.
5. Fix build issue on i386

v6:
- Split ITMT support control by OS into sched_set_itmt_support and
sched_clear_itmt_support functions.
- Return error when we set sysctl_sched_itmt_enabled, if system
does not support ITMT, or when we cannot create sysctl table entry.
- Minor code clean up and moving definition of sched_core_priority
to top of file.
- Rebase on v4.9-rc1

v5:
- Simplify intel_pstate for enabling ITMT feature
- Put x86_sched_itmt_flags related functions under proper
CONFIG_SCHED_MC/SMT flags
- Comment to note that rebuild_sched_domain is not needed after updating
CPU priorities.
- Define sysctl_sched_itmt_enabled to 0 when ITMT is not used in
arch/x86/include/asm/topology.h
- Dropped patch "Fix numa in package topology bug", as this is
already applied

v4:
- Split x86 multi-node numa topology bug fix and setting
of SD_ASYM flag for ITMT topology into 2 patches
- Split the sysctl changes for ITMT enablement and setting of ITMT
capability/core priorities into 2 patches.
- Avoid unnecessary rebuild of sched domains when ITMT sysctl or
capabilities are updated.
- Fix missing stub function for topology_max_packages for !SMP case.
- Rename set_sched_itmt() to sched_set_itmt_support().
- Various updates to itmt.c to eliminate goto and make logic tighter.
- Various change logs and comments update.
- intel_pstate: Split function to process cppc and enable ITMT
- intel_pstate: Just keep the cppc_perf information till we use CPPC for HWP

v3:
- Fix race clash when more than one program are enabling/disabling ITMT
- Remove group_priority_cpu macro to simplify code.
- Error reported by 0-day for compile issue on ARM

v2
- The patchset is split into two parts so that CPPC changes can be merged first
 1. Only ACPI CPPC changes (It is posted separately)
 2. ITMT changes (scheduler and Intel P-State)

- Changes in patch: sched,x86: Enable Turbo Boost Max Technology
 1. Use arch_update_cpu_topology to indicate need to completely
    rebuild sched domain when ITMT related sched domain flags change
 2. Enable client (single node) platform capable of ITMT with ITMT
    scheduling by default
 3. Implement arch_asym_cpu_priority to provide the cpu priority
    value to scheduler for asym packing.
 4. Fix a compile bug for i386 architecture.

- Changes in patch: sched: Extend scheduler's asym packing
 1. Use arch_asym_cpu_priority() to provide cpu priority
    value used for asym packing to the scheduler.

- Changes in acpi: bus: Enable HWP CPPC objects and
  acpi: bus: Set _OSC for diverse core support
  Minor code cleanup by removing #ifdef
- Changes in Kconfig for Intel P-State
  Avoid building CPPC lib for i386 for issue reported by 0-day

- Feature is enabled by default for single socket systems


Rafael J. Wysocki (1):
  cpufreq: intel_pstate: Use CPPC to get max performance

Srinivas Pandruvada (2):
  acpi: bus: Enable HWP CPPC objects
  acpi: bus: Set _OSC for diverse core support

Tim Chen (5):
  sched: Extend scheduler's asym packing
  x86/topology: Define x86's arch_update_cpu_topology
  x86: Enable Intel Turbo Boost Max Technology 3.0
  x86/sysctl: Add sysctl for ITMT scheduling feature
  x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU

 arch/x86/Kconfig                |   9 ++
 arch/x86/include/asm/topology.h |  32 ++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 215 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |  39 +++++++-
 drivers/acpi/bus.c              |  10 ++
 drivers/cpufreq/Kconfig.x86     |   1 +
 drivers/cpufreq/intel_pstate.c  |  56 ++++++++++-
 include/linux/acpi.h            |   1 +
 include/linux/sched.h           |   4 +
 kernel/sched/core.c             |  15 +++
 kernel/sched/fair.c             |  54 ++++++----
 kernel/sched/sched.h            |   6 ++
 13 files changed, 421 insertions(+), 22 deletions(-)
 create mode 100644 arch/x86/kernel/itmt.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v8 1/8] sched: Extend scheduler's asym packing
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-23 13:09   ` Peter Zijlstra
  2016-11-24 13:25   ` [tip:sched/core] " tip-bot for Tim Chen
  2016-11-22 20:23 ` [PATCH v8 2/8] x86/topology: Define x86's arch_update_cpu_topology Tim Chen
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Tim Chen, x86, linux-pm, linux-kernel, linux-acpi, peterz, jolsa,
	Srinivas Pandruvada

We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number.  This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/sched.h |  4 ++++
 kernel/sched/core.c   | 15 ++++++++++++++
 kernel/sched/fair.c   | 54 +++++++++++++++++++++++++++++++++++----------------
 kernel/sched/sched.h  |  6 ++++++
 4 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..ca02475 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1057,6 +1057,10 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
+int arch_asym_cpu_priority(int cpu);
+int arch_asym_max_cpu_and(const struct cpumask *mask1,
+			  const struct cpumask *mask2);
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 154fd68..d54c6e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6301,7 +6301,22 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (max_cpu < 0)
+				max_cpu = cpu;
+			else if (sched_asym_prefer(cpu, max_cpu))
+				max_cpu = cpu;
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c242944..af97dc8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -100,6 +100,17 @@ const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
  */
 unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
 
+#ifdef CONFIG_SMP
+/*
+ * For asym packing, by default the lower numbered cpu has higher priority.
+ */
+int __weak arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
@@ -7113,16 +7124,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(sds->busiest->asym_prefer_cpu,
+				      sg->asym_prefer_cpu))
 			return true;
 	}
 
@@ -7274,8 +7287,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = sds->busiest->asym_prefer_cpu;
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7613,10 +7626,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
@@ -8465,7 +8479,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	unsigned long now = jiffies;
 	struct sched_domain_shared *sds;
 	struct sched_domain *sd;
-	int nr_busy, cpu = rq->cpu;
+	int nr_busy, i, cpu = rq->cpu;
 	bool kick = false;
 
 	if (unlikely(rq->idle_balance))
@@ -8516,12 +8530,18 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	}
 
 	sd = rcu_dereference(per_cpu(sd_asym, cpu));
-	if (sd && (cpumask_first_and(nohz.idle_cpus_mask,
-				  sched_domain_span(sd)) < cpu)) {
-		kick = true;
-		goto unlock;
-	}
+	if (sd) {
+		for_each_cpu(i, sched_domain_span(sd)) {
+			if (i == cpu ||
+			    !cpumask_test_cpu(i, nohz.idle_cpus_mask))
+				continue;
 
+			if (sched_asym_prefer(i, cpu)) {
+				kick = true;
+				goto unlock;
+			}
+		}
+	}
 unlock:
 	rcu_read_unlock();
 	return kick;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 055f935..cd3d413 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -539,6 +539,11 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+}
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -905,6 +910,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 2/8] x86/topology: Define x86's arch_update_cpu_topology
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
  2016-11-22 20:23 ` [PATCH v8 1/8] sched: Extend scheduler's asym packing Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-24 19:52   ` [tip:x86/core] " tip-bot for Tim Chen
  2016-11-22 20:23 ` [PATCH v8 3/8] x86: Enable Intel Turbo Boost Max Technology 3.0 Tim Chen
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Tim Chen, x86, linux-pm, linux-kernel, linux-acpi, peterz, jolsa,
	Srinivas Pandruvada

The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.

So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.

Request the rebuild when the x86 internal update flag is set.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/smpboot.c       | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index cf75871..a5ca88a 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -146,4 +146,5 @@ struct pci_bus;
 int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
+extern bool x86_topology_update;
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42f5eb7..ac61ee7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -109,6 +109,17 @@ static bool logical_packages_frozen __read_mostly;
 /* Maximum number of SMT threads on any online core */
 int __max_smt_threads __read_mostly;
 
+/* Flag to indicate if a complete sched domain rebuild is required */
+bool x86_topology_update;
+
+int arch_update_cpu_topology(void)
+{
+	int retval = x86_topology_update;
+
+	x86_topology_update = false;
+	return retval;
+}
+
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 {
 	unsigned long flags;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 3/8] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
  2016-11-22 20:23 ` [PATCH v8 1/8] sched: Extend scheduler's asym packing Tim Chen
  2016-11-22 20:23 ` [PATCH v8 2/8] x86/topology: Define x86's arch_update_cpu_topology Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-24 19:52   ` [tip:x86/core] " tip-bot for Tim Chen
  2016-11-22 20:23 ` [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature Tim Chen
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Tim Chen, x86, linux-pm, linux-kernel, linux-acpi, peterz, jolsa,
	Srinivas Pandruvada

On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package.  In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.

To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency.  In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach.  At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.

The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function.  The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/Kconfig                |   9 ++++
 arch/x86/include/asm/topology.h |  28 +++++++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 109 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+)
 create mode 100644 arch/x86/kernel/itmt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bada636..25950f0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -939,6 +939,15 @@ config SCHED_MC
 	  making when dealing with multi-core CPU chips at a cost of slightly
 	  increased overhead in some places. If unsure say N here.
 
+config SCHED_ITMT
+	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
+	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+	---help---
+	  ITMT enabled scheduler support improves the CPU scheduler's decision
+	  to move tasks to cpu core that can be boosted to a higher frequency
+	  than others. It will have better performance at a cost of slightly
+	  increased overhead in task migrations. If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config UP_LATE_INIT
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index a5ca88a..8ace951 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -147,4 +147,32 @@ int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
 extern bool x86_topology_update;
+
+#ifdef CONFIG_SCHED_ITMT
+#include <asm/percpu.h>
+
+DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+
+/* Interface to set priority of a cpu */
+void sched_set_itmt_core_prio(int prio, int core_cpu);
+
+/* Interface to notify scheduler that system supports ITMT */
+void sched_set_itmt_support(void);
+
+/* Interface to notify scheduler that system revokes ITMT support */
+void sched_clear_itmt_support(void);
+
+#else /* CONFIG_SCHED_ITMT */
+
+static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+}
+static inline void sched_set_itmt_support(void)
+{
+}
+static inline void sched_clear_itmt_support(void)
+{
+}
+#endif /* CONFIG_SCHED_ITMT */
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 79076d7..bbd0ebc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
+obj-$(CONFIG_SCHED_ITMT)		+= itmt.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y					+= unwind_frame.o
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
new file mode 100644
index 0000000..63c9b3e
--- /dev/null
+++ b/arch/x86/kernel/itmt.c
@@ -0,0 +1,109 @@
+/*
+ * itmt.c: Support Intel Turbo Boost Max Technology 3.0
+ *
+ * (C) Copyright 2016 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ * On platforms supporting Intel Turbo Boost Max Technology 3.0, (ITMT),
+ * the maximum turbo frequencies of some cores in a CPU package may be
+ * higher than for the other cores in the same package.  In that case,
+ * better performance can be achieved by making the scheduler prefer
+ * to run tasks on the CPUs with higher max turbo frequencies.
+ *
+ * This file provides functions and data structures for enabling the
+ * scheduler to favor scheduling on cores can be boosted to a higher
+ * frequency under ITMT.
+ */
+
+#include <linux/sched.h>
+#include <linux/cpumask.h>
+#include <linux/cpuset.h>
+#include <asm/mutex.h>
+#include <linux/sched.h>
+#include <linux/sysctl.h>
+#include <linux/nodemask.h>
+
+static DEFINE_MUTEX(itmt_update_mutex);
+DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+
+/* Boolean to track if system has ITMT capabilities */
+static bool __read_mostly sched_itmt_capable;
+
+/**
+ * sched_set_itmt_support() - Indicate platform supports ITMT
+ *
+ * This function is used by the OS to indicate to scheduler that the platform
+ * is capable of supporting the ITMT feature.
+ *
+ * The current scheme has the pstate driver detects if the system
+ * is ITMT capable and call sched_set_itmt_support.
+ *
+ * This must be done only after sched_set_itmt_core_prio
+ * has been called to set the cpus' priorities.
+ */
+void sched_set_itmt_support(void)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	sched_itmt_capable = true;
+
+	mutex_unlock(&itmt_update_mutex);
+}
+
+/**
+ * sched_clear_itmt_support() - Revoke platform's support of ITMT
+ *
+ * This function is used by the OS to indicate that it has
+ * revoked the platform's support of ITMT feature.
+ *
+ */
+void sched_clear_itmt_support(void)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	sched_itmt_capable = false;
+
+	mutex_unlock(&itmt_update_mutex);
+}
+
+int arch_asym_cpu_priority(int cpu)
+{
+	return per_cpu(sched_core_priority, cpu);
+}
+
+/**
+ * sched_set_itmt_core_prio() - Set CPU priority based on ITMT
+ * @prio:	Priority of cpu core
+ * @core_cpu:	The cpu number associated with the core
+ *
+ * The pstate driver will find out the max boost frequency
+ * and call this function to set a priority proportional
+ * to the max boost frequency. CPU with higher boost
+ * frequency will receive higher priority.
+ *
+ * No need to rebuild sched domain after updating
+ * the CPU priorities. The sched domains have no
+ * dependency on CPU priorities.
+ */
+void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+	int cpu, i = 1;
+
+	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
+		int smt_prio;
+
+		/*
+		 * Ensure that the siblings are moved to the end
+		 * of the priority chain and only used when
+		 * all other high priority cpus are out of capacity.
+		 */
+		smt_prio = prio * smp_num_siblings / i;
+		per_cpu(sched_core_priority, cpu) = smt_prio;
+		i++;
+	}
+}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
                   ` (2 preceding siblings ...)
  2016-11-22 20:23 ` [PATCH v8 3/8] x86: Enable Intel Turbo Boost Max Technology 3.0 Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-24 19:53   ` [tip:x86/core] " tip-bot for Tim Chen
  2016-11-28  8:56   ` [PATCH v8 4/8] " Borislav Petkov
  2016-11-22 20:23 ` [PATCH v8 5/8] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Tim Chen
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Tim Chen, x86, linux-pm, linux-kernel, linux-acpi, peterz, jolsa,
	Srinivas Pandruvada

Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.

Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.

By default, system that is ITMT capable and single
socket has this feature turned on.  It is more likely
to be lightly loaded and operates in Turbo range.

When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/include/asm/topology.h |   7 ++-
 arch/x86/kernel/itmt.c          | 108 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 8ace951..4813df5 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -152,23 +152,26 @@ extern bool x86_topology_update;
 #include <asm/percpu.h>
 
 DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+extern unsigned int __read_mostly sysctl_sched_itmt_enabled;
 
 /* Interface to set priority of a cpu */
 void sched_set_itmt_core_prio(int prio, int core_cpu);
 
 /* Interface to notify scheduler that system supports ITMT */
-void sched_set_itmt_support(void);
+int sched_set_itmt_support(void);
 
 /* Interface to notify scheduler that system revokes ITMT support */
 void sched_clear_itmt_support(void);
 
 #else /* CONFIG_SCHED_ITMT */
 
+#define sysctl_sched_itmt_enabled	0
 static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
 {
 }
-static inline void sched_set_itmt_support(void)
+static inline int sched_set_itmt_support(void)
 {
+	return 0;
 }
 static inline void sched_clear_itmt_support(void)
 {
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 63c9b3e..672fbf7 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -34,6 +34,68 @@ DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
 /* Boolean to track if system has ITMT capabilities */
 static bool __read_mostly sched_itmt_capable;
 
+/*
+ * Boolean to control whether we want to move processes to cpu capable
+ * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
+ * Technology 3.0.
+ *
+ * It can be set via /proc/sys/kernel/sched_itmt_enabled
+ */
+unsigned int __read_mostly sysctl_sched_itmt_enabled;
+
+static int sched_itmt_update_handler(struct ctl_table *table, int write,
+				     void __user *buffer, size_t *lenp,
+				     loff_t *ppos)
+{
+	unsigned int old_sysctl;
+	int ret;
+
+	mutex_lock(&itmt_update_mutex);
+
+	if (!sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return -EINVAL;
+	}
+
+	old_sysctl = sysctl_sched_itmt_enabled;
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (!ret && write && old_sysctl != sysctl_sched_itmt_enabled) {
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
+	mutex_unlock(&itmt_update_mutex);
+
+	return ret;
+}
+
+static unsigned int zero;
+static unsigned int one = 1;
+static struct ctl_table itmt_kern_table[] = {
+	{
+		.procname	= "sched_itmt_enabled",
+		.data		= &sysctl_sched_itmt_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_itmt_update_handler,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{}
+};
+
+static struct ctl_table itmt_root_table[] = {
+	{
+		.procname	= "kernel",
+		.mode		= 0555,
+		.child		= itmt_kern_table,
+	},
+	{}
+};
+
+static struct ctl_table_header *itmt_sysctl_header;
+
 /**
  * sched_set_itmt_support() - Indicate platform supports ITMT
  *
@@ -45,14 +107,39 @@ static bool __read_mostly sched_itmt_capable;
  *
  * This must be done only after sched_set_itmt_core_prio
  * has been called to set the cpus' priorities.
+ * It must not be called with cpu hot plug lock
+ * held as we need to acquire the lock to rebuild sched domains
+ * later.
+ *
+ * Return: 0 on success
  */
-void sched_set_itmt_support(void)
+int sched_set_itmt_support(void)
 {
 	mutex_lock(&itmt_update_mutex);
 
+	if (sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return 0;
+	}
+
+	itmt_sysctl_header = register_sysctl_table(itmt_root_table);
+	if (!itmt_sysctl_header) {
+		mutex_unlock(&itmt_update_mutex);
+		return -ENOMEM;
+	}
+
 	sched_itmt_capable = true;
 
+	sysctl_sched_itmt_enabled = 1;
+
+	if (sysctl_sched_itmt_enabled) {
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
 	mutex_unlock(&itmt_update_mutex);
+
+	return 0;
 }
 
 /**
@@ -61,13 +148,32 @@ void sched_set_itmt_support(void)
  * This function is used by the OS to indicate that it has
  * revoked the platform's support of ITMT feature.
  *
+ * It must not be called with cpu hot plug lock
+ * held as we need to acquire the lock to rebuild sched domains
+ * later.
  */
 void sched_clear_itmt_support(void)
 {
 	mutex_lock(&itmt_update_mutex);
 
+	if (!sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return;
+	}
 	sched_itmt_capable = false;
 
+	if (itmt_sysctl_header) {
+		unregister_sysctl_table(itmt_sysctl_header);
+		itmt_sysctl_header = NULL;
+	}
+
+	if (sysctl_sched_itmt_enabled) {
+		/* disable sched_itmt if we are no longer ITMT capable */
+		sysctl_sched_itmt_enabled = 0;
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
 	mutex_unlock(&itmt_update_mutex);
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 5/8] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
                   ` (3 preceding siblings ...)
  2016-11-22 20:23 ` [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-24 19:53   ` [tip:x86/core] " tip-bot for Tim Chen
  2016-11-22 20:23 ` [PATCH v8 6/8] acpi: bus: Enable HWP CPPC objects Tim Chen
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Tim Chen, x86, linux-pm, linux-kernel, linux-acpi, peterz, jolsa,
	Srinivas Pandruvada

Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.

If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ac61ee7..4f13062 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -482,22 +482,42 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
+#if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
+static inline int x86_sched_itmt_flags(void)
+{
+	return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
+}
+
+#ifdef CONFIG_SCHED_MC
+static int x86_core_flags(void)
+{
+	return cpu_core_flags() | x86_sched_itmt_flags();
+}
+#endif
+#ifdef CONFIG_SCHED_SMT
+static int x86_smt_flags(void)
+{
+	return cpu_smt_flags() | x86_sched_itmt_flags();
+}
+#endif
+#endif
+
 static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ NULL, },
 };
 
 static struct sched_domain_topology_level x86_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 6/8] acpi: bus: Enable HWP CPPC objects
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
                   ` (4 preceding siblings ...)
  2016-11-22 20:23 ` [PATCH v8 5/8] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-24 19:54   ` [tip:x86/core] acpi/bus: " tip-bot for Srinivas Pandruvada
  2016-11-22 20:23 ` [PATCH v8 7/8] acpi: bus: Set _OSC for diverse core support Tim Chen
  2016-11-22 20:24 ` [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Tim Chen
  7 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Srinivas Pandruvada, x86, linux-pm, linux-kernel, linux-acpi,
	peterz, Tim Chen, jolsa

From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposes CPPC tables.

The primary reason to enable CPPC support is to get the maximum
performance of each CPU to check and enable Intel Turbo Boost Max
Technology 3.0 (ITMT).

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 drivers/acpi/bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 56190d0..2f381ba 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -331,6 +331,13 @@ static void acpi_bus_osc_support(void)
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_OST_SUPPORT;
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_PCLPI_SUPPORT;
 
+#ifdef CONFIG_X86
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT;
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
+	}
+#endif
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 7/8] acpi: bus: Set _OSC for diverse core support
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
                   ` (5 preceding siblings ...)
  2016-11-22 20:23 ` [PATCH v8 6/8] acpi: bus: Enable HWP CPPC objects Tim Chen
@ 2016-11-22 20:23 ` Tim Chen
  2016-11-24 19:54   ` [tip:x86/core] acpi/bus: " tip-bot for Srinivas Pandruvada
  2016-11-22 20:24 ` [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Tim Chen
  7 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:23 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Srinivas Pandruvada, x86, linux-pm, linux-kernel, linux-acpi,
	peterz, Tim Chen, jolsa

From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.

This is required to inform BIOS the support of Intel Turbo Boost Max
Technology 3.0 feature.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 drivers/acpi/bus.c   | 3 +++
 include/linux/acpi.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 2f381ba..806db0d 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -338,6 +338,9 @@ static void acpi_bus_osc_support(void)
 	}
 #endif
 
+	if (IS_ENABLED(CONFIG_SCHED_ITMT))
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 689a8b9..639fc63 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -469,6 +469,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
 #define OSC_SB_CPCV2_SUPPORT			0x00000040
 #define OSC_SB_PCLPI_SUPPORT			0x00000080
 #define OSC_SB_OSLPI_SUPPORT			0x00000100
+#define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT		0x00001000
 
 extern bool osc_sb_apei_support_acked;
 extern bool osc_pc_lpi_support_confirmed;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
                   ` (6 preceding siblings ...)
  2016-11-22 20:23 ` [PATCH v8 7/8] acpi: bus: Set _OSC for diverse core support Tim Chen
@ 2016-11-22 20:24 ` Tim Chen
  2016-11-24 19:55   ` [tip:x86/core] cpufreq/intel_pstate: " tip-bot for Rafael J. Wysocki
  2016-12-07 19:06   ` [PATCH v8 8/8] cpufreq: intel_pstate: " Sebastian Andrzej Siewior
  7 siblings, 2 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-22 20:24 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: Rafael J. Wysocki, x86, linux-pm, linux-kernel, linux-acpi,
	peterz, Tim Chen, jolsa, Srinivas Pandruvada

From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>

This change uses acpi cppc_lib interface to get CPPC performance limits
and calls scheduler interface to update per cpu highest priority. If
there is a difference in highest performance of each CPUs, call scheduler
interface to enable ITMT feature for only one time.

Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.

Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 drivers/cpufreq/Kconfig.x86    |  1 +
 drivers/cpufreq/intel_pstate.c | 56 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index adbd1de..c6d273b 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -6,6 +6,7 @@ config X86_INTEL_PSTATE
        bool "Intel P state control"
        depends on X86
        select ACPI_PROCESSOR if ACPI
+       select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_ITMT
        help
           This driver provides a P state for Intel core processors.
 	  The driver implements an internal governor and will become
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4737520..e8dc42f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -44,6 +44,7 @@
 
 #ifdef CONFIG_ACPI
 #include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
 #endif
 
 #define FRAC_BITS 8
@@ -379,14 +380,67 @@ static bool intel_pstate_get_ppc_enable_status(void)
 	return acpi_ppc;
 }
 
+#ifdef CONFIG_ACPI_CPPC_LIB
+
+/* The work item is needed to avoid CPU hotplug locking issues */
+static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
+{
+	sched_set_itmt_support();
+}
+
+static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
+
+static void intel_pstate_set_itmt_prio(int cpu)
+{
+	struct cppc_perf_caps cppc_perf;
+	static u32 max_highest_perf = 0, min_highest_perf = U32_MAX;
+	int ret;
+
+	ret = cppc_get_perf_caps(cpu, &cppc_perf);
+	if (ret)
+		return;
+
+	/*
+	 * The priorities can be set regardless of whether or not
+	 * sched_set_itmt_support(true) has been called and it is valid to
+	 * update them at any time after it has been called.
+	 */
+	sched_set_itmt_core_prio(cppc_perf.highest_perf, cpu);
+
+	if (max_highest_perf <= min_highest_perf) {
+		if (cppc_perf.highest_perf > max_highest_perf)
+			max_highest_perf = cppc_perf.highest_perf;
+
+		if (cppc_perf.highest_perf < min_highest_perf)
+			min_highest_perf = cppc_perf.highest_perf;
+
+		if (max_highest_perf > min_highest_perf) {
+			/*
+			 * This code can be run during CPU online under the
+			 * CPU hotplug locks, so sched_set_itmt_support()
+			 * cannot be called from here.  Queue up a work item
+			 * to invoke it.
+			 */
+			schedule_work(&sched_itmt_work);
+		}
+	}
+}
+#else
+static void intel_pstate_set_itmt_prio(int cpu)
+{
+}
+#endif
+
 static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
 {
 	struct cpudata *cpu;
 	int ret;
 	int i;
 
-	if (hwp_active)
+	if (hwp_active) {
+		intel_pstate_set_itmt_prio(policy->cpu);
 		return;
+	}
 
 	if (!intel_pstate_get_ppc_enable_status())
 		return;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 1/8] sched: Extend scheduler's asym packing
  2016-11-22 20:23 ` [PATCH v8 1/8] sched: Extend scheduler's asym packing Tim Chen
@ 2016-11-23 13:09   ` Peter Zijlstra
  2016-11-23 17:32     ` Tim Chen
  2016-11-24 13:25   ` [tip:sched/core] " tip-bot for Tim Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2016-11-23 13:09 UTC (permalink / raw)
  To: Tim Chen
  Cc: rjw, tglx, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi,
	jolsa, Srinivas Pandruvada

On Tue, Nov 22, 2016 at 12:23:53PM -0800, Tim Chen wrote:
> We generalize the scheduler's asym packing to provide an ordering
> of the cpu beyond just the cpu number.  This allows the use of the
> ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
> sched domain. The preference is defined with the cpu priority
> given by arch_asym_cpu_priority(cpu).
> 
> We also record the most preferred cpu in a sched group when
> we build the cpu's capacity for fast lookup of preferred cpu
> during load balancing.
> 
> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

With the two little edits below:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> ---
>  include/linux/sched.h |  4 ++++
>  kernel/sched/core.c   | 15 ++++++++++++++
>  kernel/sched/fair.c   | 54 +++++++++++++++++++++++++++++++++++----------------
>  kernel/sched/sched.h  |  6 ++++++
>  4 files changed, 62 insertions(+), 17 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 348f51b..ca02475 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1057,6 +1057,10 @@ static inline int cpu_numa_flags(void)
>  }
>  #endif
>  
> +int arch_asym_cpu_priority(int cpu);

extern

> +int arch_asym_max_cpu_and(const struct cpumask *mask1,
> +			  const struct cpumask *mask2);
> +

And that needs to go too; that function no longer exists.

>  struct sched_domain_attr {
>  	int relax_domain_level;
>  };

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 1/8] sched: Extend scheduler's asym packing
  2016-11-23 13:09   ` Peter Zijlstra
@ 2016-11-23 17:32     ` Tim Chen
  0 siblings, 0 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-23 17:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: rjw, tglx, mingo, bp, x86, linux-pm, linux-kernel, linux-acpi,
	jolsa, Srinivas Pandruvada

On Wed, Nov 23, 2016 at 02:09:11PM +0100, Peter Zijlstra wrote:
> 
> With the two little edits below:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> 
> > ---
> >  include/linux/sched.h |  4 ++++
> >  kernel/sched/core.c   | 15 ++++++++++++++
> >  kernel/sched/fair.c   | 54 +++++++++++++++++++++++++++++++++++----------------
> >  kernel/sched/sched.h  |  6 ++++++
> >  4 files changed, 62 insertions(+), 17 deletions(-)
> > 
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 348f51b..ca02475 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -1057,6 +1057,10 @@ static inline int cpu_numa_flags(void)
> >  }
> >  #endif
> >  
> > +int arch_asym_cpu_priority(int cpu);
> 
> extern
> 
> > +int arch_asym_max_cpu_and(const struct cpumask *mask1,
> > +			  const struct cpumask *mask2);
> > +
> 
> And that needs to go too; that function no longer exists.
> 
> >  struct sched_domain_attr {
> >  	int relax_domain_level;
> >  };

Thanks for catching that.  The patch updated below.

Tim

---->8----

From: Tim Chen <tim.c.chen@linux.intel.com>
Subject: [PATCH v8 1/8] sched: Extend scheduler's asym packing
To: rjw@rjwysocki.net, tglx@linutronix.de, mingo@redhat.com, bp@suse.de
Cc: x86@kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, peterz@infradead.org, Tim Chen <tim.c.chen@linux.intel.com>, jolsa@redhat.com, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number.  This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> 
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 15 ++++++++++++++
 kernel/sched/fair.c   | 54 +++++++++++++++++++++++++++++++++++----------------
 kernel/sched/sched.h  |  6 ++++++
 4 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..c75f778 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1057,6 +1057,8 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
+extern int arch_asym_cpu_priority(int cpu);
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 154fd68..d54c6e1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6301,7 +6301,22 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (max_cpu < 0)
+				max_cpu = cpu;
+			else if (sched_asym_prefer(cpu, max_cpu))
+				max_cpu = cpu;
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c242944..af97dc8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -100,6 +100,17 @@ const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
  */
 unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
 
+#ifdef CONFIG_SMP
+/*
+ * For asym packing, by default the lower numbered cpu has higher priority.
+ */
+int __weak arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
@@ -7113,16 +7124,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(sds->busiest->asym_prefer_cpu,
+				      sg->asym_prefer_cpu))
 			return true;
 	}
 
@@ -7274,8 +7287,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = sds->busiest->asym_prefer_cpu;
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7613,10 +7626,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
@@ -8465,7 +8479,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	unsigned long now = jiffies;
 	struct sched_domain_shared *sds;
 	struct sched_domain *sd;
-	int nr_busy, cpu = rq->cpu;
+	int nr_busy, i, cpu = rq->cpu;
 	bool kick = false;
 
 	if (unlikely(rq->idle_balance))
@@ -8516,12 +8530,18 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	}
 
 	sd = rcu_dereference(per_cpu(sd_asym, cpu));
-	if (sd && (cpumask_first_and(nohz.idle_cpus_mask,
-				  sched_domain_span(sd)) < cpu)) {
-		kick = true;
-		goto unlock;
-	}
+	if (sd) {
+		for_each_cpu(i, sched_domain_span(sd)) {
+			if (i == cpu ||
+			    !cpumask_test_cpu(i, nohz.idle_cpus_mask))
+				continue;
 
+			if (sched_asym_prefer(i, cpu)) {
+				kick = true;
+				goto unlock;
+			}
+		}
+	}
 unlock:
 	rcu_read_unlock();
 	return kick;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 055f935..cd3d413 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -539,6 +539,11 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+}
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -905,6 +910,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:sched/core] sched: Extend scheduler's asym packing
  2016-11-22 20:23 ` [PATCH v8 1/8] sched: Extend scheduler's asym packing Tim Chen
  2016-11-23 13:09   ` Peter Zijlstra
@ 2016-11-24 13:25   ` tip-bot for Tim Chen
  1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Tim Chen @ 2016-11-24 13:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: srinivas.pandruvada, peterz, linux-kernel, mingo, tim.c.chen, hpa, tglx

Commit-ID:  afe06efdf07c12fd9370d5cce5383398cedf6c90
Gitweb:     http://git.kernel.org/tip/afe06efdf07c12fd9370d5cce5383398cedf6c90
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:53 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 14:09:46 +0100

sched: Extend scheduler's asym packing

We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number.  This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: linux-pm@vger.kernel.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0e73ae12737dfaafa46c07066cc7c5d3f1675e46.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 15 +++++++++++++++
 kernel/sched/fair.c   | 53 ++++++++++++++++++++++++++++++++++-----------------
 kernel/sched/sched.h  |  6 ++++++
 4 files changed, 59 insertions(+), 17 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 19abba0..fe9a499 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1077,6 +1077,8 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
+extern int arch_asym_cpu_priority(int cpu);
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dc64bd7..393759b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6303,7 +6303,22 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (max_cpu < 0)
+				max_cpu = cpu;
+			else if (sched_asym_prefer(cpu, max_cpu))
+				max_cpu = cpu;
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aa47589..18d9e75 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -97,6 +97,16 @@ unsigned int normalized_sysctl_sched_wakeup_granularity	= 1000000UL;
 
 const_debug unsigned int sysctl_sched_migration_cost	= 500000UL;
 
+#ifdef CONFIG_SMP
+/*
+ * For asym packing, by default the lower numbered cpu has higher priority.
+ */
+int __weak arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
@@ -7388,16 +7398,18 @@ asym_packing:
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(sds->busiest->asym_prefer_cpu,
+				      sg->asym_prefer_cpu))
 			return true;
 	}
 
@@ -7549,8 +7561,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = sds->busiest->asym_prefer_cpu;
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7888,10 +7900,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
@@ -8740,7 +8753,7 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	unsigned long now = jiffies;
 	struct sched_domain_shared *sds;
 	struct sched_domain *sd;
-	int nr_busy, cpu = rq->cpu;
+	int nr_busy, i, cpu = rq->cpu;
 	bool kick = false;
 
 	if (unlikely(rq->idle_balance))
@@ -8791,12 +8804,18 @@ static inline bool nohz_kick_needed(struct rq *rq)
 	}
 
 	sd = rcu_dereference(per_cpu(sd_asym, cpu));
-	if (sd && (cpumask_first_and(nohz.idle_cpus_mask,
-				  sched_domain_span(sd)) < cpu)) {
-		kick = true;
-		goto unlock;
-	}
+	if (sd) {
+		for_each_cpu(i, sched_domain_span(sd)) {
+			if (i == cpu ||
+			    !cpumask_test_cpu(i, nohz.idle_cpus_mask))
+				continue;
 
+			if (sched_asym_prefer(i, cpu)) {
+				kick = true;
+				goto unlock;
+			}
+		}
+	}
 unlock:
 	rcu_read_unlock();
 	return kick;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d7e3931..7b34c78 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -540,6 +540,11 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+}
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -908,6 +913,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] x86/topology: Define x86's arch_update_cpu_topology
  2016-11-22 20:23 ` [PATCH v8 2/8] x86/topology: Define x86's arch_update_cpu_topology Tim Chen
@ 2016-11-24 19:52   ` tip-bot for Tim Chen
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Tim Chen @ 2016-11-24 19:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, linux-kernel, mingo, morten.rasmussen, tim.c.chen, tglx,
	srinivas.pandruvada

Commit-ID:  7d25127cef44924f1013d119ba385095ca4b4a83
Gitweb:     http://git.kernel.org/tip/7d25127cef44924f1013d119ba385095ca4b4a83
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:54 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:19 +0100

x86/topology: Define x86's arch_update_cpu_topology

The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.

So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.

Request the rebuild when the x86 internal update flag is set.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/bfbf5591276ec60b2af2da798adc1060df1e2a5f.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/smpboot.c       | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index cf75871..a5ca88a 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -146,4 +146,5 @@ struct pci_bus;
 int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
+extern bool x86_topology_update;
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 42f5eb7..ac61ee7 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -109,6 +109,17 @@ static bool logical_packages_frozen __read_mostly;
 /* Maximum number of SMT threads on any online core */
 int __max_smt_threads __read_mostly;
 
+/* Flag to indicate if a complete sched domain rebuild is required */
+bool x86_topology_update;
+
+int arch_update_cpu_topology(void)
+{
+	int retval = x86_topology_update;
+
+	x86_topology_update = false;
+	return retval;
+}
+
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 {
 	unsigned long flags;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-22 20:23 ` [PATCH v8 3/8] x86: Enable Intel Turbo Boost Max Technology 3.0 Tim Chen
@ 2016-11-24 19:52   ` tip-bot for Tim Chen
  2016-11-25  8:19     ` Ingo Molnar
  0 siblings, 1 reply; 39+ messages in thread
From: tip-bot for Tim Chen @ 2016-11-24 19:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, srinivas.pandruvada, hpa, mingo, tim.c.chen, tglx, peterz

Commit-ID:  5e76b2ab36b40ca33023e78725bdc69eafd63134
Gitweb:     http://git.kernel.org/tip/5e76b2ab36b40ca33023e78725bdc69eafd63134
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:55 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:19 +0100

x86: Enable Intel Turbo Boost Max Technology 3.0

On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package.  In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.

To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency.  In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach.  At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.

The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function.  The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/cd401ccdff88f88c8349314febdc25d51f7c48f7.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/Kconfig                |   9 ++++
 arch/x86/include/asm/topology.h |  28 +++++++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 109 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 147 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index bada636..25950f0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -939,6 +939,15 @@ config SCHED_MC
 	  making when dealing with multi-core CPU chips at a cost of slightly
 	  increased overhead in some places. If unsure say N here.
 
+config SCHED_ITMT
+	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
+	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+	---help---
+	  ITMT enabled scheduler support improves the CPU scheduler's decision
+	  to move tasks to cpu core that can be boosted to a higher frequency
+	  than others. It will have better performance at a cost of slightly
+	  increased overhead in task migrations. If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config UP_LATE_INIT
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index a5ca88a..8ace951 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -147,4 +147,32 @@ int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
 extern bool x86_topology_update;
+
+#ifdef CONFIG_SCHED_ITMT
+#include <asm/percpu.h>
+
+DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+
+/* Interface to set priority of a cpu */
+void sched_set_itmt_core_prio(int prio, int core_cpu);
+
+/* Interface to notify scheduler that system supports ITMT */
+void sched_set_itmt_support(void);
+
+/* Interface to notify scheduler that system revokes ITMT support */
+void sched_clear_itmt_support(void);
+
+#else /* CONFIG_SCHED_ITMT */
+
+static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+}
+static inline void sched_set_itmt_support(void)
+{
+}
+static inline void sched_clear_itmt_support(void)
+{
+}
+#endif /* CONFIG_SCHED_ITMT */
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 79076d7..bbd0ebc 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -123,6 +123,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
+obj-$(CONFIG_SCHED_ITMT)		+= itmt.o
 
 ifdef CONFIG_FRAME_POINTER
 obj-y					+= unwind_frame.o
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
new file mode 100644
index 0000000..63c9b3e
--- /dev/null
+++ b/arch/x86/kernel/itmt.c
@@ -0,0 +1,109 @@
+/*
+ * itmt.c: Support Intel Turbo Boost Max Technology 3.0
+ *
+ * (C) Copyright 2016 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ * On platforms supporting Intel Turbo Boost Max Technology 3.0, (ITMT),
+ * the maximum turbo frequencies of some cores in a CPU package may be
+ * higher than for the other cores in the same package.  In that case,
+ * better performance can be achieved by making the scheduler prefer
+ * to run tasks on the CPUs with higher max turbo frequencies.
+ *
+ * This file provides functions and data structures for enabling the
+ * scheduler to favor scheduling on cores can be boosted to a higher
+ * frequency under ITMT.
+ */
+
+#include <linux/sched.h>
+#include <linux/cpumask.h>
+#include <linux/cpuset.h>
+#include <asm/mutex.h>
+#include <linux/sched.h>
+#include <linux/sysctl.h>
+#include <linux/nodemask.h>
+
+static DEFINE_MUTEX(itmt_update_mutex);
+DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+
+/* Boolean to track if system has ITMT capabilities */
+static bool __read_mostly sched_itmt_capable;
+
+/**
+ * sched_set_itmt_support() - Indicate platform supports ITMT
+ *
+ * This function is used by the OS to indicate to scheduler that the platform
+ * is capable of supporting the ITMT feature.
+ *
+ * The current scheme has the pstate driver detects if the system
+ * is ITMT capable and call sched_set_itmt_support.
+ *
+ * This must be done only after sched_set_itmt_core_prio
+ * has been called to set the cpus' priorities.
+ */
+void sched_set_itmt_support(void)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	sched_itmt_capable = true;
+
+	mutex_unlock(&itmt_update_mutex);
+}
+
+/**
+ * sched_clear_itmt_support() - Revoke platform's support of ITMT
+ *
+ * This function is used by the OS to indicate that it has
+ * revoked the platform's support of ITMT feature.
+ *
+ */
+void sched_clear_itmt_support(void)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	sched_itmt_capable = false;
+
+	mutex_unlock(&itmt_update_mutex);
+}
+
+int arch_asym_cpu_priority(int cpu)
+{
+	return per_cpu(sched_core_priority, cpu);
+}
+
+/**
+ * sched_set_itmt_core_prio() - Set CPU priority based on ITMT
+ * @prio:	Priority of cpu core
+ * @core_cpu:	The cpu number associated with the core
+ *
+ * The pstate driver will find out the max boost frequency
+ * and call this function to set a priority proportional
+ * to the max boost frequency. CPU with higher boost
+ * frequency will receive higher priority.
+ *
+ * No need to rebuild sched domain after updating
+ * the CPU priorities. The sched domains have no
+ * dependency on CPU priorities.
+ */
+void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+	int cpu, i = 1;
+
+	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
+		int smt_prio;
+
+		/*
+		 * Ensure that the siblings are moved to the end
+		 * of the priority chain and only used when
+		 * all other high priority cpus are out of capacity.
+		 */
+		smt_prio = prio * smp_num_siblings / i;
+		per_cpu(sched_core_priority, cpu) = smt_prio;
+		i++;
+	}
+}

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] x86/sysctl: Add sysctl for ITMT scheduling feature
  2016-11-22 20:23 ` [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature Tim Chen
@ 2016-11-24 19:53   ` tip-bot for Tim Chen
  2016-11-28  8:56   ` [PATCH v8 4/8] " Borislav Petkov
  1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Tim Chen @ 2016-11-24 19:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, srinivas.pandruvada, tglx, tim.c.chen, peterz, mingo, hpa

Commit-ID:  f9793e34952cda133caaa35738a4b46053331c96
Gitweb:     http://git.kernel.org/tip/f9793e34952cda133caaa35738a4b46053331c96
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:56 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:19 +0100

x86/sysctl: Add sysctl for ITMT scheduling feature

Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.

Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.

By default, system that is ITMT capable and single
socket has this feature turned on.  It is more likely
to be lightly loaded and operates in Turbo range.

When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/07cc62426a28bad57b01ab16bb903a9c84fa5421.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/topology.h |   7 ++-
 arch/x86/kernel/itmt.c          | 108 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 8ace951..4813df5 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -152,23 +152,26 @@ extern bool x86_topology_update;
 #include <asm/percpu.h>
 
 DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+extern unsigned int __read_mostly sysctl_sched_itmt_enabled;
 
 /* Interface to set priority of a cpu */
 void sched_set_itmt_core_prio(int prio, int core_cpu);
 
 /* Interface to notify scheduler that system supports ITMT */
-void sched_set_itmt_support(void);
+int sched_set_itmt_support(void);
 
 /* Interface to notify scheduler that system revokes ITMT support */
 void sched_clear_itmt_support(void);
 
 #else /* CONFIG_SCHED_ITMT */
 
+#define sysctl_sched_itmt_enabled	0
 static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
 {
 }
-static inline void sched_set_itmt_support(void)
+static inline int sched_set_itmt_support(void)
 {
+	return 0;
 }
 static inline void sched_clear_itmt_support(void)
 {
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 63c9b3e..672fbf7 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -34,6 +34,68 @@ DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
 /* Boolean to track if system has ITMT capabilities */
 static bool __read_mostly sched_itmt_capable;
 
+/*
+ * Boolean to control whether we want to move processes to cpu capable
+ * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
+ * Technology 3.0.
+ *
+ * It can be set via /proc/sys/kernel/sched_itmt_enabled
+ */
+unsigned int __read_mostly sysctl_sched_itmt_enabled;
+
+static int sched_itmt_update_handler(struct ctl_table *table, int write,
+				     void __user *buffer, size_t *lenp,
+				     loff_t *ppos)
+{
+	unsigned int old_sysctl;
+	int ret;
+
+	mutex_lock(&itmt_update_mutex);
+
+	if (!sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return -EINVAL;
+	}
+
+	old_sysctl = sysctl_sched_itmt_enabled;
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (!ret && write && old_sysctl != sysctl_sched_itmt_enabled) {
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
+	mutex_unlock(&itmt_update_mutex);
+
+	return ret;
+}
+
+static unsigned int zero;
+static unsigned int one = 1;
+static struct ctl_table itmt_kern_table[] = {
+	{
+		.procname	= "sched_itmt_enabled",
+		.data		= &sysctl_sched_itmt_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_itmt_update_handler,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{}
+};
+
+static struct ctl_table itmt_root_table[] = {
+	{
+		.procname	= "kernel",
+		.mode		= 0555,
+		.child		= itmt_kern_table,
+	},
+	{}
+};
+
+static struct ctl_table_header *itmt_sysctl_header;
+
 /**
  * sched_set_itmt_support() - Indicate platform supports ITMT
  *
@@ -45,14 +107,39 @@ static bool __read_mostly sched_itmt_capable;
  *
  * This must be done only after sched_set_itmt_core_prio
  * has been called to set the cpus' priorities.
+ * It must not be called with cpu hot plug lock
+ * held as we need to acquire the lock to rebuild sched domains
+ * later.
+ *
+ * Return: 0 on success
  */
-void sched_set_itmt_support(void)
+int sched_set_itmt_support(void)
 {
 	mutex_lock(&itmt_update_mutex);
 
+	if (sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return 0;
+	}
+
+	itmt_sysctl_header = register_sysctl_table(itmt_root_table);
+	if (!itmt_sysctl_header) {
+		mutex_unlock(&itmt_update_mutex);
+		return -ENOMEM;
+	}
+
 	sched_itmt_capable = true;
 
+	sysctl_sched_itmt_enabled = 1;
+
+	if (sysctl_sched_itmt_enabled) {
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
 	mutex_unlock(&itmt_update_mutex);
+
+	return 0;
 }
 
 /**
@@ -61,13 +148,32 @@ void sched_set_itmt_support(void)
  * This function is used by the OS to indicate that it has
  * revoked the platform's support of ITMT feature.
  *
+ * It must not be called with cpu hot plug lock
+ * held as we need to acquire the lock to rebuild sched domains
+ * later.
  */
 void sched_clear_itmt_support(void)
 {
 	mutex_lock(&itmt_update_mutex);
 
+	if (!sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return;
+	}
 	sched_itmt_capable = false;
 
+	if (itmt_sysctl_header) {
+		unregister_sysctl_table(itmt_sysctl_header);
+		itmt_sysctl_header = NULL;
+	}
+
+	if (sysctl_sched_itmt_enabled) {
+		/* disable sched_itmt if we are no longer ITMT capable */
+		sysctl_sched_itmt_enabled = 0;
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
 	mutex_unlock(&itmt_update_mutex);
 }
 

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-11-22 20:23 ` [PATCH v8 5/8] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Tim Chen
@ 2016-11-24 19:53   ` tip-bot for Tim Chen
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Tim Chen @ 2016-11-24 19:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tim.c.chen, mingo, peterz, srinivas.pandruvada, tglx, hpa

Commit-ID:  d3d37d850d1d77bd66bceb8326e6353d3314b270
Gitweb:     http://git.kernel.org/tip/d3d37d850d1d77bd66bceb8326e6353d3314b270
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:57 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:20 +0100

x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU

Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.

If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/9bbb885bedbef4eb50e197305eb16b160cff0831.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/smpboot.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index ac61ee7..4f13062 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -482,22 +482,42 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
+#if defined(CONFIG_SCHED_SMT) || defined(CONFIG_SCHED_MC)
+static inline int x86_sched_itmt_flags(void)
+{
+	return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
+}
+
+#ifdef CONFIG_SCHED_MC
+static int x86_core_flags(void)
+{
+	return cpu_core_flags() | x86_sched_itmt_flags();
+}
+#endif
+#ifdef CONFIG_SCHED_SMT
+static int x86_smt_flags(void)
+{
+	return cpu_smt_flags() | x86_sched_itmt_flags();
+}
+#endif
+#endif
+
 static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ NULL, },
 };
 
 static struct sched_domain_topology_level x86_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] acpi/bus: Enable HWP CPPC objects
  2016-11-22 20:23 ` [PATCH v8 6/8] acpi: bus: Enable HWP CPPC objects Tim Chen
@ 2016-11-24 19:54   ` tip-bot for Srinivas Pandruvada
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Srinivas Pandruvada @ 2016-11-24 19:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tim.c.chen, srinivas.pandruvada, hpa, tglx, mingo

Commit-ID:  5c2832e91a3ed45f35531ae1c5afba8eac22c81f
Gitweb:     http://git.kernel.org/tip/5c2832e91a3ed45f35531ae1c5afba8eac22c81f
Author:     Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:58 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:20 +0100

acpi/bus: Enable HWP CPPC objects

Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposes CPPC tables.

The primary reason to enable CPPC support is to get the maximum
performance of each CPU to check and enable Intel Turbo Boost Max
Technology 3.0 (ITMT).

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a696f6b17843cee9a542482fae6abab087be9587.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/acpi/bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 56190d0..2f381ba 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -331,6 +331,13 @@ static void acpi_bus_osc_support(void)
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_OST_SUPPORT;
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_PCLPI_SUPPORT;
 
+#ifdef CONFIG_X86
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT;
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
+	}
+#endif
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] acpi/bus: Set _OSC for diverse core support
  2016-11-22 20:23 ` [PATCH v8 7/8] acpi: bus: Set _OSC for diverse core support Tim Chen
@ 2016-11-24 19:54   ` tip-bot for Srinivas Pandruvada
  0 siblings, 0 replies; 39+ messages in thread
From: tip-bot for Srinivas Pandruvada @ 2016-11-24 19:54 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, srinivas.pandruvada, linux-kernel, hpa, tim.c.chen

Commit-ID:  8b533a0eeefc5861cea57163dd3cec2798a77f6c
Gitweb:     http://git.kernel.org/tip/8b533a0eeefc5861cea57163dd3cec2798a77f6c
Author:     Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
AuthorDate: Tue, 22 Nov 2016 12:23:59 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:20 +0100

acpi/bus: Set _OSC for diverse core support

Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.

This is required to enable the BIOS support of the Intel Turbo Boost Max
Technology 3.0 feature.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/a023623a727e86040a1715797055f6402caefd7e.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/acpi/bus.c   | 3 +++
 include/linux/acpi.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 2f381ba..806db0d 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -338,6 +338,9 @@ static void acpi_bus_osc_support(void)
 	}
 #endif
 
+	if (IS_ENABLED(CONFIG_SCHED_ITMT))
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 61a3d90..0510237 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -469,6 +469,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
 #define OSC_SB_CPCV2_SUPPORT			0x00000040
 #define OSC_SB_PCLPI_SUPPORT			0x00000080
 #define OSC_SB_OSLPI_SUPPORT			0x00000100
+#define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT		0x00001000
 
 extern bool osc_sb_apei_support_acked;
 extern bool osc_pc_lpi_support_confirmed;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [tip:x86/core] cpufreq/intel_pstate: Use CPPC to get max performance
  2016-11-22 20:24 ` [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Tim Chen
@ 2016-11-24 19:55   ` tip-bot for Rafael J. Wysocki
  2016-12-07 19:06   ` [PATCH v8 8/8] cpufreq: intel_pstate: " Sebastian Andrzej Siewior
  1 sibling, 0 replies; 39+ messages in thread
From: tip-bot for Rafael J. Wysocki @ 2016-11-24 19:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tim.c.chen, tglx, rafael.j.wysocki, hpa,
	srinivas.pandruvada, mingo

Commit-ID:  17669006adf64d35a74cb21e3c8dfb6fb8be689f
Gitweb:     http://git.kernel.org/tip/17669006adf64d35a74cb21e3c8dfb6fb8be689f
Author:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AuthorDate: Tue, 22 Nov 2016 12:24:00 -0800
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Thu, 24 Nov 2016 20:44:20 +0100

cpufreq/intel_pstate: Use CPPC to get max performance

Use the acpi cppc_lib interface to get CPPC performance limits and update
the per cpu priority for the ITMT scheduler. If the highest performance of
CPUs differs the ITMT feature is enabled.

Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: linux-pm@vger.kernel.org
Cc: peterz@infradead.org
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Cc: linux-acpi@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: bp@suse.de
Link: http://lkml.kernel.org/r/0998b98943bcdec7d1ddd4ff27358da555ea8e92.1479844244.git.tim.c.chen@linux.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 drivers/cpufreq/Kconfig.x86    |  1 +
 drivers/cpufreq/intel_pstate.c | 56 +++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index adbd1de..c6d273b 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -6,6 +6,7 @@ config X86_INTEL_PSTATE
        bool "Intel P state control"
        depends on X86
        select ACPI_PROCESSOR if ACPI
+       select ACPI_CPPC_LIB if X86_64 && ACPI && SCHED_ITMT
        help
           This driver provides a P state for Intel core processors.
 	  The driver implements an internal governor and will become
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 4737520..e8dc42f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -44,6 +44,7 @@
 
 #ifdef CONFIG_ACPI
 #include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
 #endif
 
 #define FRAC_BITS 8
@@ -379,14 +380,67 @@ static bool intel_pstate_get_ppc_enable_status(void)
 	return acpi_ppc;
 }
 
+#ifdef CONFIG_ACPI_CPPC_LIB
+
+/* The work item is needed to avoid CPU hotplug locking issues */
+static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
+{
+	sched_set_itmt_support();
+}
+
+static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
+
+static void intel_pstate_set_itmt_prio(int cpu)
+{
+	struct cppc_perf_caps cppc_perf;
+	static u32 max_highest_perf = 0, min_highest_perf = U32_MAX;
+	int ret;
+
+	ret = cppc_get_perf_caps(cpu, &cppc_perf);
+	if (ret)
+		return;
+
+	/*
+	 * The priorities can be set regardless of whether or not
+	 * sched_set_itmt_support(true) has been called and it is valid to
+	 * update them at any time after it has been called.
+	 */
+	sched_set_itmt_core_prio(cppc_perf.highest_perf, cpu);
+
+	if (max_highest_perf <= min_highest_perf) {
+		if (cppc_perf.highest_perf > max_highest_perf)
+			max_highest_perf = cppc_perf.highest_perf;
+
+		if (cppc_perf.highest_perf < min_highest_perf)
+			min_highest_perf = cppc_perf.highest_perf;
+
+		if (max_highest_perf > min_highest_perf) {
+			/*
+			 * This code can be run during CPU online under the
+			 * CPU hotplug locks, so sched_set_itmt_support()
+			 * cannot be called from here.  Queue up a work item
+			 * to invoke it.
+			 */
+			schedule_work(&sched_itmt_work);
+		}
+	}
+}
+#else
+static void intel_pstate_set_itmt_prio(int cpu)
+{
+}
+#endif
+
 static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
 {
 	struct cpudata *cpu;
 	int ret;
 	int i;
 
-	if (hwp_active)
+	if (hwp_active) {
+		intel_pstate_set_itmt_prio(policy->cpu);
 		return;
+	}
 
 	if (!intel_pstate_get_ppc_enable_status())
 		return;

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-24 19:52   ` [tip:x86/core] " tip-bot for Tim Chen
@ 2016-11-25  8:19     ` Ingo Molnar
  2016-11-25  8:39       ` Peter Zijlstra
  2016-11-25 19:06       ` Thomas Gleixner
  0 siblings, 2 replies; 39+ messages in thread
From: Ingo Molnar @ 2016-11-25  8:19 UTC (permalink / raw)
  To: peterz, tglx, tim.c.chen, hpa, srinivas.pandruvada, linux-kernel
  Cc: linux-tip-commits


* tip-bot for Tim Chen <tipbot@zytor.com> wrote:

> Commit-ID:  5e76b2ab36b40ca33023e78725bdc69eafd63134
> Gitweb:     http://git.kernel.org/tip/5e76b2ab36b40ca33023e78725bdc69eafd63134
> Author:     Tim Chen <tim.c.chen@linux.intel.com>
> AuthorDate: Tue, 22 Nov 2016 12:23:55 -0800
> Committer:  Thomas Gleixner <tglx@linutronix.de>
> CommitDate: Thu, 24 Nov 2016 20:44:19 +0100
> 
> x86: Enable Intel Turbo Boost Max Technology 3.0

This patch doesn't build:

Note that this patch has to be redone anyway, as it won't even build:

> +#include <linux/sched.h>
> +#include <linux/cpumask.h>
> +#include <linux/cpuset.h>
> +#include <asm/mutex.h>
> +#include <linux/sched.h>
> +#include <linux/sysctl.h>
> +#include <linux/nodemask.h>

arch/x86/kernel/itmt.c:26:23: fatal error: asm/mutex.h: No such file or directory

> +config SCHED_ITMT
> +	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> +	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> +	---help---
> +	  ITMT enabled scheduler support improves the CPU scheduler's decision
> +	  to move tasks to cpu core that can be boosted to a higher frequency
> +	  than others. It will have better performance at a cost of slightly
> +	  increased overhead in task migrations. If unsure say N here.

Argh, so the 'itmt' name really sucks as well - could we please make it something 
more obvious - like SCHED_INTEL_TURBO or so - and similarly rename the file as 
well?

The sched_intel_turbo.c file could thus host all things related to scheduler 
support of turbo frequencies - it shouldn't be named after the Intel acronym of 
the day...

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-25  8:19     ` Ingo Molnar
@ 2016-11-25  8:39       ` Peter Zijlstra
  2016-11-25 19:06       ` Thomas Gleixner
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2016-11-25  8:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: tglx, tim.c.chen, hpa, srinivas.pandruvada, linux-kernel,
	linux-tip-commits

On Fri, Nov 25, 2016 at 09:19:47AM +0100, Ingo Molnar wrote:
> 
> * tip-bot for Tim Chen <tipbot@zytor.com> wrote:
> 
> > Commit-ID:  5e76b2ab36b40ca33023e78725bdc69eafd63134
> > Gitweb:     http://git.kernel.org/tip/5e76b2ab36b40ca33023e78725bdc69eafd63134
> > Author:     Tim Chen <tim.c.chen@linux.intel.com>
> > AuthorDate: Tue, 22 Nov 2016 12:23:55 -0800
> > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Thu, 24 Nov 2016 20:44:19 +0100
> > 
> > x86: Enable Intel Turbo Boost Max Technology 3.0
> 
> This patch doesn't build:
> 
> Note that this patch has to be redone anyway, as it won't even build:
> 
> > +#include <linux/sched.h>
> > +#include <linux/cpumask.h>
> > +#include <linux/cpuset.h>
> > +#include <asm/mutex.h>
> > +#include <linux/sched.h>
> > +#include <linux/sysctl.h>
> > +#include <linux/nodemask.h>
> 
> arch/x86/kernel/itmt.c:26:23: fatal error: asm/mutex.h: No such file or directory

Hehe, indeed, we killed that dead in the locking branch. Weird include
to have anyway.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-25  8:19     ` Ingo Molnar
  2016-11-25  8:39       ` Peter Zijlstra
@ 2016-11-25 19:06       ` Thomas Gleixner
  2016-11-28  8:51         ` Ingo Molnar
  1 sibling, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2016-11-25 19:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: peterz, tim.c.chen, hpa, srinivas.pandruvada, linux-kernel,
	linux-tip-commits

On Fri, 25 Nov 2016, Ingo Molnar wrote:

> 
> * tip-bot for Tim Chen <tipbot@zytor.com> wrote:
> 
> > Commit-ID:  5e76b2ab36b40ca33023e78725bdc69eafd63134
> > Gitweb:     http://git.kernel.org/tip/5e76b2ab36b40ca33023e78725bdc69eafd63134
> > Author:     Tim Chen <tim.c.chen@linux.intel.com>
> > AuthorDate: Tue, 22 Nov 2016 12:23:55 -0800
> > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > CommitDate: Thu, 24 Nov 2016 20:44:19 +0100
> > 
> > x86: Enable Intel Turbo Boost Max Technology 3.0
> 
> This patch doesn't build:
> 
> Note that this patch has to be redone anyway, as it won't even build:

The branch where I merged it to builds fine. 

Though, yes I missed the asm/mutex.h include which obviously should be
linux/mutex.h

> > +#include <linux/sched.h>
> > +#include <linux/cpumask.h>
> > +#include <linux/cpuset.h>
> > +#include <asm/mutex.h>
> > +#include <linux/sched.h>
> > +#include <linux/sysctl.h>
> > +#include <linux/nodemask.h>
> 
> arch/x86/kernel/itmt.c:26:23: fatal error: asm/mutex.h: No such file or directory
> 
> > +config SCHED_ITMT
> > +	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> > +	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> > +	---help---
> > +	  ITMT enabled scheduler support improves the CPU scheduler's decision
> > +	  to move tasks to cpu core that can be boosted to a higher frequency
> > +	  than others. It will have better performance at a cost of slightly
> > +	  increased overhead in task migrations. If unsure say N here.
> 
> Argh, so the 'itmt' name really sucks as well - could we please make it something 
> more obvious - like SCHED_INTEL_TURBO or so - and similarly rename the file as 
> well?
>
> The sched_intel_turbo.c file could thus host all things related to scheduler 
> support of turbo frequencies - it shouldn't be named after the Intel acronym of 
> the day...

It would be nice to come up with such nitpicks during review. This thing
went through 8 iterations, but nothing came up and I didn't mind the itmt
naming.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-25 19:06       ` Thomas Gleixner
@ 2016-11-28  8:51         ` Ingo Molnar
  2016-11-28 17:35           ` Tim Chen
  0 siblings, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2016-11-28  8:51 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: peterz, tim.c.chen, hpa, srinivas.pandruvada, linux-kernel,
	linux-tip-commits


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Fri, 25 Nov 2016, Ingo Molnar wrote:
> 
> > 
> > * tip-bot for Tim Chen <tipbot@zytor.com> wrote:
> > 
> > > Commit-ID:  5e76b2ab36b40ca33023e78725bdc69eafd63134
> > > Gitweb:     http://git.kernel.org/tip/5e76b2ab36b40ca33023e78725bdc69eafd63134
> > > Author:     Tim Chen <tim.c.chen@linux.intel.com>
> > > AuthorDate: Tue, 22 Nov 2016 12:23:55 -0800
> > > Committer:  Thomas Gleixner <tglx@linutronix.de>
> > > CommitDate: Thu, 24 Nov 2016 20:44:19 +0100
> > > 
> > > x86: Enable Intel Turbo Boost Max Technology 3.0
> > 
> > This patch doesn't build:
> > 
> > Note that this patch has to be redone anyway, as it won't even build:
> 
> The branch where I merged it to builds fine. 

Indeed you are right - asm/mutex.h is gone in locking/core, so this is a semantic 
merge conflict, not a build failure.

> Though, yes I missed the asm/mutex.h include which obviously should be
> linux/mutex.h
> 
> > > +#include <linux/sched.h>
> > > +#include <linux/cpumask.h>
> > > +#include <linux/cpuset.h>
> > > +#include <asm/mutex.h>
> > > +#include <linux/sched.h>
> > > +#include <linux/sysctl.h>
> > > +#include <linux/nodemask.h>
> > 
> > arch/x86/kernel/itmt.c:26:23: fatal error: asm/mutex.h: No such file or directory
> > 
> > > +config SCHED_ITMT
> > > +	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> > > +	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> > > +	---help---
> > > +	  ITMT enabled scheduler support improves the CPU scheduler's decision
> > > +	  to move tasks to cpu core that can be boosted to a higher frequency
> > > +	  than others. It will have better performance at a cost of slightly
> > > +	  increased overhead in task migrations. If unsure say N here.
> > 
> > Argh, so the 'itmt' name really sucks as well - could we please make it something 
> > more obvious - like SCHED_INTEL_TURBO or so - and similarly rename the file as 
> > well?
> >
> > The sched_intel_turbo.c file could thus host all things related to scheduler 
> > support of turbo frequencies - it shouldn't be named after the Intel acronym of 
> > the day...
> 
> It would be nice to come up with such nitpicks during review. This thing went 
> through 8 iterations, but nothing came up and I didn't mind the itmt naming.

Yeah, so I had to NAK an early iteration and didn't get around to doing a really 
detailed review yet - and after (falsely) thinking it had a build failure I got 
overly worked up about the bad naming: my bad and apologies!

So the code looks good to me but the naming still sucks a bit - I'm fine with 
having the commits re-merged as-is and renaming the Kconfig variable to something 
more expressive: I've done this in tip:sched/core and have fixed the asm/mutex.h 
thing as well.

Wrt. improving the naming:

Firstly, popular tech news has coined the 'Turbo Boost Max' technology 'TBM' (TBM2 
and TBM3) as the natural acronym of the Intel feature - not 'ITMT'. So to anyone 
except people well aware of Intel acronyms the term 'ITMT' will be pretty 
meaningless.

Does something more generic like SCHED_MC_PRIO (as an extension to SCHED_MC) work 
with everyone? Intel Turbo Max 3.0 is the current (only) implementation of it, but 
I don't think the technology will stop at that stage as dies are getting larger 
but thinner.

I also think the Kconfig text is somewhat misleading and the default-disabled 
status is counterproductive:

+config SCHED_ITMT
+       bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
+       depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+       ---help---
+         ITMT enabled scheduler support improves the CPU scheduler's decision
+         to move tasks to cpu core that can be boosted to a higher frequency
+         than others. It will have better performance at a cost of slightly
+         increased overhead in task migrations. If unsure say N here.

... the extra cost of smarter CPU selection is IMHO overwhelmed by the negative 
effects of not knowing about core frequency ordering, on most workloads.

A better default would be default-y I believe (that is what we do for CPU hardware 
enablement typically), and a better description would be something like:

+config SCHED_MC_PRIO
+       bool "CPU core priorities scheduler support"
+       depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+	default y
+       ---help---
+       Intel Turbo Boost Max 3.0 enabled CPUs have a core ordering determined at 
+	manufacturing time, which allows certain cores to reach higher turbo
+	frequencies (when running single threaded workloads) than others.
+
+	Enabling this kernel feature teaches the scheduler about the TBM3 priority
+	order of the CPU cores and adjusts the scheduler's CPU selection logic 
+	accordingly, so that higher overall system performance can be achieved.
+
+	This feature will have no effect on CPUs without this feature.
+
+	If unsure say Y here.

If/when other architectures make use of this the Kconfig entry can be moved into 
the scheduler Kconfig - but for the time being it can stay in arch/x86/.

Another variant would be to eliminate the Kconfig option altogether and make it a 
natural feature of SCHED_MC (like it is in the core scheduler).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature
  2016-11-22 20:23 ` [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature Tim Chen
  2016-11-24 19:53   ` [tip:x86/core] " tip-bot for Tim Chen
@ 2016-11-28  8:56   ` Borislav Petkov
  2016-11-29 17:30     ` Tim Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Borislav Petkov @ 2016-11-28  8:56 UTC (permalink / raw)
  To: Tim Chen
  Cc: rjw, tglx, mingo, x86, linux-pm, linux-kernel, linux-acpi,
	peterz, jolsa, Srinivas Pandruvada

On Tue, Nov 22, 2016 at 12:23:56PM -0800, Tim Chen wrote:
> Intel Turbo Boost Max Technology 3.0 (ITMT) feature
> allows some cores to be boosted to higher turbo
> frequency than others.
> 
> Add /proc/sys/kernel/sched_itmt_enabled so operator
> can enable/disable scheduling of tasks that favor cores
> with higher turbo boost frequency potential.
> 
> By default, system that is ITMT capable and single
> socket has this feature turned on.  It is more likely
> to be lightly loaded and operates in Turbo range.
> 
> When there is a change in the ITMT scheduling operation
> desired, a rebuild of the sched domain is initiated
> so the scheduler can set up sched domains with appropriate
> flag to enable/disable ITMT scheduling operations.
> 
> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> ---
>  arch/x86/include/asm/topology.h |   7 ++-
>  arch/x86/kernel/itmt.c          | 108 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 112 insertions(+), 3 deletions(-)

...

> +/*
> + * Boolean to control whether we want to move processes to cpu capable
> + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
> + * Technology 3.0.
> + *
> + * It can be set via /proc/sys/kernel/sched_itmt_enabled
> + */
> +unsigned int __read_mostly sysctl_sched_itmt_enabled;

Err, can we not have the boolean in the name itself?

I.e., have it called sysctl_sched_itmt and 1 means enabled and 0
disabled? I.e., the classic thing. :)

Ditto for the sysctl name?

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-28  8:51         ` Ingo Molnar
@ 2016-11-28 17:35           ` Tim Chen
  2016-11-28 23:22             ` Rafael J. Wysocki
  2016-11-29  7:11             ` Ingo Molnar
  0 siblings, 2 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-28 17:35 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: peterz, hpa, srinivas.pandruvada, linux-kernel,
	linux-tip-commits, Rafael J. Wysocki

On Mon, 2016-11-28 at 09:51 +0100, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > 
> > 
> > > 
> > > > 
> > > > +#include <linux/sched.h>
> > > > +#include <linux/cpumask.h>
> > > > +#include <linux/cpuset.h>
> > > > +#include <asm/mutex.h>
> > > > +#include <linux/sched.h>
> > > > +#include <linux/sysctl.h>
> > > > +#include <linux/nodemask.h>
> > > arch/x86/kernel/itmt.c:26:23: fatal error: asm/mutex.h: No such file or directory
> > > 
> > > > 
> > > > +config SCHED_ITMT
> > > > +	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> > > > +	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> > > > +	---help---
> > > > +	  ITMT enabled scheduler support improves the CPU scheduler's decision
> > > > +	  to move tasks to cpu core that can be boosted to a higher frequency
> > > > +	  than others. It will have better performance at a cost of slightly
> > > > +	  increased overhead in task migrations. If unsure say N here.
> > > Argh, so the 'itmt' name really sucks as well - could we please make it something 
> > > more obvious - like SCHED_INTEL_TURBO or so - and similarly rename the file as 
> > > well?
> > > 
> > > The sched_intel_turbo.c file could thus host all things related to scheduler 
> > > support of turbo frequencies - it shouldn't be named after the Intel acronym of 
> > > the day...
> > It would be nice to come up with such nitpicks during review. This thing went 
> > through 8 iterations, but nothing came up and I didn't mind the itmt naming.
> Yeah, so I had to NAK an early iteration and didn't get around to doing a really 
> detailed review yet - and after (falsely) thinking it had a build failure I got 
> overly worked up about the bad naming: my bad and apologies!
> 
> So the code looks good to me but the naming still sucks a bit - I'm fine with 
> having the commits re-merged as-is and renaming the Kconfig variable to something 
> more expressive: I've done this in tip:sched/core and have fixed the asm/mutex.h 
> thing as well.
> 
> Wrt. improving the naming:
> 
> Firstly, popular tech news has coined the 'Turbo Boost Max' technology 'TBM' (TBM2 
> and TBM3) as the natural acronym of the Intel feature - not 'ITMT'. So to anyone 
> except people well aware of Intel acronyms the term 'ITMT' will be pretty 
> meaningless.
> 
> Does something more generic like SCHED_MC_PRIO (as an extension to SCHED_MC) work 
> with everyone? Intel Turbo Max 3.0 is the current (only) implementation of it, but 
> I don't think the technology will stop at that stage as dies are getting larger 
> but thinner.
> 
> I also think the Kconfig text is somewhat misleading and the default-disabled 
> status is counterproductive:
> 
> +config SCHED_ITMT
> +       bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> +       depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> +       ---help---
> +         ITMT enabled scheduler support improves the CPU scheduler's decision
> +         to move tasks to cpu core that can be boosted to a higher frequency
> +         than others. It will have better performance at a cost of slightly
> +         increased overhead in task migrations. If unsure say N here.
> 
> ... the extra cost of smarter CPU selection is IMHO overwhelmed by the negative 
> effects of not knowing about core frequency ordering, on most workloads.
> 
> A better default would be default-y I believe (that is what we do for CPU hardware 
> enablement typically), and a better description would be something like:
> 
> +config SCHED_MC_PRIO
> +       bool "CPU core priorities scheduler support"
> +       depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> +	default y
> +       ---help---
> +       Intel Turbo Boost Max 3.0 enabled CPUs have a core ordering determined at 
> +	manufacturing time, which allows certain cores to reach higher turbo
> +	frequencies (when running single threaded workloads) than others.
> +
> +	Enabling this kernel feature teaches the scheduler about the TBM3 priority
> +	order of the CPU cores and adjusts the scheduler's CPU selection logic 
> +	accordingly, so that higher overall system performance can be achieved.
> +
> +	This feature will have no effect on CPUs without this feature.
> +
> +	If unsure say Y here.
> 
> If/when other architectures make use of this the Kconfig entry can be moved into 
> the scheduler Kconfig - but for the time being it can stay in arch/x86/.
> 
> Another variant would be to eliminate the Kconfig option altogether and make it a 
> natural feature of SCHED_MC (like it is in the core scheduler).
> 

I am fine with renaming SCHED_ITMT to SCHED_MC_PRIO.  Patch 7 and 8 that
Rafael merged into his tree also have SCHED_ITMT so they will need to
be updated if we renamed it.

Thanks.

Tim

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-28 17:35           ` Tim Chen
@ 2016-11-28 23:22             ` Rafael J. Wysocki
  2016-11-29  7:11             ` Ingo Molnar
  1 sibling, 0 replies; 39+ messages in thread
From: Rafael J. Wysocki @ 2016-11-28 23:22 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ingo Molnar, Thomas Gleixner, peterz, hpa, srinivas.pandruvada,
	linux-kernel, linux-tip-commits, Rafael J. Wysocki

On Monday, November 28, 2016 09:35:58 AM Tim Chen wrote:
> On Mon, 2016-11-28 at 09:51 +0100, Ingo Molnar wrote:
> > * Thomas Gleixner <tglx@linutronix.de> wrote:
> > > 
> > >
> > > > > +#include <linux/sched.h>
> > > > > +#include <linux/cpumask.h>
> > > > > +#include <linux/cpuset.h>
> > > > > +#include <asm/mutex.h>
> > > > > +#include <linux/sched.h>
> > > > > +#include <linux/sysctl.h>
> > > > > +#include <linux/nodemask.h>
> > > > 
> > > > arch/x86/kernel/itmt.c:26:23: fatal error: asm/mutex.h: No such file
> > > > or directory> > > 
> > > > > +config SCHED_ITMT
> > > > > +	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> > > > > +	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> > > > > +	---help---
> > > > > +	  ITMT enabled scheduler support improves the CPU scheduler's
> > > > > decision
> > > > > +	  to move tasks to cpu core that can be boosted to a higher
> > > > > frequency
> > > > > +	  than others. It will have better performance at a cost of
> > > > > slightly
> > > > > +	  increased overhead in task migrations. If unsure say N here.
> > > > 
> > > > Argh, so the 'itmt' name really sucks as well - could we please make
> > > > it something  more obvious - like SCHED_INTEL_TURBO or so - and
> > > > similarly rename the file as well?
> > > > 
> > > > The sched_intel_turbo.c file could thus host all things related to
> > > > scheduler  support of turbo frequencies - it shouldn't be named after
> > > > the Intel acronym of the day...
> > > 
> > > It would be nice to come up with such nitpicks during review. This thing
> > > went  through 8 iterations, but nothing came up and I didn't mind the
> > > itmt naming.> 
> > Yeah, so I had to NAK an early iteration and didn't get around to doing a
> > really  detailed review yet - and after (falsely) thinking it had a build
> > failure I got overly worked up about the bad naming: my bad and
> > apologies!
> > 
> > So the code looks good to me but the naming still sucks a bit - I'm fine
> > with  having the commits re-merged as-is and renaming the Kconfig
> > variable to something more expressive: I've done this in tip:sched/core
> > and have fixed the asm/mutex.h thing as well.
> > 
> > Wrt. improving the naming:
> > 
> > Firstly, popular tech news has coined the 'Turbo Boost Max' technology
> > 'TBM' (TBM2  and TBM3) as the natural acronym of the Intel feature - not
> > 'ITMT'. So to anyone except people well aware of Intel acronyms the term
> > 'ITMT' will be pretty meaningless.
> > 
> > Does something more generic like SCHED_MC_PRIO (as an extension to
> > SCHED_MC) work  with everyone? Intel Turbo Max 3.0 is the current (only)
> > implementation of it, but I don't think the technology will stop at that
> > stage as dies are getting larger but thinner.
> > 
> > I also think the Kconfig text is somewhat misleading and the
> > default-disabled  status is counterproductive:
> > 
> > +config SCHED_ITMT
> > +       bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
> > +       depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> > +       ---help---
> > +         ITMT enabled scheduler support improves the CPU scheduler's
> > decision +         to move tasks to cpu core that can be boosted to a
> > higher frequency +         than others. It will have better performance
> > at a cost of slightly +         increased overhead in task migrations. If
> > unsure say N here.
> > 
> > ... the extra cost of smarter CPU selection is IMHO overwhelmed by the
> > negative  effects of not knowing about core frequency ordering, on most
> > workloads.
> > 
> > A better default would be default-y I believe (that is what we do for CPU
> > hardware  enablement typically), and a better description would be
> > something like:
> > 
> > +config SCHED_MC_PRIO
> > +       bool "CPU core priorities scheduler support"
> > +       depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
> > +	default y
> > +       ---help---
> > +       Intel Turbo Boost Max 3.0 enabled CPUs have a core ordering
> > determined at  +	manufacturing time, which allows certain cores to 
reach
> > higher turbo +	frequencies (when running single threaded workloads) than
> > others. +
> > +	Enabling this kernel feature teaches the scheduler about the TBM3
> > priority +	order of the CPU cores and adjusts the scheduler's CPU
> > selection logic +	accordingly, so that higher overall system 
performance
> > can be achieved. +
> > +	This feature will have no effect on CPUs without this feature.
> > +
> > +	If unsure say Y here.
> > 
> > If/when other architectures make use of this the Kconfig entry can be
> > moved into  the scheduler Kconfig - but for the time being it can stay in
> > arch/x86/.
> > 
> > Another variant would be to eliminate the Kconfig option altogether and
> > make it a  natural feature of SCHED_MC (like it is in the core
> > scheduler).
> 
> I am fine with renaming SCHED_ITMT to SCHED_MC_PRIO.  Patch 7 and 8 that
> Rafael merged into his tree also have SCHED_ITMT so they will need to
> be updated if we renamed it.

No, I haven't.  They are in tip AFAICS.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-28 17:35           ` Tim Chen
  2016-11-28 23:22             ` Rafael J. Wysocki
@ 2016-11-29  7:11             ` Ingo Molnar
  2016-11-29 18:45               ` Tim Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Ingo Molnar @ 2016-11-29  7:11 UTC (permalink / raw)
  To: Tim Chen
  Cc: Thomas Gleixner, peterz, hpa, srinivas.pandruvada, linux-kernel,
	linux-tip-commits, Rafael J. Wysocki


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> > +	If unsure say Y here.
> > 
> > If/when other architectures make use of this the Kconfig entry can be moved into 
> > the scheduler Kconfig - but for the time being it can stay in arch/x86/.
> > 
> > Another variant would be to eliminate the Kconfig option altogether and make it a 
> > natural feature of SCHED_MC (like it is in the core scheduler).
> > 
> 
> I am fine with renaming SCHED_ITMT to SCHED_MC_PRIO.

Ok, could you please send a delta patch on top of tip:sched/core that does this 
and the other improvements?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature
  2016-11-28  8:56   ` [PATCH v8 4/8] " Borislav Petkov
@ 2016-11-29 17:30     ` Tim Chen
  2016-11-29 17:51       ` Borislav Petkov
  0 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-11-29 17:30 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: rjw, tglx, mingo, x86, linux-pm, linux-kernel, linux-acpi,
	peterz, jolsa, Srinivas Pandruvada

On Mon, 2016-11-28 at 09:56 +0100, Borislav Petkov wrote:
> On Tue, Nov 22, 2016 at 12:23:56PM -0800, Tim Chen wrote:
> > 
> > Intel Turbo Boost Max Technology 3.0 (ITMT) feature
> > allows some cores to be boosted to higher turbo
> > frequency than others.
> > 
> > Add /proc/sys/kernel/sched_itmt_enabled so operator
> > can enable/disable scheduling of tasks that favor cores
> > with higher turbo boost frequency potential.
> > 
> > By default, system that is ITMT capable and single
> > socket has this feature turned on.  It is more likely
> > to be lightly loaded and operates in Turbo range.
> > 
> > When there is a change in the ITMT scheduling operation
> > desired, a rebuild of the sched domain is initiated
> > so the scheduler can set up sched domains with appropriate
> > flag to enable/disable ITMT scheduling operations.
> > 
> > Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Co-developed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > ---
> >  arch/x86/include/asm/topology.h |   7 ++-
> >  arch/x86/kernel/itmt.c          | 108 +++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 112 insertions(+), 3 deletions(-)
> ...
> 
> > 
> > +/*
> > + * Boolean to control whether we want to move processes to cpu capable
> > + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
> > + * Technology 3.0.
> > + *
> > + * It can be set via /proc/sys/kernel/sched_itmt_enabled
> > + */sched_autogroup_enabled
> > +unsigned int __read_mostly sysctl_sched_itmt_enabled;
> Err, can we not have the boolean in the name itself?
> 
> I.e., have it called sysctl_sched_itmt and 1 means enabled and 0
> disabled? I.e., the classic thing. :)
> 
> Ditto for the sysctl name?

I am following the convention in /proc/sys/kernel/sched_autogroup_enabled
and sysctl_sched_autogroup_enabled that's also in /proc/sys/kernel.

Thanks.

Tim

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature
  2016-11-29 17:30     ` Tim Chen
@ 2016-11-29 17:51       ` Borislav Petkov
  0 siblings, 0 replies; 39+ messages in thread
From: Borislav Petkov @ 2016-11-29 17:51 UTC (permalink / raw)
  To: Tim Chen
  Cc: rjw, tglx, mingo, x86, linux-pm, linux-kernel, linux-acpi,
	peterz, jolsa, Srinivas Pandruvada

On Tue, Nov 29, 2016 at 09:30:49AM -0800, Tim Chen wrote:
> I am following the convention in /proc/sys/kernel/sched_autogroup_enabled
> and sysctl_sched_autogroup_enabled that's also in /proc/sys/kernel.

That's hardly a convention to me:

$ ls -l /proc/sys/kernel/*abled
-rw-r--r-- 1 root root 0 Nov 29 18:47 /proc/sys/kernel/ftrace_enabled
-rw-r--r-- 1 root root 0 Nov 29 18:47 /proc/sys/kernel/kexec_load_disabled
-rw-r--r-- 1 root root 0 Nov 29 18:47 /proc/sys/kernel/modules_disabled

enabled, disabled, ...

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [tip:x86/core] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-11-29  7:11             ` Ingo Molnar
@ 2016-11-29 18:45               ` Tim Chen
  0 siblings, 0 replies; 39+ messages in thread
From: Tim Chen @ 2016-11-29 18:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, peterz, hpa, srinivas.pandruvada, linux-kernel,
	linux-tip-commits, Rafael J. Wysocki

On Tue, 2016-11-29 at 08:11 +0100, Ingo Molnar wrote:
> * Tim Chen <tim.c.chen@linux.intel.com> wrote:
> 
> > 
> > > 
> > > +	If unsure say Y here.
> > > 
> > > If/when other architectures make use of this the Kconfig entry can be moved into 
> > > the scheduler Kconfig - but for the time being it can stay in arch/x86/.
> > > 
> > > Another variant would be to eliminate the Kconfig option altogether and make it a 
> > > natural feature of SCHED_MC (like it is in the core scheduler).
> > > 
> > I am fine with renaming SCHED_ITMT to SCHED_MC_PRIO.
> Ok, could you please send a delta patch on top of tip:sched/core that does this 
> and the other improvements?
> 

I have sent a delta patch in a separate mail for this change.

Thanks.

Tim

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-11-22 20:24 ` [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Tim Chen
  2016-11-24 19:55   ` [tip:x86/core] cpufreq/intel_pstate: " tip-bot for Rafael J. Wysocki
@ 2016-12-07 19:06   ` Sebastian Andrzej Siewior
  2016-12-07 23:12     ` Tim Chen
  1 sibling, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-12-07 19:06 UTC (permalink / raw)
  To: Tim Chen
  Cc: rjw, tglx, mingo, bp, Rafael J. Wysocki, x86, linux-pm,
	linux-kernel, linux-acpi, peterz, jolsa, Srinivas Pandruvada

On 2016-11-22 12:24:00 [-0800], Tim Chen wrote:
> From: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
> 
> This change uses acpi cppc_lib interface to get CPPC performance limits
> and calls scheduler interface to update per cpu highest priority. If
> there is a difference in highest performance of each CPUs, call scheduler
> interface to enable ITMT feature for only one time.
> 
> Here sched_set_itmt_core_prio() is called to set priorities and
> sched_set_itmt_support() is called to enable ITMT feature.

First I had crashed what I bisected down to de966cf4a4fa ("sched/x86: Change
CONFIG_SCHED_ITMT to CONFIG_SCHED_MC_PRIO") because it made SCHED_ITMT the
default.
Then I run another bisect round and got here with the same backtrace:
|BUG: unable to handle kernel NULL pointer dereference at           (null)
|IP: [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
|PGD 0 [    0.577616]
|Oops: 0000 [#1] SMP
|Modules linked in:
|CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc6-00146-g17669006adf6 #51
|task: ffff88003f878000 task.stack: ffffc90000008000
|RIP: 0010:[<ffffffff812aab6e>]  [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
|RSP: 0000:ffffc9000000bd48  EFLAGS: 00010296
|RAX: 00000000000137e0 RBX: 0000000000000000 RCX: 0000000000000001
|RDX: ffff88003fc00000 RSI: 0000000000000000 RDI: ffff88003fbca130
|RBP: ffffc9000000bd60 R08: 0000000000000514 R09: 0000000000000000
|R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
|R13: 0000000000000020 R14: ffffffff8167cb00 R15: 0000000000000000
|FS:  0000000000000000(0000) GS:ffff88003fcc0000(0000) knlGS:0000000000000000
|CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
|CR2: 0000000000000000 CR3: 0000000001618000 CR4: 00000000000406e0
|Stack:
| ffff88003f939848 ffff88003fbca130 0000000000000001 ffffc9000000bd80
| ffffffff812a4ccb ffff88003fc0cee8 0000000000000000 ffffc9000000bdb8
| ffffffff812dc20d ffff88003fc0cee8 ffffffff8167cb00 ffff88003fc0cf48
|Call Trace:
| [<ffffffff812a4ccb>] acpi_processor_stop+0xb2/0xc5
| [<ffffffff812dc20d>] driver_probe_device+0x14d/0x2f0
| [<ffffffff812dc41e>] __driver_attach+0x6e/0x90
| [<ffffffff812da234>] bus_for_each_dev+0x54/0x90
| [<ffffffff812dbbf9>] driver_attach+0x19/0x20
| [<ffffffff812db6a6>] bus_add_driver+0xe6/0x200
| [<ffffffff812dcb23>] driver_register+0x83/0xc0
| [<ffffffff816f050a>] acpi_processor_driver_init+0x20/0x94
| [<ffffffff81000487>] do_one_initcall+0x97/0x180
| [<ffffffff816ccf5c>] kernel_init_freeable+0x112/0x1a6
| [<ffffffff813a0fc9>] kernel_init+0x9/0xf0
| [<ffffffff813acf35>] ret_from_fork+0x25/0x30
|Code: 02 00 00 00 48 8b 14 d5 e0 c3 55 81 48 8b 1c 02 4c 8d 6b 20 eb 15 49 8b 7d 00 48 85 ff 74 05 e8 39 8c d9 ff 41 ff c4 49 83 c5 20 <44> 3b 23 72 e6 48 8d bb a0 02 00 00 e8 b1 6f f9 ff 48 89 df e8
|RIP  [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
| RSP <ffffc9000000bd48>
|CR2: 0000000000000000
|---[ end trace 917a625107b09711 ]---

The patch attached fixes it. Could someone who looked longer at the code
than I actually confirm that this fine or fix it differently? This makes
the crash on boot on a "default" kvm setup go away.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 drivers/acpi/cppc_acpi.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index d0d0504b7c89..93252e5374c5 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -803,6 +803,7 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
 		if (addr)
 			iounmap(addr);
 	}
+	per_cpu(cpc_desc_ptr, pr->id) = NULL;
 	kfree(cpc_ptr);
 
 out_buf_free:
@@ -824,6 +825,8 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
 	void __iomem *addr;
 
 	cpc_ptr = per_cpu(cpc_desc_ptr, pr->id);
+	if (!cpc_ptr)
+		return;
 
 	/* Free all the mapped sys mem areas for this CPU */
 	for (i = 2; i < cpc_ptr->num_entries; i++) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-12-07 19:06   ` [PATCH v8 8/8] cpufreq: intel_pstate: " Sebastian Andrzej Siewior
@ 2016-12-07 23:12     ` Tim Chen
  2016-12-07 23:29       ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Tim Chen @ 2016-12-07 23:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: rjw, tglx, mingo, bp, Rafael J. Wysocki, x86, linux-pm,
	linux-kernel, linux-acpi, peterz, jolsa, Srinivas Pandruvada

On Wed, 2016-12-07 at 20:06 +0100, Sebastian Andrzej Siewior wrote:
> 
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  drivers/acpi/cppc_acpi.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index d0d0504b7c89..93252e5374c5 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -803,6 +803,7 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
>  		if (addr)
>  			iounmap(addr);
>  	}
> +	per_cpu(cpc_desc_ptr, pr->id) = NULL;
>  	kfree(cpc_ptr);
>  
>  out_buf_free:
> @@ -824,6 +825,8 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
>  	void __iomem *addr;
>  
>  	cpc_ptr = per_cpu(cpc_desc_ptr, pr->id);
> +	if (!cpc_ptr)
> +		return;

I agree that not handling null pointer here is a bug that should be fixed.
The cpc_ptr is checked at other places like acpi_get_psd_map.  
We could potentially have a null cpc_ptr say when
the parsing of CPC table failed. We should handle such cases gracefully.

Tim

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-12-07 23:12     ` Tim Chen
@ 2016-12-07 23:29       ` Rafael J. Wysocki
  2016-12-09 14:45         ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2016-12-07 23:29 UTC (permalink / raw)
  To: Tim Chen, Sebastian Andrzej Siewior
  Cc: tglx, mingo, bp, Rafael J. Wysocki, x86, linux-pm, linux-kernel,
	linux-acpi, peterz, jolsa, Srinivas Pandruvada

On Wednesday, December 07, 2016 03:12:53 PM Tim Chen wrote:
> On Wed, 2016-12-07 at 20:06 +0100, Sebastian Andrzej Siewior wrote:
> > 
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > ---
> >  drivers/acpi/cppc_acpi.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> > index d0d0504b7c89..93252e5374c5 100644
> > --- a/drivers/acpi/cppc_acpi.c
> > +++ b/drivers/acpi/cppc_acpi.c
> > @@ -803,6 +803,7 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
> >  		if (addr)
> >  			iounmap(addr);
> >  	}
> > +	per_cpu(cpc_desc_ptr, pr->id) = NULL;
> >  	kfree(cpc_ptr);
> >  
> >  out_buf_free:
> > @@ -824,6 +825,8 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
> >  	void __iomem *addr;
> >  
> >  	cpc_ptr = per_cpu(cpc_desc_ptr, pr->id);
> > +	if (!cpc_ptr)
> > +		return;
> 
> I agree that not handling null pointer here is a bug that should be fixed.
> The cpc_ptr is checked at other places like acpi_get_psd_map.  
> We could potentially have a null cpc_ptr say when
> the parsing of CPC table failed. We should handle such cases gracefully.

Agreed, but the bug fixed by the first hunk is real too.  I'd fix it a bit
differently, though:

Tentatively-signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

---
 drivers/acpi/cppc_acpi.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/cppc_acpi.c
===================================================================
--- linux-pm.orig/drivers/acpi/cppc_acpi.c
+++ linux-pm/drivers/acpi/cppc_acpi.c
@@ -776,9 +776,6 @@ int acpi_cppc_processor_probe(struct acp
 		init_waitqueue_head(&pcc_data.pcc_write_wait_q);
 	}
 
-	/* Plug PSD data into this CPUs CPC descriptor. */
-	per_cpu(cpc_desc_ptr, pr->id) = cpc_ptr;
-
 	/* Everything looks okay */
 	pr_debug("Parsed CPC struct for CPU: %d\n", pr->id);
 
@@ -789,10 +786,15 @@ int acpi_cppc_processor_probe(struct acp
 		goto out_free;
 	}
 
+	/* Plug PSD data into this CPUs CPC descriptor. */
+	per_cpu(cpc_desc_ptr, pr->id) = cpc_ptr;
+
 	ret = kobject_init_and_add(&cpc_ptr->kobj, &cppc_ktype, &cpu_dev->kobj,
 			"acpi_cppc");
-	if (ret)
+	if (ret) {
+		per_cpu(cpc_desc_ptr, pr->id) = NULL;
 		goto out_free;
+	}
 
 	kfree(output.pointer);
 	return 0;
@@ -826,6 +828,8 @@ void acpi_cppc_processor_exit(struct acp
 	void __iomem *addr;
 
 	cpc_ptr = per_cpu(cpc_desc_ptr, pr->id);
+	if (!cpc_ptr)
+		return;
 
 	/* Free all the mapped sys mem areas for this CPU */
 	for (i = 2; i < cpc_ptr->num_entries; i++) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-12-07 23:29       ` Rafael J. Wysocki
@ 2016-12-09 14:45         ` Sebastian Andrzej Siewior
  2016-12-09 15:02           ` Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-12-09 14:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tim Chen, tglx, mingo, bp, Rafael J. Wysocki, x86, linux-pm,
	linux-kernel, linux-acpi, peterz, jolsa, Srinivas Pandruvada

On 2016-12-08 00:29:29 [+0100], Rafael J. Wysocki wrote:
> Agreed, but the bug fixed by the first hunk is real too.  I'd fix it a bit
> differently, though:
> 
> Tentatively-signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Is there anything you want me to do here? The hunk in
acpi_cppc_processor_exit() is unchanged and is the one that led to the
crash. The other hunk I made (the one you changed) was something I
noticed while looking at the code - nothing that hit me directly.

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-12-09 14:45         ` Sebastian Andrzej Siewior
@ 2016-12-09 15:02           ` Rafael J. Wysocki
  2016-12-09 23:52             ` [PATCH] ACPI / CPPC: Fix per-CPU pointers management Rafael J. Wysocki
  0 siblings, 1 reply; 39+ messages in thread
From: Rafael J. Wysocki @ 2016-12-09 15:02 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Rafael J. Wysocki, Tim Chen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Rafael J. Wysocki, the arch/x86 maintainers,
	Linux PM, Linux Kernel Mailing List, ACPI Devel Maling List,
	Peter Zijlstra, jolsa, Srinivas Pandruvada

On Fri, Dec 9, 2016 at 3:45 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2016-12-08 00:29:29 [+0100], Rafael J. Wysocki wrote:
>> Agreed, but the bug fixed by the first hunk is real too.  I'd fix it a bit
>> differently, though:
>>
>> Tentatively-signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Is there anything you want me to do here? The hunk in
> acpi_cppc_processor_exit() is unchanged and is the one that led to the
> crash. The other hunk I made (the one you changed) was something I
> noticed while looking at the code - nothing that hit me directly.

OK, thanks.

I'll add a changelog to the patch and resend it later today.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] ACPI / CPPC: Fix per-CPU pointers management
  2016-12-09 15:02           ` Rafael J. Wysocki
@ 2016-12-09 23:52             ` Rafael J. Wysocki
  2016-12-10 18:51               ` Sebastian Andrzej Siewior
  2016-12-14  2:26               ` Rafael J. Wysocki
  0 siblings, 2 replies; 39+ messages in thread
From: Rafael J. Wysocki @ 2016-12-09 23:52 UTC (permalink / raw)
  To: Thomas Gleixner, ACPI Devel Maling List
  Cc: Rafael J. Wysocki, Sebastian Andrzej Siewior, Tim Chen,
	Ingo Molnar, Borislav Petkov, Rafael J. Wysocki,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	Peter Zijlstra, jolsa, Srinivas Pandruvada

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Enabling ACPI CPPC on x86 causes a NULL pointer dereference to occur
(on boot on a "default" KVM setup) in acpi_cppc_processor_exit() due
to a missing check against NULL in there:

|BUG: unable to handle kernel NULL pointer dereference at           (null)
|IP: [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
|PGD 0 [    0.577616]
|Oops: 0000 [#1] SMP
|Modules linked in:
|CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc6-00146-g17669006adf6 #51
|task: ffff88003f878000 task.stack: ffffc90000008000
|RIP: 0010:[<ffffffff812aab6e>]  [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
|RSP: 0000:ffffc9000000bd48  EFLAGS: 00010296
|RAX: 00000000000137e0 RBX: 0000000000000000 RCX: 0000000000000001
|RDX: ffff88003fc00000 RSI: 0000000000000000 RDI: ffff88003fbca130
|RBP: ffffc9000000bd60 R08: 0000000000000514 R09: 0000000000000000
|R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
|R13: 0000000000000020 R14: ffffffff8167cb00 R15: 0000000000000000
|FS:  0000000000000000(0000) GS:ffff88003fcc0000(0000) knlGS:0000000000000000
|CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
|CR2: 0000000000000000 CR3: 0000000001618000 CR4: 00000000000406e0
|Stack:
| ffff88003f939848 ffff88003fbca130 0000000000000001 ffffc9000000bd80
| ffffffff812a4ccb ffff88003fc0cee8 0000000000000000 ffffc9000000bdb8
| ffffffff812dc20d ffff88003fc0cee8 ffffffff8167cb00 ffff88003fc0cf48
|Call Trace:
| [<ffffffff812a4ccb>] acpi_processor_stop+0xb2/0xc5
| [<ffffffff812dc20d>] driver_probe_device+0x14d/0x2f0
| [<ffffffff812dc41e>] __driver_attach+0x6e/0x90
| [<ffffffff812da234>] bus_for_each_dev+0x54/0x90
| [<ffffffff812dbbf9>] driver_attach+0x19/0x20
| [<ffffffff812db6a6>] bus_add_driver+0xe6/0x200
| [<ffffffff812dcb23>] driver_register+0x83/0xc0
| [<ffffffff816f050a>] acpi_processor_driver_init+0x20/0x94
| [<ffffffff81000487>] do_one_initcall+0x97/0x180
| [<ffffffff816ccf5c>] kernel_init_freeable+0x112/0x1a6
| [<ffffffff813a0fc9>] kernel_init+0x9/0xf0
| [<ffffffff813acf35>] ret_from_fork+0x25/0x30
|Code: 02 00 00 00 48 8b 14 d5 e0 c3 55 81 48 8b 1c 02 4c 8d 6b 20 eb 15 49 8b 7d 00 48 85 ff 74 05 e8 39 8c d9 ff 41 ff c4 49 83 c5 20 <44> 3b 23 72 e6 48 8d bb a0 02 00 00 e8 b1 6f f9 ff 48 89 df e8
|RIP  [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
| RSP <ffffc9000000bd48>
|CR2: 0000000000000000

Fix that and while at it, fix a possible use-after-free scenario in
acpi_cppc_processor_probe() that can happen if the function returns
without cleaning up the per-CPU pointer set by it previously.

Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Original-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Hi Thomas,

The crash fixed by this is exposed by the ITMT (asymmetric packing) series
(which involves using ACPI CPPC on x86), so IMO it would be good to route it
through tip along with that series.

Thanks,
Rafael

---
 drivers/acpi/cppc_acpi.c |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/acpi/cppc_acpi.c
===================================================================
--- linux-pm.orig/drivers/acpi/cppc_acpi.c
+++ linux-pm/drivers/acpi/cppc_acpi.c
@@ -776,9 +776,6 @@ int acpi_cppc_processor_probe(struct acp
 		init_waitqueue_head(&pcc_data.pcc_write_wait_q);
 	}
 
-	/* Plug PSD data into this CPUs CPC descriptor. */
-	per_cpu(cpc_desc_ptr, pr->id) = cpc_ptr;
-
 	/* Everything looks okay */
 	pr_debug("Parsed CPC struct for CPU: %d\n", pr->id);
 
@@ -789,10 +786,15 @@ int acpi_cppc_processor_probe(struct acp
 		goto out_free;
 	}
 
+	/* Plug PSD data into this CPUs CPC descriptor. */
+	per_cpu(cpc_desc_ptr, pr->id) = cpc_ptr;
+
 	ret = kobject_init_and_add(&cpc_ptr->kobj, &cppc_ktype, &cpu_dev->kobj,
 			"acpi_cppc");
-	if (ret)
+	if (ret) {
+		per_cpu(cpc_desc_ptr, pr->id) = NULL;
 		goto out_free;
+	}
 
 	kfree(output.pointer);
 	return 0;
@@ -826,6 +828,8 @@ void acpi_cppc_processor_exit(struct acp
 	void __iomem *addr;
 
 	cpc_ptr = per_cpu(cpc_desc_ptr, pr->id);
+	if (!cpc_ptr)
+		return;
 
 	/* Free all the mapped sys mem areas for this CPU */
 	for (i = 2; i < cpc_ptr->num_entries; i++) {

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] ACPI / CPPC: Fix per-CPU pointers management
  2016-12-09 23:52             ` [PATCH] ACPI / CPPC: Fix per-CPU pointers management Rafael J. Wysocki
@ 2016-12-10 18:51               ` Sebastian Andrzej Siewior
  2016-12-12  1:00                 ` Rafael J. Wysocki
  2016-12-14  2:26               ` Rafael J. Wysocki
  1 sibling, 1 reply; 39+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-12-10 18:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Thomas Gleixner, ACPI Devel Maling List, Rafael J. Wysocki,
	Tim Chen, Ingo Molnar, Borislav Petkov, Rafael J. Wysocki,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	Peter Zijlstra, jolsa, Srinivas Pandruvada

On 2016-12-10 00:52:28 [+0100], Rafael J. Wysocki wrote:
> Hi Thomas,
> 
> The crash fixed by this is exposed by the ITMT (asymmetric packing) series
> (which involves using ACPI CPPC on x86), so IMO it would be good to route it
> through tip along with that series.

can we get this merged into the original patch please? This is default y
and breaks bisecting.

> Thanks,
> Rafael

Sebastian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] ACPI / CPPC: Fix per-CPU pointers management
  2016-12-10 18:51               ` Sebastian Andrzej Siewior
@ 2016-12-12  1:00                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 39+ messages in thread
From: Rafael J. Wysocki @ 2016-12-12  1:00 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Rafael J. Wysocki, Thomas Gleixner, ACPI Devel Maling List,
	Rafael J. Wysocki, Tim Chen, Ingo Molnar, Borislav Petkov,
	Rafael J. Wysocki, the arch/x86 maintainers, Linux PM,
	Linux Kernel Mailing List, Peter Zijlstra, jolsa,
	Srinivas Pandruvada

On Sat, Dec 10, 2016 at 7:51 PM, Sebastian Andrzej Siewior
<bigeasy@linutronix.de> wrote:
> On 2016-12-10 00:52:28 [+0100], Rafael J. Wysocki wrote:
>> Hi Thomas,
>>
>> The crash fixed by this is exposed by the ITMT (asymmetric packing) series
>> (which involves using ACPI CPPC on x86), so IMO it would be good to route it
>> through tip along with that series.
>
> can we get this merged into the original patch please?

Functionally, that patch has a little to do with the fix, so I'd
rather not do that.

> This is default y  and breaks bisecting.

Instead, I would reorder the series to put the fix in front of that patch.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] ACPI / CPPC: Fix per-CPU pointers management
  2016-12-09 23:52             ` [PATCH] ACPI / CPPC: Fix per-CPU pointers management Rafael J. Wysocki
  2016-12-10 18:51               ` Sebastian Andrzej Siewior
@ 2016-12-14  2:26               ` Rafael J. Wysocki
  1 sibling, 0 replies; 39+ messages in thread
From: Rafael J. Wysocki @ 2016-12-14  2:26 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: ACPI Devel Maling List, Rafael J. Wysocki,
	Sebastian Andrzej Siewior, Tim Chen, Ingo Molnar,
	Borislav Petkov, Rafael J. Wysocki, the arch/x86 maintainers,
	Linux PM, Linux Kernel Mailing List, Peter Zijlstra, jolsa,
	Srinivas Pandruvada

On Saturday, December 10, 2016 12:52:28 AM Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Enabling ACPI CPPC on x86 causes a NULL pointer dereference to occur
> (on boot on a "default" KVM setup) in acpi_cppc_processor_exit() due
> to a missing check against NULL in there:
> 
> |BUG: unable to handle kernel NULL pointer dereference at           (null)
> |IP: [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
> |PGD 0 [    0.577616]
> |Oops: 0000 [#1] SMP
> |Modules linked in:
> |CPU: 3 PID: 1 Comm: swapper/0 Not tainted 4.9.0-rc6-00146-g17669006adf6 #51
> |task: ffff88003f878000 task.stack: ffffc90000008000
> |RIP: 0010:[<ffffffff812aab6e>]  [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
> |RSP: 0000:ffffc9000000bd48  EFLAGS: 00010296
> |RAX: 00000000000137e0 RBX: 0000000000000000 RCX: 0000000000000001
> |RDX: ffff88003fc00000 RSI: 0000000000000000 RDI: ffff88003fbca130
> |RBP: ffffc9000000bd60 R08: 0000000000000514 R09: 0000000000000000
> |R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000002
> |R13: 0000000000000020 R14: ffffffff8167cb00 R15: 0000000000000000
> |FS:  0000000000000000(0000) GS:ffff88003fcc0000(0000) knlGS:0000000000000000
> |CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> |CR2: 0000000000000000 CR3: 0000000001618000 CR4: 00000000000406e0
> |Stack:
> | ffff88003f939848 ffff88003fbca130 0000000000000001 ffffc9000000bd80
> | ffffffff812a4ccb ffff88003fc0cee8 0000000000000000 ffffc9000000bdb8
> | ffffffff812dc20d ffff88003fc0cee8 ffffffff8167cb00 ffff88003fc0cf48
> |Call Trace:
> | [<ffffffff812a4ccb>] acpi_processor_stop+0xb2/0xc5
> | [<ffffffff812dc20d>] driver_probe_device+0x14d/0x2f0
> | [<ffffffff812dc41e>] __driver_attach+0x6e/0x90
> | [<ffffffff812da234>] bus_for_each_dev+0x54/0x90
> | [<ffffffff812dbbf9>] driver_attach+0x19/0x20
> | [<ffffffff812db6a6>] bus_add_driver+0xe6/0x200
> | [<ffffffff812dcb23>] driver_register+0x83/0xc0
> | [<ffffffff816f050a>] acpi_processor_driver_init+0x20/0x94
> | [<ffffffff81000487>] do_one_initcall+0x97/0x180
> | [<ffffffff816ccf5c>] kernel_init_freeable+0x112/0x1a6
> | [<ffffffff813a0fc9>] kernel_init+0x9/0xf0
> | [<ffffffff813acf35>] ret_from_fork+0x25/0x30
> |Code: 02 00 00 00 48 8b 14 d5 e0 c3 55 81 48 8b 1c 02 4c 8d 6b 20 eb 15 49 8b 7d 00 48 85 ff 74 05 e8 39 8c d9 ff 41 ff c4 49 83 c5 20 <44> 3b 23 72 e6 48 8d bb a0 02 00 00 e8 b1 6f f9 ff 48 89 df e8
> |RIP  [<ffffffff812aab6e>] acpi_cppc_processor_exit+0x40/0x60
> | RSP <ffffc9000000bd48>
> |CR2: 0000000000000000
> 
> Fix that and while at it, fix a possible use-after-free scenario in
> acpi_cppc_processor_probe() that can happen if the function returns
> without cleaning up the per-CPU pointer set by it previously.
> 
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Original-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> Hi Thomas,
> 
> The crash fixed by this is exposed by the ITMT (asymmetric packing) series
> (which involves using ACPI CPPC on x86), so IMO it would be good to route it
> through tip along with that series.

The problematic commit has gone in already, so I'll route the fix through
the ACPI tree.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2016-12-14  2:30 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-22 20:23 [PATCH v8 0/8] Support Intel Turbo Boost Max Technology 3.0 Tim Chen
2016-11-22 20:23 ` [PATCH v8 1/8] sched: Extend scheduler's asym packing Tim Chen
2016-11-23 13:09   ` Peter Zijlstra
2016-11-23 17:32     ` Tim Chen
2016-11-24 13:25   ` [tip:sched/core] " tip-bot for Tim Chen
2016-11-22 20:23 ` [PATCH v8 2/8] x86/topology: Define x86's arch_update_cpu_topology Tim Chen
2016-11-24 19:52   ` [tip:x86/core] " tip-bot for Tim Chen
2016-11-22 20:23 ` [PATCH v8 3/8] x86: Enable Intel Turbo Boost Max Technology 3.0 Tim Chen
2016-11-24 19:52   ` [tip:x86/core] " tip-bot for Tim Chen
2016-11-25  8:19     ` Ingo Molnar
2016-11-25  8:39       ` Peter Zijlstra
2016-11-25 19:06       ` Thomas Gleixner
2016-11-28  8:51         ` Ingo Molnar
2016-11-28 17:35           ` Tim Chen
2016-11-28 23:22             ` Rafael J. Wysocki
2016-11-29  7:11             ` Ingo Molnar
2016-11-29 18:45               ` Tim Chen
2016-11-22 20:23 ` [PATCH v8 4/8] x86/sysctl: Add sysctl for ITMT scheduling feature Tim Chen
2016-11-24 19:53   ` [tip:x86/core] " tip-bot for Tim Chen
2016-11-28  8:56   ` [PATCH v8 4/8] " Borislav Petkov
2016-11-29 17:30     ` Tim Chen
2016-11-29 17:51       ` Borislav Petkov
2016-11-22 20:23 ` [PATCH v8 5/8] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Tim Chen
2016-11-24 19:53   ` [tip:x86/core] " tip-bot for Tim Chen
2016-11-22 20:23 ` [PATCH v8 6/8] acpi: bus: Enable HWP CPPC objects Tim Chen
2016-11-24 19:54   ` [tip:x86/core] acpi/bus: " tip-bot for Srinivas Pandruvada
2016-11-22 20:23 ` [PATCH v8 7/8] acpi: bus: Set _OSC for diverse core support Tim Chen
2016-11-24 19:54   ` [tip:x86/core] acpi/bus: " tip-bot for Srinivas Pandruvada
2016-11-22 20:24 ` [PATCH v8 8/8] cpufreq: intel_pstate: Use CPPC to get max performance Tim Chen
2016-11-24 19:55   ` [tip:x86/core] cpufreq/intel_pstate: " tip-bot for Rafael J. Wysocki
2016-12-07 19:06   ` [PATCH v8 8/8] cpufreq: intel_pstate: " Sebastian Andrzej Siewior
2016-12-07 23:12     ` Tim Chen
2016-12-07 23:29       ` Rafael J. Wysocki
2016-12-09 14:45         ` Sebastian Andrzej Siewior
2016-12-09 15:02           ` Rafael J. Wysocki
2016-12-09 23:52             ` [PATCH] ACPI / CPPC: Fix per-CPU pointers management Rafael J. Wysocki
2016-12-10 18:51               ` Sebastian Andrzej Siewior
2016-12-12  1:00                 ` Rafael J. Wysocki
2016-12-14  2:26               ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).