All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0
@ 2016-09-21 19:19 Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 01/10] x86/topology: Fix numa in package topology bug Srinivas Pandruvada
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

v4:
- Split x86 multi-node numa topology bug fix and setting
of SD_ASYM flag for ITMT topology into 2 patches
- Split the sysctl changes for ITMT enablement and setting of ITMT
capability/core priorities into 2 patches.
- Avoid unnecessary rebuild of sched domains when ITMT sysctl or
capabilities are updated.
- Fix missing stub function for topology_max_packages for !SMP case.
- Rename set_sched_itmt() to sched_set_itmt_support().
- Various updates to itmt.c to eliminate goto and make logic tighter.
- Various change logs and comments update.
- intel_pstate: Split function to process cppc and enable ITMT
- intel_pstate: Just keep the cppc_perf information till we use CPPC for HWP

v3:
- Fix race clash when more than one program are enabling/disabling ITMT
- Remove group_priority_cpu macro to simplify code.
- Error reported by 0-day for compile issue on ARM

v2
- The patchset is split into two parts so that CPPC changes can be merged first
 1. Only ACPI CPPC changes (It is posted separately)
 2. ITMT changes (scheduler and Intel P-State)

- Changes in patch: sched,x86: Enable Turbo Boost Max Technology
 1. Use arch_update_cpu_topology to indicate need to completely
    rebuild sched domain when ITMT related sched domain flags change
 2. Enable client (single node) platform capable of ITMT with ITMT
    scheduling by default
 3. Implement arch_asym_cpu_priority to provide the cpu priority
    value to scheduler for asym packing.
 4. Fix a compile bug for i386 architecture.

- Changes in patch: sched: Extend scheduler's asym packing
 1. Use arch_asym_cpu_priority() to provide cpu priority
    value used for asym packing to the scheduler.

- Changes in acpi: bus: Enable HWP CPPC objects and
  acpi: bus: Set _OSC for diverse core support
  Minor code cleanup by removing #ifdef
- Changes in Kconfig for Intel P-State
  Avoid building CPPC lib for i386 for issue reported by 0-day

- Feature is enabled by default for single socket systems

With Intel® Turbo Boost Max Technology 3.0 (ITMT), single-threaded performance is
optimized by identifying processor's fastest core and running critical workloads
on it.
Refere to:
http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-max-technology.html

This patchset consist of all changes required to support ITMT feature:
- Use CPPC information in Intel P-State driver to get performance information
- Scheduler enhancements
- cppc lib patches (split in to a seprate series)

This featured can be enabled by writing at runtime
# echo 1 > /proc/sys/kernel/sched_itmt_enabled
This featured can be disabled by writing at runtime
# echo 0 > /proc/sys/kernel/sched_itmt_enabled

Srinivas Pandruvada (3):
  acpi: bus: Enable HWP CPPC objects
  acpi: bus: Set _OSC for diverse core support
  cpufreq: intel_pstate: Use CPPC to get max performance

Tim Chen (7):
  x86/topology: Fix numa in package topology bug
  sched: Extend scheduler's asym packing
  x86/topology: Provide topology_num_packages()
  x86/topology: Define x86's arch_update_cpu_topology
  x86: Enable Intel Turbo Boost Max Technology 3.0
  x86/sysctl: Add sysctl for ITMT scheduling feature
  x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU

 arch/x86/Kconfig                |   9 ++
 arch/x86/include/asm/topology.h |  27 ++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 185 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |  85 ++++++++++++++----
 drivers/acpi/bus.c              |  10 +++
 drivers/cpufreq/Kconfig.x86     |   1 +
 drivers/cpufreq/intel_pstate.c  | 103 +++++++++++++++++++++-
 include/linux/acpi.h            |   1 +
 include/linux/sched.h           |   2 +
 kernel/sched/core.c             |  21 +++++
 kernel/sched/fair.c             |  35 +++++---
 kernel/sched/sched.h            |   6 ++
 13 files changed, 456 insertions(+), 30 deletions(-)
 create mode 100644 arch/x86/kernel/itmt.c

-- 
2.7.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v4 01/10] x86/topology: Fix numa in package topology bug
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-30 11:55   ` [tip:sched/core] sched/core, x86/topology: Fix NUMA " tip-bot for Tim Chen
  2016-09-21 19:19 ` [PATCH v4 02/10] sched: Extend scheduler's asym packing Srinivas Pandruvada
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

Current code can call set_cpu_sibling_map and invoke sched_set_topology
more than once (e.g. on hot plug).  When this happens after
sched_init_smp has been called,  we lose the numa topology extension to
sched_domain_topology in sched_init_numa.  This results in incorrect
topology when the sched domain is rebuilt.

This patch fixes the bug and issues warning if we call sched_set_topology
after sched_init_smp.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 46 ++++++++++++++++++++++++++++++----------------
 kernel/sched/core.c       |  3 +++
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4296beb..7137ec4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -471,7 +471,7 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
-static struct sched_domain_topology_level numa_inside_package_topology[] = {
+static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
 	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
 #endif
@@ -480,22 +480,23 @@ static struct sched_domain_topology_level numa_inside_package_topology[] = {
 #endif
 	{ NULL, },
 };
+
+static struct sched_domain_topology_level x86_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 /*
- * set_sched_topology() sets the topology internal to a CPU.  The
- * NUMA topologies are layered on top of it to build the full
- * system topology.
- *
- * If NUMA nodes are observed to occur within a CPU package, this
- * function should be called.  It forces the sched domain code to
- * only use the SMT level for the CPU portion of the topology.
- * This essentially falls back to relying on NUMA information
- * from the SRAT table to describe the entire system topology
- * (except for hyperthreads).
+ * Set if a package/die has multiple NUMA nodes inside.
+ * AMD Magny-Cours and Intel Cluster-on-Die have this.
  */
-static void primarily_use_numa_for_topology(void)
-{
-	set_sched_topology(numa_inside_package_topology);
-}
+static bool x86_has_numa_in_package;
 
 void set_cpu_sibling_map(int cpu)
 {
@@ -558,7 +559,7 @@ void set_cpu_sibling_map(int cpu)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
 		if (match_die(c, o) && !topology_same_node(c, o))
-			primarily_use_numa_for_topology();
+			x86_has_numa_in_package = true;
 	}
 
 	threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1304,6 +1305,16 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
+
+	/*
+	 * Set 'default' x86 topology, this matches default_topology() in that
+	 * it has NUMA nodes as a topology level. See also
+	 * native_smp_cpus_done().
+	 *
+	 * Must be done before set_cpus_sibling_map() is ran.
+	 */
+	set_sched_topology(x86_topology);
+
 	set_cpu_sibling_map(0);
 
 	switch (smp_sanity_check(max_cpus)) {
@@ -1370,6 +1381,9 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 {
 	pr_debug("Boot done\n");
 
+	if (x86_has_numa_in_package)
+		set_sched_topology(x86_numa_in_package_topology);
+
 	nmi_selftest();
 	impress_friends();
 	setup_ioapic_dest();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..e86c4a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6487,6 +6487,9 @@ static struct sched_domain_topology_level *sched_domain_topology =
 
 void set_sched_topology(struct sched_domain_topology_level *tl)
 {
+	if (WARN_ON_ONCE(sched_smp_initialized))
+		return;
+
 	sched_domain_topology = tl;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 02/10] sched: Extend scheduler's asym packing
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 01/10] x86/topology: Fix numa in package topology bug Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 03/10] x86/topology: Provide topology_num_packages() Srinivas Pandruvada
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number.  This allows the use of the
ASYM_PACKING scheduler machinery to move loads to preferred CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 18 ++++++++++++++++++
 kernel/sched/fair.c   | 35 ++++++++++++++++++++++++-----------
 kernel/sched/sched.h  |  6 ++++++
 4 files changed, 50 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 98fe95f..82ca1e4 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1052,6 +1052,8 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
+int arch_asym_cpu_priority(int cpu);
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e86c4a5..08135ca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6237,7 +6237,25 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1, prev_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (prev_cpu < 0) {
+				prev_cpu = cpu;
+				max_cpu = cpu;
+			} else {
+				if (sched_asym_prefer(cpu, max_cpu))
+					max_cpu = cpu;
+			}
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a5cd07b..bb96e1a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -100,6 +100,16 @@ const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
  */
 unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
 
+#ifdef CONFIG_SMP
+/*
+ * For asym packing, by default the lower numbered cpu has higher priority.
+ */
+int __weak arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
@@ -6861,16 +6871,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(sds->busiest->asym_prefer_cpu,
+				      sg->asym_prefer_cpu))
 			return true;
 	}
 
@@ -7022,8 +7034,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = sds->busiest->asym_prefer_cpu;
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7364,10 +7376,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b7fc1ce..3f3d04a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -532,6 +532,11 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+}
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -884,6 +889,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 03/10] x86/topology: Provide topology_num_packages()
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 01/10] x86/topology: Fix numa in package topology bug Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 02/10] sched: Extend scheduler's asym packing Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 04/10] x86/topology: Define x86's arch_update_cpu_topology Srinivas Pandruvada
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

Returns number of cpu packages discovered.

This information is needed to determine the size of the platform and
decide if the Intel Turbo Boost Max Technology 3.0 (ITMT) feature
should be turned on by default.  The ITMT feature is more effective on
single socket client like system that uses small number of cores most
of the time.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/include/asm/topology.h | 3 +++
 arch/x86/kernel/smpboot.c       | 5 +++++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index cf75871..3e95dfc 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -129,10 +129,13 @@ static inline int topology_max_smt_threads(void)
 }
 
 int topology_update_package_map(unsigned int apicid, unsigned int cpu);
+extern int topology_num_packages(void);
 extern int topology_phys_to_logical_pkg(unsigned int pkg);
 #else
 #define topology_max_packages()			(1)
 static inline int
+topology_num_packages(void) { return 1; }
+static inline int
 topology_update_package_map(unsigned int apicid, unsigned int cpu) { return 0; }
 static inline int topology_phys_to_logical_pkg(unsigned int pkg) { return 0; }
 static inline int topology_max_smt_threads(void) { return 1; }
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 7137ec4..6a763a2 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -295,6 +295,11 @@ found:
 	return 0;
 }
 
+int topology_num_packages(void)
+{
+	return logical_packages;
+}
+
 /**
  * topology_phys_to_logical_pkg - Map a physical package id to a logical
  *
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 04/10] x86/topology: Define x86's arch_update_cpu_topology
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (2 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 03/10] x86/topology: Provide topology_num_packages() Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 05/10] x86: Enable Intel Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

The scheduler calls arch_update_cpu_topology() to check whether the
scheduler domains have to be rebuilt.

So far x86 has no requirement for this, but the upcoming ITMT support
makes this necessary.

Request the rebuild when the x86 internal update flag is set.

Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/smpboot.c       | 11 +++++++++++
 2 files changed, 12 insertions(+)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 3e95dfc..323f61f 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -149,4 +149,5 @@ struct pci_bus;
 int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
+extern bool x86_topology_update;
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 6a763a2..38901b3 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -109,6 +109,17 @@ static bool logical_packages_frozen __read_mostly;
 /* Maximum number of SMT threads on any online core */
 int __max_smt_threads __read_mostly;
 
+/* Flag to indicate if a complete sched domain rebuild is required */
+bool x86_topology_update;
+
+int arch_update_cpu_topology(void)
+{
+	int retval = x86_topology_update;
+
+	x86_topology_update = false;
+	return retval;
+}
+
 static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip)
 {
 	unsigned long flags;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 05/10] x86: Enable Intel Turbo Boost Max Technology 3.0
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (3 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 04/10] x86/topology: Define x86's arch_update_cpu_topology Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 06/10] x86/sysctl: Add sysctl for ITMT scheduling feature Srinivas Pandruvada
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

On platforms supporting Intel Turbo Boost Max Technology 3.0, the maximum
turbo frequencies of some cores in a CPU package may be higher than for
the other cores in the same package.  In that case, better performance
(and possibly lower energy consumption as well) can be achieved by
making the scheduler prefer to run tasks on the CPUs with higher max
turbo frequencies.

To that end, set up a core priority metric to abstract the core
preferences based on the maximum turbo frequency.  In that metric,
the cores with higher maximum turbo frequencies are higher-priority
than the other cores in the same package and that causes the scheduler
to favor them when making load-balancing decisions using the asymmertic
packing approach.  At the same time, the priority of SMT threads with a
higher CPU number is reduced so as to avoid scheduling tasks on all of
the threads that belong to a favored core before all of the other cores
have been given a task to run.

The priority metric will be initialized by the P-state driver with the
help of the sched_set_itmt_core_prio() function.  The P-state driver
will also determine whether or not ITMT is supported by the platform
and will call sched_set_itmt_support() to indicate that.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/Kconfig                |  9 ++++
 arch/x86/include/asm/topology.h | 22 ++++++++++
 arch/x86/kernel/Makefile        |  1 +
 arch/x86/kernel/itmt.c          | 91 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 123 insertions(+)
 create mode 100644 arch/x86/kernel/itmt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2a1f0ce..6dfb97d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -927,6 +927,15 @@ config SCHED_MC
 	  making when dealing with multi-core CPU chips at a cost of slightly
 	  increased overhead in some places. If unsure say N here.
 
+config SCHED_ITMT
+	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
+	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+	---help---
+	  ITMT enabled scheduler support improves the CPU scheduler's decision
+	  to move tasks to cpu core that can be boosted to a higher frequency
+	  than others. It will have better performance at a cost of slightly
+	  increased overhead in task migrations. If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config UP_LATE_INIT
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 323f61f..637d847 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -150,4 +150,26 @@ int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
 extern bool x86_topology_update;
+
+#ifdef CONFIG_SCHED_ITMT
+#include <asm/percpu.h>
+
+DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+
+/* Interface to set priority of a cpu */
+void sched_set_itmt_core_prio(int prio, int core_cpu);
+
+/* Interface to notify scheduler that system supports ITMT */
+void sched_set_itmt_support(bool itmt_supported);
+
+#else /* CONFIG_SCHED_ITMT */
+
+static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+}
+static inline void sched_set_itmt_support(bool itmt_supported)
+{
+}
+#endif /* CONFIG_SCHED_ITMT */
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0503f5b..2008335 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
+obj-$(CONFIG_SCHED_ITMT)		+= itmt.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
new file mode 100644
index 0000000..3e1636b
--- /dev/null
+++ b/arch/x86/kernel/itmt.c
@@ -0,0 +1,91 @@
+/*
+ * itmt.c: Support Intel Turbo Boost Max Technology 3.0
+ *
+ * (C) Copyright 2016 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ *
+ * On platforms supporting Intel Turbo Boost Max Technology 3.0, (ITMT),
+ * the maximum turbo frequencies of some cores in a CPU package may be
+ * higher than for the other cores in the same package.  In that case,
+ * better performance can be achieved by making the scheduler prefer
+ * to run tasks on the CPUs with higher max turbo frequencies.
+ *
+ * This file provides functions and data structures for enabling the
+ * scheduler to favor scheduling on cores can be boosted to a higher
+ * frequency under ITMT.
+ */
+
+#include <linux/sched.h>
+#include <linux/cpumask.h>
+#include <linux/cpuset.h>
+#include <asm/mutex.h>
+#include <linux/sched.h>
+#include <linux/sysctl.h>
+#include <linux/nodemask.h>
+
+static DEFINE_MUTEX(itmt_update_mutex);
+
+/* Boolean to track if system has ITMT capabilities */
+static bool __read_mostly sched_itmt_capable;
+
+/**
+ * sched_set_itmt_support - Indicate platform support ITMT
+ * @itmt_supported: indicate platform's CPU has ITMT capability
+ *
+ * This function is used by the OS to indicate to scheduler if the platform
+ * is capable of supporting the ITMT feature.
+ *
+ * The current scheme has the pstate driver detects if the system
+ * is ITMT capable and call this function.
+ *
+ * This must be done only after sched_set_itmt_core_prio
+ * has been called to set the cpus' priorities.
+ */
+void sched_set_itmt_support(bool itmt_supported)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	if (itmt_supported != sched_itmt_capable)
+		sched_itmt_capable = itmt_supported;
+
+	mutex_unlock(&itmt_update_mutex);
+}
+
+DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+int arch_asym_cpu_priority(int cpu)
+{
+	return per_cpu(sched_core_priority, cpu);
+}
+
+/**
+ * sched_set_itmt_core_prio - Set CPU priority based on ITMT
+ * @prio: Priority of cpu core
+ * @core_cpu: The cpu number associated with the core
+ *
+ * The pstate driver will find out the max boost frequency
+ * and call this function to set a priority proportional
+ * to the max boost frequency. CPU with higher boost
+ * frequency will receive higher priority.
+ */
+void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+	int cpu, i = 1;
+
+	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
+		int smt_prio;
+
+		/*
+		 * Ensure that the siblings are moved to the end
+		 * of the priority chain and only used when
+		 * all other high priority cpus are out of capacity.
+		 */
+		smt_prio = prio * smp_num_siblings / i;
+		i++;
+		per_cpu(sched_core_priority, cpu) = smt_prio;
+	}
+}
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 06/10] x86/sysctl: Add sysctl for ITMT scheduling feature
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (4 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 05/10] x86: Enable Intel Turbo Boost Max Technology 3.0 Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Srinivas Pandruvada
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

Intel Turbo Boost Max Technology 3.0 (ITMT) feature
allows some cores to be boosted to higher turbo
frequency than others.

Add /proc/sys/kernel/sched_itmt_enabled so operator
can enable/disable scheduling of tasks that favor cores
with higher turbo boost frequency potential.

By default, system that is ITMT capable and single
socket has this feature turned on.  It is more likely
to be lightly loaded and operates in Turbo range.

When there is a change in the ITMT scheduling operation
desired, a rebuild of the sched domain is initiated
so the scheduler can set up sched domains with appropriate
flag to enable/disable ITMT scheduling operations.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/include/asm/topology.h |  1 +
 arch/x86/kernel/itmt.c          | 98 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 97 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index 637d847..78e56f0 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -155,6 +155,7 @@ extern bool x86_topology_update;
 #include <asm/percpu.h>
 
 DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+extern unsigned int __read_mostly sysctl_sched_itmt_enabled;
 
 /* Interface to set priority of a cpu */
 void sched_set_itmt_core_prio(int prio, int core_cpu);
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
index 3e1636b..e7c96fe 100644
--- a/arch/x86/kernel/itmt.c
+++ b/arch/x86/kernel/itmt.c
@@ -33,6 +33,67 @@ static DEFINE_MUTEX(itmt_update_mutex);
 /* Boolean to track if system has ITMT capabilities */
 static bool __read_mostly sched_itmt_capable;
 
+/*
+ * Boolean to control whether we want to move processes to cpu capable
+ * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
+ * Technology 3.0.
+ *
+ * It can be set via /proc/sys/kernel/sched_itmt_enabled
+ */
+unsigned int __read_mostly sysctl_sched_itmt_enabled;
+
+static int sched_itmt_update_handler(struct ctl_table *table, int write,
+			      void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret;
+	unsigned int old_sysctl;
+
+	mutex_lock(&itmt_update_mutex);
+
+	if (!sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return 0;
+	}
+
+	old_sysctl = sysctl_sched_itmt_enabled;
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (!ret && write && old_sysctl != sysctl_sched_itmt_enabled) {
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
+
+	mutex_unlock(&itmt_update_mutex);
+
+	return ret;
+}
+
+static unsigned int zero;
+static unsigned int one = 1;
+static struct ctl_table itmt_kern_table[] = {
+	{
+		.procname	= "sched_itmt_enabled",
+		.data		= &sysctl_sched_itmt_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_itmt_update_handler,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{}
+};
+
+static struct ctl_table itmt_root_table[] = {
+	{
+		.procname	= "kernel",
+		.mode		= 0555,
+		.child		= itmt_kern_table,
+	},
+	{}
+};
+
+static struct ctl_table_header *itmt_sysctl_header;
+
 /**
  * sched_set_itmt_support - Indicate platform support ITMT
  * @itmt_supported: indicate platform's CPU has ITMT capability
@@ -45,13 +106,46 @@ static bool __read_mostly sched_itmt_capable;
  *
  * This must be done only after sched_set_itmt_core_prio
  * has been called to set the cpus' priorities.
+ *
+ * It must not be called with cpu hot plug lock
+ * held as we need to acquire the lock to rebuild sched domains
+ * later.
  */
 void sched_set_itmt_support(bool itmt_supported)
 {
 	mutex_lock(&itmt_update_mutex);
 
-	if (itmt_supported != sched_itmt_capable)
-		sched_itmt_capable = itmt_supported;
+	if (itmt_supported == sched_itmt_capable) {
+		mutex_unlock(&itmt_update_mutex);
+		return;
+	}
+	sched_itmt_capable = itmt_supported;
+
+	if (itmt_supported) {
+		itmt_sysctl_header =
+			register_sysctl_table(itmt_root_table);
+		if (!itmt_sysctl_header) {
+			mutex_unlock(&itmt_update_mutex);
+			return;
+		}
+		/*
+		 * ITMT capability automatically enables ITMT
+		 * scheduling for small systems (single node).
+		 */
+		if (topology_num_packages() == 1)
+			sysctl_sched_itmt_enabled = 1;
+	} else {
+		if (itmt_sysctl_header)
+			unregister_sysctl_table(itmt_sysctl_header);
+	}
+
+	if (sysctl_sched_itmt_enabled) {
+		/* disable sched_itmt if we are no longer ITMT capable */
+		if (!itmt_supported)
+			sysctl_sched_itmt_enabled = 0;
+		x86_topology_update = true;
+		rebuild_sched_domains();
+	}
 
 	mutex_unlock(&itmt_update_mutex);
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (5 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 06/10] x86/sysctl: Add sysctl for ITMT scheduling feature Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:58     ` kbuild test robot
                     ` (2 more replies)
  2016-09-21 19:19 ` [PATCH v4 08/10] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

From: Tim Chen <tim.c.chen@linux.intel.com>

Some Intel cores in a package can be boosted to a higher turbo frequency
with ITMT 3.0 technology. The scheduler can use the asymmetric packing
feature to move tasks to the more capable cores.

If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
sched domains to enable asymmetric packing.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 38901b3..46815e6 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -487,22 +487,41 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
+#ifndef CONFIG_SCHED_ITMT
+#define sysctl_sched_itmt_enabled	0
+#endif
+
+static inline int x86_sched_itmt_flags(void)
+{
+	return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
+}
+
+static int x86_core_flags(void)
+{
+	return cpu_core_flags() | x86_sched_itmt_flags();
+}
+
+static int x86_smt_flags(void)
+{
+	return cpu_smt_flags() | x86_sched_itmt_flags();
+}
+
 static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ NULL, },
 };
 
 static struct sched_domain_topology_level x86_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
 	{ NULL, },
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 08/10] acpi: bus: Enable HWP CPPC objects
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (6 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 09/10] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposes CPPC tables.

The primary reason to enable CPPC support is to get the maximum
performance of each CPU to check and enable Intel Turbo Boost Max
Technology 3.0 (ITMT).

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 85b7d07..61643a5 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -330,6 +330,13 @@ static void acpi_bus_osc_support(void)
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_OST_SUPPORT;
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_PCLPI_SUPPORT;
 
+#ifdef CONFIG_X86
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT;
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
+	}
+#endif
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 09/10] acpi: bus: Set _OSC for diverse core support
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (7 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 08/10] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 19:19 ` [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
  9 siblings, 0 replies; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.

This is required to inform BIOS the support of Intel Turbo Boost Max
Technology 3.0 feature.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/bus.c   | 3 +++
 include/linux/acpi.h | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 61643a5..8ab6ec2 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -337,6 +337,9 @@ static void acpi_bus_osc_support(void)
 	}
 #endif
 
+	if (IS_ENABLED(CONFIG_SCHED_ITMT))
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index e746552..53841a2 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -462,6 +462,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
 #define OSC_SB_CPCV2_SUPPORT			0x00000040
 #define OSC_SB_PCLPI_SUPPORT			0x00000080
 #define OSC_SB_OSLPI_SUPPORT			0x00000100
+#define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT		0x00001000
 
 extern bool osc_sb_apei_support_acked;
 extern bool osc_pc_lpi_support_confirmed;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (8 preceding siblings ...)
  2016-09-21 19:19 ` [PATCH v4 09/10] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
@ 2016-09-21 19:19 ` Srinivas Pandruvada
  2016-09-21 20:30   ` Rafael J. Wysocki
  9 siblings, 1 reply; 24+ messages in thread
From: Srinivas Pandruvada @ 2016-09-21 19:19 UTC (permalink / raw)
  To: rjw, tglx, mingo, bp
  Cc: x86, linux-pm, linux-kernel, linux-acpi, peterz, tim.c.chen,
	jolsa, Srinivas Pandruvada

This change uses acpi cppc_lib interface to get CPPC performance limits.
Once CPPC limits of all online cores are read, first check if there is
difference in max performance. If there is a difference, then the
scheduler interface is called to update per cpu priority and enable
ITMT feature.

Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/Kconfig.x86    |   1 +
 drivers/cpufreq/intel_pstate.c | 103 ++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 103 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index adbd1de..3328c6b 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -6,6 +6,7 @@ config X86_INTEL_PSTATE
        bool "Intel P state control"
        depends on X86
        select ACPI_PROCESSOR if ACPI
+       select ACPI_CPPC_LIB if X86_64 && ACPI
        help
           This driver provides a P state for Intel core processors.
 	  The driver implements an internal governor and will become
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c877e70..d226a64 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -44,6 +44,7 @@
 
 #ifdef CONFIG_ACPI
 #include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
 #endif
 
 #define FRAC_BITS 8
@@ -195,6 +196,7 @@ struct _pid {
  * @sample:		Storage for storing last Sample data
  * @acpi_perf_data:	Stores ACPI perf information read from _PSS
  * @valid_pss_table:	Set to true for valid ACPI _PSS entries found
+ * @cppc_perf:		Stores CPPC performance information
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -218,6 +220,7 @@ struct cpudata {
 #ifdef CONFIG_ACPI
 	struct acpi_processor_performance acpi_perf_data;
 	bool valid_pss_table;
+	struct cppc_perf_caps *cppc_perf;
 #endif
 	unsigned int iowait_boost;
 };
@@ -377,14 +380,105 @@ static bool intel_pstate_get_ppc_enable_status(void)
 	return acpi_ppc;
 }
 
+/* Mask of CPUs for which CPCC data has been read */
+static cpumask_t cppc_read_cpu_mask;
+
+/*
+ * Can't call sched_set_itmt_support() in hotcpu notifier callback path
+ * as this function uses hotplug locks in its path. So call from
+ * a work function.
+ */
+static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
+{
+	sched_set_itmt_support(true);
+}
+
+static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
+
+static void intel_pstate_check_and_enable_itmt(int cpu)
+{
+	/*
+	 * For checking whether there is any difference in the maximum
+	 * performance for each CPU, need to wait till we have CPPC
+	 * data from all CPUs called from the cpufreq core. If there is a
+	 * difference in the maximum performance, then we have ITMT support.
+	 * If ITMT is supported, update the scheduler core priority for each
+	 * CPU and call to enable the ITMT feature.
+	 */
+	if (cpumask_subset(topology_core_cpumask(cpu), &cppc_read_cpu_mask)) {
+		int cpu_index;
+		int max_prio;
+		struct cpudata *cpu;
+		bool itmt_support = false;
+
+		cpu = all_cpu_data[cpumask_first(&cppc_read_cpu_mask)];
+		max_prio = cpu->cppc_perf->highest_perf;
+		for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
+			cpu = all_cpu_data[cpu_index];
+			if (max_prio != cpu->cppc_perf->highest_perf) {
+				itmt_support = true;
+				break;
+			}
+		}
+
+		if (!itmt_support)
+			return;
+
+		for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
+			cpu = all_cpu_data[cpu_index];
+			sched_set_itmt_core_prio(cpu->cppc_perf->highest_perf,
+						 cpu_index);
+		}
+		/*
+		 * Since this function is in the hotcpu notifier callback
+		 * path, submit a task to workqueue to call
+		 * sched_set_itmt_support().
+		 */
+		schedule_work(&sched_itmt_work);
+	}
+}
+
+/*
+ * Process ACPI CPPC information. Currently it is only used to for enabling
+ * ITMT feature. This driver still uses MSRs to manage HWP, not CPPC.
+ */
+static void intel_pstate_process_acpi_cppc(struct cpufreq_policy *policy)
+{
+	struct cpudata *cpu;
+	int ret;
+
+	cpu = all_cpu_data[policy->cpu];
+	cpu->cppc_perf = kzalloc(sizeof(struct cppc_perf_caps), GFP_KERNEL);
+	if (!cpu->cppc_perf)
+		return;
+
+	ret = cppc_get_perf_caps(policy->cpu, cpu->cppc_perf);
+	if (ret) {
+		kfree(cpu->cppc_perf);
+		cpu->cppc_perf = NULL;
+		return;
+	}
+
+	pr_debug("cpu:%d H:0x%x N:0x%x L:0x%x\n", policy->cpu,
+		 cpu->cppc_perf->highest_perf, cpu->cppc_perf->nominal_perf,
+		 cpu->cppc_perf->lowest_perf);
+
+	/* Mark that the CPPC data for the policy->cpu is read */
+	cpumask_set_cpu(policy->cpu, &cppc_read_cpu_mask);
+
+	intel_pstate_check_and_enable_itmt(policy->cpu);
+}
+
 static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
 {
 	struct cpudata *cpu;
 	int ret;
 	int i;
 
-	if (hwp_active)
+	if (hwp_active) {
+		intel_pstate_process_acpi_cppc(policy);
 		return;
+	}
 
 	if (!intel_pstate_get_ppc_enable_status())
 		return;
@@ -450,6 +544,13 @@ static void intel_pstate_exit_perf_limits(struct cpufreq_policy *policy)
 	struct cpudata *cpu;
 
 	cpu = all_cpu_data[policy->cpu];
+
+	if (cpu->cppc_perf) {
+		cpumask_clear_cpu(policy->cpu, &cppc_read_cpu_mask);
+		kfree(cpu->cppc_perf);
+		cpu->cppc_perf = NULL;
+	}
+
 	if (!cpu->valid_pss_table)
 		return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-09-21 19:19 ` [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Srinivas Pandruvada
@ 2016-09-21 19:58     ` kbuild test robot
  2016-09-21 20:27     ` kbuild test robot
  2016-09-21 20:33   ` Rafael J. Wysocki
  2 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2016-09-21 19:58 UTC (permalink / raw)
  Cc: kbuild-all, rjw, tglx, mingo, bp, x86, linux-pm, linux-kernel,
	linux-acpi, peterz, tim.c.chen, jolsa, Srinivas Pandruvada

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

Hi Tim,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.8-rc7 next-20160921]
[cannot apply to tip/x86/core tip/sched/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160922-032652
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-x010-201638 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kernel/smpboot.c: In function 'x86_smt_flags':
>> arch/x86/kernel/smpboot.c:506:9: error: implicit declaration of function 'cpu_smt_flags' [-Werror=implicit-function-declaration]
     return cpu_smt_flags() | x86_sched_itmt_flags();
            ^~~~~~~~~~~~~
   At top level:
   arch/x86/kernel/smpboot.c:504:12: warning: 'x86_smt_flags' defined but not used [-Wunused-function]
    static int x86_smt_flags(void)
               ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/cpu_smt_flags +506 arch/x86/kernel/smpboot.c

   500	{
   501		return cpu_core_flags() | x86_sched_itmt_flags();
   502	}
   503	
   504	static int x86_smt_flags(void)
   505	{
 > 506		return cpu_smt_flags() | x86_sched_itmt_flags();
   507	}
   508	
   509	static struct sched_domain_topology_level x86_numa_in_package_topology[] = {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20905 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
@ 2016-09-21 19:58     ` kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2016-09-21 19:58 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: kbuild-all, rjw, tglx, mingo, bp, x86, linux-pm, linux-kernel,
	linux-acpi, peterz, tim.c.chen, jolsa, Srinivas Pandruvada

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]

Hi Tim,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.8-rc7 next-20160921]
[cannot apply to tip/x86/core tip/sched/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160922-032652
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-x010-201638 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kernel/smpboot.c: In function 'x86_smt_flags':
>> arch/x86/kernel/smpboot.c:506:9: error: implicit declaration of function 'cpu_smt_flags' [-Werror=implicit-function-declaration]
     return cpu_smt_flags() | x86_sched_itmt_flags();
            ^~~~~~~~~~~~~
   At top level:
   arch/x86/kernel/smpboot.c:504:12: warning: 'x86_smt_flags' defined but not used [-Wunused-function]
    static int x86_smt_flags(void)
               ^~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/cpu_smt_flags +506 arch/x86/kernel/smpboot.c

   500	{
   501		return cpu_core_flags() | x86_sched_itmt_flags();
   502	}
   503	
   504	static int x86_smt_flags(void)
   505	{
 > 506		return cpu_smt_flags() | x86_sched_itmt_flags();
   507	}
   508	
   509	static struct sched_domain_topology_level x86_numa_in_package_topology[] = {

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 20905 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-09-21 19:19 ` [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Srinivas Pandruvada
@ 2016-09-21 20:27     ` kbuild test robot
  2016-09-21 20:27     ` kbuild test robot
  2016-09-21 20:33   ` Rafael J. Wysocki
  2 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2016-09-21 20:27 UTC (permalink / raw)
  Cc: kbuild-all, rjw, tglx, mingo, bp, x86, linux-pm, linux-kernel,
	linux-acpi, peterz, tim.c.chen, jolsa, Srinivas Pandruvada

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Hi Tim,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.8-rc7 next-20160921]
[cannot apply to tip/x86/core tip/sched/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160922-032652
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-x014-201638 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kernel/smpboot.c: In function 'x86_core_flags':
>> arch/x86/kernel/smpboot.c:501:9: error: implicit declaration of function 'cpu_core_flags' [-Werror=implicit-function-declaration]
     return cpu_core_flags() | x86_sched_itmt_flags();
            ^~~~~~~~~~~~~~
   arch/x86/kernel/smpboot.c: In function 'x86_smt_flags':
   arch/x86/kernel/smpboot.c:506:9: error: implicit declaration of function 'cpu_smt_flags' [-Werror=implicit-function-declaration]
     return cpu_smt_flags() | x86_sched_itmt_flags();
            ^~~~~~~~~~~~~
   At top level:
   arch/x86/kernel/smpboot.c:504:12: warning: 'x86_smt_flags' defined but not used [-Wunused-function]
    static int x86_smt_flags(void)
               ^~~~~~~~~~~~~
   arch/x86/kernel/smpboot.c:499:12: warning: 'x86_core_flags' defined but not used [-Wunused-function]
    static int x86_core_flags(void)
               ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/cpu_core_flags +501 arch/x86/kernel/smpboot.c

   495	{
   496		return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
   497	}
   498	
   499	static int x86_core_flags(void)
   500	{
 > 501		return cpu_core_flags() | x86_sched_itmt_flags();
   502	}
   503	
   504	static int x86_smt_flags(void)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24578 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
@ 2016-09-21 20:27     ` kbuild test robot
  0 siblings, 0 replies; 24+ messages in thread
From: kbuild test robot @ 2016-09-21 20:27 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: kbuild-all, rjw, tglx, mingo, bp, x86, linux-pm, linux-kernel,
	linux-acpi, peterz, tim.c.chen, jolsa, Srinivas Pandruvada

[-- Attachment #1: Type: text/plain, Size: 2349 bytes --]

Hi Tim,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.8-rc7 next-20160921]
[cannot apply to tip/x86/core tip/sched/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160922-032652
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-x014-201638 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   arch/x86/kernel/smpboot.c: In function 'x86_core_flags':
>> arch/x86/kernel/smpboot.c:501:9: error: implicit declaration of function 'cpu_core_flags' [-Werror=implicit-function-declaration]
     return cpu_core_flags() | x86_sched_itmt_flags();
            ^~~~~~~~~~~~~~
   arch/x86/kernel/smpboot.c: In function 'x86_smt_flags':
   arch/x86/kernel/smpboot.c:506:9: error: implicit declaration of function 'cpu_smt_flags' [-Werror=implicit-function-declaration]
     return cpu_smt_flags() | x86_sched_itmt_flags();
            ^~~~~~~~~~~~~
   At top level:
   arch/x86/kernel/smpboot.c:504:12: warning: 'x86_smt_flags' defined but not used [-Wunused-function]
    static int x86_smt_flags(void)
               ^~~~~~~~~~~~~
   arch/x86/kernel/smpboot.c:499:12: warning: 'x86_core_flags' defined but not used [-Wunused-function]
    static int x86_core_flags(void)
               ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/cpu_core_flags +501 arch/x86/kernel/smpboot.c

   495	{
   496		return sysctl_sched_itmt_enabled ? SD_ASYM_PACKING : 0;
   497	}
   498	
   499	static int x86_core_flags(void)
   500	{
 > 501		return cpu_core_flags() | x86_sched_itmt_flags();
   502	}
   503	
   504	static int x86_smt_flags(void)

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24578 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-21 19:19 ` [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
@ 2016-09-21 20:30   ` Rafael J. Wysocki
  2016-09-22 18:50     ` Tim Chen
  0 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2016-09-21 20:30 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	ACPI Devel Maling List, Peter Zijlstra, tim.c.chen, jolsa

On Wed, Sep 21, 2016 at 9:19 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> This change uses acpi cppc_lib interface to get CPPC performance limits.
> Once CPPC limits of all online cores are read, first check if there is
> difference in max performance. If there is a difference, then the
> scheduler interface is called to update per cpu priority and enable
> ITMT feature.
>
> Here sched_set_itmt_core_prio() is called to set priorities and
> sched_set_itmt_support() is called to enable ITMT feature.
>
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  drivers/cpufreq/Kconfig.x86    |   1 +
>  drivers/cpufreq/intel_pstate.c | 103 ++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 103 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
> index adbd1de..3328c6b 100644
> --- a/drivers/cpufreq/Kconfig.x86
> +++ b/drivers/cpufreq/Kconfig.x86
> @@ -6,6 +6,7 @@ config X86_INTEL_PSTATE
>         bool "Intel P state control"
>         depends on X86
>         select ACPI_PROCESSOR if ACPI
> +       select ACPI_CPPC_LIB if X86_64 && ACPI

Do we need to select CPPC here if SCHED_ITMT is unset?

>         help
>            This driver provides a P state for Intel core processors.
>           The driver implements an internal governor and will become
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index c877e70..d226a64 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -44,6 +44,7 @@
>
>  #ifdef CONFIG_ACPI
>  #include <acpi/processor.h>
> +#include <acpi/cppc_acpi.h>
>  #endif
>
>  #define FRAC_BITS 8
> @@ -195,6 +196,7 @@ struct _pid {
>   * @sample:            Storage for storing last Sample data
>   * @acpi_perf_data:    Stores ACPI perf information read from _PSS
>   * @valid_pss_table:   Set to true for valid ACPI _PSS entries found
> + * @cppc_perf:         Stores CPPC performance information
>   *
>   * This structure stores per CPU instance data for all CPUs.
>   */
> @@ -218,6 +220,7 @@ struct cpudata {
>  #ifdef CONFIG_ACPI
>         struct acpi_processor_performance acpi_perf_data;
>         bool valid_pss_table;
> +       struct cppc_perf_caps *cppc_perf;
>  #endif
>         unsigned int iowait_boost;
>  };
> @@ -377,14 +380,105 @@ static bool intel_pstate_get_ppc_enable_status(void)
>         return acpi_ppc;
>  }
>

The new code below is only useful if CONFIG_SCHED_ITMT is set, so
maybe it's better to put it into a #ifdef block?

> +/* Mask of CPUs for which CPCC data has been read */
> +static cpumask_t cppc_read_cpu_mask;
> +
> +/*
> + * Can't call sched_set_itmt_support() in hotcpu notifier callback path
> + * as this function uses hotplug locks in its path. So call from
> + * a work function.
> + */
> +static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
> +{
> +       sched_set_itmt_support(true);
> +}
> +
> +static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
> +
> +static void intel_pstate_check_and_enable_itmt(int cpu)
> +{
> +       /*
> +        * For checking whether there is any difference in the maximum
> +        * performance for each CPU, need to wait till we have CPPC
> +        * data from all CPUs called from the cpufreq core. If there is a
> +        * difference in the maximum performance, then we have ITMT support.
> +        * If ITMT is supported, update the scheduler core priority for each
> +        * CPU and call to enable the ITMT feature.
> +        */
> +       if (cpumask_subset(topology_core_cpumask(cpu), &cppc_read_cpu_mask)) {
> +               int cpu_index;
> +               int max_prio;
> +               struct cpudata *cpu;
> +               bool itmt_support = false;
> +
> +               cpu = all_cpu_data[cpumask_first(&cppc_read_cpu_mask)];
> +               max_prio = cpu->cppc_perf->highest_perf;
> +               for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
> +                       cpu = all_cpu_data[cpu_index];
> +                       if (max_prio != cpu->cppc_perf->highest_perf) {
> +                               itmt_support = true;
> +                               break;
> +                       }
> +               }
> +
> +               if (!itmt_support)
> +                       return;
> +
> +               for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
> +                       cpu = all_cpu_data[cpu_index];
> +                       sched_set_itmt_core_prio(cpu->cppc_perf->highest_perf,
> +                                                cpu_index);
> +               }

My current understanding is that we need to rebuild sched domains
after setting the priorities, so what if there are two CPU packages
and there are highest_perf differences in both, and we first enumerate
the first package entirely before getting to the second one?

In that case we'll schedule the work item after enumerating the first
package and it may rebuild the sched domains before all priorities are
set for the second package, may it not?

This seems to require some more consideration.

> +               /*
> +                * Since this function is in the hotcpu notifier callback
> +                * path, submit a task to workqueue to call
> +                * sched_set_itmt_support().
> +                */
> +               schedule_work(&sched_itmt_work);

It doesn't make sense to do this more than once IMO and what if we
attempt to schedule the work item again when it has been scheduled
once already?  Don't we need any protection here?

> +       }
> +}
> +
> +/*
> + * Process ACPI CPPC information. Currently it is only used to for enabling
> + * ITMT feature. This driver still uses MSRs to manage HWP, not CPPC.
> + */
> +static void intel_pstate_process_acpi_cppc(struct cpufreq_policy *policy)
> +{
> +       struct cpudata *cpu;
> +       int ret;
> +
> +       cpu = all_cpu_data[policy->cpu];
> +       cpu->cppc_perf = kzalloc(sizeof(struct cppc_perf_caps), GFP_KERNEL);
> +       if (!cpu->cppc_perf)
> +               return;
> +
> +       ret = cppc_get_perf_caps(policy->cpu, cpu->cppc_perf);
> +       if (ret) {
> +               kfree(cpu->cppc_perf);
> +               cpu->cppc_perf = NULL;
> +               return;
> +       }
> +
> +       pr_debug("cpu:%d H:0x%x N:0x%x L:0x%x\n", policy->cpu,
> +                cpu->cppc_perf->highest_perf, cpu->cppc_perf->nominal_perf,
> +                cpu->cppc_perf->lowest_perf);
> +
> +       /* Mark that the CPPC data for the policy->cpu is read */
> +       cpumask_set_cpu(policy->cpu, &cppc_read_cpu_mask);
> +
> +       intel_pstate_check_and_enable_itmt(policy->cpu);
> +}
> +
>  static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
>  {
>         struct cpudata *cpu;
>         int ret;
>         int i;
>
> -       if (hwp_active)
> +       if (hwp_active) {
> +               intel_pstate_process_acpi_cppc(policy);
>                 return;
> +       }
>
>         if (!intel_pstate_get_ppc_enable_status())
>                 return;
> @@ -450,6 +544,13 @@ static void intel_pstate_exit_perf_limits(struct cpufreq_policy *policy)
>         struct cpudata *cpu;
>
>         cpu = all_cpu_data[policy->cpu];
> +
> +       if (cpu->cppc_perf) {
> +               cpumask_clear_cpu(policy->cpu, &cppc_read_cpu_mask);
> +               kfree(cpu->cppc_perf);
> +               cpu->cppc_perf = NULL;
> +       }
> +
>         if (!cpu->valid_pss_table)
>                 return;
>
> --

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-09-21 19:19 ` [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Srinivas Pandruvada
  2016-09-21 19:58     ` kbuild test robot
  2016-09-21 20:27     ` kbuild test robot
@ 2016-09-21 20:33   ` Rafael J. Wysocki
  2016-09-22 19:40     ` Tim Chen
  2 siblings, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2016-09-21 20:33 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	ACPI Devel Maling List, Peter Zijlstra, tim.c.chen, jolsa

On Wed, Sep 21, 2016 at 9:19 PM, Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
>
> Some Intel cores in a package can be boosted to a higher turbo frequency
> with ITMT 3.0 technology. The scheduler can use the asymmetric packing
> feature to move tasks to the more capable cores.
>
> If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
> sched domains to enable asymmetric packing.
>
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  arch/x86/kernel/smpboot.c | 27 +++++++++++++++++++++++----
>  1 file changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 38901b3..46815e6 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -487,22 +487,41 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>         return false;
>  }
>
> +#ifndef CONFIG_SCHED_ITMT
> +#define sysctl_sched_itmt_enabled      0
> +#endif

I thought that would be done in the header where
sysctl_sched_itmt_enabled is declared (along with defining the stubs
for the sched_set_itmt_* functions).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-21 20:30   ` Rafael J. Wysocki
@ 2016-09-22 18:50     ` Tim Chen
  2016-09-22 18:56       ` Thomas Gleixner
  2016-09-22 20:58       ` Rafael J. Wysocki
  0 siblings, 2 replies; 24+ messages in thread
From: Tim Chen @ 2016-09-22 18:50 UTC (permalink / raw)
  To: Rafael J. Wysocki, Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	ACPI Devel Maling List, Peter Zijlstra, jolsa

On Wed, 2016-09-21 at 22:30 +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 21, 2016 at 9:19 PM, Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > 
> > +
> > +static void intel_pstate_check_and_enable_itmt(int cpu)
> > +{
> > +       /*
> > +        * For checking whether there is any difference in the maximum
> > +        * performance for each CPU, need to wait till we have CPPC
> > +        * data from all CPUs called from the cpufreq core. If there is a
> > +        * difference in the maximum performance, then we have ITMT support.
> > +        * If ITMT is supported, update the scheduler core priority for each
> > +        * CPU and call to enable the ITMT feature.
> > +        */
> > +       if (cpumask_subset(topology_core_cpumask(cpu), &cppc_read_cpu_mask)) {
> > +               int cpu_index;
> > +               int max_prio;
> > +               struct cpudata *cpu;
> > +               bool itmt_support = false;
> > +
> > +               cpu = all_cpu_data[cpumask_first(&cppc_read_cpu_mask)];
> > +               max_prio = cpu->cppc_perf->highest_perf;
> > +               for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
> > +                       cpu = all_cpu_data[cpu_index];
> > +                       if (max_prio != cpu->cppc_perf->highest_perf) {
> > +                               itmt_support = true;
> > +                               break;
> > +                       }
> > +               }
> > +
> > +               if (!itmt_support)
> > +                       return;
> > +
> > +               for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
> > +                       cpu = all_cpu_data[cpu_index];
> > +                       sched_set_itmt_core_prio(cpu->cppc_perf->highest_perf,
> > +                                                cpu_index);
> > +               }
> My current understanding is that we need to rebuild sched domains
> after setting the priorities, 

No, that's not true.  We need to rebuild the sched domains only
when the sched domain flags are changed, not when we are changing
the priorities.  Only the sched domain flag is a property of
the sched domain. CPU priority values are not part of sched domain.

Morten had similar question about whether we need to rebuild sched domain
when we change cpu priorities when we first post the patches. 
Peter has explained that it wasn't necessary.
http://lkml.iu.edu/hypermail/linux/kernel/1608.3/01753.html



> so what if there are two CPU packages
> and there are highest_perf differences in both, and we first enumerate
> the first package entirely before getting to the second one?
> 
> In that case we'll schedule the work item after enumerating the first
> package and it may rebuild the sched domains before all priorities are
> set for the second package, may it not?

That is not a problem.  For the second package, all the cpu priorities
are initialized to the same value.  So even if we start to do 
asym_packing in the scheduler for the whole system, 
on the second package, all the cpus are treated equally by the scheduler.
We will operate as if there is no favored core till we update the
priorities of the cpu on the second package.

That said, we don't enable ITMT automatically for 2 package system.
So the explicit sysctl command to enable ITMT and cause the sched domain
rebuild for 2 package system is most likely to come after
we have discovered and set all the cpu priorities.

> 
> This seems to require some more consideration.
> 
> > 
> > +               /*
> > +                * Since this function is in the hotcpu notifier callback
> > +                * path, submit a task to workqueue to call
> > +                * sched_set_itmt_support().
> > +                */
> > +               schedule_work(&sched_itmt_work);
> It doesn't make sense to do this more than once IMO and what if we
> attempt to schedule the work item again when it has been scheduled
> once already?  Don't we need any protection here?

It is not a problem for sched_set_itmt_support to be called more than
once.

First, we will ignore the second call if sched_itmt_capable has already
been set to the same value in the previous sched_set_itmt_support call.
Secondly, the call to update sched_itmt_capable
is protected by the itmt_update_mutex.

Thanks.

Tim


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-22 18:50     ` Tim Chen
@ 2016-09-22 18:56       ` Thomas Gleixner
  2016-09-22 19:01         ` Tim Chen
  2016-09-22 20:58       ` Rafael J. Wysocki
  1 sibling, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2016-09-22 18:56 UTC (permalink / raw)
  To: Tim Chen
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Rafael J. Wysocki,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers, Linux PM,
	Linux Kernel Mailing List, ACPI Devel Maling List,
	Peter Zijlstra, jolsa

[-- Attachment #1: Type: text/plain, Size: 813 bytes --]

On Thu, 22 Sep 2016, Tim Chen wrote:
> On Wed, 2016-09-21 at 22:30 +0200, Rafael J. Wysocki wrote:
> > My current understanding is that we need to rebuild sched domains
> > after setting the priorities, 
> 
> No, that's not true.  We need to rebuild the sched domains only
> when the sched domain flags are changed, not when we are changing
> the priorities.  Only the sched domain flag is a property of
> the sched domain. CPU priority values are not part of sched domain.
> 
> Morten had similar question about whether we need to rebuild sched domain
> when we change cpu priorities when we first post the patches. 
> Peter has explained that it wasn't necessary.
> http://lkml.iu.edu/hypermail/linux/kernel/1608.3/01753.html

And why is there no explanation in form of a comment in the code?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-22 18:56       ` Thomas Gleixner
@ 2016-09-22 19:01         ` Tim Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Chen @ 2016-09-22 19:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Rafael J. Wysocki,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers, Linux PM,
	Linux Kernel Mailing List, ACPI Devel Maling List,
	Peter Zijlstra, jolsa

On Thu, 2016-09-22 at 20:56 +0200, Thomas Gleixner wrote:
> On Thu, 22 Sep 2016, Tim Chen wrote:
> > 
> > On Wed, 2016-09-21 at 22:30 +0200, Rafael J. Wysocki wrote:
> > > 
> > > My current understanding is that we need to rebuild sched domains
> > > after setting the priorities, 
> > No, that's not true.  We need to rebuild the sched domains only
> > when the sched domain flags are changed, not when we are changing
> > the priorities.  Only the sched domain flag is a property of
> > the sched domain. CPU priority values are not part of sched domain.
> > 
> > Morten had similar question about whether we need to rebuild sched domain
> > when we change cpu priorities when we first post the patches. 
> > Peter has explained that it wasn't necessary.
> > http://lkml.iu.edu/hypermail/linux/kernel/1608.3/01753.html
> And why is there no explanation in form of a comment in the code?

Sure, I'll add a comment.

Thanks.

Tim



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU
  2016-09-21 20:33   ` Rafael J. Wysocki
@ 2016-09-22 19:40     ` Tim Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Chen @ 2016-09-22 19:40 UTC (permalink / raw)
  To: Rafael J. Wysocki, Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	ACPI Devel Maling List, Peter Zijlstra, jolsa

On Wed, 2016-09-21 at 22:33 +0200, Rafael J. Wysocki wrote:
> On Wed, Sep 21, 2016 at 9:19 PM, Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > Some Intel cores in a package can be boosted to a higher turbo frequency
> > with ITMT 3.0 technology. The scheduler can use the asymmetric packing
> > feature to move tasks to the more capable cores.
> > 
> > If ITMT is enabled, add SD_ASYM_PACKING flag to the thread and core
> > sched domains to enable asymmetric packing.
> > 
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > ---
> >  arch/x86/kernel/smpboot.c | 27 +++++++++++++++++++++++----
> >  1 file changed, 23 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > index 38901b3..46815e6 100644
> > --- a/arch/x86/kernel/smpboot.c
> > +++ b/arch/x86/kernel/smpboot.c
> > @@ -487,22 +487,41 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
> >         return false;
> >  }
> > 
> > +#ifndef CONFIG_SCHED_ITMT
> > +#define sysctl_sched_itmt_enabled      0
> > +#endif
> I thought that would be done in the header where
> sysctl_sched_itmt_enabled is declared (along with defining the stubs
> for the sched_set_itmt_* functions).

Sure. I will move it to arch/x86/include/asm/topology.h.

Tim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-22 18:50     ` Tim Chen
  2016-09-22 18:56       ` Thomas Gleixner
@ 2016-09-22 20:58       ` Rafael J. Wysocki
  2016-09-22 21:41         ` Tim Chen
  1 sibling, 1 reply; 24+ messages in thread
From: Rafael J. Wysocki @ 2016-09-22 20:58 UTC (permalink / raw)
  To: Tim Chen
  Cc: Rafael J. Wysocki, Srinivas Pandruvada, Rafael J. Wysocki,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	the arch/x86 maintainers, Linux PM, Linux Kernel Mailing List,
	ACPI Devel Maling List, Peter Zijlstra, jolsa

On Thu, Sep 22, 2016 at 8:50 PM, Tim Chen <tim.c.chen@linux.intel.com> wrote:
> On Wed, 2016-09-21 at 22:30 +0200, Rafael J. Wysocki wrote:
>> On Wed, Sep 21, 2016 at 9:19 PM, Srinivas Pandruvada
>> <srinivas.pandruvada@linux.intel.com> wrote:
>> >
>> >
>> > +
>> > +static void intel_pstate_check_and_enable_itmt(int cpu)
>> > +{
>> > +       /*
>> > +        * For checking whether there is any difference in the maximum
>> > +        * performance for each CPU, need to wait till we have CPPC
>> > +        * data from all CPUs called from the cpufreq core. If there is a
>> > +        * difference in the maximum performance, then we have ITMT support.
>> > +        * If ITMT is supported, update the scheduler core priority for each
>> > +        * CPU and call to enable the ITMT feature.
>> > +        */
>> > +       if (cpumask_subset(topology_core_cpumask(cpu), &cppc_read_cpu_mask)) {
>> > +               int cpu_index;
>> > +               int max_prio;
>> > +               struct cpudata *cpu;
>> > +               bool itmt_support = false;
>> > +
>> > +               cpu = all_cpu_data[cpumask_first(&cppc_read_cpu_mask)];
>> > +               max_prio = cpu->cppc_perf->highest_perf;
>> > +               for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
>> > +                       cpu = all_cpu_data[cpu_index];
>> > +                       if (max_prio != cpu->cppc_perf->highest_perf) {
>> > +                               itmt_support = true;
>> > +                               break;
>> > +                       }
>> > +               }
>> > +
>> > +               if (!itmt_support)
>> > +                       return;
>> > +
>> > +               for_each_cpu(cpu_index, &cppc_read_cpu_mask) {
>> > +                       cpu = all_cpu_data[cpu_index];
>> > +                       sched_set_itmt_core_prio(cpu->cppc_perf->highest_perf,
>> > +                                                cpu_index);
>> > +               }
>> My current understanding is that we need to rebuild sched domains
>> after setting the priorities,
>
> No, that's not true.  We need to rebuild the sched domains only
> when the sched domain flags are changed, not when we are changing
> the priorities.  Only the sched domain flag is a property of
> the sched domain. CPU priority values are not part of sched domain.
>
> Morten had similar question about whether we need to rebuild sched domain
> when we change cpu priorities when we first post the patches.
> Peter has explained that it wasn't necessary.
> http://lkml.iu.edu/hypermail/linux/kernel/1608.3/01753.html

So to me this means that sched domains need to be rebuilt in two cases
by the ITMT code:
(1) When the "ITMT capable" flag changes.
(2) When the sysctl setting changes.

In which case I'm not sure why intel_pstate_check_and_enable_itmt()
has to be so complicated.

It seems to only need to (a) set the priority for the current CPU and
(b) invoke sched_set_itmt_support() (via the work item) to set the
"ITMT capable" flag if it finds out that ITMT should be enabled.

And it may be better to enable ITMT at the _OSC exchange time (if the
platform acknowledges support).

>> so what if there are two CPU packages
>> and there are highest_perf differences in both, and we first enumerate
>> the first package entirely before getting to the second one?
>>
>> In that case we'll schedule the work item after enumerating the first
>> package and it may rebuild the sched domains before all priorities are
>> set for the second package, may it not?
>
> That is not a problem.  For the second package, all the cpu priorities
> are initialized to the same value.  So even if we start to do
> asym_packing in the scheduler for the whole system,
> on the second package, all the cpus are treated equally by the scheduler.
> We will operate as if there is no favored core till we update the
> priorities of the cpu on the second package.

OK

But updating those priorities after we have set the "ITMT capable"
flag is not a problem?  Nobody is going to be confused and so on?

> That said, we don't enable ITMT automatically for 2 package system.
> So the explicit sysctl command to enable ITMT and cause the sched domain
> rebuild for 2 package system is most likely to come after
> we have discovered and set all the cpu priorities.

Right, but if that behavior is relied on, there should be a comment
about that in the code (and relying on it would be kind of fragile for
that matter).

>>
>> This seems to require some more consideration.
>>
>> >
>> > +               /*
>> > +                * Since this function is in the hotcpu notifier callback
>> > +                * path, submit a task to workqueue to call
>> > +                * sched_set_itmt_support().
>> > +                */
>> > +               schedule_work(&sched_itmt_work);
>> It doesn't make sense to do this more than once IMO and what if we
>> attempt to schedule the work item again when it has been scheduled
>> once already?  Don't we need any protection here?
>
> It is not a problem for sched_set_itmt_support to be called more than
> once.

While it is not incorrect, it also is not particularly useful to
schedule a work item just to find out later that it had nothing to do
to begin with.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-09-22 20:58       ` Rafael J. Wysocki
@ 2016-09-22 21:41         ` Tim Chen
  0 siblings, 0 replies; 24+ messages in thread
From: Tim Chen @ 2016-09-22 21:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Srinivas Pandruvada, Rafael J. Wysocki, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, the arch/x86 maintainers, Linux PM,
	Linux Kernel Mailing List, ACPI Devel Maling List,
	Peter Zijlstra, jolsa

On Thu, 2016-09-22 at 22:58 +0200, Rafael J. Wysocki wrote:
> 
> > > so what if there are two CPU packages
> > > and there are highest_perf differences in both, and we first enumerate
> > > the first package entirely before getting to the second one?
> > > 
> > > In that case we'll schedule the work item after enumerating the first
> > > package and it may rebuild the sched domains before all priorities are
> > > set for the second package, may it not?
> > That is not a problem.  For the second package, all the cpu priorities
> > are initialized to the same value.  So even if we start to do
> > asym_packing in the scheduler for the whole system,
> > on the second package, all the cpus are treated equally by the scheduler.
> > We will operate as if there is no favored core till we update the
> > priorities of the cpu on the second package.
> OK
> 
> But updating those priorities after we have set the "ITMT capable"
> flag is not a problem?  Nobody is going to be confused and so on?
> 

Not a problem.  The worst thing that could happen is we schedule a job
to a cpu with a lesser max turbo freq first while the priorities update are in
progress.

> > 
> > That said, we don't enable ITMT automatically for 2 package system.
> > So the explicit sysctl command to enable ITMT and cause the sched domain
> > rebuild for 2 package system is most likely to come after
> > we have discovered and set all the cpu priorities.
> Right, but if that behavior is relied on, there should be a comment
> about that in the code (and relying on it would be kind of fragile for
> that matter).

No, we don't rely on this behavior of not enabling ITMT automatically
for 2 package system.  We could enable ITMT for 2
package system by default if we want to.  Then asym_packing will just
consider the second package's cpus to be equal priorities if they haven't
been set.  

> 
> > 
> > > 
> > > 
> > > This seems to require some more consideration.
> > > 
> > > > 
> > > > 
> > > > +               /*
> > > > +                * Since this function is in the hotcpu notifier callback
> > > > +                * path, submit a task to workqueue to call
> > > > +                * sched_set_itmt_support().
> > > > +                */
> > > > +               schedule_work(&sched_itmt_work);
> > > It doesn't make sense to do this more than once IMO and what if we
> > > attempt to schedule the work item again when it has been scheduled
> > > once already?  Don't we need any protection here?
> > It is not a problem for sched_set_itmt_support to be called more than
> > once.
> While it is not incorrect, it also is not particularly useful to
> schedule a work item just to find out later that it had nothing to do
> to begin with.

Setting ITMT capability is done per socket during system boot.  So there is no
performance impact at all so it should not be an issue.

Tim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [tip:sched/core] sched/core, x86/topology: Fix NUMA in package topology bug
  2016-09-21 19:19 ` [PATCH v4 01/10] x86/topology: Fix numa in package topology bug Srinivas Pandruvada
@ 2016-09-30 11:55   ` tip-bot for Tim Chen
  0 siblings, 0 replies; 24+ messages in thread
From: tip-bot for Tim Chen @ 2016-09-30 11:55 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, hpa, torvalds, linux-kernel, peterz, tglx,
	srinivas.pandruvada, mingo, tim.c.chen

Commit-ID:  8f37961cf22304fb286c7604d3a7f6104dcc1283
Gitweb:     http://git.kernel.org/tip/8f37961cf22304fb286c7604d3a7f6104dcc1283
Author:     Tim Chen <tim.c.chen@linux.intel.com>
AuthorDate: Wed, 21 Sep 2016 12:19:03 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Fri, 30 Sep 2016 10:53:18 +0200

sched/core, x86/topology: Fix NUMA in package topology bug

Current code can call set_cpu_sibling_map() and invoke sched_set_topology()
more than once (e.g. on CPU hot plug).  When this happens after
sched_init_smp() has been called, we lose the NUMA topology extension to
sched_domain_topology in sched_init_numa().  This results in incorrect
topology when the sched domain is rebuilt.

This patch fixes the bug and issues warning if we call sched_set_topology()
after sched_init_smp().

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: bp@suse.de
Cc: jolsa@redhat.com
Cc: rjw@rjwysocki.net
Link: http://lkml.kernel.org/r/1474485552-141429-2-git-send-email-srinivas.pandruvada@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/kernel/smpboot.c | 46 ++++++++++++++++++++++++++++++----------------
 kernel/sched/core.c       |  3 +++
 2 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4296beb..7137ec4 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -471,7 +471,7 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
-static struct sched_domain_topology_level numa_inside_package_topology[] = {
+static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
 	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
 #endif
@@ -480,22 +480,23 @@ static struct sched_domain_topology_level numa_inside_package_topology[] = {
 #endif
 	{ NULL, },
 };
+
+static struct sched_domain_topology_level x86_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
 /*
- * set_sched_topology() sets the topology internal to a CPU.  The
- * NUMA topologies are layered on top of it to build the full
- * system topology.
- *
- * If NUMA nodes are observed to occur within a CPU package, this
- * function should be called.  It forces the sched domain code to
- * only use the SMT level for the CPU portion of the topology.
- * This essentially falls back to relying on NUMA information
- * from the SRAT table to describe the entire system topology
- * (except for hyperthreads).
+ * Set if a package/die has multiple NUMA nodes inside.
+ * AMD Magny-Cours and Intel Cluster-on-Die have this.
  */
-static void primarily_use_numa_for_topology(void)
-{
-	set_sched_topology(numa_inside_package_topology);
-}
+static bool x86_has_numa_in_package;
 
 void set_cpu_sibling_map(int cpu)
 {
@@ -558,7 +559,7 @@ void set_cpu_sibling_map(int cpu)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
 		if (match_die(c, o) && !topology_same_node(c, o))
-			primarily_use_numa_for_topology();
+			x86_has_numa_in_package = true;
 	}
 
 	threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1304,6 +1305,16 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
+
+	/*
+	 * Set 'default' x86 topology, this matches default_topology() in that
+	 * it has NUMA nodes as a topology level. See also
+	 * native_smp_cpus_done().
+	 *
+	 * Must be done before set_cpus_sibling_map() is ran.
+	 */
+	set_sched_topology(x86_topology);
+
 	set_cpu_sibling_map(0);
 
 	switch (smp_sanity_check(max_cpus)) {
@@ -1370,6 +1381,9 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 {
 	pr_debug("Boot done\n");
 
+	if (x86_has_numa_in_package)
+		set_sched_topology(x86_numa_in_package_topology);
+
 	nmi_selftest();
 	impress_friends();
 	setup_ioapic_dest();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8bae0cd..af5d4d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6552,6 +6552,9 @@ static struct sched_domain_topology_level *sched_domain_topology =
 
 void set_sched_topology(struct sched_domain_topology_level *tl)
 {
+	if (WARN_ON_ONCE(sched_smp_initialized))
+		return;
+
 	sched_domain_topology = tl;
 }
 

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-09-30 11:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-21 19:19 [PATCH v4 00/10] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 01/10] x86/topology: Fix numa in package topology bug Srinivas Pandruvada
2016-09-30 11:55   ` [tip:sched/core] sched/core, x86/topology: Fix NUMA " tip-bot for Tim Chen
2016-09-21 19:19 ` [PATCH v4 02/10] sched: Extend scheduler's asym packing Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 03/10] x86/topology: Provide topology_num_packages() Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 04/10] x86/topology: Define x86's arch_update_cpu_topology Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 05/10] x86: Enable Intel Turbo Boost Max Technology 3.0 Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 06/10] x86/sysctl: Add sysctl for ITMT scheduling feature Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 07/10] x86/sched: Add SD_ASYM_PACKING flags to x86 ITMT CPU Srinivas Pandruvada
2016-09-21 19:58   ` kbuild test robot
2016-09-21 19:58     ` kbuild test robot
2016-09-21 20:27   ` kbuild test robot
2016-09-21 20:27     ` kbuild test robot
2016-09-21 20:33   ` Rafael J. Wysocki
2016-09-22 19:40     ` Tim Chen
2016-09-21 19:19 ` [PATCH v4 08/10] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 09/10] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
2016-09-21 19:19 ` [PATCH v4 10/10] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
2016-09-21 20:30   ` Rafael J. Wysocki
2016-09-22 18:50     ` Tim Chen
2016-09-22 18:56       ` Thomas Gleixner
2016-09-22 19:01         ` Tim Chen
2016-09-22 20:58       ` Rafael J. Wysocki
2016-09-22 21:41         ` Tim Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.