linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0
@ 2016-08-18 22:36 Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains Srinivas Pandruvada
                   ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

With Intel® Turbo Boost Max Technology 3.0 (ITMT), single-threaded performance is
optimized by identifying processor's fastest core and running critical workloads
on it.
Refere to:
http://www.intel.com/content/www/us/en/architecture-and-technology/turbo-boost/turbo-boost-max-technology.html

This patchset consist of all changes required to support ITMT feature:
- Enhance CPPC ACPI lib to support x86
- Use CPPC information in Intel P-State driver to get performance information
- Scheduler enhancements

By default this feature is OFF, to turn on:

# echo 1 > /proc/sys/kernel/sched_itmt_enabled


Srinivas Pandruvada (7):
  acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config
  acpi: cpcc: Add integer read support
  acpi: cppc: Add support for function fixed hardware address
  acpi: cppc: Add prefix cppc to cpudata structure name
  acpi: bus: Enable HWP CPPC objects
  acpi: bus: Set _OSC for diverse core support
  cpufreq: intel_pstate: Use CPPC to get max performance

Tim Chen (4):
  sched, cpuset: Add regenerate_sched_domains function to rebuild all
    sched domains
  sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for cpus
    supporting Intel Turbo Boost Max Technology
  sched: Extend scheduler's asym packing
  sched,x86: Enable Turbo Boost Max Technology

 arch/x86/Kconfig                |   9 +++
 arch/x86/include/asm/topology.h |  26 +++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 147 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c       |  77 ++++++++++++++++-----
 drivers/acpi/Kconfig            |   1 -
 drivers/acpi/bus.c              |   9 +++
 drivers/acpi/cppc_acpi.c        |  88 ++++++++++++++++++++----
 drivers/acpi/processor_driver.c |   5 +-
 drivers/cpufreq/Kconfig.x86     |   1 +
 drivers/cpufreq/cppc_cpufreq.c  |  14 ++--
 drivers/cpufreq/intel_pstate.c  |  75 +++++++++++++++++++-
 include/acpi/cppc_acpi.h        |   4 +-
 include/linux/acpi.h            |   1 +
 include/linux/cpuset.h          |   2 +
 include/linux/sched.h           |   3 +
 kernel/cpuset.c                 |  32 +++++++--
 kernel/sched/core.c             |  46 ++++++++++++-
 kernel/sched/fair.c             |  25 ++++---
 kernel/sched/sched.h            |  17 +++++
 20 files changed, 517 insertions(+), 66 deletions(-)
 create mode 100644 arch/x86/kernel/itmt.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-22 13:52   ` Morten Rasmussen
  2016-08-18 22:36 ` [PATCH 02/11] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for cpus supporting Intel Turbo Boost Max Technology Srinivas Pandruvada
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

From: Tim Chen <tim.c.chen@linux.intel.com>

The current rebuild_sched_domains will only rebuild the sched domains
unless the cpumask changes.  However, in some scenarios when the
topology flag value changes, it will not rebuild the sched domain.

We create a regenerate_sched_domains function that will always
rebuild all the sched domains to take care of this scenario.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 include/linux/cpuset.h |  2 ++
 include/linux/sched.h  |  3 +++
 kernel/cpuset.c        | 32 +++++++++++++++++++++++++-------
 kernel/sched/core.c    | 25 ++++++++++++++++++++++---
 4 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index bfc204e..9f948fa 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -102,6 +102,8 @@ extern int current_cpuset_is_being_rebound(void);
 
 extern void rebuild_sched_domains(void);
 
+extern void regenerate_sched_domains(void);
+
 extern void cpuset_print_current_mems_allowed(void);
 
 /*
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..3301959 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1152,6 +1152,9 @@ static inline struct cpumask *sched_domain_span(struct sched_domain *sd)
 extern void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 				    struct sched_domain_attr *dattr_new);
 
+extern void regen_partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
+				    struct sched_domain_attr *dattr_new);
+
 /* Allocate an array of sched domains, for partition_sched_domains(). */
 cpumask_var_t *alloc_sched_domains(unsigned int ndoms);
 void free_sched_domains(cpumask_var_t doms[], unsigned int ndoms);
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index c7fd277..f6f7c17 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -794,10 +794,12 @@ done:
  * which has that flag enabled, or if any cpuset with a non-empty
  * 'cpus' is removed, then call this routine to rebuild the
  * scheduler's dynamic sched domains.
+ * If forced flag is set, then we will always regenerate all new
+ * sched domains.
  *
  * Call with cpuset_mutex held.  Takes get_online_cpus().
  */
-static void rebuild_sched_domains_locked(void)
+static void rebuild_sched_domains_locked(bool rebuild_all)
 {
 	struct sched_domain_attr *attr;
 	cpumask_var_t *doms;
@@ -818,12 +820,17 @@ static void rebuild_sched_domains_locked(void)
 	ndoms = generate_sched_domains(&doms, &attr);
 
 	/* Have scheduler rebuild the domains */
-	partition_sched_domains(ndoms, doms, attr);
+	if (rebuild_all)
+		/* Will rebuild a complete set of all sched domains */
+		regen_partition_sched_domains(ndoms, doms, attr);
+	else
+		/* Rebuild only sched domains with changed cpu masks */
+		partition_sched_domains(ndoms, doms, attr);
 out:
 	put_online_cpus();
 }
 #else /* !CONFIG_SMP */
-static void rebuild_sched_domains_locked(void)
+static void rebuild_sched_domains_locked(bool forced)
 {
 }
 #endif /* CONFIG_SMP */
@@ -831,7 +838,18 @@ static void rebuild_sched_domains_locked(void)
 void rebuild_sched_domains(void)
 {
 	mutex_lock(&cpuset_mutex);
-	rebuild_sched_domains_locked();
+	rebuild_sched_domains_locked(false);
+	mutex_unlock(&cpuset_mutex);
+}
+
+/*
+ * Similar to rebuild_sched domains, but will force
+ * all sched domains to be always rebuilt.
+ */
+void regenerate_sched_domains(void)
+{
+	mutex_lock(&cpuset_mutex);
+	rebuild_sched_domains_locked(true);
 	mutex_unlock(&cpuset_mutex);
 }
 
@@ -919,7 +937,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct cpumask *new_cpus)
 	rcu_read_unlock();
 
 	if (need_rebuild_sched_domains)
-		rebuild_sched_domains_locked();
+		rebuild_sched_domains_locked(false);
 }
 
 /**
@@ -1267,7 +1285,7 @@ static int update_relax_domain_level(struct cpuset *cs, s64 val)
 		cs->relax_domain_level = val;
 		if (!cpumask_empty(cs->cpus_allowed) &&
 		    is_sched_load_balance(cs))
-			rebuild_sched_domains_locked();
+			rebuild_sched_domains_locked(true);
 	}
 
 	return 0;
@@ -1333,7 +1351,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
 	spin_unlock_irq(&callback_lock);
 
 	if (!cpumask_empty(trialcs->cpus_allowed) && balance_flag_changed)
-		rebuild_sched_domains_locked();
+		rebuild_sched_domains_locked(false);
 
 	if (spread_flag_changed)
 		update_tasks_flags(cs);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..ec752da 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7069,8 +7069,9 @@ static int dattrs_equal(struct sched_domain_attr *cur, int idx_cur,
  *
  * Call with hotplug lock held
  */
-void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
-			     struct sched_domain_attr *dattr_new)
+static void __partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
+			     struct sched_domain_attr *dattr_new,
+			     int need_domain_rebuild)
 {
 	int i, j, n;
 	int new_topology;
@@ -7081,7 +7082,7 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
 	unregister_sched_domain_sysctl();
 
 	/* Let architecture update cpu core mappings. */
-	new_topology = arch_update_cpu_topology();
+	new_topology = arch_update_cpu_topology() | need_domain_rebuild;
 
 	n = doms_new ? ndoms_new : 0;
 
@@ -7132,6 +7133,24 @@ match2:
 	mutex_unlock(&sched_domains_mutex);
 }
 
+/*
+ * Generate sched domains only when the cpumask or domain attr changes
+ */
+void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
+			     struct sched_domain_attr *dattr_new)
+{
+	__partition_sched_domains(ndoms_new, doms_new, dattr_new, 0);
+}
+
+/*
+ * Generate new sched domains always
+ */
+void regen_partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
+			     struct sched_domain_attr *dattr_new)
+{
+	__partition_sched_domains(ndoms_new, doms_new, dattr_new, 1);
+}
+
 static int num_cpus_frozen;	/* used to mark begin/end of suspend/resume */
 
 /*
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/11] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for cpus supporting Intel Turbo Boost Max Technology
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 03/11] sched: Extend scheduler's asym packing Srinivas Pandruvada
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

From: Tim Chen <tim.c.chen@linux.intel.com>

We uses ASYM_PACKING feature in the scheduler to move tasks to more
capable cpus that can be boosted to higher frequency.  We mark the sched
domain topology level with SD_ASYM_PACKING flag for such systems.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/kernel/smpboot.c | 77 ++++++++++++++++++++++++++++++++++++-----------
 kernel/sched/core.c       |  3 ++
 2 files changed, 62 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 2a6e84a..255f64e 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -464,31 +464,59 @@ static bool match_die(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
 	return false;
 }
 
-static struct sched_domain_topology_level numa_inside_package_topology[] = {
+#ifdef CONFIG_SCHED_ITMT
+extern unsigned int sysctl_sched_itmt_enabled;
+
+static int x86_core_flags(void)
+{
+	int flags = cpu_core_flags();
+
+	if (sysctl_sched_itmt_enabled)
+		flags |= SD_ASYM_PACKING;
+
+	return flags;
+}
+
+static int x86_smt_flags(void)
+{
+	int flags = cpu_smt_flags();
+
+	if (sysctl_sched_itmt_enabled)
+		flags |= SD_ASYM_PACKING;
+
+	return flags;
+}
+#else
+#define x86_core_flags cpu_core_flags
+#define x86_smt_flags cpu_smt_flags
+#endif
+
+static struct sched_domain_topology_level x86_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
+#endif
+	{ cpu_cpu_mask, SD_INIT_NAME(DIE) },
+	{ NULL, },
+};
+
+static struct sched_domain_topology_level x86_numa_in_package_topology[] = {
 #ifdef CONFIG_SCHED_SMT
-	{ cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME(SMT) },
+	{ cpu_smt_mask, x86_smt_flags, SD_INIT_NAME(SMT) },
 #endif
 #ifdef CONFIG_SCHED_MC
-	{ cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+	{ cpu_coregroup_mask, x86_core_flags, SD_INIT_NAME(MC) },
 #endif
 	{ NULL, },
 };
+
 /*
- * set_sched_topology() sets the topology internal to a CPU.  The
- * NUMA topologies are layered on top of it to build the full
- * system topology.
- *
- * If NUMA nodes are observed to occur within a CPU package, this
- * function should be called.  It forces the sched domain code to
- * only use the SMT level for the CPU portion of the topology.
- * This essentially falls back to relying on NUMA information
- * from the SRAT table to describe the entire system topology
- * (except for hyperthreads).
+ * Set if a package/die has multiple NUMA nodes inside.
+ * AMD Magny-Cours and Intel Cluster-on-Die have this.
  */
-static void primarily_use_numa_for_topology(void)
-{
-	set_sched_topology(numa_inside_package_topology);
-}
+static bool x86_has_numa_in_package = false;
 
 void set_cpu_sibling_map(int cpu)
 {
@@ -551,7 +579,7 @@ void set_cpu_sibling_map(int cpu)
 				c->booted_cores = cpu_data(i).booted_cores;
 		}
 		if (match_die(c, o) && !topology_same_node(c, o))
-			primarily_use_numa_for_topology();
+			x86_has_numa_in_package = true;
 	}
 
 	threads = cpumask_weight(topology_sibling_cpumask(cpu));
@@ -1297,6 +1325,16 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 		zalloc_cpumask_var(&per_cpu(cpu_core_map, i), GFP_KERNEL);
 		zalloc_cpumask_var(&per_cpu(cpu_llc_shared_map, i), GFP_KERNEL);
 	}
+
+	/*
+	 * Set 'default' x86 topology, this matches default_topology() in that
+	 * it has NUMA nodes as a topology level. See also
+	 * native_smp_cpus_done().
+	 *
+	 * Must be done before set_cpus_sibling_map() is ran.
+	 */
+	set_sched_topology(x86_topology);
+
 	set_cpu_sibling_map(0);
 
 	switch (smp_sanity_check(max_cpus)) {
@@ -1363,6 +1401,9 @@ void __init native_smp_cpus_done(unsigned int max_cpus)
 {
 	pr_debug("Boot done\n");
 
+	if (x86_has_numa_in_package)
+		set_sched_topology(x86_numa_in_package_topology);
+
 	nmi_selftest();
 	impress_friends();
 	setup_ioapic_dest();
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec752da..342eca9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6487,6 +6487,9 @@ static struct sched_domain_topology_level *sched_domain_topology =
 
 void set_sched_topology(struct sched_domain_topology_level *tl)
 {
+	if (WARN_ON_ONCE(sched_smp_initialized))
+		return;
+
 	sched_domain_topology = tl;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 02/11] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for cpus supporting Intel Turbo Boost Max Technology Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-25 11:22   ` Morten Rasmussen
  2016-08-18 22:36 ` [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

From: Tim Chen <tim.c.chen@linux.intel.com>

We generalize the scheduler's asym packing to provide an
ordering of the cpu beyond just the cpu number.  This allows
the use of the ASYM_PACKING scheduler machinery to move
loads to prefered CPU in a sched domain based on a preference
defined by sched_asym_prefer function.

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 kernel/sched/core.c  | 18 ++++++++++++++++++
 kernel/sched/fair.c  | 25 ++++++++++++++-----------
 kernel/sched/sched.h | 17 +++++++++++++++++
 3 files changed, 49 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 342eca9..2ca99a1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6237,7 +6237,25 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1, prev_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (prev_cpu < 0) {
+				prev_cpu = cpu;
+				max_cpu = cpu;
+			} else {
+				if (sched_asym_prefer(cpu, max_cpu))
+					max_cpu = cpu;
+			}
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 039de34..37a30d6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6862,16 +6862,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, group_priority_cpu(sg))) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(group_priority_cpu(sds->busiest),
+				      group_priority_cpu(sg)))
 			return true;
 	}
 
@@ -7023,8 +7025,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = group_priority_cpu(sds->busiest);
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7365,10 +7367,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c64fc51..75e1002 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -532,6 +532,22 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+#ifndef sched_asym_prefer
+
+/* For default ASYM_PACKING, lower numbered cpu is prefered */
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return a < b;
+}
+
+#endif /* sched_asym_prefer */
+
+/*
+ * Return lowest numbered cpu in the group as the most prefered cpu
+ * for ASYM_PACKING for default case.
+ */
+#define group_priority_cpu(group) group->asym_prefer_cpu
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -884,6 +900,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (2 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 03/11] sched: Extend scheduler's asym packing Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-22  9:01   ` kbuild test robot
  2016-08-24 10:18   ` Ingo Molnar
  2016-08-18 22:36 ` [PATCH 05/11] acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config Srinivas Pandruvada
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

From: Tim Chen <tim.c.chen@linux.intel.com>

On some Intel cores, they can boosted to a higher turbo frequency than
the other cores on the same die.  So we prefer processes to be run on
them vs other lower frequency ones for extra performance.

We extend the asym packing feature in the scheduler to support packing
task to the higher frequency core at the core sched domain level.

We set up a core priority metric to abstract the core preferences based
on the maximum boost frequency.  The priority is instantiated such that
the core with a higher priority is favored over the core with lower
priority when making scheduling decision using ASYM_PACKING.  The smt
threads that are of higher number are discounted in their priority so
we will not try to pack tasks onto all the threads of a favored core
before using other cpu cores.  The cpu that's of the highese priority
in a sched_group is recorded in sched_group->asym_prefer_cpu during
initialization to save lookup during load balancing.

A sysctl variable /proc/sys/kernel/sched_itmt_enabled is provided so
the scheduling based on favored core can be turned on or off at run time.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 arch/x86/Kconfig                |   9 +++
 arch/x86/include/asm/topology.h |  26 +++++++
 arch/x86/kernel/Makefile        |   1 +
 arch/x86/kernel/itmt.c          | 147 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 183 insertions(+)
 create mode 100644 arch/x86/kernel/itmt.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index c580d8c..c1d36db 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -928,6 +928,15 @@ config SCHED_MC
 	  making when dealing with multi-core CPU chips at a cost of slightly
 	  increased overhead in some places. If unsure say N here.
 
+config SCHED_ITMT
+	bool "Intel Turbo Boost Max Technology (ITMT) scheduler support"
+	depends on SCHED_MC && CPU_SUP_INTEL && X86_INTEL_PSTATE
+	---help---
+	  ITMT enabled scheduler support improves the CPU scheduler's decision
+	  to move tasks to cpu core that can be boosted to a higher frequency
+	  than others. It will have better performance at a cost of slightly
+	  increased overhead in task migrations. If unsure say N here.
+
 source "kernel/Kconfig.preempt"
 
 config UP_LATE_INIT
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index cf75871..f148843 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -146,4 +146,30 @@ struct pci_bus;
 int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
+#ifdef CONFIG_SCHED_ITMT
+DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return per_cpu(sched_core_priority, a) > per_cpu(sched_core_priority, b);
+}
+
+#define sched_asym_prefer sched_asym_prefer
+
+/* Interface to set priority of a cpu */
+void sched_set_itmt_core_prio(int prio, int core_cpu);
+
+/* Interface to notify scheduler that system supports ITMT */
+void set_sched_itmt(bool support_itmt);
+
+#else /* CONFIG_SCHED_ITMT */
+
+static inline void set_sched_itmt(bool support_itmt)
+{
+}
+static inline void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+}
+#endif /* CONFIG_SCHED_ITMT */
+
 #endif /* _ASM_X86_TOPOLOGY_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 0503f5b..2008335 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -124,6 +124,7 @@ obj-$(CONFIG_EFI)			+= sysfb_efi.o
 
 obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
+obj-$(CONFIG_SCHED_ITMT)		+= itmt.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/itmt.c b/arch/x86/kernel/itmt.c
new file mode 100644
index 0000000..a9470f5
--- /dev/null
+++ b/arch/x86/kernel/itmt.c
@@ -0,0 +1,147 @@
+/*
+ * itmt.c: Functions and data structures for enabling
+ * 	   scheduler to favor scheduling on cores that
+ *	   can be boosted to a higher frequency using
+ *	   Intel Turbo Boost Max Technology 3.0
+ *
+ * (C) Copyright 2016 Intel Corporation
+ * Author: Tim Chen <tim.c.chen@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/sched.h>
+#include <linux/cpumask.h>
+#include <linux/cpuset.h>
+#include <asm/mutex.h>
+#include <linux/sched.h>
+#include <linux/sysctl.h>
+
+DEFINE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
+static DEFINE_MUTEX(itmt_update_mutex);
+
+static unsigned int zero = 0;
+static unsigned int one = 1;
+
+/*
+ * Boolean to control whether we want to move processes to cpu capable
+ * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
+ * Technology 3.0.
+ *
+ * It can be set via /proc/sys/kernel/sched_itmt_enabled
+ */
+unsigned int __read_mostly sysctl_sched_itmt_enabled = 0;
+
+/*
+ * The pstate_driver calls set_sched_itmt to indicate if the system
+ * is ITMT capable.
+ */
+static bool __read_mostly sched_itmt_capable;
+
+static void enable_sched_itmt(bool enable_itmt)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	sysctl_sched_itmt_enabled = enable_itmt;
+	regenerate_sched_domains();
+
+	mutex_unlock(&itmt_update_mutex);
+}
+
+static int sched_itmt_update_handler(struct ctl_table *table, int write,
+			      void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	int ret;
+
+	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+
+	if (ret || !write)
+		return ret;
+
+	enable_sched_itmt(sysctl_sched_itmt_enabled);
+
+	return ret;
+}
+
+static struct ctl_table itmt_kern_table[] = {
+	{
+		.procname	= "sched_itmt_enabled",
+		.data		= &sysctl_sched_itmt_enabled,
+		.maxlen		= sizeof(unsigned int),
+		.mode		= 0644,
+		.proc_handler	= sched_itmt_update_handler,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
+	{}
+};
+
+static struct ctl_table itmt_root_table[] = {
+	{
+		.procname	= "kernel",
+		.mode		= 0555,
+		.child		= itmt_kern_table,
+	},
+	{}
+};
+
+static struct ctl_table_header *itmt_sysctl_header;
+
+/*
+ * The boot code will find out the max boost frequency
+ * and call this function to set a priority proportional
+ * to the max boost frequency. CPU with higher boost
+ * frequency will receive higher priority.
+ */
+void sched_set_itmt_core_prio(int prio, int core_cpu)
+{
+	int cpu, i = 1;
+
+	for_each_cpu(cpu, topology_sibling_cpumask(core_cpu)) {
+		int smt_prio;
+
+		/*
+		 * Discount the priority of sibling so that we don't
+		 * pack all loads to the same core before using other cores.
+		 */
+		smt_prio = prio * smp_num_siblings / i;
+		i++;
+		per_cpu(sched_core_priority, cpu) = smt_prio;
+	}
+}
+
+/*
+ * During boot up, boot code will detect if the system
+ * is ITMT capable and call set_sched_itmt.
+ *
+ * This should be call after sched_set_itmt_core_prio
+ * has been called to set the cpus' priorities.
+ *
+ * This function should be called without cpu hot plug lock
+ * as we need to acquire the lock to rebuild sched domains
+ * later.
+ */
+void set_sched_itmt(bool itmt_capable)
+{
+	mutex_lock(&itmt_update_mutex);
+
+	if (itmt_capable != sched_itmt_capable) {
+
+		if (itmt_capable) {
+			itmt_sysctl_header =
+				register_sysctl_table(itmt_root_table);
+		} else {
+			if (itmt_sysctl_header)
+				unregister_sysctl_table(itmt_sysctl_header);
+			sysctl_sched_itmt_enabled = false;
+		}
+
+		sched_itmt_capable = itmt_capable;
+		regenerate_sched_domains();
+	}
+
+	mutex_unlock(&itmt_update_mutex);
+}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/11] acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (3 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-20  0:46   ` Rafael J. Wysocki
  2016-08-18 22:36 ` [PATCH 06/11] acpi: cpcc: Add integer read support Srinivas Pandruvada
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

Some newer x86 platforms have support for both _CPC and _PSS object. So
kernel config can have both ACPI_CPU_FREQ_PSS and ACPI_CPPC_LIB. So remove
restriction for ACPI_CPPC_LIB to build only when ACPI_CPU_FREQ_PSS is not
defined.
Also for legacy systems with only _PSS, we shouldn't bail out if
acpi_cppc_processor_probe() fails, if ACPI_CPU_FREQ_PSS is also defined.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/Kconfig            | 1 -
 drivers/acpi/processor_driver.c | 5 ++++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 445ce28..c6bb6aa 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -227,7 +227,6 @@ config ACPI_MCFG
 config ACPI_CPPC_LIB
 	bool
 	depends on ACPI_PROCESSOR
-	depends on !ACPI_CPU_FREQ_PSS
 	select MAILBOX
 	select PCC
 	help
diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index 0553aee..0e0b629 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -245,8 +245,11 @@ static int __acpi_processor_start(struct acpi_device *device)
 		return 0;
 
 	result = acpi_cppc_processor_probe(pr);
-	if (result)
+	if (result) {
+#ifndef CONFIG_ACPI_CPU_FREQ_PSS
 		return -ENODEV;
+#endif
+	}
 
 	if (!cpuidle_get_driver() || cpuidle_get_driver() == &acpi_idle_driver)
 		acpi_processor_power_init(pr);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/11] acpi: cpcc: Add integer read support
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (4 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 05/11] acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 07/11] acpi: cppc: Add support for function fixed hardware address Srinivas Pandruvada
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

The _CPC performance limits can also be simple integer not just buffer
with register information. Add support for integer read via cpc_read.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/cppc_acpi.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 2e98173..34209f5 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -651,11 +651,18 @@ EXPORT_SYMBOL_GPL(acpi_cppc_processor_exit);
  * we can directly write to it.
  */
 
-static int cpc_read(struct cpc_reg *reg, u64 *val)
+static int cpc_read(struct cpc_register_resource *res, u64 *val)
 {
+	struct cpc_reg *reg = &res->cpc_entry.reg;
 	int ret_val = 0;
 
 	*val = 0;
+
+	if (res->type == ACPI_TYPE_INTEGER) {
+		*val = res->cpc_entry.int_value;
+		return 0;
+	}
+
 	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
 		void __iomem *vaddr = GET_PCC_VADDR(reg->address);
 
@@ -754,16 +761,16 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
 		}
 	}
 
-	cpc_read(&highest_reg->cpc_entry.reg, &high);
+	cpc_read(highest_reg, &high);
 	perf_caps->highest_perf = high;
 
-	cpc_read(&lowest_reg->cpc_entry.reg, &low);
+	cpc_read(lowest_reg, &low);
 	perf_caps->lowest_perf = low;
 
-	cpc_read(&ref_perf->cpc_entry.reg, &ref);
+	cpc_read(ref_perf, &ref);
 	perf_caps->reference_perf = ref;
 
-	cpc_read(&nom_perf->cpc_entry.reg, &nom);
+	cpc_read(nom_perf, &nom);
 	perf_caps->nominal_perf = nom;
 
 	if (!ref)
@@ -812,8 +819,8 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 		}
 	}
 
-	cpc_read(&delivered_reg->cpc_entry.reg, &delivered);
-	cpc_read(&reference_reg->cpc_entry.reg, &reference);
+	cpc_read(delivered_reg, &delivered);
+	cpc_read(reference_reg, &reference);
 
 	if (!delivered || !reference) {
 		ret = -EFAULT;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/11] acpi: cppc: Add support for function fixed hardware address
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (5 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 06/11] acpi: cpcc: Add integer read support Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-20  0:49   ` Rafael J. Wysocki
  2016-08-18 22:36 ` [PATCH 08/11] acpi: cppc: Add prefix cppc to cpudata structure name Srinivas Pandruvada
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

The CPPC registers can also be accessed via function fixed hardware
addresses in X86. Add support by modifying cpc_read and cpc_write
to be able to read/write MSRs on x86 platform. Also with this change,
acpi_cppc_processor_probe doesn't bail out if space id is not equal to
PCC or memory address space.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/cppc_acpi.c | 77 +++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 66 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 34209f5..939fb5c 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -42,6 +42,10 @@
 #include <linux/ktime.h>
 
 #include <acpi/cppc_acpi.h>
+#ifdef CONFIG_X86
+#include <asm/msr.h>
+#endif
+
 /*
  * Lock to provide mutually exclusive access to the PCC
  * channel. e.g. When the remote updates the shared region
@@ -585,8 +589,9 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
 					pr_debug("Mismatched PCC ids.\n");
 					goto out_free;
 				}
-			} else if (gas_t->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
-				/* Support only PCC and SYS MEM type regs */
+			} else if (gas_t->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY &&
+				   gas_t->space_id != ACPI_ADR_SPACE_FIXED_HARDWARE) {
+				/* Support only PCC, FFH and SYS MEM type regs */
 				pr_debug("Unsupported register type: %d\n", gas_t->space_id);
 				goto out_free;
 			}
@@ -645,13 +650,59 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
 }
 EXPORT_SYMBOL_GPL(acpi_cppc_processor_exit);
 
+#ifdef CONFIG_X86
+static int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val)
+{
+	int err;
+
+	err = rdmsrl_on_cpu(cpunum, reg->address, val);
+	if (!err) {
+		u64 mask = GENMASK_ULL(reg->bit_offset + reg->bit_width - 1,
+				       reg->bit_offset);
+
+		*val &= mask;
+		*val >>= reg->bit_offset;
+	}
+	return err;
+}
+
+static int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
+{
+	u64 rd_val;
+	int err;
+
+	err = rdmsrl_on_cpu(cpunum, reg->address, &rd_val);
+	if (!err) {
+		u64 mask = GENMASK_ULL(reg->bit_offset + reg->bit_width - 1,
+				       reg->bit_offset);
+
+		val <<= reg->bit_offset;
+		val &= mask;
+		rd_val &= ~mask;
+		rd_val |= val;
+		err = wrmsrl_on_cpu(cpunum, reg->address, rd_val);
+	}
+	return err;
+}
+#else
+static int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val)
+{
+	return -EINVAL;
+}
+static int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
+{
+	return -EINVAL;
+
+}
+#endif
+
 /*
  * Since cpc_read and cpc_write are called while holding pcc_lock, it should be
  * as fast as possible. We have already mapped the PCC subspace during init, so
  * we can directly write to it.
  */
 
-static int cpc_read(struct cpc_register_resource *res, u64 *val)
+static int cpc_read(int cpunum, struct cpc_register_resource *res, u64 *val)
 {
 	struct cpc_reg *reg = &res->cpc_entry.reg;
 	int ret_val = 0;
@@ -684,13 +735,15 @@ static int cpc_read(struct cpc_register_resource *res, u64 *val)
 				reg->bit_width);
 			ret_val = -EFAULT;
 		}
+	} else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE) {
+		ret_val = cpc_read_ffh(cpunum, reg, val);
 	} else
 		ret_val = acpi_os_read_memory((acpi_physical_address)reg->address,
 					val, reg->bit_width);
 	return ret_val;
 }
 
-static int cpc_write(struct cpc_reg *reg, u64 val)
+static int cpc_write(int cpunum, struct cpc_reg *reg, u64 val)
 {
 	int ret_val = 0;
 
@@ -716,6 +769,8 @@ static int cpc_write(struct cpc_reg *reg, u64 val)
 			ret_val = -EFAULT;
 			break;
 		}
+	} else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE) {
+		ret_val = cpc_write_ffh(cpunum, reg, val);
 	} else
 		ret_val = acpi_os_write_memory((acpi_physical_address)reg->address,
 				val, reg->bit_width);
@@ -761,16 +816,16 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
 		}
 	}
 
-	cpc_read(highest_reg, &high);
+	cpc_read(cpunum, highest_reg, &high);
 	perf_caps->highest_perf = high;
 
-	cpc_read(lowest_reg, &low);
+	cpc_read(cpunum, lowest_reg, &low);
 	perf_caps->lowest_perf = low;
 
-	cpc_read(ref_perf, &ref);
+	cpc_read(cpunum, ref_perf, &ref);
 	perf_caps->reference_perf = ref;
 
-	cpc_read(nom_perf, &nom);
+	cpc_read(cpunum, nom_perf, &nom);
 	perf_caps->nominal_perf = nom;
 
 	if (!ref)
@@ -819,8 +874,8 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
 		}
 	}
 
-	cpc_read(delivered_reg, &delivered);
-	cpc_read(reference_reg, &reference);
+	cpc_read(cpunum, delivered_reg, &delivered);
+	cpc_read(cpunum, reference_reg, &reference);
 
 	if (!delivered || !reference) {
 		ret = -EFAULT;
@@ -875,7 +930,7 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
 	 * Skip writing MIN/MAX until Linux knows how to come up with
 	 * useful values.
 	 */
-	cpc_write(&desired_reg->cpc_entry.reg, perf_ctrls->desired_perf);
+	cpc_write(cpu, &desired_reg->cpc_entry.reg, perf_ctrls->desired_perf);
 
 	/* Is this a PCC reg ?*/
 	if (desired_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/11] acpi: cppc: Add prefix cppc to cpudata structure name
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (6 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 07/11] acpi: cppc: Add support for function fixed hardware address Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 09/11] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

Since struct cpudata is defined in a header file, add prefix cppc_ to
make it not a generic name. Otherwise it causes compile issue in locally
define structure with the same name.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/cppc_acpi.c       |  4 ++--
 drivers/cpufreq/cppc_cpufreq.c | 14 +++++++-------
 include/acpi/cppc_acpi.h       |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
index 939fb5c..5ca517d 100644
--- a/drivers/acpi/cppc_acpi.c
+++ b/drivers/acpi/cppc_acpi.c
@@ -276,13 +276,13 @@ end:
  *
  *	Return: 0 for success or negative value for err.
  */
-int acpi_get_psd_map(struct cpudata **all_cpu_data)
+int acpi_get_psd_map(struct cppc_cpudata **all_cpu_data)
 {
 	int count_target;
 	int retval = 0;
 	unsigned int i, j;
 	cpumask_var_t covered_cpus;
-	struct cpudata *pr, *match_pr;
+	struct cppc_cpudata *pr, *match_pr;
 	struct acpi_psd_package *pdomain;
 	struct acpi_psd_package *match_pdomain;
 	struct cpc_desc *cpc_ptr, *match_cpc_ptr;
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index 8882b8e..b1b3549 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -30,13 +30,13 @@
  * performance capabilities, desired performance level
  * requested etc.
  */
-static struct cpudata **all_cpu_data;
+static struct cppc_cpudata **all_cpu_data;
 
 static int cppc_cpufreq_set_target(struct cpufreq_policy *policy,
 		unsigned int target_freq,
 		unsigned int relation)
 {
-	struct cpudata *cpu;
+	struct cppc_cpudata *cpu;
 	struct cpufreq_freqs freqs;
 	int ret = 0;
 
@@ -66,7 +66,7 @@ static int cppc_verify_policy(struct cpufreq_policy *policy)
 static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 {
 	int cpu_num = policy->cpu;
-	struct cpudata *cpu = all_cpu_data[cpu_num];
+	struct cppc_cpudata *cpu = all_cpu_data[cpu_num];
 	int ret;
 
 	cpu->perf_ctrls.desired_perf = cpu->perf_caps.lowest_perf;
@@ -79,7 +79,7 @@ static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy)
 
 static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
 {
-	struct cpudata *cpu;
+	struct cppc_cpudata *cpu;
 	unsigned int cpu_num = policy->cpu;
 	int ret = 0;
 
@@ -134,7 +134,7 @@ static struct cpufreq_driver cppc_cpufreq_driver = {
 static int __init cppc_cpufreq_init(void)
 {
 	int i, ret = 0;
-	struct cpudata *cpu;
+	struct cppc_cpudata *cpu;
 
 	if (acpi_disabled)
 		return -ENODEV;
@@ -144,7 +144,7 @@ static int __init cppc_cpufreq_init(void)
 		return -ENOMEM;
 
 	for_each_possible_cpu(i) {
-		all_cpu_data[i] = kzalloc(sizeof(struct cpudata), GFP_KERNEL);
+		all_cpu_data[i] = kzalloc(sizeof(struct cppc_cpudata), GFP_KERNEL);
 		if (!all_cpu_data[i])
 			goto out;
 
@@ -175,7 +175,7 @@ out:
 
 static void __exit cppc_cpufreq_exit(void)
 {
-	struct cpudata *cpu;
+	struct cppc_cpudata *cpu;
 	int i;
 
 	cpufreq_unregister_driver(&cppc_cpufreq_driver);
diff --git a/include/acpi/cppc_acpi.h b/include/acpi/cppc_acpi.h
index 284965c..b7816c2 100644
--- a/include/acpi/cppc_acpi.h
+++ b/include/acpi/cppc_acpi.h
@@ -114,7 +114,7 @@ struct cppc_perf_fb_ctrs {
 };
 
 /* Per CPU container for runtime CPPC management. */
-struct cpudata {
+struct cppc_cpudata {
 	int cpu;
 	struct cppc_perf_caps perf_caps;
 	struct cppc_perf_ctrls perf_ctrls;
@@ -127,6 +127,6 @@ struct cpudata {
 extern int cppc_get_perf_ctrs(int cpu, struct cppc_perf_fb_ctrs *perf_fb_ctrs);
 extern int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls);
 extern int cppc_get_perf_caps(int cpu, struct cppc_perf_caps *caps);
-extern int acpi_get_psd_map(struct cpudata **);
+extern int acpi_get_psd_map(struct cppc_cpudata **);
 
 #endif /* _CPPC_ACPI_H*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/11] acpi: bus: Enable HWP CPPC objects
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (7 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 08/11] acpi: cppc: Add prefix cppc to cpudata structure name Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-20  0:49   ` Rafael J. Wysocki
  2016-08-18 22:36 ` [PATCH 10/11] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
  2016-08-18 22:36 ` [PATCH 11/11] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
If platform supports CPPC, then BIOS exposess CPPC tables.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/bus.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 85b7d07..61643a5 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -330,6 +330,13 @@ static void acpi_bus_osc_support(void)
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_OST_SUPPORT;
 	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_PCLPI_SUPPORT;
 
+#ifdef CONFIG_X86
+	if (boot_cpu_has(X86_FEATURE_HWP)) {
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT;
+		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
+	}
+#endif
+
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/11] acpi: bus: Set _OSC for diverse core support
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (8 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 09/11] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-20  0:51   ` Rafael J. Wysocki
  2016-08-18 22:36 ` [PATCH 11/11] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
core support.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/acpi/bus.c   | 4 +++-
 include/linux/acpi.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 61643a5..fbd3b7c 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -336,7 +336,9 @@ static void acpi_bus_osc_support(void)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
 	}
 #endif
-
+#ifdef CONFIG_SCHED_ITMT
+	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;
+#endif
 	if (!ghes_disable)
 		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
 	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 4d8452c..17f6e08 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -460,6 +460,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
 #define OSC_SB_CPCV2_SUPPORT			0x00000040
 #define OSC_SB_PCLPI_SUPPORT			0x00000080
 #define OSC_SB_OSLPI_SUPPORT			0x00000100
+#define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT		0x00001000
 
 extern bool osc_sb_apei_support_acked;
 extern bool osc_pc_lpi_support_confirmed;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/11] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
                   ` (9 preceding siblings ...)
  2016-08-18 22:36 ` [PATCH 10/11] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
@ 2016-08-18 22:36 ` Srinivas Pandruvada
  2016-08-22 11:59   ` kbuild test robot
  10 siblings, 1 reply; 33+ messages in thread
From: Srinivas Pandruvada @ 2016-08-18 22:36 UTC (permalink / raw)
  To: mingo, tglx, hpa, rjw, peterz
  Cc: x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	srinivas.pandruvada, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

This change uses acpi cppc_lib interface to get CPPC performance limits.
Once CPPC limits of all online cores are read, first check if there is
difference in max performance. If there is a difference, then the
scheduler interface is called to update per cpu priority. After updating
priority of all current cpus, the itmt feature is enabled.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 drivers/cpufreq/Kconfig.x86    |  1 +
 drivers/cpufreq/intel_pstate.c | 75 ++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/Kconfig.x86 b/drivers/cpufreq/Kconfig.x86
index adbd1de..6c4f747 100644
--- a/drivers/cpufreq/Kconfig.x86
+++ b/drivers/cpufreq/Kconfig.x86
@@ -6,6 +6,7 @@ config X86_INTEL_PSTATE
        bool "Intel P state control"
        depends on X86
        select ACPI_PROCESSOR if ACPI
+       select ACPI_CPPC_LIB if ACPI
        help
           This driver provides a P state for Intel core processors.
 	  The driver implements an internal governor and will become
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index be9eade..c51b9c7 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -44,6 +44,7 @@
 
 #ifdef CONFIG_ACPI
 #include <acpi/processor.h>
+#include <acpi/cppc_acpi.h>
 #endif
 
 #define FRAC_BITS 8
@@ -193,6 +194,8 @@ struct _pid {
  * @sample:		Storage for storing last Sample data
  * @acpi_perf_data:	Stores ACPI perf information read from _PSS
  * @valid_pss_table:	Set to true for valid ACPI _PSS entries found
+ * @cppc_data:		Stores CPPC information for HWP capable CPUs
+ * @valid_cppc_table:	Set to true for valid CPPC entries are found
  *
  * This structure stores per CPU instance data for all CPUs.
  */
@@ -215,6 +218,8 @@ struct cpudata {
 #ifdef CONFIG_ACPI
 	struct acpi_processor_performance acpi_perf_data;
 	bool valid_pss_table;
+	struct cppc_cpudata *cppc_data;
+	bool valid_cppc_table;
 #endif
 };
 
@@ -361,6 +366,15 @@ static struct perf_limits *limits = &powersave_limits;
 #endif
 
 #ifdef CONFIG_ACPI
+static cpumask_t cppc_rd_cpu_mask;
+
+/* Call set_sched_itmt from a work function to be able to use hotplug locks */
+static void intel_pstste_sched_itmt_work_fn(struct work_struct *work)
+{
+	set_sched_itmt(true);
+}
+
+static DECLARE_WORK(sched_itmt_work, intel_pstste_sched_itmt_work_fn);
 
 static bool intel_pstate_get_ppc_enable_status(void)
 {
@@ -377,14 +391,63 @@ static void intel_pstate_init_acpi_perf_limits(struct cpufreq_policy *policy)
 	int ret;
 	int i;
 
-	if (hwp_active)
+	cpu = all_cpu_data[policy->cpu];
+
+	if (hwp_active) {
+		struct cppc_perf_caps *perf_caps;
+
+		cpu->cppc_data = kzalloc(sizeof(struct cppc_cpudata),
+					 GFP_KERNEL);
+		if (!cpu->cppc_data)
+			return;
+
+		perf_caps = &cpu->cppc_data->perf_caps;
+		ret = cppc_get_perf_caps(policy->cpu, perf_caps);
+		if (ret) {
+			kfree(cpu->cppc_data);
+			return;
+		}
+
+		cpu->valid_cppc_table = true;
+		pr_debug("cpu:%d H:0x%x N:0x%x R:0x%x L:0x%x\n", policy->cpu,
+			 perf_caps->highest_perf, perf_caps->nominal_perf,
+			 perf_caps->reference_perf, perf_caps->lowest_perf);
+
+		cpumask_set_cpu(policy->cpu, &cppc_rd_cpu_mask);
+		if (cpumask_subset(topology_core_cpumask(policy->cpu),
+				   &cppc_rd_cpu_mask)) {
+			int cpu_index;
+			int max_prio;
+			bool itmt_support = false;
+
+			cpu = all_cpu_data[0];
+			max_prio = cpu->cppc_data->perf_caps.highest_perf;
+			for_each_cpu(cpu_index, &cppc_rd_cpu_mask) {
+				cpu = all_cpu_data[cpu_index];
+				perf_caps = &cpu->cppc_data->perf_caps;
+				if (max_prio != perf_caps->highest_perf) {
+					itmt_support = true;
+					break;
+				}
+			}
+
+			if (!itmt_support)
+				return;
+
+			for_each_cpu(cpu_index, &cppc_rd_cpu_mask) {
+				cpu = all_cpu_data[cpu_index];
+				perf_caps = &cpu->cppc_data->perf_caps;
+				sched_set_itmt_core_prio(
+					perf_caps->highest_perf, cpu_index);
+			}
+			schedule_work(&sched_itmt_work);
+		}
 		return;
+	}
 
 	if (!intel_pstate_get_ppc_enable_status())
 		return;
 
-	cpu = all_cpu_data[policy->cpu];
-
 	ret = acpi_processor_register_performance(&cpu->acpi_perf_data,
 						  policy->cpu);
 	if (ret)
@@ -444,6 +507,12 @@ static void intel_pstate_exit_perf_limits(struct cpufreq_policy *policy)
 	struct cpudata *cpu;
 
 	cpu = all_cpu_data[policy->cpu];
+
+	if (cpu->valid_cppc_table) {
+		cpumask_clear_cpu(policy->cpu, &cppc_rd_cpu_mask);
+		kfree(cpu->cppc_data);
+	}
+
 	if (!cpu->valid_pss_table)
 		return;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/11] acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config
  2016-08-18 22:36 ` [PATCH 05/11] acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config Srinivas Pandruvada
@ 2016-08-20  0:46   ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2016-08-20  0:46 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, peterz, x86, bp, sudeep.holla, ak, linux-acpi,
	linux-pm, alexey.klimov, viresh.kumar, akpm, linux-kernel, lenb,
	tim.c.chen, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

On Thursday, August 18, 2016 03:36:46 PM Srinivas Pandruvada wrote:
> Some newer x86 platforms have support for both _CPC and _PSS object. So
> kernel config can have both ACPI_CPU_FREQ_PSS and ACPI_CPPC_LIB. So remove
> restriction for ACPI_CPPC_LIB to build only when ACPI_CPU_FREQ_PSS is not
> defined.
> Also for legacy systems with only _PSS, we shouldn't bail out if
> acpi_cppc_processor_probe() fails, if ACPI_CPU_FREQ_PSS is also defined.
> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  drivers/acpi/Kconfig            | 1 -
>  drivers/acpi/processor_driver.c | 5 ++++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index 445ce28..c6bb6aa 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -227,7 +227,6 @@ config ACPI_MCFG
>  config ACPI_CPPC_LIB
>  	bool
>  	depends on ACPI_PROCESSOR
> -	depends on !ACPI_CPU_FREQ_PSS
>  	select MAILBOX
>  	select PCC
>  	help
> diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
> index 0553aee..0e0b629 100644
> --- a/drivers/acpi/processor_driver.c
> +++ b/drivers/acpi/processor_driver.c
> @@ -245,8 +245,11 @@ static int __acpi_processor_start(struct acpi_device *device)
>  		return 0;
>  
>  	result = acpi_cppc_processor_probe(pr);
> -	if (result)
> +	if (result) {
> +#ifndef CONFIG_ACPI_CPU_FREQ_PSS
>  		return -ENODEV;
> +#endif
> +	}

	if (result && !IS_ENABLED(CONFIG_ACPI_CPU_FREQ_PSS))
		return -ENODEV;

would look better.

>  
>  	if (!cpuidle_get_driver() || cpuidle_get_driver() == &acpi_idle_driver)
>  		acpi_processor_power_init(pr);
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 07/11] acpi: cppc: Add support for function fixed hardware address
  2016-08-18 22:36 ` [PATCH 07/11] acpi: cppc: Add support for function fixed hardware address Srinivas Pandruvada
@ 2016-08-20  0:49   ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2016-08-20  0:49 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, peterz, x86, bp, sudeep.holla, ak, linux-acpi,
	linux-pm, alexey.klimov, viresh.kumar, akpm, linux-kernel, lenb,
	tim.c.chen, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

On Thursday, August 18, 2016 03:36:48 PM Srinivas Pandruvada wrote:
> The CPPC registers can also be accessed via function fixed hardware
> addresses in X86. Add support by modifying cpc_read and cpc_write
> to be able to read/write MSRs on x86 platform. Also with this change,
> acpi_cppc_processor_probe doesn't bail out if space id is not equal to
> PCC or memory address space.
> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  drivers/acpi/cppc_acpi.c | 77 +++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/acpi/cppc_acpi.c b/drivers/acpi/cppc_acpi.c
> index 34209f5..939fb5c 100644
> --- a/drivers/acpi/cppc_acpi.c
> +++ b/drivers/acpi/cppc_acpi.c
> @@ -42,6 +42,10 @@
>  #include <linux/ktime.h>
>  
>  #include <acpi/cppc_acpi.h>
> +#ifdef CONFIG_X86
> +#include <asm/msr.h>
> +#endif

Please figure out how to avoid this.

> +
>  /*
>   * Lock to provide mutually exclusive access to the PCC
>   * channel. e.g. When the remote updates the shared region
> @@ -585,8 +589,9 @@ int acpi_cppc_processor_probe(struct acpi_processor *pr)
>  					pr_debug("Mismatched PCC ids.\n");
>  					goto out_free;
>  				}
> -			} else if (gas_t->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY) {
> -				/* Support only PCC and SYS MEM type regs */
> +			} else if (gas_t->space_id != ACPI_ADR_SPACE_SYSTEM_MEMORY &&
> +				   gas_t->space_id != ACPI_ADR_SPACE_FIXED_HARDWARE) {
> +				/* Support only PCC, FFH and SYS MEM type regs */
>  				pr_debug("Unsupported register type: %d\n", gas_t->space_id);
>  				goto out_free;
>  			}
> @@ -645,13 +650,59 @@ void acpi_cppc_processor_exit(struct acpi_processor *pr)
>  }
>  EXPORT_SYMBOL_GPL(acpi_cppc_processor_exit);
>  
> +#ifdef CONFIG_X86
> +static int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val)
> +{
> +	int err;
> +
> +	err = rdmsrl_on_cpu(cpunum, reg->address, val);
> +	if (!err) {
> +		u64 mask = GENMASK_ULL(reg->bit_offset + reg->bit_width - 1,
> +				       reg->bit_offset);
> +
> +		*val &= mask;
> +		*val >>= reg->bit_offset;
> +	}
> +	return err;
> +}
> +
> +static int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
> +{
> +	u64 rd_val;
> +	int err;
> +
> +	err = rdmsrl_on_cpu(cpunum, reg->address, &rd_val);
> +	if (!err) {
> +		u64 mask = GENMASK_ULL(reg->bit_offset + reg->bit_width - 1,
> +				       reg->bit_offset);
> +
> +		val <<= reg->bit_offset;
> +		val &= mask;
> +		rd_val &= ~mask;
> +		rd_val |= val;
> +		err = wrmsrl_on_cpu(cpunum, reg->address, rd_val);
> +	}
> +	return err;
> +}

The above really should go somewhere under arch/x86/.

> +#else
> +static int cpc_read_ffh(int cpunum, struct cpc_reg *reg, u64 *val)
> +{
> +	return -EINVAL;
> +}
> +static int cpc_write_ffh(int cpunum, struct cpc_reg *reg, u64 val)
> +{
> +	return -EINVAL;
> +
> +}

And I would defined these as __weak functions.

Also another return value like -ENOTSUPP for example would better IMO.

> +#endif
> +
>  /*
>   * Since cpc_read and cpc_write are called while holding pcc_lock, it should be
>   * as fast as possible. We have already mapped the PCC subspace during init, so
>   * we can directly write to it.
>   */
>  
> -static int cpc_read(struct cpc_register_resource *res, u64 *val)
> +static int cpc_read(int cpunum, struct cpc_register_resource *res, u64 *val)
>  {
>  	struct cpc_reg *reg = &res->cpc_entry.reg;
>  	int ret_val = 0;
> @@ -684,13 +735,15 @@ static int cpc_read(struct cpc_register_resource *res, u64 *val)
>  				reg->bit_width);
>  			ret_val = -EFAULT;
>  		}
> +	} else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE) {
> +		ret_val = cpc_read_ffh(cpunum, reg, val);
>  	} else
>  		ret_val = acpi_os_read_memory((acpi_physical_address)reg->address,
>  					val, reg->bit_width);
>  	return ret_val;
>  }
>  
> -static int cpc_write(struct cpc_reg *reg, u64 val)
> +static int cpc_write(int cpunum, struct cpc_reg *reg, u64 val)
>  {
>  	int ret_val = 0;
>  
> @@ -716,6 +769,8 @@ static int cpc_write(struct cpc_reg *reg, u64 val)
>  			ret_val = -EFAULT;
>  			break;
>  		}
> +	} else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE) {
> +		ret_val = cpc_write_ffh(cpunum, reg, val);
>  	} else
>  		ret_val = acpi_os_write_memory((acpi_physical_address)reg->address,
>  				val, reg->bit_width);
> @@ -761,16 +816,16 @@ int cppc_get_perf_caps(int cpunum, struct cppc_perf_caps *perf_caps)
>  		}
>  	}
>  
> -	cpc_read(highest_reg, &high);
> +	cpc_read(cpunum, highest_reg, &high);
>  	perf_caps->highest_perf = high;
>  
> -	cpc_read(lowest_reg, &low);
> +	cpc_read(cpunum, lowest_reg, &low);
>  	perf_caps->lowest_perf = low;
>  
> -	cpc_read(ref_perf, &ref);
> +	cpc_read(cpunum, ref_perf, &ref);
>  	perf_caps->reference_perf = ref;
>  
> -	cpc_read(nom_perf, &nom);
> +	cpc_read(cpunum, nom_perf, &nom);
>  	perf_caps->nominal_perf = nom;
>  
>  	if (!ref)
> @@ -819,8 +874,8 @@ int cppc_get_perf_ctrs(int cpunum, struct cppc_perf_fb_ctrs *perf_fb_ctrs)
>  		}
>  	}
>  
> -	cpc_read(delivered_reg, &delivered);
> -	cpc_read(reference_reg, &reference);
> +	cpc_read(cpunum, delivered_reg, &delivered);
> +	cpc_read(cpunum, reference_reg, &reference);
>  
>  	if (!delivered || !reference) {
>  		ret = -EFAULT;
> @@ -875,7 +930,7 @@ int cppc_set_perf(int cpu, struct cppc_perf_ctrls *perf_ctrls)
>  	 * Skip writing MIN/MAX until Linux knows how to come up with
>  	 * useful values.
>  	 */
> -	cpc_write(&desired_reg->cpc_entry.reg, perf_ctrls->desired_perf);
> +	cpc_write(cpu, &desired_reg->cpc_entry.reg, perf_ctrls->desired_perf);
>  
>  	/* Is this a PCC reg ?*/
>  	if (desired_reg->cpc_entry.reg.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/11] acpi: bus: Enable HWP CPPC objects
  2016-08-18 22:36 ` [PATCH 09/11] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
@ 2016-08-20  0:49   ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2016-08-20  0:49 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, peterz, x86, bp, sudeep.holla, ak, linux-acpi,
	linux-pm, alexey.klimov, viresh.kumar, akpm, linux-kernel, lenb,
	tim.c.chen, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

On Thursday, August 18, 2016 03:36:50 PM Srinivas Pandruvada wrote:
> Need to set platform wide _OSC bits to enable CPPC and CPPC version 2.
> If platform supports CPPC, then BIOS exposess CPPC tables.
> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  drivers/acpi/bus.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 85b7d07..61643a5 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -330,6 +330,13 @@ static void acpi_bus_osc_support(void)
>  	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_HOTPLUG_OST_SUPPORT;
>  	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_PCLPI_SUPPORT;
>  
> +#ifdef CONFIG_X86
> +	if (boot_cpu_has(X86_FEATURE_HWP)) {
> +		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_SUPPORT;
> +		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
> +	}
> +#endif

Any chance to use IS_ENABLED() here too?

> +
>  	if (!ghes_disable)
>  		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
>  	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/11] acpi: bus: Set _OSC for diverse core support
  2016-08-18 22:36 ` [PATCH 10/11] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
@ 2016-08-20  0:51   ` Rafael J. Wysocki
  0 siblings, 0 replies; 33+ messages in thread
From: Rafael J. Wysocki @ 2016-08-20  0:51 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, peterz, x86, bp, sudeep.holla, ak, linux-acpi,
	linux-pm, alexey.klimov, viresh.kumar, akpm, linux-kernel, lenb,
	tim.c.chen, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

On Thursday, August 18, 2016 03:36:51 PM Srinivas Pandruvada wrote:
> Set the OSC_SB_CPC_DIVERSE_HIGH_SUPPORT (bit 12) to enable diverse
> core support.
> 
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> ---
>  drivers/acpi/bus.c   | 4 +++-
>  include/linux/acpi.h | 1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index 61643a5..fbd3b7c 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -336,7 +336,9 @@ static void acpi_bus_osc_support(void)
>  		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPCV2_SUPPORT;
>  	}
>  #endif
> -
> +#ifdef CONFIG_SCHED_ITMT
> +	capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;
> +#endif

	if (IS_ENABLED(CONFIG_SCHED_ITMT))
		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_CPC_DIVERSE_HIGH_SUPPORT;

pretty please.

>  	if (!ghes_disable)
>  		capbuf[OSC_SUPPORT_DWORD] |= OSC_SB_APEI_SUPPORT;
>  	if (ACPI_FAILURE(acpi_get_handle(NULL, "\\_SB", &handle)))
> diff --git a/include/linux/acpi.h b/include/linux/acpi.h
> index 4d8452c..17f6e08 100644
> --- a/include/linux/acpi.h
> +++ b/include/linux/acpi.h
> @@ -460,6 +460,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context);
>  #define OSC_SB_CPCV2_SUPPORT			0x00000040
>  #define OSC_SB_PCLPI_SUPPORT			0x00000080
>  #define OSC_SB_OSLPI_SUPPORT			0x00000100
> +#define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT		0x00001000
>  
>  extern bool osc_sb_apei_support_acked;
>  extern bool osc_pc_lpi_support_confirmed;
> 

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-18 22:36 ` [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
@ 2016-08-22  9:01   ` kbuild test robot
  2016-08-22 19:04     ` Tim Chen
  2016-08-24 10:18   ` Ingo Molnar
  1 sibling, 1 reply; 33+ messages in thread
From: kbuild test robot @ 2016-08-22  9:01 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: kbuild-all, mingo, tglx, hpa, rjw, peterz, x86, bp, sudeep.holla,
	ak, linux-acpi, linux-pm, alexey.klimov, viresh.kumar, akpm,
	linux-kernel, lenb, tim.c.chen, srinivas.pandruvada,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

[-- Attachment #1: Type: text/plain, Size: 2735 bytes --]

Hi Tim,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.8-rc3 next-20160822]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160819-101955
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   In file included from arch/x86/include/asm/numa.h:6:0,
                    from arch/x86/include/asm/acpi.h:28,
                    from arch/x86/include/asm/fixmap.h:19,
                    from arch/x86/kernel/apic/apic_noop.c:18:
>> arch/x86/include/asm/topology.h:150:34: error: unknown type name 'sched_core_priority'
    DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
                                     ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/topology.h: In function 'sched_asym_prefer':
>> arch/x86/include/asm/topology.h:154:9: error: implicit declaration of function 'per_cpu' [-Werror=implicit-function-declaration]
     return per_cpu(sched_core_priority, a) > per_cpu(sched_core_priority, b);
            ^~~~~~~
>> arch/x86/include/asm/topology.h:154:17: error: 'sched_core_priority' undeclared (first use in this function)
     return per_cpu(sched_core_priority, a) > per_cpu(sched_core_priority, b);
                    ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/topology.h:154:17: note: each undeclared identifier is reported only once for each function it appears in
   cc1: some warnings being treated as errors

vim +/sched_core_priority +150 arch/x86/include/asm/topology.h

   144	
   145	struct pci_bus;
   146	int x86_pci_root_bus_node(int bus);
   147	void x86_pci_root_bus_resources(int bus, struct list_head *resources);
   148	
   149	#ifdef CONFIG_SCHED_ITMT
 > 150	DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
   151	
   152	static inline bool sched_asym_prefer(int a, int b)
   153	{
 > 154		return per_cpu(sched_core_priority, a) > per_cpu(sched_core_priority, b);
   155	}
   156	
   157	#define sched_asym_prefer sched_asym_prefer

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 55184 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 11/11] cpufreq: intel_pstate: Use CPPC to get max performance
  2016-08-18 22:36 ` [PATCH 11/11] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
@ 2016-08-22 11:59   ` kbuild test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kbuild test robot @ 2016-08-22 11:59 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: kbuild-all, mingo, tglx, hpa, rjw, peterz, x86, bp, sudeep.holla,
	ak, linux-acpi, linux-pm, alexey.klimov, viresh.kumar, akpm,
	linux-kernel, lenb, tim.c.chen, srinivas.pandruvada,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

[-- Attachment #1: Type: text/plain, Size: 4899 bytes --]

Hi Srinivas,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.8-rc3 next-20160822]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
[Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160819-101955
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: i386-randconfig-h0-08210914 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/acpi/cppc_acpi.c: In function 'cpc_read':
>> drivers/acpi/cppc_acpi.c:731:11: error: implicit declaration of function 'readq_relaxed' [-Werror=implicit-function-declaration]
       *val = readq_relaxed(vaddr);
              ^~~~~~~~~~~~~
   drivers/acpi/cppc_acpi.c: In function 'cpc_write':
>> drivers/acpi/cppc_acpi.c:764:4: error: implicit declaration of function 'writeq_relaxed' [-Werror=implicit-function-declaration]
       writeq_relaxed(val, vaddr);
       ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/readq_relaxed +731 drivers/acpi/cppc_acpi.c

beee23ae Prakash, Prashanth  2016-02-17  725  			*val = readw_relaxed(vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  726  			break;
77e3d86f Prakash, Prashanth  2016-02-17  727  		case 32:
beee23ae Prakash, Prashanth  2016-02-17  728  			*val = readl_relaxed(vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  729  			break;
77e3d86f Prakash, Prashanth  2016-02-17  730  		case 64:
beee23ae Prakash, Prashanth  2016-02-17 @731  			*val = readq_relaxed(vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  732  			break;
77e3d86f Prakash, Prashanth  2016-02-17  733  		default:
77e3d86f Prakash, Prashanth  2016-02-17  734  			pr_debug("Error: Cannot read %u bit width from PCC\n",
77e3d86f Prakash, Prashanth  2016-02-17  735  				reg->bit_width);
77e3d86f Prakash, Prashanth  2016-02-17  736  			ret_val = -EFAULT;
77e3d86f Prakash, Prashanth  2016-02-17  737  		}
c39ec8bd Srinivas Pandruvada 2016-08-18  738  	} else if (reg->space_id == ACPI_ADR_SPACE_FIXED_HARDWARE) {
c39ec8bd Srinivas Pandruvada 2016-08-18  739  		ret_val = cpc_read_ffh(cpunum, reg, val);
77e3d86f Prakash, Prashanth  2016-02-17  740  	} else
77e3d86f Prakash, Prashanth  2016-02-17  741  		ret_val = acpi_os_read_memory((acpi_physical_address)reg->address,
337aadff Ashwin Chaugule     2015-10-02  742  					val, reg->bit_width);
77e3d86f Prakash, Prashanth  2016-02-17  743  	return ret_val;
337aadff Ashwin Chaugule     2015-10-02  744  }
337aadff Ashwin Chaugule     2015-10-02  745  
c39ec8bd Srinivas Pandruvada 2016-08-18  746  static int cpc_write(int cpunum, struct cpc_reg *reg, u64 val)
337aadff Ashwin Chaugule     2015-10-02  747  {
77e3d86f Prakash, Prashanth  2016-02-17  748  	int ret_val = 0;
77e3d86f Prakash, Prashanth  2016-02-17  749  
77e3d86f Prakash, Prashanth  2016-02-17  750  	if (reg->space_id == ACPI_ADR_SPACE_PLATFORM_COMM) {
77e3d86f Prakash, Prashanth  2016-02-17  751  		void __iomem *vaddr = GET_PCC_VADDR(reg->address);
337aadff Ashwin Chaugule     2015-10-02  752  
77e3d86f Prakash, Prashanth  2016-02-17  753  		switch (reg->bit_width) {
77e3d86f Prakash, Prashanth  2016-02-17  754  		case 8:
beee23ae Prakash, Prashanth  2016-02-17  755  			writeb_relaxed(val, vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  756  			break;
77e3d86f Prakash, Prashanth  2016-02-17  757  		case 16:
beee23ae Prakash, Prashanth  2016-02-17  758  			writew_relaxed(val, vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  759  			break;
77e3d86f Prakash, Prashanth  2016-02-17  760  		case 32:
beee23ae Prakash, Prashanth  2016-02-17  761  			writel_relaxed(val, vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  762  			break;
77e3d86f Prakash, Prashanth  2016-02-17  763  		case 64:
beee23ae Prakash, Prashanth  2016-02-17 @764  			writeq_relaxed(val, vaddr);
77e3d86f Prakash, Prashanth  2016-02-17  765  			break;
77e3d86f Prakash, Prashanth  2016-02-17  766  		default:
77e3d86f Prakash, Prashanth  2016-02-17  767  			pr_debug("Error: Cannot write %u bit width to PCC\n",

:::::: The code at line 731 was first introduced by commit
:::::: beee23aebc6650609ef1547f6d813fa5065f74aa ACPI / CPPC: replace writeX/readX to PCC with relaxed version

:::::: TO: Prakash, Prashanth <pprakash@codeaurora.org>
:::::: CC: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 22933 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains
  2016-08-18 22:36 ` [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains Srinivas Pandruvada
@ 2016-08-22 13:52   ` Morten Rasmussen
  2016-08-22 19:51     ` Tim Chen
  0 siblings, 1 reply; 33+ messages in thread
From: Morten Rasmussen @ 2016-08-22 13:52 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, rjw, peterz, x86, bp, sudeep.holla, ak,
	linux-acpi, linux-pm, alexey.klimov, viresh.kumar, akpm,
	linux-kernel, lenb, tim.c.chen, paul.gortmaker, jpoimboe, mcgrof,
	jgross, robert.moore, dvyukov, jeyu

On Thu, Aug 18, 2016 at 03:36:42PM -0700, Srinivas Pandruvada wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> The current rebuild_sched_domains will only rebuild the sched domains
> unless the cpumask changes.  However, in some scenarios when the
> topology flag value changes, it will not rebuild the sched domain.
> 
> We create a regenerate_sched_domains function that will always
> rebuild all the sched domains to take care of this scenario.

[...]

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7081,7 +7082,7 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
>  	unregister_sched_domain_sysctl();
>  
>  	/* Let architecture update cpu core mappings. */
> -	new_topology = arch_update_cpu_topology();
> +	new_topology = arch_update_cpu_topology() | need_domain_rebuild;

You can force rebuild_sched_domains() to rebuild the sched_domain
hierarchy by just implementing arch_update_cpu_topology(). Make it
return 1 when you want the hierarchy to be updated.

Implementing another forcing mechanism seems redundant. I must be
missing something?

I just did exactly that to set the SD_ASYM_CPUCAPACITY flag for
big.LITTLE platforms on arm/arm64 as we don't know if the flag should be
set until cpufreq has initialized.

Morten

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-22  9:01   ` kbuild test robot
@ 2016-08-22 19:04     ` Tim Chen
  0 siblings, 0 replies; 33+ messages in thread
From: Tim Chen @ 2016-08-22 19:04 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Srinivas Pandruvada, kbuild-all, mingo, tglx, hpa, rjw, peterz,
	x86, bp, sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, paul.gortmaker, jpoimboe,
	mcgrof, jgross, robert.moore, dvyukov, jeyu

On Mon, Aug 22, 2016 at 05:01:15PM +0800, kbuild test robot wrote:
> Hi Tim,
> 
> [auto build test ERROR on pm/linux-next]
> [also build test ERROR on v4.8-rc3 next-20160822]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> [Suggest to use git(>=2.9.0) format-patch --base=<commit> (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on]
> [Check https://git-scm.com/docs/git-format-patch for more information]
> 
> url:    https://github.com/0day-ci/linux/commits/Srinivas-Pandruvada/Support-Intel-Turbo-Boost-Max-Technology-3-0/20160819-101955
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
> config: i386-allyesconfig (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386 
> 

Should be fixed by the patch below.  Will incorporate the fix in next
rev of the patch series.

Tim

---

diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index f148843..bcf3e85 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -147,6 +147,8 @@ int x86_pci_root_bus_node(int bus);
 void x86_pci_root_bus_resources(int bus, struct list_head *resources);
 
 #ifdef CONFIG_SCHED_ITMT
+#include <asm/percpu.h>
+
 DECLARE_PER_CPU_READ_MOSTLY(int, sched_core_priority);
 
 static inline bool sched_asym_prefer(int a, int b)

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains
  2016-08-22 13:52   ` Morten Rasmussen
@ 2016-08-22 19:51     ` Tim Chen
  0 siblings, 0 replies; 33+ messages in thread
From: Tim Chen @ 2016-08-22 19:51 UTC (permalink / raw)
  To: Morten Rasmussen, Srinivas Pandruvada
  Cc: mingo, tglx, hpa, rjw, peterz, x86, bp, sudeep.holla, ak,
	linux-acpi, linux-pm, alexey.klimov, viresh.kumar, akpm,
	linux-kernel, lenb, paul.gortmaker, jpoimboe, mcgrof, jgross,
	robert.moore, dvyukov, jeyu

On Mon, 2016-08-22 at 14:52 +0100, Morten Rasmussen wrote:
> On Thu, Aug 18, 2016 at 03:36:42PM -0700, Srinivas Pandruvada wrote:
> > 
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > The current rebuild_sched_domains will only rebuild the sched domains
> > unless the cpumask changes.  However, in some scenarios when the
> > topology flag value changes, it will not rebuild the sched domain.
> > 
> > We create a regenerate_sched_domains function that will always
> > rebuild all the sched domains to take care of this scenario.
> [...]
> 
> > 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7081,7 +7082,7 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[],
> >  	unregister_sched_domain_sysctl();
> >  
> >  	/* Let architecture update cpu core mappings. */
> > -	new_topology = arch_update_cpu_topology();
> > +	new_topology = arch_update_cpu_topology() | need_domain_rebuild;
> You can force rebuild_sched_domains() to rebuild the sched_domain
> hierarchy by just implementing arch_update_cpu_topology(). Make it
> return 1 when you want the hierarchy to be updated.
> 
> Implementing another forcing mechanism seems redundant. I must be
> missing something?

Sure, I'll take a look at using arch_update_cpu_topology. 

Thanks.

Tim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-18 22:36 ` [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
  2016-08-22  9:01   ` kbuild test robot
@ 2016-08-24 10:18   ` Ingo Molnar
  2016-08-24 17:50     ` Tim Chen
  1 sibling, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2016-08-24 10:18 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, rjw, peterz, x86, bp, sudeep.holla, ak,
	linux-acpi, linux-pm, alexey.klimov, viresh.kumar, akpm,
	linux-kernel, lenb, tim.c.chen, paul.gortmaker, jpoimboe, mcgrof,
	jgross, robert.moore, dvyukov, jeyu, Peter Zijlstra


* Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:

> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> On some Intel cores, they can boosted to a higher turbo frequency than
> the other cores on the same die.  So we prefer processes to be run on
> them vs other lower frequency ones for extra performance.
> 
> We extend the asym packing feature in the scheduler to support packing
> task to the higher frequency core at the core sched domain level.
> 
> We set up a core priority metric to abstract the core preferences based
> on the maximum boost frequency.  The priority is instantiated such that
> the core with a higher priority is favored over the core with lower
> priority when making scheduling decision using ASYM_PACKING.  The smt
> threads that are of higher number are discounted in their priority so
> we will not try to pack tasks onto all the threads of a favored core
> before using other cpu cores.  The cpu that's of the highese priority
> in a sched_group is recorded in sched_group->asym_prefer_cpu during
> initialization to save lookup during load balancing.
> 
> A sysctl variable /proc/sys/kernel/sched_itmt_enabled is provided so
> the scheduling based on favored core can be turned on or off at run time.

> +/*
> + * Boolean to control whether we want to move processes to cpu capable
> + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
> + * Technology 3.0.
> + *
> + * It can be set via /proc/sys/kernel/sched_itmt_enabled
> + */
> +unsigned int __read_mostly sysctl_sched_itmt_enabled = 0;

Ugh, no.

We don't add features to the scheduler in the hope that they might or might not 
help. We either enable a new feature by default (and make damn sure it helps!),
or don't add the feature at all.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-24 10:18   ` Ingo Molnar
@ 2016-08-24 17:50     ` Tim Chen
  2016-08-24 18:08       ` Ingo Molnar
  0 siblings, 1 reply; 33+ messages in thread
From: Tim Chen @ 2016-08-24 17:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, peterz, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, paul.gortmaker, jpoimboe,
	mcgrof, jgross, robert.moore, dvyukov, jeyu, Peter Zijlstra

On Wed, Aug 24, 2016 at 12:18:53PM +0200, Ingo Molnar wrote:
> 
> * Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote:
> 
> > From: Tim Chen <tim.c.chen@linux.intel.com>
> > 
> > On some Intel cores, they can boosted to a higher turbo frequency than
> > the other cores on the same die.  So we prefer processes to be run on
> > them vs other lower frequency ones for extra performance.
> > 
> > We extend the asym packing feature in the scheduler to support packing
> > task to the higher frequency core at the core sched domain level.
> > 
> > We set up a core priority metric to abstract the core preferences based
> > on the maximum boost frequency.  The priority is instantiated such that
> > the core with a higher priority is favored over the core with lower
> > priority when making scheduling decision using ASYM_PACKING.  The smt
> > threads that are of higher number are discounted in their priority so
> > we will not try to pack tasks onto all the threads of a favored core
> > before using other cpu cores.  The cpu that's of the highese priority
> > in a sched_group is recorded in sched_group->asym_prefer_cpu during
> > initialization to save lookup during load balancing.
> > 
> > A sysctl variable /proc/sys/kernel/sched_itmt_enabled is provided so
> > the scheduling based on favored core can be turned on or off at run time.
> 
> > +/*
> > + * Boolean to control whether we want to move processes to cpu capable
> > + * of higher turbo frequency for cpus supporting Intel Turbo Boost Max
> > + * Technology 3.0.
> > + *
> > + * It can be set via /proc/sys/kernel/sched_itmt_enabled
> > + */
> > +unsigned int __read_mostly sysctl_sched_itmt_enabled = 0;
> 
> Ugh, no.
> 
> We don't add features to the scheduler in the hope that they might or might not 
> help. We either enable a new feature by default (and make damn sure it helps!),
> or don't add the feature at all.
> 
> Thanks,
> 
> 	Ingo

Ingo,

This feature will be a clear benefit for client machines and
less clear on servers.

This feature is most beneficial to single threaded workload running on
a single socket that operates in mostly Turbo mode.  Client platform
like Broadwell High End Desktop is the first one that supports it.
Enablng this feature for such platform by default will be a win as it
runs single threaded workload much of the time (10%-15% peformance
upside).

On the other hand, a heavily loaded server that rarely operates in Turbo
mode will benefit much less from this feature.  There is some overhead
incurred by migrating load to the favored cores.  Some server folks
have asked us to be cautious here and not to turn on ITMT scheduling
by default.   Even so, when the server is lightly loaded, this feature
can still be a win.  That said, this is future looking as we don't have
any server with this feature today.

So if we take the approach to enable this feature by default for only
single node system (using that as a criteria for client), will that seem
reasonable to you?

Thanks.

Tim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-24 17:50     ` Tim Chen
@ 2016-08-24 18:08       ` Ingo Molnar
  2016-08-24 18:22         ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Ingo Molnar @ 2016-08-24 18:08 UTC (permalink / raw)
  To: Tim Chen
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, peterz, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, paul.gortmaker, jpoimboe,
	mcgrof, jgross, robert.moore, dvyukov, jeyu, Peter Zijlstra


* Tim Chen <tim.c.chen@linux.intel.com> wrote:

> Ingo,
> 
> This feature will be a clear benefit for client machines and
> less clear on servers.
> 
> This feature is most beneficial to single threaded workload running on
> a single socket that operates in mostly Turbo mode.  Client platform
> like Broadwell High End Desktop is the first one that supports it.
> Enablng this feature for such platform by default will be a win as it
> runs single threaded workload much of the time (10%-15% peformance
> upside).
> 
> On the other hand, a heavily loaded server that rarely operates in Turbo
> mode will benefit much less from this feature.  There is some overhead
> incurred by migrating load to the favored cores.  Some server folks
> have asked us to be cautious here and not to turn on ITMT scheduling
> by default.   Even so, when the server is lightly loaded, this feature
> can still be a win.  That said, this is future looking as we don't have
> any server with this feature today.
> 
> So if we take the approach to enable this feature by default for only
> single node system (using that as a criteria for client), will that seem
> reasonable to you?

I suppose that would work. Peter, any objections to such an approach?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology
  2016-08-24 18:08       ` Ingo Molnar
@ 2016-08-24 18:22         ` Peter Zijlstra
  0 siblings, 0 replies; 33+ messages in thread
From: Peter Zijlstra @ 2016-08-24 18:22 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Tim Chen, Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, paul.gortmaker, jpoimboe,
	mcgrof, jgross, robert.moore, dvyukov, jeyu

On Wed, Aug 24, 2016 at 08:08:54PM +0200, Ingo Molnar wrote:
> > 
> > So if we take the approach to enable this feature by default for only
> > single node system (using that as a criteria for client), will that seem
> > reasonable to you?
> 
> I suppose that would work. Peter, any objections to such an approach?

Works for me.

Thanks!

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-18 22:36 ` [PATCH 03/11] sched: Extend scheduler's asym packing Srinivas Pandruvada
@ 2016-08-25 11:22   ` Morten Rasmussen
  2016-08-25 11:45     ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Morten Rasmussen @ 2016-08-25 11:22 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: mingo, tglx, hpa, rjw, peterz, x86, bp, sudeep.holla, ak,
	linux-acpi, linux-pm, alexey.klimov, viresh.kumar, akpm,
	linux-kernel, lenb, tim.c.chen, paul.gortmaker, jpoimboe, mcgrof,
	jgross, robert.moore, dvyukov, jeyu

On Thu, Aug 18, 2016 at 03:36:44PM -0700, Srinivas Pandruvada wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> 
> We generalize the scheduler's asym packing to provide an
> ordering of the cpu beyond just the cpu number.  This allows
> the use of the ASYM_PACKING scheduler machinery to move
> loads to prefered CPU in a sched domain based on a preference
> defined by sched_asym_prefer function.
> 
> We also record the most preferred cpu in a sched group when
> we build the cpu's capacity for fast lookup of preferred cpu
> during load balancing.
> 
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>

[...]

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index c64fc51..75e1002 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -532,6 +532,22 @@ struct dl_rq {
>  
>  #ifdef CONFIG_SMP
>  
> +#ifndef sched_asym_prefer
> +
> +/* For default ASYM_PACKING, lower numbered cpu is prefered */
> +static inline bool sched_asym_prefer(int a, int b)
> +{
> +	return a < b;
> +}
> +
> +#endif /* sched_asym_prefer */

Isn't this a very significant change in the interface between
architecture and the scheduler?

If I'm not mistaken, our current interface is quite strict when it comes
to information passed from the architecture into the scheduler. We allow
'topology' flags, but not behavioural flags, to be set by the
architecture, and the architecture can expose current and max cpu
capacities through the arch_scale_*() functions. For NUMA, we can expose
'distance' between nodes (and more?).

These are meant to describe the system topology to the scheduler, so it
can make better decisions on its own. sched_asym_prefer() is is not only
affecting scheduler behaviour, it is handing off scheduling decisions to
architecture code. In essence allowing logic to be plugged into the
scheduler, although with a somewhat limited scope of impact.

Should this been seen as the architecture/scheduler is up for revision
and we will start allowing architecture code to plug in function to
affect scheduling behaviour?

I haven't reviewed the entire patch set in detail, but why can't the cpu
priority list be handed to the scheduler instead of moving scheduling
decisions out of the scheduler?

Isn't it possible to express the cpu 'priority' as different cpu
capacities instead? Without understanding the details of ITMT, it seems
to me that what you really have is different cpu compute capacities, and
that is what we have cpu capacity for.

Is the intention long term to change the cpu priority order on the fly,
otherwise I don't see why you would put the logic in architecture code?

Finally, the existing callback functions from the scheduler to
architecture code are prefixed with arch_, I think this one should do
the same to make it clear that this function may be implemented by
generic scheduler code.

Thanks,
Morten

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-25 11:22   ` Morten Rasmussen
@ 2016-08-25 11:45     ` Peter Zijlstra
  2016-08-25 13:18       ` Morten Rasmussen
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-08-25 11:45 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

On Thu, Aug 25, 2016 at 12:22:52PM +0100, Morten Rasmussen wrote:
> I haven't reviewed the entire patch set in detail, but why can't the cpu
> priority list be handed to the scheduler instead of moving scheduling
> decisions out of the scheduler?

It basically does that. All that we allow here is the architecture to
override the default order of what is considered priority.

The default (as per Power7) is naked cpu number, with lower cpu numbers
having higher priority to higher numbers.

This patch set allows the architecture to provide a less_than operator
(and through that a custom order).

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-25 11:45     ` Peter Zijlstra
@ 2016-08-25 13:18       ` Morten Rasmussen
  2016-08-25 13:45         ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Morten Rasmussen @ 2016-08-25 13:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

On Thu, Aug 25, 2016 at 01:45:22PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 25, 2016 at 12:22:52PM +0100, Morten Rasmussen wrote:
> > I haven't reviewed the entire patch set in detail, but why can't the cpu
> > priority list be handed to the scheduler instead of moving scheduling
> > decisions out of the scheduler?
> 
> It basically does that. All that we allow here is the architecture to
> override the default order of what is considered priority.
> 
> The default (as per Power7) is naked cpu number, with lower cpu numbers
> having higher priority to higher numbers.
> 
> This patch set allows the architecture to provide a less_than operator
> (and through that a custom order).

But why not just pass the customized list into the scheduler? Seems
simpler?

A custom less_than operator opens up for potential exploitation that is
much more complicated than what is shown in this patch set. Paired up
with the utilization signals, which are now exposed outside the
scheduler through cpufreq, you can start making complex load-balancing
decisions outside the scheduler using the this new interface.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-25 13:18       ` Morten Rasmussen
@ 2016-08-25 13:45         ` Peter Zijlstra
  2016-08-26 10:39           ` Morten Rasmussen
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-08-25 13:45 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

On Thu, Aug 25, 2016 at 02:18:37PM +0100, Morten Rasmussen wrote:

> But why not just pass the customized list into the scheduler? Seems
> simpler?

Mostly because I didn't want to regress Power I suppose. The ITMT stuff
needs an extra load, whereas the Power stuff can use the CPU number we
already have.

Also, since we need an interface to pass in this custom list, I don't
see the distinction, you can do the same manipulation by constantly
updating the prio list.

But not of this stuff should be EXPORT'ed, so its only available to the
core kernel, which greatly limits the potential for abuse. We can see
arch code just fine.

And if you spin a custom kernel, you can already wreck the load
balancer.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-25 13:45         ` Peter Zijlstra
@ 2016-08-26 10:39           ` Morten Rasmussen
  2016-08-26 12:42             ` Peter Zijlstra
  0 siblings, 1 reply; 33+ messages in thread
From: Morten Rasmussen @ 2016-08-26 10:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

On Thu, Aug 25, 2016 at 03:45:03PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 25, 2016 at 02:18:37PM +0100, Morten Rasmussen wrote:
> 
> > But why not just pass the customized list into the scheduler? Seems
> > simpler?
> 
> Mostly because I didn't want to regress Power I suppose. The ITMT stuff
> needs an extra load, whereas the Power stuff can use the CPU number we
> already have.

The customized list wouldn't have to be mandatory. You could easily
create a default list that would match current behaviour for Power.

To pass in a custom list of priorities you could either extend struct
sched_domain_topology_level to have another function pointer that
returns the cpu priority, or introduce an arch_cpu_priotity() function.
Either of them could be used in the sched_domain hierarchy to set the
sched_group priority cpu and if you add a rq->cpu_priority, the
asymmetric packing comparison would be a simple comparison between
rq->cpu_priority of the two cpus in question.

What is the 'extra load' needed for ITMT? Isn't it just a priority list,
or does the absolute priority value have a meaning? I only saw it used
for less_than comparison, maybe I missed it.

If you need to express the difference in compute capability, why not use
capacity?

> Also, since we need an interface to pass in this custom list, I don't
> see the distinction, you can do the same manipulation by constantly
> updating the prio list.

Sure, but the overhead of rebuilding the sched_domain hierarchy is huge
compared to just tweaking the result of the less_than operator that get
called from the scheduler frequently. However, updating
group_priority_cpu() would require a rebuild too in this patch set.

> But not of this stuff should be EXPORT'ed, so its only available to the
> core kernel, which greatly limits the potential for abuse. We can see
> arch code just fine.

I don't see why it can't be wired up to be controlled by entities
outside arch code, e.g. cpufreq or the thermal framework, or even code
outside the kernel (firmware).

> And if you spin a custom kernel, you can already wreck the load
> balancer.

You can wreck any software where you have the source code and a compiler
:)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-26 10:39           ` Morten Rasmussen
@ 2016-08-26 12:42             ` Peter Zijlstra
  2016-08-26 17:25               ` Tim Chen
  0 siblings, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2016-08-26 12:42 UTC (permalink / raw)
  To: Morten Rasmussen
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, tim.c.chen,
	paul.gortmaker, jpoimboe, mcgrof, jgross, robert.moore, dvyukov,
	jeyu

On Fri, Aug 26, 2016 at 11:39:46AM +0100, Morten Rasmussen wrote:
> On Thu, Aug 25, 2016 at 03:45:03PM +0200, Peter Zijlstra wrote:
> > On Thu, Aug 25, 2016 at 02:18:37PM +0100, Morten Rasmussen wrote:
> > 
> > > But why not just pass the customized list into the scheduler? Seems
> > > simpler?
> > 
> > Mostly because I didn't want to regress Power I suppose. The ITMT stuff
> > needs an extra load, whereas the Power stuff can use the CPU number we
> > already have.
> 
> The customized list wouldn't have to be mandatory. You could easily
> create a default list that would match current behaviour for Power.

Sure, but then you have the extra load.. probably not an issue but
still.

> What is the 'extra load' needed for ITMT? Isn't it just a priority list,
> or does the absolute priority value have a meaning? I only saw it used
> for less_than comparison, maybe I missed it.

LOAD as in a memop, we need to go fetch the priority from wherever we
put it in memory, be it rq->cpu_priority or a percpu variable on its
own.

> If you need to express the difference in compute capability, why not use
> capacity?

Doesn't work, capacity is actually equal with these things.

Think of one core having more turbo range when thermals allow it. But
the moment you run multiple cores the thermal head-room dissipates and
they all end up running at more or less the same (lower) frequency.

All of this asym/prio stuff only matters when cores (Power) / packages
(Intel) are mostly idle.

On Power SMT0 can go faster than SMT7 when all other siblings are idle,
with ITMT some core can go faster than other when the rest is idle.

I suppose we _could_ model it with a dynamic capacity value, but last
time I looked at that it made my head hurt.

> > Also, since we need an interface to pass in this custom list, I don't
> > see the distinction, you can do the same manipulation by constantly
> > updating the prio list.
> 
> Sure, but the overhead of rebuilding the sched_domain hierarchy is huge
> compared to just tweaking the result of the less_than operator that get
> called from the scheduler frequently. However, updating
> group_priority_cpu() would require a rebuild too in this patch set.

You don't actually need to rebuild the domains to change the priorities.
We only need to rebuild the domains when we add/remove SD_ASYM_PACKING.

Yes, the sched_group::asym_prefer_cpu thing is tedious, but you could
actually update that without a rebuild if one wanted.

Note that there's actually a semi useful use case for dynamically
updating the cpu priorities: core hopping.

  https://www.researchgate.net/publication/279915789_Evaluation_of_Core_Hopping_on_POWER7

Again, that's something only relevant to mostly idle packages.

> > But not of this stuff should be EXPORT'ed, so its only available to the
> > core kernel, which greatly limits the potential for abuse. We can see
> > arch code just fine.
> 
> I don't see why it can't be wired up to be controlled by entities
> outside arch code, e.g. cpufreq or the thermal framework, or even code
> outside the kernel (firmware).

I suppose an arch could do that, but then we'd see that, wouldn't we?

The firmware and kernel would need to co-ordinate where the prio value
lives, which is not something trivially done. And even if the value
lives in rq->cpu_priority, it _could_ do that.


In any case, I don't feel too strongly about this, if you want to stick
the value in rq->cpu_priority and have Power use that we can do that I
suppose.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-26 12:42             ` Peter Zijlstra
@ 2016-08-26 17:25               ` Tim Chen
  2016-08-26 23:14                 ` Tim Chen
  0 siblings, 1 reply; 33+ messages in thread
From: Tim Chen @ 2016-08-26 17:25 UTC (permalink / raw)
  To: Peter Zijlstra, Morten Rasmussen
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, paul.gortmaker, jpoimboe,
	mcgrof, jgross, robert.moore, dvyukov, jeyu

On Fri, 2016-08-26 at 14:42 +0200, Peter Zijlstra wrote:
> On Fri, Aug 26, 2016 at 11:39:46AM +0100, Morten Rasmussen wrote:
> > 
> > On Thu, Aug 25, 2016 at 03:45:03PM +0200, Peter Zijlstra wrote:
> > > 
> > > On Thu, Aug 25, 2016 at 02:18:37PM +0100, Morten Rasmussen wrote:
> > > 
> > > > 
> > > > But why not just pass the customized list into the scheduler? Seems
> > > > simpler?
> > > Mostly because I didn't want to regress Power I suppose. The ITMT stuff
> > > needs an extra load, whereas the Power stuff can use the CPU number we
> > > already have.
> > The customized list wouldn't have to be mandatory. You could easily
> > create a default list that would match current behaviour for Power.
> Sure, but then you have the extra load.. probably not an issue but
> still.
> 
> > 
> > What is the 'extra load' needed for ITMT? Isn't it just a priority list,
> > or does the absolute priority value have a meaning? I only saw it used
> > for less_than comparison, maybe I missed it.
> LOAD as in a memop, we need to go fetch the priority from wherever we
> put it in memory, be it rq->cpu_priority or a percpu variable on its
> own.
> 
> > 
> > If you need to express the difference in compute capability, why not use
> > capacity?
> Doesn't work, capacity is actually equal with these things.
> 
> Think of one core having more turbo range when thermals allow it. But
> the moment you run multiple cores the thermal head-room dissipates and
> they all end up running at more or less the same (lower) frequency.
> 
> All of this asym/prio stuff only matters when cores (Power) / packages
> (Intel) are mostly idle.
> 
> On Power SMT0 can go faster than SMT7 when all other siblings are idle,
> with ITMT some core can go faster than other when the rest is idle.
> 
> I suppose we _could_ model it with a dynamic capacity value, but last
> time I looked at that it made my head hurt.
> 
> > 
> > > 
> > > Also, since we need an interface to pass in this custom list, I don't
> > > see the distinction, you can do the same manipulation by constantly
> > > updating the prio list.
> > Sure, but the overhead of rebuilding the sched_domain hierarchy is huge
> > compared to just tweaking the result of the less_than operator that get
> > called from the scheduler frequently. However, updating
> > group_priority_cpu() would require a rebuild too in this patch set.
> You don't actually need to rebuild the domains to change the priorities.
> We only need to rebuild the domains when we add/remove SD_ASYM_PACKING.
> 
> Yes, the sched_group::asym_prefer_cpu thing is tedious, but you could
> actually update that without a rebuild if one wanted.
> 
> Note that there's actually a semi useful use case for dynamically
> updating the cpu priorities: core hopping.
> 
>   https://www.researchgate.net/publication/279915789_Evaluation_of_Core_Hopping_on_POWER7
> 
> Again, that's something only relevant to mostly idle packages.
> 
> > 
> > > 
> > > But not of this stuff should be EXPORT'ed, so its only available to the
> > > core kernel, which greatly limits the potential for abuse. We can see
> > > arch code just fine.
> > I don't see why it can't be wired up to be controlled by entities
> > outside arch code, e.g. cpufreq or the thermal framework, or even code
> > outside the kernel (firmware).
> I suppose an arch could do that, but then we'd see that, wouldn't we?
> 
> The firmware and kernel would need to co-ordinate where the prio value
> lives, which is not something trivially done. And even if the value
> lives in rq->cpu_priority, it _could_ do that.
> 
> 
> In any case, I don't feel too strongly about this, if you want to stick
> the value in rq->cpu_priority and have Power use that we can do that I
> suppose.

This will mean increasing the rq structure for power pc.

I guess some compile flag to decide if this cpu_priority field should be
in rq. Something like
COFIG_SCHED_ITMT || ((CONFIG_PPC64 || CONFIG_PPC32) && CONFIG_SCHED_SMT))?

And I will need code to power pc to instantiate rp->cpu_priority on boot.

This gets somewhat ugly.

I prefer the other alternative Morten suggested by
having an arch_cpu_asym_priority() function. It is cleaner
without increasing size or rq structure.

I can define for default lower cpu having higher priority:

int __weak arch_cpu_asym_priority(int cpu)
{
        return -cpu;
}

and then define it appropriately for x86 when ITMT is used.

Tim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/11] sched: Extend scheduler's asym packing
  2016-08-26 17:25               ` Tim Chen
@ 2016-08-26 23:14                 ` Tim Chen
  0 siblings, 0 replies; 33+ messages in thread
From: Tim Chen @ 2016-08-26 23:14 UTC (permalink / raw)
  To: Peter Zijlstra, Morten Rasmussen
  Cc: Srinivas Pandruvada, mingo, tglx, hpa, rjw, x86, bp,
	sudeep.holla, ak, linux-acpi, linux-pm, alexey.klimov,
	viresh.kumar, akpm, linux-kernel, lenb, paul.gortmaker, jpoimboe,
	mcgrof, jgross, robert.moore, dvyukov, jeyu

On Fri, Aug 26, 2016 at 10:25:38AM -0700, Tim Chen wrote:
> On Fri, 2016-08-26 at 14:42 +0200, Peter Zijlstra wrote:
> > On Fri, Aug 26, 2016 at 11:39:46AM +0100, Morten Rasmussen wrote:
> > > 
> > > On Thu, Aug 25, 2016 at 03:45:03PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > On Thu, Aug 25, 2016 at 02:18:37PM +0100, Morten Rasmussen wrote:
> > > > 
> > > > > 
> > > > > But why not just pass the customized list into the scheduler? Seems
> > > > > simpler?
> > > > Mostly because I didn't want to regress Power I suppose. The ITMT stuff
> > > > needs an extra load, whereas the Power stuff can use the CPU number we
> > > > already have.
> > > The customized list wouldn't have to be mandatory. You could easily
> > > create a default list that would match current behaviour for Power.
> > Sure, but then you have the extra load.. probably not an issue but
> > still.
> > 
> > > 
> > > What is the 'extra load' needed for ITMT? Isn't it just a priority list,
> > > or does the absolute priority value have a meaning? I only saw it used
> > > for less_than comparison, maybe I missed it.
> > LOAD as in a memop, we need to go fetch the priority from wherever we
> > put it in memory, be it rq->cpu_priority or a percpu variable on its
> > own.
> > 
> > > 
> > > If you need to express the difference in compute capability, why not use
> > > capacity?
> > Doesn't work, capacity is actually equal with these things.
> > 
> > Think of one core having more turbo range when thermals allow it. But
> > the moment you run multiple cores the thermal head-room dissipates and
> > they all end up running at more or less the same (lower) frequency.
> > 
> > All of this asym/prio stuff only matters when cores (Power) / packages
> > (Intel) are mostly idle.
> > 
> > On Power SMT0 can go faster than SMT7 when all other siblings are idle,
> > with ITMT some core can go faster than other when the rest is idle.
> > 
> > I suppose we _could_ model it with a dynamic capacity value, but last
> > time I looked at that it made my head hurt.
> > 
> > > 
> > > > 
> > > > Also, since we need an interface to pass in this custom list, I don't
> > > > see the distinction, you can do the same manipulation by constantly
> > > > updating the prio list.
> > > Sure, but the overhead of rebuilding the sched_domain hierarchy is huge
> > > compared to just tweaking the result of the less_than operator that get
> > > called from the scheduler frequently. However, updating
> > > group_priority_cpu() would require a rebuild too in this patch set.
> > You don't actually need to rebuild the domains to change the priorities.
> > We only need to rebuild the domains when we add/remove SD_ASYM_PACKING.
> > 
> > Yes, the sched_group::asym_prefer_cpu thing is tedious, but you could
> > actually update that without a rebuild if one wanted.
> > 
> > Note that there's actually a semi useful use case for dynamically
> > updating the cpu priorities: core hopping.
> > 
> >   https://www.researchgate.net/publication/279915789_Evaluation_of_Core_Hopping_on_POWER7
> > 
> > Again, that's something only relevant to mostly idle packages.
> > 
> > > 
> > > > 
> > > > But not of this stuff should be EXPORT'ed, so its only available to the
> > > > core kernel, which greatly limits the potential for abuse. We can see
> > > > arch code just fine.
> > > I don't see why it can't be wired up to be controlled by entities
> > > outside arch code, e.g. cpufreq or the thermal framework, or even code
> > > outside the kernel (firmware).
> > I suppose an arch could do that, but then we'd see that, wouldn't we?
> > 
> > The firmware and kernel would need to co-ordinate where the prio value
> > lives, which is not something trivially done. And even if the value
> > lives in rq->cpu_priority, it _could_ do that.
> > 
> > 
> > In any case, I don't feel too strongly about this, if you want to stick
> > the value in rq->cpu_priority and have Power use that we can do that I
> > suppose.
> 
> This will mean increasing the rq structure for power pc.
> 
> I guess some compile flag to decide if this cpu_priority field should be
> in rq. Something like
> COFIG_SCHED_ITMT || ((CONFIG_PPC64 || CONFIG_PPC32) && CONFIG_SCHED_SMT))?
> 
> And I will need code to power pc to instantiate rp->cpu_priority on boot.
> 
> This gets somewhat ugly.
> 
> I prefer the other alternative Morten suggested by
> having an arch_cpu_asym_priority() function. It is cleaner
> without increasing size or rq structure.
> 
> I can define for default lower cpu having higher priority:
> 
> int __weak arch_cpu_asym_priority(int cpu)
> {
>         return -cpu;
> }
> 
> and then define it appropriately for x86 when ITMT is used.
> 
> Tim
> 

Morten & Peter,

If the patch is updated as below to use arch_asym_cpu_priority,
will that be okay with you?

Tim

---cut---
Subject: sched: Extend scheduler's asym packing

We generalize the scheduler's asym packing to provide an ordering
of the cpu beyond just the cpu number.  This allows the use of the
ASYM_PACKING scheduler machinery to move loads to prefered CPU in a
sched domain. The preference is defined with the cpu priority
given by arch_asym_cpu_priority(cpu).

We also record the most preferred cpu in a sched group when
we build the cpu's capacity for fast lookup of preferred cpu
during load balancing.

v2:
1. Use arch_asym_cpu_priority() to provide cpu priority
value used for asym packing to the scheduler.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 18 ++++++++++++++++++
 kernel/sched/fair.c   | 35 ++++++++++++++++++++++++-----------
 kernel/sched/sched.h  | 12 ++++++++++++
 4 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..aeea288 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1052,6 +1052,8 @@ static inline int cpu_numa_flags(void)
 }
 #endif
 
+int arch_asym_cpu_priority(int cpu);
+
 struct sched_domain_attr {
 	int relax_domain_level;
 };
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e86c4a5..08135ca 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6237,7 +6237,25 @@ static void init_sched_groups_capacity(int cpu, struct sched_domain *sd)
 	WARN_ON(!sg);
 
 	do {
+		int cpu, max_cpu = -1, prev_cpu = -1;
+
 		sg->group_weight = cpumask_weight(sched_group_cpus(sg));
+
+		if (!(sd->flags & SD_ASYM_PACKING))
+			goto next;
+
+		for_each_cpu(cpu, sched_group_cpus(sg)) {
+			if (prev_cpu < 0) {
+				prev_cpu = cpu;
+				max_cpu = cpu;
+			} else {
+				if (sched_asym_prefer(cpu, max_cpu))
+					max_cpu = cpu;
+			}
+		}
+		sg->asym_prefer_cpu = max_cpu;
+
+next:
 		sg = sg->next;
 	} while (sg != sd->groups);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 039de34..4976b99 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -100,6 +100,16 @@ const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
  */
 unsigned int __read_mostly sysctl_sched_shares_window = 10000000UL;
 
+#ifdef CONFIG_SMP
+/*
+ * For asym packing, by default the lower numbered cpu has higher priority.
+ */
+int __weak arch_asym_cpu_priority(int cpu)
+{
+	return -cpu;
+}
+#endif
+
 #ifdef CONFIG_CFS_BANDWIDTH
 /*
  * Amount of runtime to allocate from global (tg) to local (per-cfs_rq) pool
@@ -6862,16 +6872,18 @@ static bool update_sd_pick_busiest(struct lb_env *env,
 	if (env->idle == CPU_NOT_IDLE)
 		return true;
 	/*
-	 * ASYM_PACKING needs to move all the work to the lowest
-	 * numbered CPUs in the group, therefore mark all groups
-	 * higher than ourself as busy.
+	 * ASYM_PACKING needs to move all the work to the highest
+	 * prority CPUs in the group, therefore mark all groups
+	 * of lower priority than ourself as busy.
 	 */
-	if (sgs->sum_nr_running && env->dst_cpu < group_first_cpu(sg)) {
+	if (sgs->sum_nr_running &&
+	    sched_asym_prefer(env->dst_cpu, group_priority_cpu(sg))) {
 		if (!sds->busiest)
 			return true;
 
-		/* Prefer to move from highest possible cpu's work */
-		if (group_first_cpu(sds->busiest) < group_first_cpu(sg))
+		/* Prefer to move from lowest priority cpu's work */
+		if (sched_asym_prefer(group_priority_cpu(sds->busiest),
+				      group_priority_cpu(sg)))
 			return true;
 	}
 
@@ -7023,8 +7035,8 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
 	if (!sds->busiest)
 		return 0;
 
-	busiest_cpu = group_first_cpu(sds->busiest);
-	if (env->dst_cpu > busiest_cpu)
+	busiest_cpu = group_priority_cpu(sds->busiest);
+	if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
 		return 0;
 
 	env->imbalance = DIV_ROUND_CLOSEST(
@@ -7365,10 +7377,11 @@ static int need_active_balance(struct lb_env *env)
 
 		/*
 		 * ASYM_PACKING needs to force migrate tasks from busy but
-		 * higher numbered CPUs in order to pack all tasks in the
-		 * lowest numbered CPUs.
+		 * lower priority CPUs in order to pack all tasks in the
+		 * highest priority CPUs.
 		 */
-		if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+		if ((sd->flags & SD_ASYM_PACKING) &&
+		    sched_asym_prefer(env->dst_cpu, env->src_cpu))
 			return 1;
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c64fc51..cc2d35f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -532,6 +532,17 @@ struct dl_rq {
 
 #ifdef CONFIG_SMP
 
+static inline bool sched_asym_prefer(int a, int b)
+{
+	return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
+}
+
+/*
+ * Return lowest numbered cpu in the group as the most prefered cpu
+ * for ASYM_PACKING for default case.
+ */
+#define group_priority_cpu(group) group->asym_prefer_cpu
+
 /*
  * We add the notion of a root-domain which will be used to define per-domain
  * variables. Each exclusive cpuset essentially defines an island domain by
@@ -884,6 +895,7 @@ struct sched_group {
 
 	unsigned int group_weight;
 	struct sched_group_capacity *sgc;
+	int asym_prefer_cpu;		/* cpu of highest priority in group */
 
 	/*
 	 * The CPUs this group covers.
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-08-26 23:15 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18 22:36 [PATCH 00/11] Support Intel® Turbo Boost Max Technology 3.0 Srinivas Pandruvada
2016-08-18 22:36 ` [PATCH 01/11] sched, cpuset: Add regenerate_sched_domains function to rebuild all sched domains Srinivas Pandruvada
2016-08-22 13:52   ` Morten Rasmussen
2016-08-22 19:51     ` Tim Chen
2016-08-18 22:36 ` [PATCH 02/11] sched, x86: Add SD_ASYM_PACKING flags to x86 cpu topology for cpus supporting Intel Turbo Boost Max Technology Srinivas Pandruvada
2016-08-18 22:36 ` [PATCH 03/11] sched: Extend scheduler's asym packing Srinivas Pandruvada
2016-08-25 11:22   ` Morten Rasmussen
2016-08-25 11:45     ` Peter Zijlstra
2016-08-25 13:18       ` Morten Rasmussen
2016-08-25 13:45         ` Peter Zijlstra
2016-08-26 10:39           ` Morten Rasmussen
2016-08-26 12:42             ` Peter Zijlstra
2016-08-26 17:25               ` Tim Chen
2016-08-26 23:14                 ` Tim Chen
2016-08-18 22:36 ` [PATCH 04/11] sched,x86: Enable Turbo Boost Max Technology Srinivas Pandruvada
2016-08-22  9:01   ` kbuild test robot
2016-08-22 19:04     ` Tim Chen
2016-08-24 10:18   ` Ingo Molnar
2016-08-24 17:50     ` Tim Chen
2016-08-24 18:08       ` Ingo Molnar
2016-08-24 18:22         ` Peter Zijlstra
2016-08-18 22:36 ` [PATCH 05/11] acpi: cppc: Allow build with ACPI_CPU_FREQ_PSS config Srinivas Pandruvada
2016-08-20  0:46   ` Rafael J. Wysocki
2016-08-18 22:36 ` [PATCH 06/11] acpi: cpcc: Add integer read support Srinivas Pandruvada
2016-08-18 22:36 ` [PATCH 07/11] acpi: cppc: Add support for function fixed hardware address Srinivas Pandruvada
2016-08-20  0:49   ` Rafael J. Wysocki
2016-08-18 22:36 ` [PATCH 08/11] acpi: cppc: Add prefix cppc to cpudata structure name Srinivas Pandruvada
2016-08-18 22:36 ` [PATCH 09/11] acpi: bus: Enable HWP CPPC objects Srinivas Pandruvada
2016-08-20  0:49   ` Rafael J. Wysocki
2016-08-18 22:36 ` [PATCH 10/11] acpi: bus: Set _OSC for diverse core support Srinivas Pandruvada
2016-08-20  0:51   ` Rafael J. Wysocki
2016-08-18 22:36 ` [PATCH 11/11] cpufreq: intel_pstate: Use CPPC to get max performance Srinivas Pandruvada
2016-08-22 11:59   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).