All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting
@ 2016-07-31 23:31 Rafael J. Wysocki
  2016-07-31 23:34 ` [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly Rafael J. Wysocki
                   ` (7 more replies)
  0 siblings, 8 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:31 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

Hi,

Admittedly, this hasn't been tested yet, so no promises and you have been
warned.  It builds, though (on x86-64 at least).

At this point I'm looking for general feedback mostly: does the direction make
sense or is there any reason why it can't work (that I'm not seeing), is it
acceptable and if not, how can it be improved?

Of course, if you don't like the details, please let me know too. :-)

This is based on Peter's suggestions and Srinivas's research.

The ultimate goal is to improve performance for tasks that have been waiting
on I/O in the schedutil governor and to provide a better default P-state
selection algorithm for intel_pstate (which also involves taking the "iowait"
into account).  The steps to get there are the following:

[1] Drop the util and max arguments from cpufreq_update_util() and the ->func()
    callback in struct update_util_data and make the schedutil governor access
    the scheduler's utilization data directly (this one is originally from Peter,
    I did my best to avoid breaking it).
[2] Drop cpufreq_trigger_update() as it is the same as cpufreq_update_util()
    after [1].
[3] Pass rq to cpufreq_update_util() (instead of the time) and make it do the
    smp_processor_id() check.
[4] Add a flags argument to cpufreq_update_util() and the ->func() callback in
    struct update_util_data, update their users accordingly and use the flags
    to clean up the handling of util updates from the RT sched class a bit.
[5] Make enqueue_task_fair() pass a new "IO" flag to cpufreq_update_util()
    if p->in_iowait is set.
[6] Modify the schedutil governor to use the new "IO" flag for boosting CPU
    frequency temporarily (in order to improve performance for tasks that
    have been waiting on I/O).
[7] Add a new P-state selection algorithm, based on "busy fraction" computation
    and "IO" boosting, and use it by default for Core processors.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
@ 2016-07-31 23:34 ` Rafael J. Wysocki
  2016-08-01 19:28   ` Steve Muckle
  2016-07-31 23:35 ` [RFC][PATCH 2/7] cpufreq / sched: Drop cpufreq_trigger_update() Rafael J. Wysocki
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:34 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Peter Zijlstra <peterz@infradead.org>

Since the schedutil governor is part of the scheduler proper, it can
access scheduler data directly.

This allows us to remove the util and max arguments of
cpufreq_update_util(), since only the schedutil governor will use
those, which leads to some text reduction:

  43595    1226      24   44845    af2d defconfig-build/kernel/sched/fair.o.pre
  42907    1226      24   44157    ac7d defconfig-build/kernel/sched/fair.o.post

Of course, we get more text in schedutil in return, but we can
benefit from not being tied to those two parameters by doing a very
coarse deadline reservation.

[ rjw: Subject/changelog + rebase, minor updates ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq_governor.c |    3 --
 drivers/cpufreq/intel_pstate.c     |    3 --
 include/linux/sched.h              |    6 +----
 kernel/sched/core.c                |    4 +--
 kernel/sched/cpufreq.c             |    3 --
 kernel/sched/cpufreq_schedutil.c   |   40 +++++++++++++++++++++++++++++++++----
 kernel/sched/fair.c                |   11 +++-------
 kernel/sched/sched.h               |   17 +++++++--------
 8 files changed, 55 insertions(+), 32 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
+++ linux-pm/drivers/cpufreq/cpufreq_governor.c
@@ -259,8 +259,7 @@ static void dbs_irq_work(struct irq_work
 	schedule_work_on(smp_processor_id(), &policy_dbs->work);
 }
 
-static void dbs_update_util_handler(struct update_util_data *data, u64 time,
-				    unsigned long util, unsigned long max)
+static void dbs_update_util_handler(struct update_util_data *data, u64 time)
 {
 	struct cpu_dbs_info *cdbs = container_of(data, struct cpu_dbs_info, update_util);
 	struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1328,8 +1328,7 @@ static inline void intel_pstate_adjust_b
 		get_avg_frequency(cpu));
 }
 
-static void intel_pstate_update_util(struct update_util_data *data, u64 time,
-				     unsigned long util, unsigned long max)
+static void intel_pstate_update_util(struct update_util_data *data, u64 time)
 {
 	struct cpudata *cpu = container_of(data, struct cpudata, update_util);
 	u64 delta_ns = time - cpu->sample.time;
Index: linux-pm/include/linux/sched.h
===================================================================
--- linux-pm.orig/include/linux/sched.h
+++ linux-pm/include/linux/sched.h
@@ -3377,13 +3377,11 @@ static inline unsigned long rlimit_max(u
 
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {
-	void (*func)(struct update_util_data *data,
-		     u64 time, unsigned long util, unsigned long max);
+       void (*func)(struct update_util_data *data, u64 time);
 };
 
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
-			void (*func)(struct update_util_data *data, u64 time,
-				     unsigned long util, unsigned long max));
+                       void (*func)(struct update_util_data *data, u64 time));
 void cpufreq_remove_update_util_hook(int cpu);
 #endif /* CONFIG_CPU_FREQ */
 
Index: linux-pm/kernel/sched/cpufreq.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq.c
+++ linux-pm/kernel/sched/cpufreq.c
@@ -32,8 +32,7 @@ DEFINE_PER_CPU(struct update_util_data *
  * called or it will WARN() and return with no effect.
  */
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
-			void (*func)(struct update_util_data *data, u64 time,
-				     unsigned long util, unsigned long max))
+			void (*func)(struct update_util_data *data, u64 time))
 {
 	if (WARN_ON(!data || !func))
 		return;
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -144,17 +144,47 @@ static unsigned int get_next_freq(struct
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
-static void sugov_update_single(struct update_util_data *hook, u64 time,
-				unsigned long util, unsigned long max)
+static void sugov_get_util(unsigned long *util, unsigned long *max)
+{
+	unsigned long dl_util, dl_max;
+	unsigned long cfs_util, cfs_max;
+	int cpu = smp_processor_id();
+	struct dl_bw *dl_bw = dl_bw_of(cpu);
+	struct rq *rq = this_rq();
+
+	if (rt_prio(current->prio)) {
+		*util = ULONG_MAX;
+		return;
+	}
+
+	dl_max = dl_bw_cpus(cpu) << 20;
+	dl_util = dl_bw->total_bw;
+
+	cfs_max = rq->cpu_capacity_orig;
+	cfs_util = min(rq->cfs.avg.util_avg, cfs_max);
+
+	if (cfs_util * dl_max > dl_util * cfs_max) {
+		*util = cfs_util;
+		*max  = cfs_max;
+	} else {
+		*util = dl_util;
+		*max  = dl_max;
+	}
+}
+
+static void sugov_update_single(struct update_util_data *hook, u64 time)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
+	unsigned long util, max;
 	unsigned int next_f;
 
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
+	sugov_get_util(&util, &max);
+
 	next_f = util == ULONG_MAX ? policy->cpuinfo.max_freq :
 			get_next_freq(sg_cpu, util, max);
 	sugov_update_commit(sg_policy, time, next_f);
@@ -206,13 +236,15 @@ static unsigned int sugov_next_freq_shar
 	return get_next_freq(sg_cpu, util, max);
 }
 
-static void sugov_update_shared(struct update_util_data *hook, u64 time,
-				unsigned long util, unsigned long max)
+static void sugov_update_shared(struct update_util_data *hook, u64 time)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
+	unsigned long util, max;
 	unsigned int next_f;
 
+	sugov_get_util(&util, &max);
+
 	raw_spin_lock(&sg_policy->update_lock);
 
 	sg_cpu->util = util;
Index: linux-pm/kernel/sched/fair.c
===================================================================
--- linux-pm.orig/kernel/sched/fair.c
+++ linux-pm/kernel/sched/fair.c
@@ -2870,11 +2870,8 @@ static inline u64 cfs_rq_clock_task(stru
 
 static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 {
-	struct rq *rq = rq_of(cfs_rq);
-	int cpu = cpu_of(rq);
-
-	if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
-		unsigned long max = rq->cpu_capacity_orig;
+	if (&this_rq()->cfs == cfs_rq) {
+		struct rq *rq = rq_of(cfs_rq);
 
 		/*
 		 * There are a few boundary cases this might miss but it should
@@ -2892,8 +2889,8 @@ static inline void cfs_rq_util_change(st
 		 *
 		 * See cpu_util().
 		 */
-		cpufreq_update_util(rq_clock(rq),
-				    min(cfs_rq->avg.util_avg, max), max);
+		if (cpu_of(rq) == smp_processor_id())
+			cpufreq_update_util(rq_clock(rq));
 	}
 }
 
Index: linux-pm/kernel/sched/sched.h
===================================================================
--- linux-pm.orig/kernel/sched/sched.h
+++ linux-pm/kernel/sched/sched.h
@@ -190,6 +190,7 @@ static inline int dl_bandwidth_enabled(v
 }
 
 extern struct dl_bw *dl_bw_of(int i);
+extern int dl_bw_cpus(int i);
 
 struct dl_bw {
 	raw_spinlock_t lock;
@@ -1760,21 +1761,19 @@ DECLARE_PER_CPU(struct update_util_data
 /**
  * cpufreq_update_util - Take a note about CPU utilization changes.
  * @time: Current time.
- * @util: Current utilization.
- * @max: Utilization ceiling.
  *
  * This function is called by the scheduler on every invocation of
  * update_load_avg() on the CPU whose utilization is being updated.
  *
  * It can only be called from RCU-sched read-side critical sections.
  */
-static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned long max)
+static inline void cpufreq_update_util(u64 time)
 {
-       struct update_util_data *data;
+	struct update_util_data *data;
 
-       data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
-       if (data)
-               data->func(data, time, util, max);
+	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
+	if (data)
+		data->func(data, time);
 }
 
 /**
@@ -1795,10 +1794,10 @@ static inline void cpufreq_update_util(u
  */
 static inline void cpufreq_trigger_update(u64 time)
 {
-	cpufreq_update_util(time, ULONG_MAX, 0);
+	cpufreq_update_util(time);
 }
 #else
-static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned long max) {}
+static inline void cpufreq_update_util(u64 time) {}
 static inline void cpufreq_trigger_update(u64 time) {}
 #endif /* CONFIG_CPU_FREQ */
 
Index: linux-pm/kernel/sched/core.c
===================================================================
--- linux-pm.orig/kernel/sched/core.c
+++ linux-pm/kernel/sched/core.c
@@ -2438,7 +2438,7 @@ inline struct dl_bw *dl_bw_of(int i)
 	return &cpu_rq(i)->rd->dl_bw;
 }
 
-static inline int dl_bw_cpus(int i)
+int dl_bw_cpus(int i)
 {
 	struct root_domain *rd = cpu_rq(i)->rd;
 	int cpus = 0;
@@ -2456,7 +2456,7 @@ inline struct dl_bw *dl_bw_of(int i)
 	return &cpu_rq(i)->dl.dl_bw;
 }
 
-static inline int dl_bw_cpus(int i)
+int dl_bw_cpus(int i)
 {
 	return 1;
 }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 2/7] cpufreq / sched: Drop cpufreq_trigger_update()
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
  2016-07-31 23:34 ` [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly Rafael J. Wysocki
@ 2016-07-31 23:35 ` Rafael J. Wysocki
  2016-07-31 23:36 ` [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util() Rafael J. Wysocki
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:35 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

After dropping the util and max arguments of cpufreq_update_util(),
the cpufreq_trigger_update() wrapper around it is pointless, so drop
it and call cpufreq_update_util() directly instead.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/sched/deadline.c |    4 ++--
 kernel/sched/rt.c       |    4 ++--
 kernel/sched/sched.h    |   22 ++++++----------------
 3 files changed, 10 insertions(+), 20 deletions(-)

Index: linux-pm/kernel/sched/deadline.c
===================================================================
--- linux-pm.orig/kernel/sched/deadline.c
+++ linux-pm/kernel/sched/deadline.c
@@ -732,9 +732,9 @@ static void update_curr_dl(struct rq *rq
 		return;
 	}
 
-	/* kick cpufreq (see the comment in linux/cpufreq.h). */
+	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
 	if (cpu_of(rq) == smp_processor_id())
-		cpufreq_trigger_update(rq_clock(rq));
+		cpufreq_update_util(rq_clock(rq));
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
Index: linux-pm/kernel/sched/rt.c
===================================================================
--- linux-pm.orig/kernel/sched/rt.c
+++ linux-pm/kernel/sched/rt.c
@@ -957,9 +957,9 @@ static void update_curr_rt(struct rq *rq
 	if (unlikely((s64)delta_exec <= 0))
 		return;
 
-	/* Kick cpufreq (see the comment in linux/cpufreq.h). */
+	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
 	if (cpu_of(rq) == smp_processor_id())
-		cpufreq_trigger_update(rq_clock(rq));
+		cpufreq_update_util(rq_clock(rq));
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
Index: linux-pm/kernel/sched/sched.h
===================================================================
--- linux-pm.orig/kernel/sched/sched.h
+++ linux-pm/kernel/sched/sched.h
@@ -1766,19 +1766,6 @@ DECLARE_PER_CPU(struct update_util_data
  * update_load_avg() on the CPU whose utilization is being updated.
  *
  * It can only be called from RCU-sched read-side critical sections.
- */
-static inline void cpufreq_update_util(u64 time)
-{
-	struct update_util_data *data;
-
-	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
-	if (data)
-		data->func(data, time);
-}
-
-/**
- * cpufreq_trigger_update - Trigger CPU performance state evaluation if needed.
- * @time: Current time.
  *
  * The way cpufreq is currently arranged requires it to evaluate the CPU
  * performance state (frequency/voltage) on a regular basis to prevent it from
@@ -1792,13 +1779,16 @@ static inline void cpufreq_update_util(u
  * but that really is a band-aid.  Going forward it should be replaced with
  * solutions targeted more specifically at RT and DL tasks.
  */
-static inline void cpufreq_trigger_update(u64 time)
+static inline void cpufreq_update_util(u64 time)
 {
-	cpufreq_update_util(time);
+	struct update_util_data *data;
+
+	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
+	if (data)
+		data->func(data, time);
 }
 #else
 static inline void cpufreq_update_util(u64 time) {}
-static inline void cpufreq_trigger_update(u64 time) {}
 #endif /* CONFIG_CPU_FREQ */
 
 #ifdef arch_scale_freq_capacity

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
  2016-07-31 23:34 ` [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly Rafael J. Wysocki
  2016-07-31 23:35 ` [RFC][PATCH 2/7] cpufreq / sched: Drop cpufreq_trigger_update() Rafael J. Wysocki
@ 2016-07-31 23:36 ` Rafael J. Wysocki
  2016-08-01  7:29   ` Dominik Brodowski
  2016-07-31 23:36 ` [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util() Rafael J. Wysocki
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:36 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

All of the callers of cpufreq_update_util() check whether or not
cpu_of(rq) is equal to smp_processor_id() before calling it and pass
rq_clock(rq) to it as the time argument, so rework it to take a
runqueue pointer as the argument and move the cpu_of(rq) check and
the rq_clock(rq) evaluation into it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/sched/deadline.c |    3 +--
 kernel/sched/fair.c     |    5 +----
 kernel/sched/rt.c       |    3 +--
 kernel/sched/sched.h    |   11 +++++++----
 4 files changed, 10 insertions(+), 12 deletions(-)

Index: linux-pm/kernel/sched/deadline.c
===================================================================
--- linux-pm.orig/kernel/sched/deadline.c
+++ linux-pm/kernel/sched/deadline.c
@@ -733,8 +733,7 @@ static void update_curr_dl(struct rq *rq
 	}
 
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-	if (cpu_of(rq) == smp_processor_id())
-		cpufreq_update_util(rq_clock(rq));
+	cpufreq_update_util(rq);
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
Index: linux-pm/kernel/sched/fair.c
===================================================================
--- linux-pm.orig/kernel/sched/fair.c
+++ linux-pm/kernel/sched/fair.c
@@ -2871,8 +2871,6 @@ static inline u64 cfs_rq_clock_task(stru
 static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 {
 	if (&this_rq()->cfs == cfs_rq) {
-		struct rq *rq = rq_of(cfs_rq);
-
 		/*
 		 * There are a few boundary cases this might miss but it should
 		 * get called often enough that that should (hopefully) not be
@@ -2889,8 +2887,7 @@ static inline void cfs_rq_util_change(st
 		 *
 		 * See cpu_util().
 		 */
-		if (cpu_of(rq) == smp_processor_id())
-			cpufreq_update_util(rq_clock(rq));
+		cpufreq_update_util(rq_of(cfs_rq));
 	}
 }
 
Index: linux-pm/kernel/sched/rt.c
===================================================================
--- linux-pm.orig/kernel/sched/rt.c
+++ linux-pm/kernel/sched/rt.c
@@ -958,8 +958,7 @@ static void update_curr_rt(struct rq *rq
 		return;
 
 	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
-	if (cpu_of(rq) == smp_processor_id())
-		cpufreq_update_util(rq_clock(rq));
+	cpufreq_update_util(rq);
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
Index: linux-pm/kernel/sched/sched.h
===================================================================
--- linux-pm.orig/kernel/sched/sched.h
+++ linux-pm/kernel/sched/sched.h
@@ -1760,7 +1760,7 @@ DECLARE_PER_CPU(struct update_util_data
 
 /**
  * cpufreq_update_util - Take a note about CPU utilization changes.
- * @time: Current time.
+ * @rq: Runqueue to carry out the update for.
  *
  * This function is called by the scheduler on every invocation of
  * update_load_avg() on the CPU whose utilization is being updated.
@@ -1779,16 +1779,19 @@ DECLARE_PER_CPU(struct update_util_data
  * but that really is a band-aid.  Going forward it should be replaced with
  * solutions targeted more specifically at RT and DL tasks.
  */
-static inline void cpufreq_update_util(u64 time)
+static inline void cpufreq_update_util(struct rq *rq)
 {
 	struct update_util_data *data;
 
+	if (cpu_of(rq) != smp_processor_id())
+		return;
+
 	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
 	if (data)
-		data->func(data, time);
+		data->func(data, rq_clock(rq));
 }
 #else
-static inline void cpufreq_update_util(u64 time) {}
+static inline void cpufreq_update_util(struct rq *rq) {}
 #endif /* CONFIG_CPU_FREQ */
 
 #ifdef arch_scale_freq_capacity

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util()
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2016-07-31 23:36 ` [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util() Rafael J. Wysocki
@ 2016-07-31 23:36 ` Rafael J. Wysocki
  2016-08-01  7:33   ` Dominik Brodowski
  2016-07-31 23:37 ` [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition Rafael J. Wysocki
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:36 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

It is useful to know the reason why cpufreq_update_util() has just
been called, so add a flags argument to it and to the ->func()
callback in struct update_util_data.

Update all of the users of them accordingly and use the flags
argument for marking updates coming from the RT sched class.

With that, modify the schedutil governor to use the "RT" flag
instead of the special utilization value of ULONG_MAX to track
updates from the RT sched class.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq_governor.c |    3 ++-
 drivers/cpufreq/intel_pstate.c     |    3 ++-
 include/linux/sched.h              |    7 +++++--
 kernel/sched/cpufreq.c             |    3 ++-
 kernel/sched/cpufreq_schedutil.c   |   37 +++++++++++++++++++++----------------
 kernel/sched/deadline.c            |    2 +-
 kernel/sched/fair.c                |    2 +-
 kernel/sched/rt.c                  |    2 +-
 kernel/sched/sched.h               |    7 ++++---
 9 files changed, 39 insertions(+), 27 deletions(-)

Index: linux-pm/include/linux/sched.h
===================================================================
--- linux-pm.orig/include/linux/sched.h
+++ linux-pm/include/linux/sched.h
@@ -3375,13 +3375,16 @@ static inline unsigned long rlimit_max(u
 	return task_rlimit_max(current, limit);
 }
 
+#define UUF_RT	0x01
+
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {
-       void (*func)(struct update_util_data *data, u64 time);
+       void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
 };
 
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
-                       void (*func)(struct update_util_data *data, u64 time));
+                       void (*func)(struct update_util_data *data, u64 time,
+				    unsigned int flags));
 void cpufreq_remove_update_util_hook(int cpu);
 #endif /* CONFIG_CPU_FREQ */
 
Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1328,7 +1328,8 @@ static inline void intel_pstate_adjust_b
 		get_avg_frequency(cpu));
 }
 
-static void intel_pstate_update_util(struct update_util_data *data, u64 time)
+static void intel_pstate_update_util(struct update_util_data *data, u64 time,
+				     unsigned int flags)
 {
 	struct cpudata *cpu = container_of(data, struct cpudata, update_util);
 	u64 delta_ns = time - cpu->sample.time;
Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
+++ linux-pm/drivers/cpufreq/cpufreq_governor.c
@@ -259,7 +259,8 @@ static void dbs_irq_work(struct irq_work
 	schedule_work_on(smp_processor_id(), &policy_dbs->work);
 }
 
-static void dbs_update_util_handler(struct update_util_data *data, u64 time)
+static void dbs_update_util_handler(struct update_util_data *data, u64 time,
+				    unsigned int flags)
 {
 	struct cpu_dbs_info *cdbs = container_of(data, struct cpu_dbs_info, update_util);
 	struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;
Index: linux-pm/kernel/sched/cpufreq.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq.c
+++ linux-pm/kernel/sched/cpufreq.c
@@ -32,7 +32,8 @@ DEFINE_PER_CPU(struct update_util_data *
  * called or it will WARN() and return with no effect.
  */
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
-			void (*func)(struct update_util_data *data, u64 time))
+			void (*func)(struct update_util_data *data, u64 time,
+				     unsigned int flags))
 {
 	if (WARN_ON(!data || !func))
 		return;
Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -53,6 +53,7 @@ struct sugov_cpu {
 	unsigned long util;
 	unsigned long max;
 	u64 last_update;
+	unsigned int flags;
 };
 
 static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);
@@ -144,7 +145,8 @@ static unsigned int get_next_freq(struct
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
-static void sugov_get_util(unsigned long *util, unsigned long *max)
+static void sugov_get_util(unsigned long *util, unsigned long *max,
+			   unsigned int flags)
 {
 	unsigned long dl_util, dl_max;
 	unsigned long cfs_util, cfs_max;
@@ -152,10 +154,8 @@ static void sugov_get_util(unsigned long
 	struct dl_bw *dl_bw = dl_bw_of(cpu);
 	struct rq *rq = this_rq();
 
-	if (rt_prio(current->prio)) {
-		*util = ULONG_MAX;
+	if (flags & UUF_RT)
 		return;
-	}
 
 	dl_max = dl_bw_cpus(cpu) << 20;
 	dl_util = dl_bw->total_bw;
@@ -172,7 +172,8 @@ static void sugov_get_util(unsigned long
 	}
 }
 
-static void sugov_update_single(struct update_util_data *hook, u64 time)
+static void sugov_update_single(struct update_util_data *hook, u64 time,
+				unsigned int flags)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
@@ -183,15 +184,16 @@ static void sugov_update_single(struct u
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
-	sugov_get_util(&util, &max);
+	sugov_get_util(&util, &max, flags);
 
-	next_f = util == ULONG_MAX ? policy->cpuinfo.max_freq :
-			get_next_freq(sg_cpu, util, max);
+	next_f = flags & UUF_RT ? policy->cpuinfo.max_freq :
+				  get_next_freq(sg_cpu, util, max);
 	sugov_update_commit(sg_policy, time, next_f);
 }
 
 static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu,
-					   unsigned long util, unsigned long max)
+					   unsigned long util, unsigned long max,
+					   unsigned int flags)
 {
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
@@ -199,7 +201,7 @@ static unsigned int sugov_next_freq_shar
 	u64 last_freq_update_time = sg_policy->last_freq_update_time;
 	unsigned int j;
 
-	if (util == ULONG_MAX)
+	if (flags & UUF_RT)
 		return max_f;
 
 	for_each_cpu(j, policy->cpus) {
@@ -222,10 +224,10 @@ static unsigned int sugov_next_freq_shar
 		if (delta_ns > TICK_NSEC)
 			continue;
 
-		j_util = j_sg_cpu->util;
-		if (j_util == ULONG_MAX)
+		if (j_sg_cpu->flags & UUF_RT)
 			return max_f;
 
+		j_util = j_sg_cpu->util;
 		j_max = j_sg_cpu->max;
 		if (j_util * max > j_max * util) {
 			util = j_util;
@@ -236,23 +238,25 @@ static unsigned int sugov_next_freq_shar
 	return get_next_freq(sg_cpu, util, max);
 }
 
-static void sugov_update_shared(struct update_util_data *hook, u64 time)
+static void sugov_update_shared(struct update_util_data *hook, u64 time,
+				unsigned int flags)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	unsigned long util, max;
 	unsigned int next_f;
 
-	sugov_get_util(&util, &max);
+	sugov_get_util(&util, &max, flags);
 
 	raw_spin_lock(&sg_policy->update_lock);
 
 	sg_cpu->util = util;
 	sg_cpu->max = max;
+	sg_cpu->flags = flags;
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		next_f = sugov_next_freq_shared(sg_cpu, util, max);
+		next_f = sugov_next_freq_shared(sg_cpu, util, max, flags);
 		sugov_update_commit(sg_policy, time, next_f);
 	}
 
@@ -476,8 +480,9 @@ static int sugov_start(struct cpufreq_po
 
 		sg_cpu->sg_policy = sg_policy;
 		if (policy_is_shared(policy)) {
-			sg_cpu->util = ULONG_MAX;
+			sg_cpu->util = 0;
 			sg_cpu->max = 0;
+			sg_cpu->flags = UUF_RT;
 			sg_cpu->last_update = 0;
 			sg_cpu->cached_raw_freq = 0;
 			cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
Index: linux-pm/kernel/sched/sched.h
===================================================================
--- linux-pm.orig/kernel/sched/sched.h
+++ linux-pm/kernel/sched/sched.h
@@ -1761,6 +1761,7 @@ DECLARE_PER_CPU(struct update_util_data
 /**
  * cpufreq_update_util - Take a note about CPU utilization changes.
  * @rq: Runqueue to carry out the update for.
+ * @flags: Update reason flags.
  *
  * This function is called by the scheduler on every invocation of
  * update_load_avg() on the CPU whose utilization is being updated.
@@ -1779,7 +1780,7 @@ DECLARE_PER_CPU(struct update_util_data
  * but that really is a band-aid.  Going forward it should be replaced with
  * solutions targeted more specifically at RT and DL tasks.
  */
-static inline void cpufreq_update_util(struct rq *rq)
+static inline void cpufreq_update_util(struct rq *rq, unsigned int flags)
 {
 	struct update_util_data *data;
 
@@ -1788,10 +1789,10 @@ static inline void cpufreq_update_util(s
 
 	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
 	if (data)
-		data->func(data, rq_clock(rq));
+		data->func(data, rq_clock(rq), flags);
 }
 #else
-static inline void cpufreq_update_util(struct rq *rq) {}
+static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
 #endif /* CONFIG_CPU_FREQ */
 
 #ifdef arch_scale_freq_capacity
Index: linux-pm/kernel/sched/deadline.c
===================================================================
--- linux-pm.orig/kernel/sched/deadline.c
+++ linux-pm/kernel/sched/deadline.c
@@ -733,7 +733,7 @@ static void update_curr_dl(struct rq *rq
 	}
 
 	/* kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_util(rq);
+	cpufreq_update_util(rq, 0);
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));
Index: linux-pm/kernel/sched/fair.c
===================================================================
--- linux-pm.orig/kernel/sched/fair.c
+++ linux-pm/kernel/sched/fair.c
@@ -2887,7 +2887,7 @@ static inline void cfs_rq_util_change(st
 		 *
 		 * See cpu_util().
 		 */
-		cpufreq_update_util(rq_of(cfs_rq));
+		cpufreq_update_util(rq_of(cfs_rq), 0);
 	}
 }
 
Index: linux-pm/kernel/sched/rt.c
===================================================================
--- linux-pm.orig/kernel/sched/rt.c
+++ linux-pm/kernel/sched/rt.c
@@ -958,7 +958,7 @@ static void update_curr_rt(struct rq *rq
 		return;
 
 	/* Kick cpufreq (see the comment in kernel/sched/sched.h). */
-	cpufreq_update_util(rq);
+	cpufreq_update_util(rq, UUF_RT);
 
 	schedstat_set(curr->se.statistics.exec_max,
 		      max(curr->se.statistics.exec_max, delta_exec));

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2016-07-31 23:36 ` [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util() Rafael J. Wysocki
@ 2016-07-31 23:37 ` Rafael J. Wysocki
  2016-08-02  1:22   ` Steve Muckle
  2016-07-31 23:37 ` [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting Rafael J. Wysocki
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:37 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Testing indicates that it is possible to improve performace
significantly without increasing energy consumption too much by
teaching cpufreq governors to bump up the CPU performance level if
the in_iowait flag is set for the task in enqueue_task_fair().

For this purpose, define a new cpufreq_update_util() flag
UUF_IO and modify enqueue_task_fair() to pass that flag to
cpufreq_update_util() in the in_iowait case.  That generally
requires cpufreq_update_util() to be called directly from there,
because update_load_avg() is not likely to be invoked in that
case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 include/linux/sched.h |    1 +
 kernel/sched/fair.c   |    8 ++++++++
 2 files changed, 9 insertions(+)

Index: linux-pm/kernel/sched/fair.c
===================================================================
--- linux-pm.orig/kernel/sched/fair.c
+++ linux-pm/kernel/sched/fair.c
@@ -4459,6 +4459,14 @@ enqueue_task_fair(struct rq *rq, struct
 	struct cfs_rq *cfs_rq;
 	struct sched_entity *se = &p->se;
 
+	/*
+	 * If in_iowait is set, it is likely that the loops below will not
+	 * trigger any cpufreq utilization updates, so do it here explicitly
+	 * with the IO flag passed.
+	 */
+	if (p->in_iowait)
+		cpufreq_update_util(rq, UUF_IO);
+
 	for_each_sched_entity(se) {
 		if (se->on_rq)
 			break;
Index: linux-pm/include/linux/sched.h
===================================================================
--- linux-pm.orig/include/linux/sched.h
+++ linux-pm/include/linux/sched.h
@@ -3376,6 +3376,7 @@ static inline unsigned long rlimit_max(u
 }
 
 #define UUF_RT	0x01
+#define UUF_IO	0x02
 
 #ifdef CONFIG_CPU_FREQ
 struct update_util_data {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2016-07-31 23:37 ` [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition Rafael J. Wysocki
@ 2016-07-31 23:37 ` Rafael J. Wysocki
  2016-08-02  1:35   ` Steve Muckle
  2016-07-31 23:38 ` [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core Rafael J. Wysocki
  2016-08-01 15:26 ` [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Doug Smythies
  7 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:37 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Modify the schedutil cpufreq governor to boost the CPU frequency
if the UUF_IO flag is passed to it via cpufreq_update_util().

If that happens, the frequency is set to the maximum during
the first update after receiving the UUF_IO flag and then the
boost is reduced by half during each following update.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 kernel/sched/cpufreq_schedutil.c |   61 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 54 insertions(+), 7 deletions(-)

Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -48,11 +48,13 @@ struct sugov_cpu {
 	struct sugov_policy *sg_policy;
 
 	unsigned int cached_raw_freq;
+	unsigned long iowait_boost;
+	unsigned long iowait_boost_max;
+	u64 last_update;
 
 	/* The fields below are only needed when sharing a policy. */
 	unsigned long util;
 	unsigned long max;
-	u64 last_update;
 	unsigned int flags;
 };
 
@@ -172,22 +174,58 @@ static void sugov_get_util(unsigned long
 	}
 }
 
+static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
+				   unsigned int flags)
+{
+	if (flags & UUF_IO) {
+		sg_cpu->iowait_boost = sg_cpu->iowait_boost_max;
+	} else if (sg_cpu->iowait_boost) {
+		s64 delta_ns = time - sg_cpu->last_update;
+
+		/* Clear iowait_boost if the CPU apprears to have been idle. */
+		if (delta_ns > TICK_NSEC)
+			sg_cpu->iowait_boost = 0;
+	}
+}
+
+static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util,
+			       unsigned long *max)
+{
+	unsigned long boost_util = sg_cpu->iowait_boost;
+	unsigned long boost_max = sg_cpu->iowait_boost_max;
+
+	if (!boost_util)
+		return;
+
+	if (*util * boost_max < *max * boost_util) {
+		*util = boost_util;
+		*max = boost_max;
+	}
+	sg_cpu->iowait_boost >>= 1;
+}
+
 static void sugov_update_single(struct update_util_data *hook, u64 time,
 				unsigned int flags)
 {
 	struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
-	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned long util, max;
 	unsigned int next_f;
 
+	sugov_set_iowait_boost(sg_cpu, time, flags);
+	sg_cpu->last_update = time;
+
 	if (!sugov_should_update_freq(sg_policy, time))
 		return;
 
 	sugov_get_util(&util, &max, flags);
 
-	next_f = flags & UUF_RT ? policy->cpuinfo.max_freq :
-				  get_next_freq(sg_cpu, util, max);
+	if (flags & UUF_RT) {
+		next_f = sg_policy->policy->cpuinfo.max_freq;
+	} else {
+		sugov_iowait_boost(sg_cpu, &util, &max);
+		next_f = get_next_freq(sg_cpu, util, max);
+	}
 	sugov_update_commit(sg_policy, time, next_f);
 }
 
@@ -204,6 +242,8 @@ static unsigned int sugov_next_freq_shar
 	if (flags & UUF_RT)
 		return max_f;
 
+	sugov_iowait_boost(sg_cpu, &util, &max);
+
 	for_each_cpu(j, policy->cpus) {
 		struct sugov_cpu *j_sg_cpu;
 		unsigned long j_util, j_max;
@@ -218,12 +258,13 @@ static unsigned int sugov_next_freq_shar
 		 * frequency update and the time elapsed between the last update
 		 * of the CPU utilization and the last frequency update is long
 		 * enough, don't take the CPU into account as it probably is
-		 * idle now.
+		 * idle now (and clear iowait_boost for it).
 		 */
 		delta_ns = last_freq_update_time - j_sg_cpu->last_update;
-		if (delta_ns > TICK_NSEC)
+		if (delta_ns > TICK_NSEC) {
+			j_sg_cpu->iowait_boost = 0;
 			continue;
-
+		}
 		if (j_sg_cpu->flags & UUF_RT)
 			return max_f;
 
@@ -233,6 +274,8 @@ static unsigned int sugov_next_freq_shar
 			util = j_util;
 			max = j_max;
 		}
+
+		sugov_iowait_boost(j_sg_cpu, &util, &max);
 	}
 
 	return get_next_freq(sg_cpu, util, max);
@@ -253,6 +296,8 @@ static void sugov_update_shared(struct u
 	sg_cpu->util = util;
 	sg_cpu->max = max;
 	sg_cpu->flags = flags;
+
+	sugov_set_iowait_boost(sg_cpu, time, flags);
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
@@ -485,6 +530,8 @@ static int sugov_start(struct cpufreq_po
 			sg_cpu->flags = UUF_RT;
 			sg_cpu->last_update = 0;
 			sg_cpu->cached_raw_freq = 0;
+			sg_cpu->iowait_boost = 0;
+			sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
 			cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
 						     sugov_update_shared);
 		} else {

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2016-07-31 23:37 ` [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting Rafael J. Wysocki
@ 2016-07-31 23:38 ` Rafael J. Wysocki
  2016-08-04  4:18   ` Doug Smythies
                     ` (2 more replies)
  2016-08-01 15:26 ` [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Doug Smythies
  7 siblings, 3 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-07-31 23:38 UTC (permalink / raw)
  To: Linux PM list
  Cc: Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The PID-base P-state selection algorithm used by intel_pstate for
Core processors is based on very weak foundations.  Namely, its
decisions are mostly based on the values of the APERF and MPERF
feedback registers and it only estimates the actual utilization to
check if it is not extremely low (in order to avoid getting stuck
in the highest P-state in that case).

Since it generally causes the CPU P-state to ramp up quickly, it
leads to satisfactory performance, but the metric used by it is only
really valid when the CPU changes P-states by itself (ie. in the turbo
range) and if the P-state value set by the driver is treated by the
CPU as the upper limit on turbo P-states selected by it.

As a result, the only case when P-states are reduced by that
algorithm is when the CPU has just come out of idle, but in that
particular case it would have been better to bump up the P-state
instead.  That causes some benchmarks to behave erratically and
attempts to improve the situation lead to excessive energy
consumption, because they make the CPU stay in very high P-states
almost all the time.

Consequently, the only viable way to fix that is to replace the
erroneous algorithm entirely with a better one.

To that end, notice that setting the P-state proportional to the
actual CPU utilization (measured with the help of MPERF and TSC)
generally leads to reasonable behavior, but it does not reflect
the "performance boosting" nature of the current P-state
selection algorithm.  It may be made more similar to that
algorithm, though, by adding iowait boosting to it.

Specifically, if the P-state is bumped up to the maximum after
receiving the UUF_IO flag via cpufreq_update_util(), it will
allow tasks that were previously waiting on I/O to get the full
capacity of the CPU when they are ready to process data again and
that should lead to the desired performance increase overall
without sacrificing too much energy.

For this reason, use the above approach for Core processors in
intel_pstate.

Original-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |   43 +++++++++++++++++++++++++++++++++++++++--
 include/linux/sched.h          |    3 ++
 kernel/sched/sched.h           |    3 --
 3 files changed, 44 insertions(+), 5 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -181,6 +181,8 @@ struct _pid {
  * @cpu:		CPU number for this instance data
  * @update_util:	CPUFreq utility callback information
  * @update_util_set:	CPUFreq utility callback is set
+ * @iowait_boost:	iowait-related boost fraction
+ * @last_update:	Time of the last update.
  * @pstate:		Stores P state limits for this CPU
  * @vid:		Stores VID limits for this CPU
  * @pid:		Stores PID parameters for this CPU
@@ -206,6 +208,7 @@ struct cpudata {
 	struct vid_data vid;
 	struct _pid pid;
 
+	u64	last_update;
 	u64	last_sample_time;
 	u64	prev_aperf;
 	u64	prev_mperf;
@@ -216,6 +219,7 @@ struct cpudata {
 	struct acpi_processor_performance acpi_perf_data;
 	bool valid_pss_table;
 #endif
+	unsigned int iowait_boost;
 };
 
 static struct cpudata **all_cpu_data;
@@ -229,6 +233,7 @@ static struct cpudata **all_cpu_data;
  * @p_gain_pct:		PID proportional gain
  * @i_gain_pct:		PID integral gain
  * @d_gain_pct:		PID derivative gain
+ * @boost_iowait:	Whether or not to use iowait boosting.
  *
  * Stores per CPU model static PID configuration data.
  */
@@ -240,6 +245,7 @@ struct pstate_adjust_policy {
 	int p_gain_pct;
 	int d_gain_pct;
 	int i_gain_pct;
+	bool boost_iowait;
 };
 
 /**
@@ -277,6 +283,7 @@ struct cpu_defaults {
 	struct pstate_funcs funcs;
 };
 
+static inline int32_t get_target_pstate_default(struct cpudata *cpu);
 static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu);
 static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu);
 
@@ -1017,6 +1024,7 @@ static struct cpu_defaults core_params =
 		.p_gain_pct = 20,
 		.d_gain_pct = 0,
 		.i_gain_pct = 0,
+		.boost_iowait = true,
 	},
 	.funcs = {
 		.get_max = core_get_max_pstate,
@@ -1025,7 +1033,7 @@ static struct cpu_defaults core_params =
 		.get_turbo = core_get_turbo_pstate,
 		.get_scaling = core_get_scaling,
 		.get_val = core_get_val,
-		.get_target_pstate = get_target_pstate_use_performance,
+		.get_target_pstate = get_target_pstate_default,
 	},
 };
 
@@ -1290,6 +1298,24 @@ static inline int32_t get_target_pstate_
 	return cpu->pstate.current_pstate - pid_calc(&cpu->pid, perf_scaled);
 }
 
+static inline int32_t get_target_pstate_default(struct cpudata *cpu)
+{
+	struct sample *sample = &cpu->sample;
+	int32_t busy_frac;
+	int pstate;
+
+	busy_frac = div_fp(sample->mperf, sample->tsc);
+	sample->busy_scaled = busy_frac * 100;
+
+	if (busy_frac < cpu->iowait_boost)
+		busy_frac = cpu->iowait_boost;
+
+	cpu->iowait_boost >>= 1;
+
+	pstate = cpu->pstate.turbo_pstate;
+	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
+}
+
 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)
 {
 	int max_perf, min_perf;
@@ -1332,8 +1358,21 @@ static void intel_pstate_update_util(str
 				     unsigned int flags)
 {
 	struct cpudata *cpu = container_of(data, struct cpudata, update_util);
-	u64 delta_ns = time - cpu->sample.time;
+	u64 delta_ns;
+
+	if (pid_params.boost_iowait) {
+		if (flags & UUF_IO) {
+			cpu->iowait_boost = int_tofp(1);
+		} else if (cpu->iowait_boost) {
+			/* Clear iowait_boost if the CPU may have been idle. */
+			delta_ns = time - cpu->last_update;
+			if (delta_ns > TICK_NSEC)
+				cpu->iowait_boost = 0;
+		}
+		cpu->last_update = time;
+	}
 
+	delta_ns = time - cpu->sample.time;
 	if ((s64)delta_ns >= pid_params.sample_rate_ns) {
 		bool sample_taken = intel_pstate_sample(cpu, time);
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
  2016-07-31 23:36 ` [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util() Rafael J. Wysocki
@ 2016-08-01  7:29   ` Dominik Brodowski
  2016-08-01 14:57     ` Rafael J. Wysocki
  2016-08-01 19:48     ` Steve Muckle
  0 siblings, 2 replies; 51+ messages in thread
From: Dominik Brodowski @ 2016-08-01  7:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

A small nitpick:

On Mon, Aug 01, 2016 at 01:36:01AM +0200, Rafael J. Wysocki wrote:
> --- linux-pm.orig/kernel/sched/sched.h
> +++ linux-pm/kernel/sched/sched.h
> @@ -1760,7 +1760,7 @@ DECLARE_PER_CPU(struct update_util_data
>  
>  /**
>   * cpufreq_update_util - Take a note about CPU utilization changes.
> - * @time: Current time.
> + * @rq: Runqueue to carry out the update for.
>   *
>   * This function is called by the scheduler on every invocation of
>   * update_load_avg() on the CPU whose utilization is being updated.

This comment seems to need an update due to the smp_processor_id() check
being moved into this function.

Best,
	Dominik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util()
  2016-07-31 23:36 ` [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util() Rafael J. Wysocki
@ 2016-08-01  7:33   ` Dominik Brodowski
  2016-08-01 14:57     ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Dominik Brodowski @ 2016-08-01  7:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Mon, Aug 01, 2016 at 01:36:46AM +0200, Rafael J. Wysocki wrote:
> +#define UUF_RT	0x01

What does UUF stand for?

Best
	Dominik

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util()
  2016-08-01  7:33   ` Dominik Brodowski
@ 2016-08-01 14:57     ` Rafael J. Wysocki
  2016-08-01 19:59       ` Steve Muckle
  0 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-01 14:57 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Monday, August 01, 2016 09:33:12 AM Dominik Brodowski wrote:
> On Mon, Aug 01, 2016 at 01:36:46AM +0200, Rafael J. Wysocki wrote:
> > +#define UUF_RT	0x01
> 
> What does UUF stand for?

"Utilization upadte flag".

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
  2016-08-01  7:29   ` Dominik Brodowski
@ 2016-08-01 14:57     ` Rafael J. Wysocki
  2016-08-01 19:48     ` Steve Muckle
  1 sibling, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-01 14:57 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Monday, August 01, 2016 09:29:57 AM Dominik Brodowski wrote:
> A small nitpick:
> 
> On Mon, Aug 01, 2016 at 01:36:01AM +0200, Rafael J. Wysocki wrote:
> > --- linux-pm.orig/kernel/sched/sched.h
> > +++ linux-pm/kernel/sched/sched.h
> > @@ -1760,7 +1760,7 @@ DECLARE_PER_CPU(struct update_util_data
> >  
> >  /**
> >   * cpufreq_update_util - Take a note about CPU utilization changes.
> > - * @time: Current time.
> > + * @rq: Runqueue to carry out the update for.
> >   *
> >   * This function is called by the scheduler on every invocation of
> >   * update_load_avg() on the CPU whose utilization is being updated.
> 
> This comment seems to need an update due to the smp_processor_id() check
> being moved into this function.

Right.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting
  2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2016-07-31 23:38 ` [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core Rafael J. Wysocki
@ 2016-08-01 15:26 ` Doug Smythies
  2016-08-01 16:30   ` Rafael J. Wysocki
  7 siblings, 1 reply; 51+ messages in thread
From: Doug Smythies @ 2016-08-01 15:26 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Linux PM list'
  Cc: 'Peter Zijlstra', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar'

On 2016.07.31 16:32 Rafael J. Wysocki wrote:

> Hi,
>
> Admittedly, this hasn't been tested yet, so no promises and you have been
> warned.  It builds, though (on x86-64 at least).

It would not build for me until I changed the kernel configuration file like so:

$ scripts/diffconfig .config .config_rjw_b_2
 CPU_FREQ_GOV_SCHEDUTIL m -> y

(the above is actually backwards, it is "y" that works.)

Otherwise I got:

Kernel: arch/x86/boot/bzImage is ready  (#86)
  MODPOST 3318 modules
ERROR: "dl_bw_cpus" [kernel/sched/cpufreq_schedutil.ko] undefined!
ERROR: "dl_bw_of" [kernel/sched/cpufreq_schedutil.ko] undefined!
ERROR: "runqueues" [kernel/sched/cpufreq_schedutil.ko] undefined!
scripts/Makefile.modpost:91: recipe for target '__modpost' failed
make[4]: *** [__modpost] Error 1
Makefile:1186: recipe for target 'modules' failed

... Doug

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting
  2016-08-01 15:26 ` [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Doug Smythies
@ 2016-08-01 16:30   ` Rafael J. Wysocki
  2016-08-08 11:08     ` Peter Zijlstra
  0 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-01 16:30 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Linux PM list', 'Peter Zijlstra',
	'Srinivas Pandruvada', 'Viresh Kumar',
	'Linux Kernel Mailing List', 'Steve Muckle',
	'Juri Lelli', 'Ingo Molnar'

On Monday, August 01, 2016 08:26:54 AM Doug Smythies wrote:
> On 2016.07.31 16:32 Rafael J. Wysocki wrote:
> 
> > Hi,
> >
> > Admittedly, this hasn't been tested yet, so no promises and you have been
> > warned.  It builds, though (on x86-64 at least).
> 
> It would not build for me until I changed the kernel configuration file like so:
> 
> $ scripts/diffconfig .config .config_rjw_b_2
>  CPU_FREQ_GOV_SCHEDUTIL m -> y
> 
> (the above is actually backwards, it is "y" that works.)
> 
> Otherwise I got:
> 
> Kernel: arch/x86/boot/bzImage is ready  (#86)
>   MODPOST 3318 modules
> ERROR: "dl_bw_cpus" [kernel/sched/cpufreq_schedutil.ko] undefined!
> ERROR: "dl_bw_of" [kernel/sched/cpufreq_schedutil.ko] undefined!
> ERROR: "runqueues" [kernel/sched/cpufreq_schedutil.ko] undefined!
> scripts/Makefile.modpost:91: recipe for target '__modpost' failed
> make[4]: *** [__modpost] Error 1
> Makefile:1186: recipe for target 'modules' failed

There are some export_symbol declarations missing, I'll fix that up.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-07-31 23:34 ` [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly Rafael J. Wysocki
@ 2016-08-01 19:28   ` Steve Muckle
  2016-08-01 23:46     ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-01 19:28 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Juri Lelli, Ingo Molnar

On Mon, Aug 01, 2016 at 01:34:36AM +0200, Rafael J. Wysocki wrote:
...
> Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> @@ -144,17 +144,47 @@ static unsigned int get_next_freq(struct
>  	return cpufreq_driver_resolve_freq(policy, freq);
>  }
>  
> -static void sugov_update_single(struct update_util_data *hook, u64 time,
> -				unsigned long util, unsigned long max)
> +static void sugov_get_util(unsigned long *util, unsigned long *max)
> +{
> +	unsigned long dl_util, dl_max;
> +	unsigned long cfs_util, cfs_max;
> +	int cpu = smp_processor_id();
> +	struct dl_bw *dl_bw = dl_bw_of(cpu);
> +	struct rq *rq = this_rq();
> +
> +	if (rt_prio(current->prio)) {
> +		*util = ULONG_MAX;
> +		return;
> +	}
> +
> +	dl_max = dl_bw_cpus(cpu) << 20;
> +	dl_util = dl_bw->total_bw;
> +
> +	cfs_max = rq->cpu_capacity_orig;
> +	cfs_util = min(rq->cfs.avg.util_avg, cfs_max);
> +
> +	if (cfs_util * dl_max > dl_util * cfs_max) {
> +		*util = cfs_util;
> +		*max  = cfs_max;
> +	} else {
> +		*util = dl_util;
> +		*max  = dl_max;
> +	}
> +}

Last Friday I had put together a similar patch based on Peter's. I need
the flags field for the remote wakeup support. My previous plan,
installing a late callback in check_preempt_curr that gets requested
from the earlier existing CFS callback, was not working out since those
two events don't always match up 1:1.

Anyway one way that my patch differed was that I had used the flags
field to keep the behavior the same for both RT and DL. That happens
later on in this series for RT but the DL policy is modified as above.
Can the DL policy be left as-is and discussed/modified in a separate
series?

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
  2016-08-01  7:29   ` Dominik Brodowski
  2016-08-01 14:57     ` Rafael J. Wysocki
@ 2016-08-01 19:48     ` Steve Muckle
  2016-08-01 23:43       ` Rafael J. Wysocki
  1 sibling, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-01 19:48 UTC (permalink / raw)
  To: Dominik Brodowski
  Cc: Rafael J. Wysocki, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Ingo Molnar

On Mon, Aug 01, 2016 at 09:29:57AM +0200, Dominik Brodowski wrote:
> A small nitpick:
> 
> On Mon, Aug 01, 2016 at 01:36:01AM +0200, Rafael J. Wysocki wrote:
> > --- linux-pm.orig/kernel/sched/sched.h
> > +++ linux-pm/kernel/sched/sched.h
> > @@ -1760,7 +1760,7 @@ DECLARE_PER_CPU(struct update_util_data
> >  
> >  /**
> >   * cpufreq_update_util - Take a note about CPU utilization changes.
> > - * @time: Current time.
> > + * @rq: Runqueue to carry out the update for.
> >   *
> >   * This function is called by the scheduler on every invocation of
> >   * update_load_avg() on the CPU whose utilization is being updated.
> 
> This comment seems to need an update due to the smp_processor_id() check
> being moved into this function.

The callers of this have also changed - it is no longer called directly
by update_load_avg(), rather via cfs_rq_util_change() from several other
locations (I believe it was my patch that failed to update this
comment).

Could this be replaced with a more generic statement such as "called by
CFS in various paths?"

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util()
  2016-08-01 14:57     ` Rafael J. Wysocki
@ 2016-08-01 19:59       ` Steve Muckle
  2016-08-01 23:44         ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-01 19:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Dominik Brodowski, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Steve Muckle, Juri Lelli, Ingo Molnar

On Mon, Aug 01, 2016 at 04:57:18PM +0200, Rafael J. Wysocki wrote:
> On Monday, August 01, 2016 09:33:12 AM Dominik Brodowski wrote:
> > On Mon, Aug 01, 2016 at 01:36:46AM +0200, Rafael J. Wysocki wrote:
> > > +#define UUF_RT	0x01
> > 
> > What does UUF stand for?
> 
> "Utilization upadte flag".

I had wondered the same - in my patchset I used CPUFREQ_SCHED_UPDATE_* for the
prefixes, though I guess some may object to the length.

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
  2016-08-01 19:48     ` Steve Muckle
@ 2016-08-01 23:43       ` Rafael J. Wysocki
  0 siblings, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-01 23:43 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Dominik Brodowski, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Ingo Molnar

On Monday, August 01, 2016 12:48:18 PM Steve Muckle wrote:
> On Mon, Aug 01, 2016 at 09:29:57AM +0200, Dominik Brodowski wrote:
> > A small nitpick:
> > 
> > On Mon, Aug 01, 2016 at 01:36:01AM +0200, Rafael J. Wysocki wrote:
> > > --- linux-pm.orig/kernel/sched/sched.h
> > > +++ linux-pm/kernel/sched/sched.h
> > > @@ -1760,7 +1760,7 @@ DECLARE_PER_CPU(struct update_util_data
> > >  
> > >  /**
> > >   * cpufreq_update_util - Take a note about CPU utilization changes.
> > > - * @time: Current time.
> > > + * @rq: Runqueue to carry out the update for.
> > >   *
> > >   * This function is called by the scheduler on every invocation of
> > >   * update_load_avg() on the CPU whose utilization is being updated.
> > 
> > This comment seems to need an update due to the smp_processor_id() check
> > being moved into this function.
> 
> The callers of this have also changed - it is no longer called directly
> by update_load_avg(), rather via cfs_rq_util_change() from several other
> locations (I believe it was my patch that failed to update this
> comment).
> 
> Could this be replaced with a more generic statement such as "called by
> CFS in various paths?"

Good observation.

I'll modify that comment to match the code.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util()
  2016-08-01 19:59       ` Steve Muckle
@ 2016-08-01 23:44         ` Rafael J. Wysocki
  2016-08-02  1:36           ` Steve Muckle
  0 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-01 23:44 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Dominik Brodowski, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Ingo Molnar

On Monday, August 01, 2016 12:59:30 PM Steve Muckle wrote:
> On Mon, Aug 01, 2016 at 04:57:18PM +0200, Rafael J. Wysocki wrote:
> > On Monday, August 01, 2016 09:33:12 AM Dominik Brodowski wrote:
> > > On Mon, Aug 01, 2016 at 01:36:46AM +0200, Rafael J. Wysocki wrote:
> > > > +#define UUF_RT	0x01
> > > 
> > > What does UUF stand for?
> > 
> > "Utilization upadte flag".
> 
> I had wondered the same - in my patchset I used CPUFREQ_SCHED_UPDATE_* for the
> prefixes, though I guess some may object to the length.

Well, OK.

I guess something like SCHED_CPUFREQ_RT etc would be sufficient?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-08-01 19:28   ` Steve Muckle
@ 2016-08-01 23:46     ` Rafael J. Wysocki
  2016-08-02 10:38       ` Juri Lelli
  2016-08-08 10:38       ` Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-01 23:46 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Juri Lelli, Ingo Molnar

On Monday, August 01, 2016 12:28:50 PM Steve Muckle wrote:
> On Mon, Aug 01, 2016 at 01:34:36AM +0200, Rafael J. Wysocki wrote:
> ...
> > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > @@ -144,17 +144,47 @@ static unsigned int get_next_freq(struct
> >  	return cpufreq_driver_resolve_freq(policy, freq);
> >  }
> >  
> > -static void sugov_update_single(struct update_util_data *hook, u64 time,
> > -				unsigned long util, unsigned long max)
> > +static void sugov_get_util(unsigned long *util, unsigned long *max)
> > +{
> > +	unsigned long dl_util, dl_max;
> > +	unsigned long cfs_util, cfs_max;
> > +	int cpu = smp_processor_id();
> > +	struct dl_bw *dl_bw = dl_bw_of(cpu);
> > +	struct rq *rq = this_rq();
> > +
> > +	if (rt_prio(current->prio)) {
> > +		*util = ULONG_MAX;
> > +		return;
> > +	}
> > +
> > +	dl_max = dl_bw_cpus(cpu) << 20;
> > +	dl_util = dl_bw->total_bw;
> > +
> > +	cfs_max = rq->cpu_capacity_orig;
> > +	cfs_util = min(rq->cfs.avg.util_avg, cfs_max);
> > +
> > +	if (cfs_util * dl_max > dl_util * cfs_max) {
> > +		*util = cfs_util;
> > +		*max  = cfs_max;
> > +	} else {
> > +		*util = dl_util;
> > +		*max  = dl_max;
> > +	}
> > +}
> 
> Last Friday I had put together a similar patch based on Peter's. I need
> the flags field for the remote wakeup support. My previous plan,
> installing a late callback in check_preempt_curr that gets requested
> from the earlier existing CFS callback, was not working out since those
> two events don't always match up 1:1.
> 
> Anyway one way that my patch differed was that I had used the flags
> field to keep the behavior the same for both RT and DL. That happens
> later on in this series for RT but the DL policy is modified as above.
> Can the DL policy be left as-is and discussed/modified in a separate
> series?

No problem with that as far as I'm concerned, but in that case it won't be
a Peter's patch any more. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-07-31 23:37 ` [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition Rafael J. Wysocki
@ 2016-08-02  1:22   ` Steve Muckle
  2016-08-02  1:37     ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-02  1:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
...
> For this purpose, define a new cpufreq_update_util() flag
> UUF_IO and modify enqueue_task_fair() to pass that flag to
> cpufreq_update_util() in the in_iowait case.  That generally
> requires cpufreq_update_util() to be called directly from there,
> because update_load_avg() is not likely to be invoked in that
> case.

I didn't follow why the cpufreq hook won't likely be called if
in_iowait is set? AFAICS update_load_avg() gets called in the second loop
and calls update_cfs_rq_load_avg (triggers the hook).

> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  include/linux/sched.h |    1 +
>  kernel/sched/fair.c   |    8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> Index: linux-pm/kernel/sched/fair.c
> ===================================================================
> --- linux-pm.orig/kernel/sched/fair.c
> +++ linux-pm/kernel/sched/fair.c
> @@ -4459,6 +4459,14 @@ enqueue_task_fair(struct rq *rq, struct
>  	struct cfs_rq *cfs_rq;
>  	struct sched_entity *se = &p->se;
>  
> +	/*
> +	 * If in_iowait is set, it is likely that the loops below will not
> +	 * trigger any cpufreq utilization updates, so do it here explicitly
> +	 * with the IO flag passed.
> +	 */
> +	if (p->in_iowait)
> +		cpufreq_update_util(rq, UUF_IO);
> +
>  	for_each_sched_entity(se) {
>  		if (se->on_rq)
>  			break;
> Index: linux-pm/include/linux/sched.h
> ===================================================================
> --- linux-pm.orig/include/linux/sched.h
> +++ linux-pm/include/linux/sched.h
> @@ -3376,6 +3376,7 @@ static inline unsigned long rlimit_max(u
>  }
>  
>  #define UUF_RT	0x01
> +#define UUF_IO	0x02
>  
>  #ifdef CONFIG_CPU_FREQ
>  struct update_util_data {

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting
  2016-07-31 23:37 ` [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting Rafael J. Wysocki
@ 2016-08-02  1:35   ` Steve Muckle
  2016-08-02 23:03     ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-02  1:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Mon, Aug 01, 2016 at 01:37:59AM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Modify the schedutil cpufreq governor to boost the CPU frequency
> if the UUF_IO flag is passed to it via cpufreq_update_util().
> 
> If that happens, the frequency is set to the maximum during
> the first update after receiving the UUF_IO flag and then the
> boost is reduced by half during each following update.

Were these changes to schedutil part of the positive test results
mentioned in patch 5? Or are those just from intel pstate?

I was nervous about the effect of this on power and tested a couple low
power usecases. The platform is the Hikey 96board (8 core ARM A53,
single CPUfreq domain) running AOSP Android and schedutil backported to
kernel 4.4. These tests run mp3 and mpeg4 playback for a little while,
recording total energy consumption during the test along with frequency
residency.

As the results below show I did not measure an appreciable effect - if
anything things may be slightly better with the patches.

The hardcoding of a non-tunable boosting scheme makes me nervous but
perhaps it could be revisited if some platform or configuration shows
a noticeable regression?

Testcase	Energy	/----- CPU frequency residency -----\
		(J)	208000	432000	729000	960000	1200000
mp3-before-1	26.822	47.27%	24.79%	16.23%	5.20%	6.52%
mp3-before-2	26.817	41.70%	28.75%	17.62%	5.17%	6.75%
mp3-before-3	26.65	42.48%	28.65%	17.25%	5.07%	6.55%
mp3-after-1	26.667	42.51%	27.38%	18.00%	5.40%	6.71%
mp3-after-2	26.777	48.37%	24.15%	15.68%	4.55%	7.25%
mp3-after-3	26.806	41.93%	27.71%	18.35%	4.78%	7.35%

mpeg4-before-1	26.024	18.41%	60.09%	13.16%	0.49%	7.85%
mpeg4-before-2	25.147	20.47%	64.80%	8.44%	1.37%	4.91%
mpeg4-before-3	25.007	19.18%	66.08%	10.01%	0.59%	4.22%
mpeg4-after-1	25.598	19.77%	61.33%	11.63%	0.79%	6.48%
mpeg4-after-2	25.18	22.31%	62.78%	8.83%	1.18%	4.90%
mpeg4-after-3	25.162	21.59%	64.88%	8.29%	0.49%	4.71%

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util()
  2016-08-01 23:44         ` Rafael J. Wysocki
@ 2016-08-02  1:36           ` Steve Muckle
  0 siblings, 0 replies; 51+ messages in thread
From: Steve Muckle @ 2016-08-02  1:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Dominik Brodowski, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Ingo Molnar

On Tue, Aug 02, 2016 at 01:44:41AM +0200, Rafael J. Wysocki wrote:
> On Monday, August 01, 2016 12:59:30 PM Steve Muckle wrote:
> > On Mon, Aug 01, 2016 at 04:57:18PM +0200, Rafael J. Wysocki wrote:
> > > On Monday, August 01, 2016 09:33:12 AM Dominik Brodowski wrote:
> > > > On Mon, Aug 01, 2016 at 01:36:46AM +0200, Rafael J. Wysocki wrote:
> > > > > +#define UUF_RT	0x01
> > > > 
> > > > What does UUF stand for?
> > > 
> > > "Utilization upadte flag".
> > 
> > I had wondered the same - in my patchset I used CPUFREQ_SCHED_UPDATE_* for the
> > prefixes, though I guess some may object to the length.
> 
> Well, OK.
> 
> I guess something like SCHED_CPUFREQ_RT etc would be sufficient?

Yeah I think that would work.

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-02  1:22   ` Steve Muckle
@ 2016-08-02  1:37     ` Rafael J. Wysocki
  2016-08-02 22:02       ` Steve Muckle
  0 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-02  1:37 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Ingo Molnar

On Tue, Aug 2, 2016 at 3:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
> ...
>> For this purpose, define a new cpufreq_update_util() flag
>> UUF_IO and modify enqueue_task_fair() to pass that flag to
>> cpufreq_update_util() in the in_iowait case.  That generally
>> requires cpufreq_update_util() to be called directly from there,
>> because update_load_avg() is not likely to be invoked in that
>> case.
>
> I didn't follow why the cpufreq hook won't likely be called if
> in_iowait is set? AFAICS update_load_avg() gets called in the second loop
> and calls update_cfs_rq_load_avg (triggers the hook).

In practice it turns out that in the majority of cases when in_iowait
is set the second loop will not run.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-08-01 23:46     ` Rafael J. Wysocki
@ 2016-08-02 10:38       ` Juri Lelli
  2016-08-02 14:28         ` Steve Muckle
  2016-08-08 10:38       ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Juri Lelli @ 2016-08-02 10:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Linux PM list, Peter Zijlstra, Srinivas Pandruvada,
	Viresh Kumar, Linux Kernel Mailing List, Ingo Molnar

Hi,

On 02/08/16 01:46, Rafael J. Wysocki wrote:
> On Monday, August 01, 2016 12:28:50 PM Steve Muckle wrote:
> > On Mon, Aug 01, 2016 at 01:34:36AM +0200, Rafael J. Wysocki wrote:
> > ...
> > > Index: linux-pm/kernel/sched/cpufreq_schedutil.c
> > > ===================================================================
> > > --- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
> > > +++ linux-pm/kernel/sched/cpufreq_schedutil.c
> > > @@ -144,17 +144,47 @@ static unsigned int get_next_freq(struct
> > >  	return cpufreq_driver_resolve_freq(policy, freq);
> > >  }
> > >  
> > > -static void sugov_update_single(struct update_util_data *hook, u64 time,
> > > -				unsigned long util, unsigned long max)
> > > +static void sugov_get_util(unsigned long *util, unsigned long *max)
> > > +{
> > > +	unsigned long dl_util, dl_max;
> > > +	unsigned long cfs_util, cfs_max;
> > > +	int cpu = smp_processor_id();
> > > +	struct dl_bw *dl_bw = dl_bw_of(cpu);
> > > +	struct rq *rq = this_rq();
> > > +
> > > +	if (rt_prio(current->prio)) {
> > > +		*util = ULONG_MAX;
> > > +		return;
> > > +	}
> > > +
> > > +	dl_max = dl_bw_cpus(cpu) << 20;
> > > +	dl_util = dl_bw->total_bw;
> > > +
> > > +	cfs_max = rq->cpu_capacity_orig;
> > > +	cfs_util = min(rq->cfs.avg.util_avg, cfs_max);
> > > +
> > > +	if (cfs_util * dl_max > dl_util * cfs_max) {
> > > +		*util = cfs_util;
> > > +		*max  = cfs_max;
> > > +	} else {
> > > +		*util = dl_util;
> > > +		*max  = dl_max;
> > > +	}
> > > +}
> > 
> > Last Friday I had put together a similar patch based on Peter's. I need
> > the flags field for the remote wakeup support. My previous plan,
> > installing a late callback in check_preempt_curr that gets requested
> > from the earlier existing CFS callback, was not working out since those
> > two events don't always match up 1:1.
> > 
> > Anyway one way that my patch differed was that I had used the flags
> > field to keep the behavior the same for both RT and DL.

Do you mean "go to max" policy for both, until proper policies will be
implemented in the future?

> That happens
> > later on in this series for RT but the DL policy is modified as above.
> > Can the DL policy be left as-is and discussed/modified in a separate
> > series?

Not that we want to start discussing this point now, if we postpone the
change for later, but I just wanted to point out a difference w.r.t.
what the schedfreq thing was doing: it used to sum contributions from
the different classes, instead of taking the max. We probably never
really discussed on the list what is the right thing to do, though.

Best,

- Juri

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-08-02 10:38       ` Juri Lelli
@ 2016-08-02 14:28         ` Steve Muckle
  2016-08-02 14:43           ` Juri Lelli
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-02 14:28 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael J. Wysocki, Steve Muckle, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Ingo Molnar

On Tue, Aug 02, 2016 at 11:38:17AM +0100, Juri Lelli wrote:
> > > Anyway one way that my patch differed was that I had used the flags
> > > field to keep the behavior the same for both RT and DL.
> 
> Do you mean "go to max" policy for both, until proper policies will be
> implemented in the future?

Yep.

> > That happens
> > > later on in this series for RT but the DL policy is modified as above.
> > > Can the DL policy be left as-is and discussed/modified in a separate
> > > series?
> 
> Not that we want to start discussing this point now, if we postpone the
> change for later, but I just wanted to point out a difference w.r.t.
> what the schedfreq thing was doing: it used to sum contributions from
> the different classes, instead of taking the max. We probably never
> really discussed on the list what is the right thing to do, though.

Yeah I figured that was worth deferring into its own patchset/thread.

cheers,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-08-02 14:28         ` Steve Muckle
@ 2016-08-02 14:43           ` Juri Lelli
  0 siblings, 0 replies; 51+ messages in thread
From: Juri Lelli @ 2016-08-02 14:43 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Ingo Molnar

On 02/08/16 07:28, Steve Muckle wrote:
> On Tue, Aug 02, 2016 at 11:38:17AM +0100, Juri Lelli wrote:
> > > > Anyway one way that my patch differed was that I had used the flags
> > > > field to keep the behavior the same for both RT and DL.
> > 
> > Do you mean "go to max" policy for both, until proper policies will be
> > implemented in the future?
> 
> Yep.
> 

OK, thanks for clarifying.

> > > That happens
> > > > later on in this series for RT but the DL policy is modified as above.
> > > > Can the DL policy be left as-is and discussed/modified in a separate
> > > > series?
> > 
> > Not that we want to start discussing this point now, if we postpone the
> > change for later, but I just wanted to point out a difference w.r.t.
> > what the schedfreq thing was doing: it used to sum contributions from
> > the different classes, instead of taking the max. We probably never
> > really discussed on the list what is the right thing to do, though.
> 
> Yeah I figured that was worth deferring into its own patchset/thread.
> 

Right. Makes sense to me to defer this point.

Best,

- Juri

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-02  1:37     ` Rafael J. Wysocki
@ 2016-08-02 22:02       ` Steve Muckle
  2016-08-02 22:38         ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-02 22:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Rafael J. Wysocki, Linux PM list, Peter Zijlstra,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Ingo Molnar

On Tue, Aug 02, 2016 at 03:37:02AM +0200, Rafael J. Wysocki wrote:
> On Tue, Aug 2, 2016 at 3:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
> > ...
> >> For this purpose, define a new cpufreq_update_util() flag
> >> UUF_IO and modify enqueue_task_fair() to pass that flag to
> >> cpufreq_update_util() in the in_iowait case.  That generally
> >> requires cpufreq_update_util() to be called directly from there,
> >> because update_load_avg() is not likely to be invoked in that
> >> case.
> >
> > I didn't follow why the cpufreq hook won't likely be called if
> > in_iowait is set? AFAICS update_load_avg() gets called in the second loop
> > and calls update_cfs_rq_load_avg (triggers the hook).
> 
> In practice it turns out that in the majority of cases when in_iowait
> is set the second loop will not run.

My understanding of enqueue_task_fair() is that the first loop walks up
the portion of the sched_entity hierarchy that needs to be enqueued, and
the second loop updates the rest of the hierarchy that was already
enqueued.

Even if the se corresponding to the root cfs_rq needs to be enqueued
(meaning the whole hierarchy is traversed in the first loop and the
second loop does nothing), enqueue_entity() on the root cfs_rq should
result in the cpufreq hook being called, via enqueue_entity() ->
enqueue_entity_load_avg() -> update_cfs_rq_load_avg().

I'll keep looking to see if I've misunderstood something in here.

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-02 22:02       ` Steve Muckle
@ 2016-08-02 22:38         ` Rafael J. Wysocki
  2016-08-04  2:24           ` Steve Muckle
  0 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-02 22:38 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM list,
	Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Juri Lelli, Ingo Molnar

On Wed, Aug 3, 2016 at 12:02 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Tue, Aug 02, 2016 at 03:37:02AM +0200, Rafael J. Wysocki wrote:
>> On Tue, Aug 2, 2016 at 3:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
>> > ...
>> >> For this purpose, define a new cpufreq_update_util() flag
>> >> UUF_IO and modify enqueue_task_fair() to pass that flag to
>> >> cpufreq_update_util() in the in_iowait case.  That generally
>> >> requires cpufreq_update_util() to be called directly from there,
>> >> because update_load_avg() is not likely to be invoked in that
>> >> case.
>> >
>> > I didn't follow why the cpufreq hook won't likely be called if
>> > in_iowait is set? AFAICS update_load_avg() gets called in the second loop
>> > and calls update_cfs_rq_load_avg (triggers the hook).
>>
>> In practice it turns out that in the majority of cases when in_iowait
>> is set the second loop will not run.
>
> My understanding of enqueue_task_fair() is that the first loop walks up
> the portion of the sched_entity hierarchy that needs to be enqueued, and
> the second loop updates the rest of the hierarchy that was already
> enqueued.
>
> Even if the se corresponding to the root cfs_rq needs to be enqueued
> (meaning the whole hierarchy is traversed in the first loop and the
> second loop does nothing), enqueue_entity() on the root cfs_rq should
> result in the cpufreq hook being called, via enqueue_entity() ->
> enqueue_entity_load_avg() -> update_cfs_rq_load_avg().

But then it's rather difficult to pass the IO flag to this one, isn't it?

Essentially, the problem is to pass "IO" to cpufreq_update_util() when
p->in_iowait is set.

If you can find a clever way to do it without adding an extra call
site, that's fine by me, but in any case the extra
cpufreq_update_util() invocation should not be too expensive.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting
  2016-08-02  1:35   ` Steve Muckle
@ 2016-08-02 23:03     ` Rafael J. Wysocki
  0 siblings, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-02 23:03 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Juri Lelli, Ingo Molnar

On Monday, August 01, 2016 06:35:31 PM Steve Muckle wrote:
> On Mon, Aug 01, 2016 at 01:37:59AM +0200, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Modify the schedutil cpufreq governor to boost the CPU frequency
> > if the UUF_IO flag is passed to it via cpufreq_update_util().
> > 
> > If that happens, the frequency is set to the maximum during
> > the first update after receiving the UUF_IO flag and then the
> > boost is reduced by half during each following update.
> 
> Were these changes to schedutil part of the positive test results
> mentioned in patch 5? Or are those just from intel pstate?
> 
> I was nervous about the effect of this on power and tested a couple low
> power usecases. The platform is the Hikey 96board (8 core ARM A53,
> single CPUfreq domain) running AOSP Android and schedutil backported to
> kernel 4.4. These tests run mp3 and mpeg4 playback for a little while,
> recording total energy consumption during the test along with frequency
> residency.
> 
> As the results below show I did not measure an appreciable effect - if
> anything things may be slightly better with the patches.
> 
> The hardcoding of a non-tunable boosting scheme makes me nervous but
> perhaps it could be revisited if some platform or configuration shows
> a noticeable regression?

That would be my approach. :-)

I'm not a big fan of tunables in general, as there are only a few people
who actually set them to anything different from the default and then they
get a lot of focus (even though they are after super-corner cases sometimes).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-02 22:38         ` Rafael J. Wysocki
@ 2016-08-04  2:24           ` Steve Muckle
  2016-08-04 21:19             ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-04  2:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Rafael J. Wysocki, Linux PM list,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Peter Zijlstra, Ingo Molnar

On Wed, Aug 03, 2016 at 12:38:20AM +0200, Rafael J. Wysocki wrote:
> On Wed, Aug 3, 2016 at 12:02 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > On Tue, Aug 02, 2016 at 03:37:02AM +0200, Rafael J. Wysocki wrote:
> >> On Tue, Aug 2, 2016 at 3:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> >> > On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
> >> > ...
> >> >> For this purpose, define a new cpufreq_update_util() flag
> >> >> UUF_IO and modify enqueue_task_fair() to pass that flag to
> >> >> cpufreq_update_util() in the in_iowait case.  That generally
> >> >> requires cpufreq_update_util() to be called directly from there,
> >> >> because update_load_avg() is not likely to be invoked in that
> >> >> case.
> >> >
> >> > I didn't follow why the cpufreq hook won't likely be called if
> >> > in_iowait is set? AFAICS update_load_avg() gets called in the second loop
> >> > and calls update_cfs_rq_load_avg (triggers the hook).
> >>
> >> In practice it turns out that in the majority of cases when in_iowait
> >> is set the second loop will not run.
> >
> > My understanding of enqueue_task_fair() is that the first loop walks up
> > the portion of the sched_entity hierarchy that needs to be enqueued, and
> > the second loop updates the rest of the hierarchy that was already
> > enqueued.
> >
> > Even if the se corresponding to the root cfs_rq needs to be enqueued
> > (meaning the whole hierarchy is traversed in the first loop and the
> > second loop does nothing), enqueue_entity() on the root cfs_rq should
> > result in the cpufreq hook being called, via enqueue_entity() ->
> > enqueue_entity_load_avg() -> update_cfs_rq_load_avg().
> 
> But then it's rather difficult to pass the IO flag to this one, isn't it?
> 
> Essentially, the problem is to pass "IO" to cpufreq_update_util() when
> p->in_iowait is set.
> 
> If you can find a clever way to do it without adding an extra call
> site, that's fine by me, but in any case the extra
> cpufreq_update_util() invocation should not be too expensive.

I was under the impression that function pointer calls were more
expensive, and in the shared policy case there is a nontrivial amount of
code that is run in schedutil (including taking a spinlock) before we'd
see sugov_should_update_freq() return false and bail.

Agreed that getting knowledge of p->in_iowait down to the existing hook
is not easy. I spent some time fiddling with that. It seemed doable but
somewhat gross due to the required flag passing and modifications
to enqueue_entity, update_load_avg, etc. If it is decided that it is worth
pursuing I can keep working on it and post a draft.

But I also wonder if the hooks are in the best location.  They are
currently deep in the PELT code. This may make sense from a theoretical
standpoint, calling them whenever a root cfs_rq utilization changes, but
it also makes the hooks difficult to correlate (for policy purposes such
as this iowait change) with higher level logical events like a task
wakeup. Or load balance where we probably want to call the hook just
once after a load balance is complete.

This is also an issue for the remote wakeup case where I currently have
another invocation of the hook in check_preempt_curr(), so I can know if
preemption was triggered and skip a remote schedutil update in that case
to avoid a duplicate IPI.

It seems to me worth evaluating if a higher level set of hook locations
could be used. One possibility is higher up in CFS:
- enqueue_task_fair, dequeue_task_fair
- scheduler_tick
- active_load_balance_cpu_stop, load_balance

Though this wouldn't solve my issue with check_preempt_curr. That would
probably require going further up the stack to try_to_wake_up() etc. Not
yet sure what the other hook locations would be at that level.

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-07-31 23:38 ` [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core Rafael J. Wysocki
@ 2016-08-04  4:18   ` Doug Smythies
  2016-08-04  6:53   ` Doug Smythies
  2016-08-23  3:48   ` Wanpeng Li
  2 siblings, 0 replies; 51+ messages in thread
From: Doug Smythies @ 2016-08-04  4:18 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

[-- Attachment #1: Type: text/plain, Size: 6419 bytes --]

Hi Rafael,

Hope this feedback and test results help.

On 2016.07.31 16:49 Rafael J. Wysocki wrote:

> The PID-base P-state selection algorithm used by intel_pstate for
> Core processors is based on very weak foundations.

Agree, very very much.

...[cut]...

> Consequently, the only viable way to fix that is to replace the
> erroneous algorithm entirely with a better one.

Agree, very very much.

> To that end, notice that setting the P-state proportional to the
> actual CPU utilization (measured with the help of MPERF and TSC)
> generally leads to reasonable behavior, but it does not reflect
> the "performance boosting" nature of the current P-state
> selection algorithm. It may be made more similar to that
> algorithm, though, by adding iowait boosting to it.

Which is good and does help a lot for the IO case, but issues
remain for the compute case.

...[cut]...

> +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
> +{
> +	struct sample *sample = &cpu->sample;
> +	int32_t busy_frac;
> +	int pstate;
> +
> +	busy_frac = div_fp(sample->mperf, sample->tsc);
> +	sample->busy_scaled = busy_frac * 100;
> +
> +	if (busy_frac < cpu->iowait_boost)
> +		busy_frac = cpu->iowait_boost;
> +
> +	cpu->iowait_boost >>= 1;
> +
> +	pstate = cpu->pstate.turbo_pstate;
> +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> +}
> +

The response curve is not normalized on the lower end to the minimum
pstate for the processor, meaning the overall response will vary
between processors as a function of minimum pstate.

The clamping at maximum pstate at about 80% load seems at little high
to me. Work I have done in various attempts to bring back the use of actual load
has always ended up achieving maximum pstate before 80% load for best results.
Even the get_target_pstate_cpu_load people reach the max pstate faster, and they
are more about energy than performance.
What was the criteria for the decision here? Are test results available for review
and/or duplication by others?


Several tests were done with this patch set.
The patch set would not apply to kernel 4.7, but did apply fine to a 4.7+ kernel
(I did as of 7a66ecf) from a few days ago.

Test 1: Phoronix ffmpeg test (less time is better):
Reason: Because it suffers from rotating amongst CPUs in an odd way, challenging for CPU frequency scaling drivers.
This test tends to be an indicator of potential troubles with some games.
Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq - ondemand.
With patch set: 15.8 Seconds average and 24.51 package watts.
Without patch set: 11.61 Seconds average and 27.59 watts.
Conclusion: Significant reduction in performance with proposed patch set.

Tests 2, 3, 4: Phoronix apache, kernel compile, and postmark tests.
Conclusion: All were similar with and without the patch set, with perhaps a slight
improvement in power consumption for the postmark test with the patch set.

Test 5: Random reads within a largish (50 gigabytes) file.
Reason: Because it was a test I used to use with other include or not include IOWAIT work.
Conclusion: no difference with and without the patch set, likely due to domination by 
long seek times (the file is on a normal disk, not an SSD).

Test 6: Sequential read of a largish (50 gigabytes) file.
Reason: Because it was a test I used to use with other include or not include IOWAIT work.
With patch set: 288.38 Seconds; 177.544 MB/Sec; 6.83 Watts.
Without patch set: 292.38 Seconds; 174.99 MB/Sec; 7.08 Watts.
Conclusion: Better performance and better power with the patch set.

Test 7: Compile the kernel 9 times.
Reason: Just because it was a very revealing test during the
"intel_pstate: Increase hold-off time before busyness is scaled"
discussion / thread(s).
Conclusion: no difference with and without the patch set.

Test 8: pipe-test between cores.
Reason: Just because it was so useful during the
"cross core scheduling frequency drop bisected to 0c313cb20732"
discussion / thread(s).
With patch set: 73.166 Sec; 3.6576 usec/loop; 2278.53 Joules.
Without Patch set: 74.444 Sec; 3.7205 usec/loop; 2338.79 Joules.
Conclusion: Slightly better performance and better energy with the patch set.

Test 9: Dougs_specpower simulator (20% load):
Time is fixed, less energy is better.
Reason: During the long
"[intel-pstate driver regression] processor frequency very high even if in idle"
and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
discussion / thread(s), some sort of test was needed to try to mimic what Srinivas
was getting on his fancy SpecPower test platform. So far at least, this test does that.
Only the 20% load case was created, because that was the biggest problem case back then.
With patch set: 4 tests at an average of 7197 Joules per test, relatively high CPU frequencies.
Without the patch set: 4 tests at an average of 5956 Joules per test, relatively low CPU frequencies.
Conclusion: 21% energy regression with the patch set.
Note: Newer processors might do better than my older i7-2600K.

Test 10: measure the frequency response curve, fixed work packet method,
75 hertz work / sleep frequency (all CPU, no IOWAIT):
Reason: To compare to some older data and observe overall.
png graph attached - might get stripped from the distribution lists.
Conclusions: Tends to oscillate, suggesting some sort of damping is needed.
However, any filtering tends to increase the step function load rise time
(see test 11 below, I think there is some wiggle room here).
See also graph which has: with and without patch set; performance mode (for reference);
Philippe Longepe's cpu_load method also with setpoint 40 (for reference); one of my previous
attempts at a load related patch set from quite sometime ago (for reference).

Test 11: Look at the step function load response. From no load to 100% on one CPU (CPU load only, no IO).
While there is a graph, it is not attached:
Conclusion: The step function response is greatly improved (virtually one sample time max).
It would probably be O.K. to slow it down a little with a filter so as to reduce the
tendency to oscillate under periodic load conditions (to a point, at least. A low enough frequency will
always oscillate) (see the graph for test10).
 
Questions:
Is there a migration plan? i.e. will there be an attempt to merge the current cpu_load method
and this method into one method? Then possibly the PID controller could be eliminated.

... Doug


[-- Attachment #2: fixed_rjw_boost.png --]
[-- Type: image/png, Size: 106498 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-07-31 23:38 ` [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core Rafael J. Wysocki
  2016-08-04  4:18   ` Doug Smythies
@ 2016-08-04  6:53   ` Doug Smythies
  2016-08-06  0:02     ` Rafael J. Wysocki
  2016-08-23  3:48   ` Wanpeng Li
  2 siblings, 1 reply; 51+ messages in thread
From: Doug Smythies @ 2016-08-04  6:53 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On 2016.08.03 21:19 Doug Smythies wrote:

Re-sending without the previously attached graph.

Hi Rafael,

Hope this feedback and test results help.

On 2016.07.31 16:49 Rafael J. Wysocki wrote:

> The PID-base P-state selection algorithm used by intel_pstate for
> Core processors is based on very weak foundations.

Agree, very very much.

...[cut]...

> Consequently, the only viable way to fix that is to replace the
> erroneous algorithm entirely with a better one.

Agree, very very much.

> To that end, notice that setting the P-state proportional to the
> actual CPU utilization (measured with the help of MPERF and TSC)
> generally leads to reasonable behavior, but it does not reflect
> the "performance boosting" nature of the current P-state
> selection algorithm. It may be made more similar to that
> algorithm, though, by adding iowait boosting to it.

Which is good and does help a lot for the IO case, but issues
remain for the compute case.

...[cut]...

> +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
> +{
> +	struct sample *sample = &cpu->sample;
> +	int32_t busy_frac;
> +	int pstate;
> +
> +	busy_frac = div_fp(sample->mperf, sample->tsc);
> +	sample->busy_scaled = busy_frac * 100;
> +
> +	if (busy_frac < cpu->iowait_boost)
> +		busy_frac = cpu->iowait_boost;
> +
> +	cpu->iowait_boost >>= 1;
> +
> +	pstate = cpu->pstate.turbo_pstate;
> +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> +}
> +

The response curve is not normalized on the lower end to the minimum
pstate for the processor, meaning the overall response will vary
between processors as a function of minimum pstate.

The clamping at maximum pstate at about 80% load seems at little high
to me. Work I have done in various attempts to bring back the use of actual load
has always ended up achieving maximum pstate before 80% load for best results.
Even the get_target_pstate_cpu_load people reach the max pstate faster, and they
are more about energy than performance.
What was the criteria for the decision here? Are test results available for review
and/or duplication by others?


Several tests were done with this patch set.
The patch set would not apply to kernel 4.7, but did apply fine to a 4.7+ kernel
(I did as of 7a66ecf) from a few days ago.

Test 1: Phoronix ffmpeg test (less time is better):
Reason: Because it suffers from rotating amongst CPUs in an odd way, challenging for CPU frequency scaling drivers.
This test tends to be an indicator of potential troubles with some games.
Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq - ondemand.
With patch set: 15.8 Seconds average and 24.51 package watts.
Without patch set: 11.61 Seconds average and 27.59 watts.
Conclusion: Significant reduction in performance with proposed patch set.

Tests 2, 3, 4: Phoronix apache, kernel compile, and postmark tests.
Conclusion: All were similar with and without the patch set, with perhaps a slight
improvement in power consumption for the postmark test with the patch set.

Test 5: Random reads within a largish (50 gigabytes) file.
Reason: Because it was a test I used to use with other include or not include IOWAIT work.
Conclusion: no difference with and without the patch set, likely due to domination by 
long seek times (the file is on a normal disk, not an SSD).

Test 6: Sequential read of a largish (50 gigabytes) file.
Reason: Because it was a test I used to use with other include or not include IOWAIT work.
With patch set: 288.38 Seconds; 177.544 MB/Sec; 6.83 Watts.
Without patch set: 292.38 Seconds; 174.99 MB/Sec; 7.08 Watts.
Conclusion: Better performance and better power with the patch set.

Test 7: Compile the kernel 9 times.
Reason: Just because it was a very revealing test during the
"intel_pstate: Increase hold-off time before busyness is scaled"
discussion / thread(s).
Conclusion: no difference with and without the patch set.

Test 8: pipe-test between cores.
Reason: Just because it was so useful during the
"cross core scheduling frequency drop bisected to 0c313cb20732"
discussion / thread(s).
With patch set: 73.166 Sec; 3.6576 usec/loop; 2278.53 Joules.
Without Patch set: 74.444 Sec; 3.7205 usec/loop; 2338.79 Joules.
Conclusion: Slightly better performance and better energy with the patch set.

Test 9: Dougs_specpower simulator (20% load):
Time is fixed, less energy is better.
Reason: During the long
"[intel-pstate driver regression] processor frequency very high even if in idle"
and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
discussion / thread(s), some sort of test was needed to try to mimic what Srinivas
was getting on his fancy SpecPower test platform. So far at least, this test does that.
Only the 20% load case was created, because that was the biggest problem case back then.
With patch set: 4 tests at an average of 7197 Joules per test, relatively high CPU frequencies.
Without the patch set: 4 tests at an average of 5956 Joules per test, relatively low CPU frequencies.
Conclusion: 21% energy regression with the patch set.
Note: Newer processors might do better than my older i7-2600K.

Test 10: measure the frequency response curve, fixed work packet method,
75 hertz work / sleep frequency (all CPU, no IOWAIT):
Reason: To compare to some older data and observe overall.
png graph attached - might get stripped from the distribution lists.
Conclusions: Tends to oscillate, suggesting some sort of damping is needed.
However, any filtering tends to increase the step function load rise time
(see test 11 below, I think there is some wiggle room here).
See also graph which has: with and without patch set; performance mode (for reference);
Philippe Longepe's cpu_load method also with setpoint 40 (for reference); one of my previous
attempts at a load related patch set from quite sometime ago (for reference).

Test 11: Look at the step function load response. From no load to 100% on one CPU (CPU load only, no IO).
While there is a graph, it is not attached:
Conclusion: The step function response is greatly improved (virtually one sample time max).
It would probably be O.K. to slow it down a little with a filter so as to reduce the
tendency to oscillate under periodic load conditions (to a point, at least. A low enough frequency will
always oscillate) (see the graph for test10).
 
Questions:
Is there a migration plan? i.e. will there be an attempt to merge the current cpu_load method
and this method into one method? Then possibly the PID controller could be eliminated.

... Doug

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-04  2:24           ` Steve Muckle
@ 2016-08-04 21:19             ` Rafael J. Wysocki
  2016-08-04 22:09               ` Steve Muckle
  0 siblings, 1 reply; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-04 21:19 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Linux PM list, Srinivas Pandruvada,
	Viresh Kumar, Linux Kernel Mailing List, Juri Lelli,
	Peter Zijlstra, Ingo Molnar

On Wednesday, August 03, 2016 07:24:18 PM Steve Muckle wrote:
> On Wed, Aug 03, 2016 at 12:38:20AM +0200, Rafael J. Wysocki wrote:
> > On Wed, Aug 3, 2016 at 12:02 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > > On Tue, Aug 02, 2016 at 03:37:02AM +0200, Rafael J. Wysocki wrote:
> > >> On Tue, Aug 2, 2016 at 3:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > >> > On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
> > >> > ...
> > >> >> For this purpose, define a new cpufreq_update_util() flag
> > >> >> UUF_IO and modify enqueue_task_fair() to pass that flag to
> > >> >> cpufreq_update_util() in the in_iowait case.  That generally
> > >> >> requires cpufreq_update_util() to be called directly from there,
> > >> >> because update_load_avg() is not likely to be invoked in that
> > >> >> case.
> > >> >
> > >> > I didn't follow why the cpufreq hook won't likely be called if
> > >> > in_iowait is set? AFAICS update_load_avg() gets called in the second loop
> > >> > and calls update_cfs_rq_load_avg (triggers the hook).
> > >>
> > >> In practice it turns out that in the majority of cases when in_iowait
> > >> is set the second loop will not run.
> > >
> > > My understanding of enqueue_task_fair() is that the first loop walks up
> > > the portion of the sched_entity hierarchy that needs to be enqueued, and
> > > the second loop updates the rest of the hierarchy that was already
> > > enqueued.
> > >
> > > Even if the se corresponding to the root cfs_rq needs to be enqueued
> > > (meaning the whole hierarchy is traversed in the first loop and the
> > > second loop does nothing), enqueue_entity() on the root cfs_rq should
> > > result in the cpufreq hook being called, via enqueue_entity() ->
> > > enqueue_entity_load_avg() -> update_cfs_rq_load_avg().
> > 
> > But then it's rather difficult to pass the IO flag to this one, isn't it?
> > 
> > Essentially, the problem is to pass "IO" to cpufreq_update_util() when
> > p->in_iowait is set.
> > 
> > If you can find a clever way to do it without adding an extra call
> > site, that's fine by me, but in any case the extra
> > cpufreq_update_util() invocation should not be too expensive.
> 
> I was under the impression that function pointer calls were more
> expensive, and in the shared policy case there is a nontrivial amount of
> code that is run in schedutil (including taking a spinlock) before we'd
> see sugov_should_update_freq() return false and bail.

That's correct in principle, but we only do that if p->in_iowait is set,
which is somewhat special anyway and doesn't happen every time for sure.

So while there is overhead theoretically, I'm not even sure if it is measurable.

> Agreed that getting knowledge of p->in_iowait down to the existing hook
> is not easy. I spent some time fiddling with that. It seemed doable but
> somewhat gross due to the required flag passing and modifications
> to enqueue_entity, update_load_avg, etc. If it is decided that it is worth
> pursuing I can keep working on it and post a draft.

Well, that's a Peter's call. :-)

> But I also wonder if the hooks are in the best location.  They are
> currently deep in the PELT code. This may make sense from a theoretical
> standpoint, calling them whenever a root cfs_rq utilization changes, but
> it also makes the hooks difficult to correlate (for policy purposes such
> as this iowait change) with higher level logical events like a task
> wakeup. Or load balance where we probably want to call the hook just
> once after a load balance is complete.

I generally agree.  We still need to ensure that the hools will be invoked
frequently enough, though, even if HZ is 100.

> This is also an issue for the remote wakeup case where I currently have
> another invocation of the hook in check_preempt_curr(), so I can know if
> preemption was triggered and skip a remote schedutil update in that case
> to avoid a duplicate IPI.
> 
> It seems to me worth evaluating if a higher level set of hook locations
> could be used. One possibility is higher up in CFS:
> - enqueue_task_fair, dequeue_task_fair
> - scheduler_tick
> - active_load_balance_cpu_stop, load_balance

Agreed, that's worth checking.

> Though this wouldn't solve my issue with check_preempt_curr. That would
> probably require going further up the stack to try_to_wake_up() etc. Not
> yet sure what the other hook locations would be at that level.

That's probably too far away from the root cfs_rq utilization changes IMO.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-04 21:19             ` Rafael J. Wysocki
@ 2016-08-04 22:09               ` Steve Muckle
  2016-08-05 23:36                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Steve Muckle @ 2016-08-04 22:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Rafael J. Wysocki, Linux PM list,
	Srinivas Pandruvada, Viresh Kumar, Linux Kernel Mailing List,
	Juri Lelli, Peter Zijlstra, Ingo Molnar

On Thu, Aug 04, 2016 at 11:19:00PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, August 03, 2016 07:24:18 PM Steve Muckle wrote:
> > On Wed, Aug 03, 2016 at 12:38:20AM +0200, Rafael J. Wysocki wrote:
> > > On Wed, Aug 3, 2016 at 12:02 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > > > On Tue, Aug 02, 2016 at 03:37:02AM +0200, Rafael J. Wysocki wrote:
> > > >> On Tue, Aug 2, 2016 at 3:22 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > > >> > On Mon, Aug 01, 2016 at 01:37:23AM +0200, Rafael J. Wysocki wrote:
> > > >> > ...
> > > >> >> For this purpose, define a new cpufreq_update_util() flag
> > > >> >> UUF_IO and modify enqueue_task_fair() to pass that flag to
> > > >> >> cpufreq_update_util() in the in_iowait case.  That generally
> > > >> >> requires cpufreq_update_util() to be called directly from there,
> > > >> >> because update_load_avg() is not likely to be invoked in that
> > > >> >> case.
> > > >> >
> > > >> > I didn't follow why the cpufreq hook won't likely be called if
> > > >> > in_iowait is set? AFAICS update_load_avg() gets called in the second loop
> > > >> > and calls update_cfs_rq_load_avg (triggers the hook).
> > > >>
> > > >> In practice it turns out that in the majority of cases when in_iowait
> > > >> is set the second loop will not run.
> > > >
> > > > My understanding of enqueue_task_fair() is that the first loop walks up
> > > > the portion of the sched_entity hierarchy that needs to be enqueued, and
> > > > the second loop updates the rest of the hierarchy that was already
> > > > enqueued.
> > > >
> > > > Even if the se corresponding to the root cfs_rq needs to be enqueued
> > > > (meaning the whole hierarchy is traversed in the first loop and the
> > > > second loop does nothing), enqueue_entity() on the root cfs_rq should
> > > > result in the cpufreq hook being called, via enqueue_entity() ->
> > > > enqueue_entity_load_avg() -> update_cfs_rq_load_avg().
> > > 
> > > But then it's rather difficult to pass the IO flag to this one, isn't it?
> > > 
> > > Essentially, the problem is to pass "IO" to cpufreq_update_util() when
> > > p->in_iowait is set.
> > > 
> > > If you can find a clever way to do it without adding an extra call
> > > site, that's fine by me, but in any case the extra
> > > cpufreq_update_util() invocation should not be too expensive.
> > 
> > I was under the impression that function pointer calls were more
> > expensive, and in the shared policy case there is a nontrivial amount of
> > code that is run in schedutil (including taking a spinlock) before we'd
> > see sugov_should_update_freq() return false and bail.
> 
> That's correct in principle, but we only do that if p->in_iowait is set,
> which is somewhat special anyway and doesn't happen every time for sure.
> 
> So while there is overhead theoretically, I'm not even sure if it is measurable.

Ok my worry was if there were IO-heavy workloads that would
hammer this path, but I don't know of any specifically or how often this
path can be taken.

> 
> > Agreed that getting knowledge of p->in_iowait down to the existing hook
> > is not easy. I spent some time fiddling with that. It seemed doable but
> > somewhat gross due to the required flag passing and modifications
> > to enqueue_entity, update_load_avg, etc. If it is decided that it is worth
> > pursuing I can keep working on it and post a draft.
> 
> Well, that's a Peter's call. :-)
> 
> > But I also wonder if the hooks are in the best location.  They are
> > currently deep in the PELT code. This may make sense from a theoretical
> > standpoint, calling them whenever a root cfs_rq utilization changes, but
> > it also makes the hooks difficult to correlate (for policy purposes such
> > as this iowait change) with higher level logical events like a task
> > wakeup. Or load balance where we probably want to call the hook just
> > once after a load balance is complete.
> 
> I generally agree.  We still need to ensure that the hools will be invoked
> frequently enough, though, even if HZ is 100.
> 
> > This is also an issue for the remote wakeup case where I currently have
> > another invocation of the hook in check_preempt_curr(), so I can know if
> > preemption was triggered and skip a remote schedutil update in that case
> > to avoid a duplicate IPI.
> > 
> > It seems to me worth evaluating if a higher level set of hook locations
> > could be used. One possibility is higher up in CFS:
> > - enqueue_task_fair, dequeue_task_fair
> > - scheduler_tick
> > - active_load_balance_cpu_stop, load_balance
> 
> Agreed, that's worth checking.
> 
> > Though this wouldn't solve my issue with check_preempt_curr. That would
> > probably require going further up the stack to try_to_wake_up() etc. Not
> > yet sure what the other hook locations would be at that level.
> 
> That's probably too far away from the root cfs_rq utilization changes IMO.

Is your concern that the rate of hook calls would be decreased?

thanks,
Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition
  2016-08-04 22:09               ` Steve Muckle
@ 2016-08-05 23:36                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-05 23:36 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Linux PM list, Srinivas Pandruvada,
	Viresh Kumar, Linux Kernel Mailing List, Juri Lelli,
	Peter Zijlstra, Ingo Molnar

On Thursday, August 04, 2016 03:09:08 PM Steve Muckle wrote:
> On Thu, Aug 04, 2016 at 11:19:00PM +0200, Rafael J. Wysocki wrote:

[cut]
 
> > > This is also an issue for the remote wakeup case where I currently have
> > > another invocation of the hook in check_preempt_curr(), so I can know if
> > > preemption was triggered and skip a remote schedutil update in that case
> > > to avoid a duplicate IPI.
> > > 
> > > It seems to me worth evaluating if a higher level set of hook locations
> > > could be used. One possibility is higher up in CFS:
> > > - enqueue_task_fair, dequeue_task_fair
> > > - scheduler_tick
> > > - active_load_balance_cpu_stop, load_balance
> > 
> > Agreed, that's worth checking.
> > 
> > > Though this wouldn't solve my issue with check_preempt_curr. That would
> > > probably require going further up the stack to try_to_wake_up() etc. Not
> > > yet sure what the other hook locations would be at that level.
> > 
> > That's probably too far away from the root cfs_rq utilization changes IMO.
> 
> Is your concern that the rate of hook calls would be decreased?

It might be decreased, but also we might end up using utilization values
that wouldn't reflect the current situation (eg. if the hook is called before
update_load_avg(), the util value used by the governor may not be adequate).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-04  6:53   ` Doug Smythies
@ 2016-08-06  0:02     ` Rafael J. Wysocki
  2016-08-09 17:16       ` Doug Smythies
  2016-08-13 15:59       ` Doug Smythies
  0 siblings, 2 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-06  0:02 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Peter Zijlstra', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On Wednesday, August 03, 2016 11:53:23 PM Doug Smythies wrote:
> On 2016.08.03 21:19 Doug Smythies wrote:
> 
> Re-sending without the previously attached graph.
> 
> Hi Rafael,
> 
> Hope this feedback and test results help.
> 
> On 2016.07.31 16:49 Rafael J. Wysocki wrote:
> 
> > The PID-base P-state selection algorithm used by intel_pstate for
> > Core processors is based on very weak foundations.
> 
> Agree, very very much.
> 
> ...[cut]...
> 
> > Consequently, the only viable way to fix that is to replace the
> > erroneous algorithm entirely with a better one.
> 
> Agree, very very much.
> 
> > To that end, notice that setting the P-state proportional to the
> > actual CPU utilization (measured with the help of MPERF and TSC)
> > generally leads to reasonable behavior, but it does not reflect
> > the "performance boosting" nature of the current P-state
> > selection algorithm. It may be made more similar to that
> > algorithm, though, by adding iowait boosting to it.
> 
> Which is good and does help a lot for the IO case, but issues
> remain for the compute case.
> 
> ...[cut]...
> 
> > +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
> > +{
> > +	struct sample *sample = &cpu->sample;
> > +	int32_t busy_frac;
> > +	int pstate;
> > +
> > +	busy_frac = div_fp(sample->mperf, sample->tsc);
> > +	sample->busy_scaled = busy_frac * 100;
> > +
> > +	if (busy_frac < cpu->iowait_boost)
> > +		busy_frac = cpu->iowait_boost;
> > +
> > +	cpu->iowait_boost >>= 1;
> > +
> > +	pstate = cpu->pstate.turbo_pstate;
> > +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> > +}
> > +
> 
> The response curve is not normalized on the lower end to the minimum
> pstate for the processor, meaning the overall response will vary
> between processors as a function of minimum pstate.

But that's OK IMO.

Mapping busy_frac = 0 to the minimum P-state would over-provision workloads
with small values of busy_frac.

> The clamping at maximum pstate at about 80% load seems at little high
> to me. Work I have done in various attempts to bring back the use of actual load
> has always ended up achieving maximum pstate before 80% load for best results.
> Even the get_target_pstate_cpu_load people reach the max pstate faster, and they
> are more about energy than performance.
> What was the criteria for the decision here? Are test results available for review
> and/or duplication by others?

This follows the coefficient used by the schedutil governor, but then the
metric is different, so quite possibly a different value may work better here.

We'll test other values before applying this for sure. :-)

> 
> Several tests were done with this patch set.
> The patch set would not apply to kernel 4.7, but did apply fine to a 4.7+ kernel
> (I did as of 7a66ecf) from a few days ago.
> 
> Test 1: Phoronix ffmpeg test (less time is better):
> Reason: Because it suffers from rotating amongst CPUs in an odd way, challenging for CPU frequency scaling drivers.
> This test tends to be an indicator of potential troubles with some games.
> Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq - ondemand.
> With patch set: 15.8 Seconds average and 24.51 package watts.
> Without patch set: 11.61 Seconds average and 27.59 watts.
> Conclusion: Significant reduction in performance with proposed patch set.
> 
> Tests 2, 3, 4: Phoronix apache, kernel compile, and postmark tests.
> Conclusion: All were similar with and without the patch set, with perhaps a slight
> improvement in power consumption for the postmark test with the patch set.
> 
> Test 5: Random reads within a largish (50 gigabytes) file.
> Reason: Because it was a test I used to use with other include or not include IOWAIT work.
> Conclusion: no difference with and without the patch set, likely due to domination by 
> long seek times (the file is on a normal disk, not an SSD).
> 
> Test 6: Sequential read of a largish (50 gigabytes) file.
> Reason: Because it was a test I used to use with other include or not include IOWAIT work.
> With patch set: 288.38 Seconds; 177.544 MB/Sec; 6.83 Watts.
> Without patch set: 292.38 Seconds; 174.99 MB/Sec; 7.08 Watts.
> Conclusion: Better performance and better power with the patch set.
> 
> Test 7: Compile the kernel 9 times.
> Reason: Just because it was a very revealing test during the
> "intel_pstate: Increase hold-off time before busyness is scaled"
> discussion / thread(s).
> Conclusion: no difference with and without the patch set.
> 
> Test 8: pipe-test between cores.
> Reason: Just because it was so useful during the
> "cross core scheduling frequency drop bisected to 0c313cb20732"
> discussion / thread(s).
> With patch set: 73.166 Sec; 3.6576 usec/loop; 2278.53 Joules.
> Without Patch set: 74.444 Sec; 3.7205 usec/loop; 2338.79 Joules.
> Conclusion: Slightly better performance and better energy with the patch set.
> 
> Test 9: Dougs_specpower simulator (20% load):
> Time is fixed, less energy is better.
> Reason: During the long
> "[intel-pstate driver regression] processor frequency very high even if in idle"
> and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
> discussion / thread(s), some sort of test was needed to try to mimic what Srinivas
> was getting on his fancy SpecPower test platform. So far at least, this test does that.
> Only the 20% load case was created, because that was the biggest problem case back then.
> With patch set: 4 tests at an average of 7197 Joules per test, relatively high CPU frequencies.
> Without the patch set: 4 tests at an average of 5956 Joules per test, relatively low CPU frequencies.
> Conclusion: 21% energy regression with the patch set.
> Note: Newer processors might do better than my older i7-2600K.
> 
> Test 10: measure the frequency response curve, fixed work packet method,
> 75 hertz work / sleep frequency (all CPU, no IOWAIT):
> Reason: To compare to some older data and observe overall.
> png graph attached - might get stripped from the distribution lists.
> Conclusions: Tends to oscillate, suggesting some sort of damping is needed.
> However, any filtering tends to increase the step function load rise time
> (see test 11 below, I think there is some wiggle room here).
> See also graph which has: with and without patch set; performance mode (for reference);
> Philippe Longepe's cpu_load method also with setpoint 40 (for reference); one of my previous
> attempts at a load related patch set from quite sometime ago (for reference).
> 
> Test 11: Look at the step function load response. From no load to 100% on one CPU (CPU load only, no IO).
> While there is a graph, it is not attached:
> Conclusion: The step function response is greatly improved (virtually one sample time max).
> It would probably be O.K. to slow it down a little with a filter so as to reduce the
> tendency to oscillate under periodic load conditions (to a point, at least. A low enough frequency will
> always oscillate) (see the graph for test10).

All of the above is useful information, thanks for taking the time to do that
work!

> Questions:
> Is there a migration plan?

Not yet.  We have quite a lot of testing to do first.

> i.e. will there be an attempt to merge the current cpu_load method
> and this method into one method?

Quite possibly if the results are good enough.

> Then possibly the PID controller could be eliminated.

Right.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly
  2016-08-01 23:46     ` Rafael J. Wysocki
  2016-08-02 10:38       ` Juri Lelli
@ 2016-08-08 10:38       ` Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-08-08 10:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Linux PM list, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Juri Lelli, Ingo Molnar

On Tue, Aug 02, 2016 at 01:46:51AM +0200, Rafael J. Wysocki wrote:
> On Monday, August 01, 2016 12:28:50 PM Steve Muckle wrote:

> > Can the DL policy be left as-is and discussed/modified in a separate
> > series?
> 
> No problem with that as far as I'm concerned, but in that case it won't be
> a Peter's patch any more. :-)

No problem from my side though. I just did them together as an
illustration of the benefit etc.. Rip it up as you see fit ;-)

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting
  2016-08-01 16:30   ` Rafael J. Wysocki
@ 2016-08-08 11:08     ` Peter Zijlstra
  2016-08-08 13:01       ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2016-08-08 11:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Doug Smythies, 'Linux PM list',
	'Srinivas Pandruvada', 'Viresh Kumar',
	'Linux Kernel Mailing List', 'Steve Muckle',
	'Juri Lelli', 'Ingo Molnar'

On Mon, Aug 01, 2016 at 06:30:27PM +0200, Rafael J. Wysocki wrote:
> >  CPU_FREQ_GOV_SCHEDUTIL m -> y

> > ERROR: "runqueues" [kernel/sched/cpufreq_schedutil.ko] undefined!

> There are some export_symbol declarations missing, I'll fix that up.

Do you really need that thing to be a module? I would raelly like to not
export runqueues. People have had 'crazy' ideas in the past.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting
  2016-08-08 11:08     ` Peter Zijlstra
@ 2016-08-08 13:01       ` Rafael J. Wysocki
  0 siblings, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-08 13:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Doug Smythies, 'Linux PM list',
	'Srinivas Pandruvada', 'Viresh Kumar',
	'Linux Kernel Mailing List', 'Steve Muckle',
	'Juri Lelli', 'Ingo Molnar'

On Monday, August 08, 2016 01:08:22 PM Peter Zijlstra wrote:
> On Mon, Aug 01, 2016 at 06:30:27PM +0200, Rafael J. Wysocki wrote:
> > >  CPU_FREQ_GOV_SCHEDUTIL m -> y
> 
> > > ERROR: "runqueues" [kernel/sched/cpufreq_schedutil.ko] undefined!
> 
> > There are some export_symbol declarations missing, I'll fix that up.
> 
> Do you really need that thing to be a module? I would raelly like to not
> export runqueues. People have had 'crazy' ideas in the past.

It doesn't have to be a module, but it is today.

Well, I guess accessing scheduler data directly is a good enough resone for
making it non-modular. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-06  0:02     ` Rafael J. Wysocki
@ 2016-08-09 17:16       ` Doug Smythies
  2016-08-13 15:59       ` Doug Smythies
  1 sibling, 0 replies; 51+ messages in thread
From: Doug Smythies @ 2016-08-09 17:16 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On 2016.08.05 17:02 Rafael J. Wysocki wrote:
> On 2016.08.03 21:19 Doug Smythies wrote:
>> On 2016.07.31 16:49 Rafael J. Wysocki wrote:
>>
>> 
>>> +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>>> +{
>>> +	struct sample *sample = &cpu->sample;
>>> +	int32_t busy_frac;
>>> +	int pstate;
>>> +
>>> +	busy_frac = div_fp(sample->mperf, sample->tsc);
>>> +	sample->busy_scaled = busy_frac * 100;
>>> +
>>> +	if (busy_frac < cpu->iowait_boost)
>>> +		busy_frac = cpu->iowait_boost;
>>> +
>>> +	cpu->iowait_boost >>= 1;
>>> +
>>> +	pstate = cpu->pstate.turbo_pstate;
>>> +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
>>> +}
>>> +
>> 
>> The response curve is not normalized on the lower end to the minimum
>> pstate for the processor, meaning the overall response will vary
>> between processors as a function of minimum pstate.

> But that's OK IMO.
>
> Mapping busy_frac = 0 to the minimum P-state would over-provision workloads
> with small values of busy_frac.

Agreed, mapping busy_frac = 0 to the minimum Pstate would be a bad thing to do.

However, that is not what I meant. I meant that the mapping of busy-frac = N to
the minimum pstate for the processor should give the same "N" (within granularity),
regardless of the processor.

Example, my processor, i7-2600K: max pstate = 38; min pstate = 16.
Load before going to pstate, 17:  17 = (38 + 38/4) * load
Load = N = 35.8 %

Example, something like, i5-3337U (I think, I don't actually have one):
max pstate = 27; min pstate = 8.
Load before going to pstate, 9: 9 = (27 + 27/4) * load
Load =  N = 26.7 %

It was a couple of years ago, so I should re-do the sensitivity
analysis/testing, but I concluded that the performance / energy tradeoff
was somewhat sensitive to "N".
 
I am suggesting that the response curve, or transfer function,
needs to be normalized, for any processor, to:

  Max pstate |              __________
             |             /
             |            /
             |           /
             |          /
             |         /
   Min pstate| _______/
             |__________________________
              |       |     |          |
              0%      N     M         100%
                      CPU load

Currently M ~= 80%

One time (not re-based since kernel 4.3) I did have a proposed solution [1],
but it was expensive in terms of extra multiplies and divides.

[1]: http://marc.info/?l=linux-pm&m=142881187323474&w=2

>> The clamping at maximum pstate at about 80% load seems at little high
>> to me. Work I have done in various attempts to bring back the use of actual load
>> has always ended up achieving maximum pstate before 80% load for best results.
>> Even the get_target_pstate_cpu_load people reach the max pstate faster, and they
>> are more about energy than performance.
>> What was the criteria for the decision here? Are test results available for review
>> and/or duplication by others?

> This follows the coefficient used by the schedutil governor, but then the
> metric is different, so quite possibly a different value may work better here.
> 
> We'll test other values before applying this for sure. :-)

I am now testing this change to the code (for M ~= 67%; N ~= 30% (my CPU); N ~= 22% (i5-3337U)):

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 8b2bdb7..909d441 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1313,7 +1313,7 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
        cpu->iowait_boost >>= 1;

        pstate = cpu->pstate.turbo_pstate;
-       return fp_toint((pstate + (pstate >> 2)) * busy_frac);
+       return fp_toint((pstate + (pstate >> 1)) * busy_frac);
 }

 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)

> 
> Several tests were done with this patch set.
...[cut]...

>> Questions:
>> Is there a migration plan?
>
> Not yet.  We have quite a lot of testing to do first.
>
>> i.e. will there be an attempt to merge the current cpu_load method
>> and this method into one method?
>
> Quite possibly if the results are good enough.
>
>> Then possibly the PID controller could be eliminated.
>
> Right.

I think this change is important, and I'll help with it as best as I can.

... Doug

A related CPU frequency Vs. Load graph will be sent to Rafael and Srinivas off-list.

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-06  0:02     ` Rafael J. Wysocki
  2016-08-09 17:16       ` Doug Smythies
@ 2016-08-13 15:59       ` Doug Smythies
  2016-08-19 14:47         ` Peter Zijlstra
  2016-08-22 18:53         ` Srinivas Pandruvada
  1 sibling, 2 replies; 51+ messages in thread
From: Doug Smythies @ 2016-08-13 15:59 UTC (permalink / raw)
  To: 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On 2016.08.05 17:02 Rafael J. Wysocki wrote:
>> On 2016.08.03 21:19 Doug Smythies wrote:
>>> On 2016.07.31 16:49 Rafael J. Wysocki wrote:
>>> 
>>> The PID-base P-state selection algorithm used by intel_pstate for
>>> Core processors is based on very weak foundations.
>>
>> ...[cut]...
>> 
>>> +static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>>> +{
>>> +	struct sample *sample = &cpu->sample;
>>> +	int32_t busy_frac;
>>> +	int pstate;
>>> +
>>> +	busy_frac = div_fp(sample->mperf, sample->tsc);
>>> +	sample->busy_scaled = busy_frac * 100;
>>> +
>>> +	if (busy_frac < cpu->iowait_boost)
>>> +		busy_frac = cpu->iowait_boost;
>>> +
>>> +	cpu->iowait_boost >>= 1;
>>> +
>>> +	pstate = cpu->pstate.turbo_pstate;
>>> +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
>>> +}
>>> +
>>
My previous replies (and see below) have suggested that some filtering
is needed on the target pstate, otherwise, and dependant on the type of
workload, it tends to oscillate.

I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index c43ef55..262ec5f 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -98,6 +98,7 @@ static inline u64 div_ext_fp(u64 x, u64 y)
  * @tsc:               Difference of time stamp counter between last and
  *                     current sample
  * @time:              Current time from scheduler
+ * @target:            target pstate filtered.
  *
  * This structure is used in the cpudata structure to store performance sample
  * data for choosing next P State.
@@ -108,6 +109,7 @@ struct sample {
        u64 aperf;
        u64 mperf;
        u64 tsc;
+       u64 target;
        u64 time;
 };

@@ -1168,6 +1170,7 @@ static void intel_pstate_get_cpu_pstates(struct cpudata *cpu)
                pstate_funcs.get_vid(cpu);

        intel_pstate_set_min_pstate(cpu);
+       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 }

 static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
@@ -1301,8 +1304,10 @@ static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
 static inline int32_t get_target_pstate_default(struct cpudata *cpu)
 {
        struct sample *sample = &cpu->sample;
+       int64_t scaled_gain, unfiltered_target;
        int32_t busy_frac;
        int pstate;
+       u64 duration_ns;

        busy_frac = div_fp(sample->mperf, sample->tsc);
        sample->busy_scaled = busy_frac * 100;
@@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
        cpu->iowait_boost >>= 1;

        pstate = cpu->pstate.turbo_pstate;
-       return fp_toint((pstate + (pstate >> 2)) * busy_frac);
+       /* To Do: I think the above should be:
+        *
+        * if (limits.no_turbo || limits.turbo_disabled)
+        *      pstate = cpu->pstate.max_pstate;
+        * else
+        *      pstate = cpu->pstate.turbo_pstate;
+        *
+        * figure it out.
+        *
+        * no clamps. Pre-filter clamping was needed in past implementations.
+        * To Do: Is any pre-filter clamping needed here? */
+
+       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
+
+       /*
+        * Idle check.
+        * We have a deferrable timer. Very long durations can be
+        * either due to long idle (C0 time near 0),
+        * or due to short idle times that spanned jiffy boundaries
+        * (C0 time not near zero).
+        *
+        * To Do: As of the utilization stuff, I do not think the
+        * spanning jiffy boundaries thing is true anymore.
+        * Check, and fix the comment.
+        *
+        * The very long durations are 0.4 seconds or more.
+        * Either way, a very long duration will effectively flush
+        * the IIR filter, otherwise falling edge load response times
+        * can be on the order of tens of seconds, because this driver
+        * runs very rarely. Furthermore, for higher periodic loads that
+        * just so happen to not be in the C0 state on jiffy boundaries,
+        * the long ago history should be forgotten.
+        * For cases of durations that are a few times the set sample
+        * period, increase the IIR filter gain so as to weight
+        * the current sample more appropriately.
+        *
+        * To Do: sample_time should be forced to be accurate. For
+        * example if the kernel is a 250 Hz kernel, then a
+        * sample_rate_ms of 10 should result in a sample_time of 12.
+        *
+        * To Do: Check that the IO Boost case is not filtered too much.
+        *        It might be that a filter by-pass is needed for the boost case.
+        *        However, the existing gain = f(duration) might be good enough.
+        */
+
+       duration_ns = cpu->sample.time - cpu->last_sample_time;
+
+       scaled_gain = div_u64(int_tofp(duration_ns) *
+               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));
+       if (scaled_gain > int_tofp(100))
+               scaled_gain = int_tofp(100);
+       /*
+        * This code should not be required,
+        * but short duration times have been observed
+        * To Do: Check if this code is actually still needed. I don't think so.
+        */
+       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
+               scaled_gain = int_tofp(pid_params.p_gain_pct);
+
+       /*
+        * Bandwidth limit the output. For now, re-task p_gain_pct for this purpose.
+        * Use a smple IIR (Infinite Impulse Response) filter.
+        */
+       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
+                       cpu->sample.target + scaled_gain *
+                       unfiltered_target, int_tofp(100));
+
+       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
 }

 static inline void intel_pstate_update_pstate(struct cpudata *cpu, int pstate)
@@ -1579,6 +1651,7 @@ static void intel_pstate_stop_cpu(struct cpufreq_policy *policy)
                return;

        intel_pstate_set_min_pstate(cpu);
+       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
 }

 static int intel_pstate_cpu_init(struct cpufreq_policy *policy)

The filter introduces a trade-off between step function load response time
and the tendency for the target pstate to oscillate.

...[cut]...

>> Several tests were done with this patch set.
>> The patch set would not apply to kernel 4.7, but did apply fine to a 4.7+ kernel
>> (I did as of 7a66ecf) from a few days ago.
>>
>> Test 1: Phoronix ffmpeg test (less time is better):
>> Reason: Because it suffers from rotating amongst CPUs in an odd way, challenging for CPU frequency scaling drivers.
>> This test tends to be an indicator of potential troubles with some games.
>> Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq - ondemand.
>> With patch set: 15.8 Seconds average and 24.51 package watts.
>> Without patch set: 11.61 Seconds average and 27.59 watts.
>> Conclusion: Significant reduction in performance with proposed patch set.

With the filter this become even worse at ~18 seconds.
I used to fix this by moving the response curve to the left.
I have not tested this:

+       unfiltered_target = (pstate + (pstate >> 1)) * busy_frac;

which moves the response curve left a little, yet. I will test it.

...[cut]...

>> Test 9: Doug's_specpower simulator (20% load):
>> Time is fixed, less energy is better.
>> Reason: During the long
>> "[intel-pstate driver regression] processor frequency very high even if in idle"
>> and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
>> discussion / thread(s), some sort of test was needed to try to mimic what Srinivas
>> was getting on his fancy SpecPower test platform. So far at least, this test does that.
>> Only the 20% load case was created, because that was the biggest problem case back then.
>> With patch set: 4 tests at an average of 7197 Joules per test, relatively high CPU frequencies.
>> Without the patch set: 4 tests at an average of 5956 Joules per test, relatively low CPU frequencies.
>> Conclusion: 21% energy regression with the patch set.
>> Note: Newer processors might do better than my older i7-2600K.

Patch set + above and IIR gain = 10%: 5670 Joules.
Conclusion: Energy regression eliminated.

Other gains:

gain =  5%: 5342 Joules; Busy MHz: 2172
gain = 10%: 5670 Joules; Busy MHz: 2285
gain = 20%: 6126 Joules; Busy MHz: 2560
gain = 30%: 6426 Joules; Busy MHz: 2739
gain = 40%: 6674 Joules; Busy MHz: 2912
gain = 70%: 7109 Joules; Busy MHz: 3199

locked at minimum pstate (reference): 4653 Joules; Busy MHz: 1600
Performance mode (reference): 7808 Joules; Busy MHz: 3647

>> Test 10: measure the frequency response curve, fixed work packet method,
>> 75 hertz work / sleep frequency (all CPU, no IOWAIT):
>> Reason: To compare to some older data and observe overall.
>> png graph NOT attached.
>> Conclusions: Tends to oscillate, suggesting some sort of damping is needed.
>> However, any filtering tends to increase the step function load rise time
>> (see test 11 below, I think there is some wiggle room here).
>> See also graph which has: with and without patch set; performance mode (for reference);
>> Philippe Longepe's cpu_load method also with setpoint 40 (for reference); one of my previous
>> attempts at a load related patch set from quite some time ago (for reference).

As expected, the filter damps out the oscillation.
New graphs will be sent to Rafael and Srinivas off-list.

>> 
>> Test 11: Look at the step function load response. From no load to 100% on one CPU (CPU load only, no IO).
>> While there is a graph, it is not attached:
>> Conclusion: The step function response is greatly improved (virtually one sample time max).
>> It would probably be O.K. to slow it down a little with a filter so as to reduce the
>> tendency to oscillate under periodic load conditions (to a point, at least. A low enough frequency will
>> always oscillate) (see the graph for test10).

I haven't done this test yet, but from previous work, a gain setting of 10 to 15% gives a
load step function response time similar to the current PID based filter.

The other tests gave similar results with or without the filter.

... Doug

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-13 15:59       ` Doug Smythies
@ 2016-08-19 14:47         ` Peter Zijlstra
  2016-08-20  1:06           ` Rafael J. Wysocki
  2016-08-20  6:40           ` Doug Smythies
  2016-08-22 18:53         ` Srinivas Pandruvada
  1 sibling, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2016-08-19 14:47 UTC (permalink / raw)
  To: Doug Smythies
  Cc: 'Rafael J. Wysocki', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On Sat, Aug 13, 2016 at 08:59:01AM -0700, Doug Smythies wrote:
> My previous replies (and see below) have suggested that some filtering
> is needed on the target pstate, otherwise, and dependant on the type of
> workload, it tends to oscillate.
> 
> I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:

One question though; why is this filter intel_pstate specific? Should we
not do this in generic code?

> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index c43ef55..262ec5f 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>         cpu->iowait_boost >>= 1;
> 
>         pstate = cpu->pstate.turbo_pstate;

> +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;

> +       duration_ns = cpu->sample.time - cpu->last_sample_time;
> +
> +       scaled_gain = div_u64(int_tofp(duration_ns) *
> +               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));

Drop int_to_fp() on one of the dividend terms and in the divisor. Same
end result since they divide away against one another but reduces the
risk of overflow.

Also, sample_rate_ns, really!? A rate is in [1/s], should that thing be
called period_ns ?

> +       if (scaled_gain > int_tofp(100))
> +               scaled_gain = int_tofp(100);

> +       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
> +               scaled_gain = int_tofp(pid_params.p_gain_pct);
> +
> +       /*
> +        * Bandwidth limit the output. For now, re-task p_gain_pct for this purpose.
> +        * Use a smple IIR (Infinite Impulse Response) filter.
> +        */
> +       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
> +                       cpu->sample.target + scaled_gain *
> +                       unfiltered_target, int_tofp(100));


Really hard to read that stuff, maybe cure with a comment:

	/*
	 *       g = dt*p / period
	 *
	 * target' = (1 - g)*target  +  g*input
	 */

> +
> +       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
>  }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-19 14:47         ` Peter Zijlstra
@ 2016-08-20  1:06           ` Rafael J. Wysocki
  2016-08-20  6:40           ` Doug Smythies
  1 sibling, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-20  1:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Doug Smythies, 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On Friday, August 19, 2016 04:47:29 PM Peter Zijlstra wrote:
> On Sat, Aug 13, 2016 at 08:59:01AM -0700, Doug Smythies wrote:
> > My previous replies (and see below) have suggested that some filtering
> > is needed on the target pstate, otherwise, and dependant on the type of
> > workload, it tends to oscillate.
> > 
> > I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:
> 
> One question though; why is this filter intel_pstate specific? Should we
> not do this in generic code?

The intel_pstate algorithm is based on the feedback registers and I'm not sure
if the same effect appears if utilization is computed in a different way.

> > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> > index c43ef55..262ec5f 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
> >         cpu->iowait_boost >>= 1;
> > 
> >         pstate = cpu->pstate.turbo_pstate;
> 
> > +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
> 
> > +       duration_ns = cpu->sample.time - cpu->last_sample_time;
> > +
> > +       scaled_gain = div_u64(int_tofp(duration_ns) *
> > +               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));
> 
> Drop int_to_fp() on one of the dividend terms and in the divisor. Same
> end result since they divide away against one another but reduces the
> risk of overflow.
> 
> Also, sample_rate_ns, really!? A rate is in [1/s], should that thing be
> called period_ns ?

That's an old name that hasn't been changed for quite a while.  That said
"period" or "interval" would be better.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-19 14:47         ` Peter Zijlstra
  2016-08-20  1:06           ` Rafael J. Wysocki
@ 2016-08-20  6:40           ` Doug Smythies
  1 sibling, 0 replies; 51+ messages in thread
From: Doug Smythies @ 2016-08-20  6:40 UTC (permalink / raw)
  To: 'Peter Zijlstra'
  Cc: 'Rafael J. Wysocki', 'Srinivas Pandruvada',
	'Viresh Kumar', 'Linux Kernel Mailing List',
	'Steve Muckle', 'Juri Lelli',
	'Ingo Molnar', 'Linux PM list'

On 2016.08.19 07:47 Peter Zijlstra wrote:
> On Sat, Aug 13, 2016 at 08:59:01AM -0700, Doug Smythies wrote:
>> My previous replies (and see below) have suggested that some filtering
>> is needed on the target pstate, otherwise, and dependant on the type of
>> workload, it tends to oscillate.
>> 
>> I added the IIR (Infinite Impulse Response) filter that I have suggested in the past:
>
> One question though; why is this filter intel_pstate specific? Should we
> not do this in generic code?

I wouldn't know. I'm not familiar with the other CPU frequency scaling drivers
or what filtering, if any, they already have.

>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index c43ef55..262ec5f 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -1313,7 +1318,74 @@ static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>>         cpu->iowait_boost >>= 1;
>> 
>>         pstate = cpu->pstate.turbo_pstate;
>> +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
>> +       duration_ns = cpu->sample.time - cpu->last_sample_time;
>> +
>> +       scaled_gain = div_u64(int_tofp(duration_ns) *
>> +               int_tofp(pid_params.p_gain_pct), int_tofp(pid_params.sample_rate_ns));
>
> Drop int_to_fp() on one of the dividend terms and in the divisor. Same
> end result since they divide away against one another but reduces the
> risk of overflow.

Yes of course. Thanks.

> Also, sample_rate_ns, really!? A rate is in [1/s], should that thing be
> called period_ns ?

Agreed (strongly), however and as Rafael mentioned on his reply, this stuff
has been around for a long time, including the externally available:

/sys/kernel/debug/pstate_snb/sample_rate_ms

Which be referenced by some documentation and scripts (I have some).
Myself, I'd be O.K. to change it all to "period".

>> +       if (scaled_gain > int_tofp(100))
>> +               scaled_gain = int_tofp(100);
>> +       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
>> +               scaled_gain = int_tofp(pid_params.p_gain_pct);
>> +
>> +       /*
>> +        * Bandwidth limit the output. For now, re-task p_gain_pct for this purpose.
>> +        * Use a smple IIR (Infinite Impulse Response) filter.
>> +        */
>> +       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
>> +                       cpu->sample.target + scaled_gain *
>> +                       unfiltered_target, int_tofp(100));
>
> Really hard to read that stuff, maybe cure with a comment:
>
>	/*
>	 *       g = dt*p / period
>	 *
>	 * target' = (1 - g)*target  +  g*input
>	 */

Yes, O.K. I'll add more comments if this continues towards a formal
patch submission.

>> +
>> +       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
>>  }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-13 15:59       ` Doug Smythies
  2016-08-19 14:47         ` Peter Zijlstra
@ 2016-08-22 18:53         ` Srinivas Pandruvada
  2016-08-22 22:53           ` Doug Smythies
  1 sibling, 1 reply; 51+ messages in thread
From: Srinivas Pandruvada @ 2016-08-22 18:53 UTC (permalink / raw)
  To: Doug Smythies, 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'Viresh Kumar',
	'Linux Kernel Mailing List', 'Steve Muckle',
	'Juri Lelli', 'Ingo Molnar',
	'Linux PM list'

Hi Doug,

I am not able to apply this patch. Can you send as a patch on top of
Rafael's RFC 7/7. Since test takes long time, I want to apply correct
patch.

Thanks,
Srinivas

On Sat, 2016-08-13 at 08:59 -0700, Doug Smythies wrote:
> On 2016.08.05 17:02 Rafael J. Wysocki wrote:
> > 
> > > 
> > > On 2016.08.03 21:19 Doug Smythies wrote:
> > > > 
> > > > On 2016.07.31 16:49 Rafael J. Wysocki wrote:
> > > > 
> > > > The PID-base P-state selection algorithm used by intel_pstate
> > > > for
> > > > Core processors is based on very weak foundations.
> > > ...[cut]...
> > > 
> > > > 
> > > > +static inline int32_t get_target_pstate_default(struct cpudata
> > > > *cpu)
> > > > +{
> > > > +	struct sample *sample = &cpu->sample;
> > > > +	int32_t busy_frac;
> > > > +	int pstate;
> > > > +
> > > > +	busy_frac = div_fp(sample->mperf, sample->tsc);
> > > > +	sample->busy_scaled = busy_frac * 100;
> > > > +
> > > > +	if (busy_frac < cpu->iowait_boost)
> > > > +		busy_frac = cpu->iowait_boost;
> > > > +
> > > > +	cpu->iowait_boost >>= 1;
> > > > +
> > > > +	pstate = cpu->pstate.turbo_pstate;
> > > > +	return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> > > > +}
> > > > +
> My previous replies (and see below) have suggested that some
> filtering
> is needed on the target pstate, otherwise, and dependant on the type
> of
> workload, it tends to oscillate.
> 
> I added the IIR (Infinite Impulse Response) filter that I have
> suggested in the past:
> 
> diff --git a/drivers/cpufreq/intel_pstate.c
> b/drivers/cpufreq/intel_pstate.c
> index c43ef55..262ec5f 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -98,6 +98,7 @@ static inline u64 div_ext_fp(u64 x, u64 y)
>   * @tsc:               Difference of time stamp counter between last
> and
>   *                     current sample
>   * @time:              Current time from scheduler
> + * @target:            target pstate filtered.
>   *
>   * This structure is used in the cpudata structure to store
> performance sample
>   * data for choosing next P State.
> @@ -108,6 +109,7 @@ struct sample {
>         u64 aperf;
>         u64 mperf;
>         u64 tsc;
> +       u64 target;
>         u64 time;
>  };
> 
> @@ -1168,6 +1170,7 @@ static void intel_pstate_get_cpu_pstates(struct
> cpudata *cpu)
>                 pstate_funcs.get_vid(cpu);
> 
>         intel_pstate_set_min_pstate(cpu);
> +       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
>  }
> 
>  static inline void intel_pstate_calc_avg_perf(struct cpudata *cpu)
> @@ -1301,8 +1304,10 @@ static inline int32_t
> get_target_pstate_use_performance(struct cpudata *cpu)
>  static inline int32_t get_target_pstate_default(struct cpudata *cpu)
>  {
>         struct sample *sample = &cpu->sample;
> +       int64_t scaled_gain, unfiltered_target;
>         int32_t busy_frac;
>         int pstate;
> +       u64 duration_ns;
> 
>         busy_frac = div_fp(sample->mperf, sample->tsc);
>         sample->busy_scaled = busy_frac * 100;
> @@ -1313,7 +1318,74 @@ static inline int32_t
> get_target_pstate_default(struct cpudata *cpu)
>         cpu->iowait_boost >>= 1;
> 
>         pstate = cpu->pstate.turbo_pstate;
> -       return fp_toint((pstate + (pstate >> 2)) * busy_frac);
> +       /* To Do: I think the above should be:
> +        *
> +        * if (limits.no_turbo || limits.turbo_disabled)
> +        *      pstate = cpu->pstate.max_pstate;
> +        * else
> +        *      pstate = cpu->pstate.turbo_pstate;
> +        *
> +        * figure it out.
> +        *
> +        * no clamps. Pre-filter clamping was needed in past
> implementations.
> +        * To Do: Is any pre-filter clamping needed here? */
> +
> +       unfiltered_target = (pstate + (pstate >> 2)) * busy_frac;
> +
> +       /*
> +        * Idle check.
> +        * We have a deferrable timer. Very long durations can be
> +        * either due to long idle (C0 time near 0),
> +        * or due to short idle times that spanned jiffy boundaries
> +        * (C0 time not near zero).
> +        *
> +        * To Do: As of the utilization stuff, I do not think the
> +        * spanning jiffy boundaries thing is true anymore.
> +        * Check, and fix the comment.
> +        *
> +        * The very long durations are 0.4 seconds or more.
> +        * Either way, a very long duration will effectively flush
> +        * the IIR filter, otherwise falling edge load response times
> +        * can be on the order of tens of seconds, because this
> driver
> +        * runs very rarely. Furthermore, for higher periodic loads
> that
> +        * just so happen to not be in the C0 state on jiffy
> boundaries,
> +        * the long ago history should be forgotten.
> +        * For cases of durations that are a few times the set sample
> +        * period, increase the IIR filter gain so as to weight
> +        * the current sample more appropriately.
> +        *
> +        * To Do: sample_time should be forced to be accurate. For
> +        * example if the kernel is a 250 Hz kernel, then a
> +        * sample_rate_ms of 10 should result in a sample_time of 12.
> +        *
> +        * To Do: Check that the IO Boost case is not filtered too
> much.
> +        *        It might be that a filter by-pass is needed for the
> boost case.
> +        *        However, the existing gain = f(duration) might be
> good enough.
> +        */
> +
> +       duration_ns = cpu->sample.time - cpu->last_sample_time;
> +
> +       scaled_gain = div_u64(int_tofp(duration_ns) *
> +               int_tofp(pid_params.p_gain_pct),
> int_tofp(pid_params.sample_rate_ns));
> +       if (scaled_gain > int_tofp(100))
> +               scaled_gain = int_tofp(100);
> +       /*
> +        * This code should not be required,
> +        * but short duration times have been observed
> +        * To Do: Check if this code is actually still needed. I
> don't think so.
> +        */
> +       if (scaled_gain < int_tofp(pid_params.p_gain_pct))
> +               scaled_gain = int_tofp(pid_params.p_gain_pct);
> +
> +       /*
> +        * Bandwidth limit the output. For now, re-task p_gain_pct
> for this purpose.
> +        * Use a smple IIR (Infinite Impulse Response) filter.
> +        */
> +       cpu->sample.target = div_u64((int_tofp(100) - scaled_gain) *
> +                       cpu->sample.target + scaled_gain *
> +                       unfiltered_target, int_tofp(100));
> +
> +       return fp_toint(cpu->sample.target + (1 << (FRAC_BITS-1)));
>  }
> 
>  static inline void intel_pstate_update_pstate(struct cpudata *cpu,
> int pstate)
> @@ -1579,6 +1651,7 @@ static void intel_pstate_stop_cpu(struct
> cpufreq_policy *policy)
>                 return;
> 
>         intel_pstate_set_min_pstate(cpu);
> +       cpu->sample.target = int_tofp(cpu->pstate.min_pstate);
>  }
> 
>  static int intel_pstate_cpu_init(struct cpufreq_policy *policy)
> 
> The filter introduces a trade-off between step function load response
> time
> and the tendency for the target pstate to oscillate.
> 
> ...[cut]...
> 
> > 
> > > 
> > > Several tests were done with this patch set.
> > > The patch set would not apply to kernel 4.7, but did apply fine
> > > to a 4.7+ kernel
> > > (I did as of 7a66ecf) from a few days ago.
> > > 
> > > Test 1: Phoronix ffmpeg test (less time is better):
> > > Reason: Because it suffers from rotating amongst CPUs in an odd
> > > way, challenging for CPU frequency scaling drivers.
> > > This test tends to be an indicator of potential troubles with
> > > some games.
> > > Criteria: (Dirk Brandewie): Must match or better acpi_cpufreq -
> > > ondemand.
> > > With patch set: 15.8 Seconds average and 24.51 package watts.
> > > Without patch set: 11.61 Seconds average and 27.59 watts.
> > > Conclusion: Significant reduction in performance with proposed
> > > patch set.
> With the filter this become even worse at ~18 seconds.
> I used to fix this by moving the response curve to the left.
> I have not tested this:
> 
> +       unfiltered_target = (pstate + (pstate >> 1)) * busy_frac;
> 
> which moves the response curve left a little, yet. I will test it.
> 
> ...[cut]...
> 
> > 
> > > 
> > > Test 9: Doug's_specpower simulator (20% load):
> > > Time is fixed, less energy is better.
> > > Reason: During the long
> > > "[intel-pstate driver regression] processor frequency very high
> > > even if in idle"
> > > and subsequent https://bugzilla.kernel.org/show_bug.cgi?id=115771
> > > discussion / thread(s), some sort of test was needed to try to
> > > mimic what Srinivas
> > > was getting on his fancy SpecPower test platform. So far at
> > > least, this test does that.
> > > Only the 20% load case was created, because that was the biggest
> > > problem case back then.
> > > With patch set: 4 tests at an average of 7197 Joules per test,
> > > relatively high CPU frequencies.
> > > Without the patch set: 4 tests at an average of 5956 Joules per
> > > test, relatively low CPU frequencies.
> > > Conclusion: 21% energy regression with the patch set.
> > > Note: Newer processors might do better than my older i7-2600K.
> Patch set + above and IIR gain = 10%: 5670 Joules.
> Conclusion: Energy regression eliminated.
> 
> Other gains:
> 
> gain =  5%: 5342 Joules; Busy MHz: 2172
> gain = 10%: 5670 Joules; Busy MHz: 2285
> gain = 20%: 6126 Joules; Busy MHz: 2560
> gain = 30%: 6426 Joules; Busy MHz: 2739
> gain = 40%: 6674 Joules; Busy MHz: 2912
> gain = 70%: 7109 Joules; Busy MHz: 3199
> 
> locked at minimum pstate (reference): 4653 Joules; Busy MHz: 1600
> Performance mode (reference): 7808 Joules; Busy MHz: 3647
> 
> > 
> > > 
> > > Test 10: measure the frequency response curve, fixed work packet
> > > method,
> > > 75 hertz work / sleep frequency (all CPU, no IOWAIT):
> > > Reason: To compare to some older data and observe overall.
> > > png graph NOT attached.
> > > Conclusions: Tends to oscillate, suggesting some sort of damping
> > > is needed.
> > > However, any filtering tends to increase the step function load
> > > rise time
> > > (see test 11 below, I think there is some wiggle room here).
> > > See also graph which has: with and without patch set; performance
> > > mode (for reference);
> > > Philippe Longepe's cpu_load method also with setpoint 40 (for
> > > reference); one of my previous
> > > attempts at a load related patch set from quite some time ago
> > > (for reference).
> As expected, the filter damps out the oscillation.
> New graphs will be sent to Rafael and Srinivas off-list.
> 
> > 
> > > 
> > > 
> > > Test 11: Look at the step function load response. From no load to
> > > 100% on one CPU (CPU load only, no IO).
> > > While there is a graph, it is not attached:
> > > Conclusion: The step function response is greatly improved
> > > (virtually one sample time max).
> > > It would probably be O.K. to slow it down a little with a filter
> > > so as to reduce the
> > > tendency to oscillate under periodic load conditions (to a point,
> > > at least. A low enough frequency will
> > > always oscillate) (see the graph for test10).
> I haven't done this test yet, but from previous work, a gain setting
> of 10 to 15% gives a
> load step function response time similar to the current PID based
> filter.
> 
> The other tests gave similar results with or without the filter.
> 
> ... Doug
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-22 18:53         ` Srinivas Pandruvada
@ 2016-08-22 22:53           ` Doug Smythies
  0 siblings, 0 replies; 51+ messages in thread
From: Doug Smythies @ 2016-08-22 22:53 UTC (permalink / raw)
  To: 'Srinivas Pandruvada', 'Rafael J. Wysocki'
  Cc: 'Peter Zijlstra', 'Viresh Kumar',
	'Linux Kernel Mailing List', 'Steve Muckle',
	'Juri Lelli', 'Ingo Molnar',
	'Linux PM list'

On 2016.08.22 11:54 Srinivas Pandruvada wrote:

> I am not able to apply this patch. Can you send as a patch on top of
> Rafael's RFC 7/7. Since test takes long time, I want to apply correct
> patch.

...[cut]...

O.K. I have included a couple of changes as per the feedback from Peter Zijlstra,
but all of my "To Do" notes are still there.

Coming shortly as patch 8 of 7 as per below:

doug@s15:~/temp-k-git/linux$ git log --oneline

305e905 cpufreq: intel_pstate: add iir filter to pstate.
a29457f cpufreq: intel_pstate: Change P-state selection algorithm for Core
75cf618 cpufreq: schedutil: Add iowait boosting
01b35b0 cpufreq / sched: UUF_IO flag to indicate iowait condition
81d0b42 cpufreq / sched: Add flags argument to cpufreq_update_util()
cb8f8e0 cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util()
ffa049b cpufreq / sched: Drop cpufreq_trigger_update()
a3c19c3 cpufreq / sched: Make schedutil access utilization data directly
fa8410b Linux 4.8-rc3

... Doug

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-07-31 23:38 ` [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core Rafael J. Wysocki
  2016-08-04  4:18   ` Doug Smythies
  2016-08-04  6:53   ` Doug Smythies
@ 2016-08-23  3:48   ` Wanpeng Li
  2016-08-23  4:08     ` Srinivas Pandruvada
  2 siblings, 1 reply; 51+ messages in thread
From: Wanpeng Li @ 2016-08-23  3:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Srinivas Pandruvada, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

Hi Rafael,
2016-08-01 7:38 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The PID-base P-state selection algorithm used by intel_pstate for
> Core processors is based on very weak foundations.  Namely, its
> decisions are mostly based on the values of the APERF and MPERF
> feedback registers and it only estimates the actual utilization to
> check if it is not extremely low (in order to avoid getting stuck
> in the highest P-state in that case).
>
> Since it generally causes the CPU P-state to ramp up quickly, it
> leads to satisfactory performance, but the metric used by it is only
> really valid when the CPU changes P-states by itself (ie. in the turbo
> range) and if the P-state value set by the driver is treated by the
> CPU as the upper limit on turbo P-states selected by it.

Do you mean CPU will ignore the request value which is submitted by
intel_pstate driver in the turbo range, but respect the upper limit on
the turbo P-states which is submitted by intel_pstate driver?

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-23  3:48   ` Wanpeng Li
@ 2016-08-23  4:08     ` Srinivas Pandruvada
  2016-08-23  4:50       ` Wanpeng Li
  0 siblings, 1 reply; 51+ messages in thread
From: Srinivas Pandruvada @ 2016-08-23  4:08 UTC (permalink / raw)
  To: Wanpeng Li, Rafael J. Wysocki
  Cc: Linux PM list, Peter Zijlstra, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Tue, 2016-08-23 at 11:48 +0800, Wanpeng Li wrote:
> Hi Rafael,
> 2016-08-01 7:38 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> > 
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The PID-base P-state selection algorithm used by intel_pstate for
> > Core processors is based on very weak foundations.  Namely, its
> > decisions are mostly based on the values of the APERF and MPERF
> > feedback registers and it only estimates the actual utilization to
> > check if it is not extremely low (in order to avoid getting stuck
> > in the highest P-state in that case).
> > 
> > Since it generally causes the CPU P-state to ramp up quickly, it
> > leads to satisfactory performance, but the metric used by it is
> > only
> > really valid when the CPU changes P-states by itself (ie. in the
> > turbo
> > range) and if the P-state value set by the driver is treated by the
> > CPU as the upper limit on turbo P-states selected by it.
> 
> Do you mean CPU will ignore the request value which is submitted by
> intel_pstate driver in the turbo range, but respect the upper limit
> on
> the turbo P-states which is submitted by intel_pstate driver?
Any request to request a P-state even upper limit can be totally
ignored in turbo range. 

Thanks,
Srinivas

> 
> Regards,
> Wanpeng Li
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-23  4:08     ` Srinivas Pandruvada
@ 2016-08-23  4:50       ` Wanpeng Li
  2016-08-23 17:30         ` Rafael J. Wysocki
  0 siblings, 1 reply; 51+ messages in thread
From: Wanpeng Li @ 2016-08-23  4:50 UTC (permalink / raw)
  To: Srinivas Pandruvada
  Cc: Rafael J. Wysocki, Linux PM list, Peter Zijlstra, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

2016-08-23 12:08 GMT+08:00 Srinivas Pandruvada
<srinivas.pandruvada@linux.intel.com>:
> On Tue, 2016-08-23 at 11:48 +0800, Wanpeng Li wrote:
>> Hi Rafael,
>> 2016-08-01 7:38 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
>> >
>> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> >
>> > The PID-base P-state selection algorithm used by intel_pstate for
>> > Core processors is based on very weak foundations.  Namely, its
>> > decisions are mostly based on the values of the APERF and MPERF
>> > feedback registers and it only estimates the actual utilization to
>> > check if it is not extremely low (in order to avoid getting stuck
>> > in the highest P-state in that case).
>> >
>> > Since it generally causes the CPU P-state to ramp up quickly, it
>> > leads to satisfactory performance, but the metric used by it is
>> > only
>> > really valid when the CPU changes P-states by itself (ie. in the
>> > turbo
>> > range) and if the P-state value set by the driver is treated by the
>> > CPU as the upper limit on turbo P-states selected by it.
>>
>> Do you mean CPU will ignore the request value which is submitted by
>> intel_pstate driver in the turbo range, but respect the upper limit
>> on
>> the turbo P-states which is submitted by intel_pstate driver?
> Any request to request a P-state even upper limit can be totally
> ignored in turbo range.

How to understand Rafael's description "if the P-state value set by
the driver is treated by the CPU as the upper limit on the turbo
P-states selected by it"? The upper limit will be totally ignored in
turbo range.

Regards,
Wanpeng Li

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core
  2016-08-23  4:50       ` Wanpeng Li
@ 2016-08-23 17:30         ` Rafael J. Wysocki
  0 siblings, 0 replies; 51+ messages in thread
From: Rafael J. Wysocki @ 2016-08-23 17:30 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Srinivas Pandruvada, Linux PM list, Peter Zijlstra, Viresh Kumar,
	Linux Kernel Mailing List, Steve Muckle, Juri Lelli, Ingo Molnar

On Tuesday, August 23, 2016 12:50:14 PM Wanpeng Li wrote:
> 2016-08-23 12:08 GMT+08:00 Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com>:
> > On Tue, 2016-08-23 at 11:48 +0800, Wanpeng Li wrote:
> >> Hi Rafael,
> >> 2016-08-01 7:38 GMT+08:00 Rafael J. Wysocki <rjw@rjwysocki.net>:
> >> >
> >> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> >
> >> > The PID-base P-state selection algorithm used by intel_pstate for
> >> > Core processors is based on very weak foundations.  Namely, its
> >> > decisions are mostly based on the values of the APERF and MPERF
> >> > feedback registers and it only estimates the actual utilization to
> >> > check if it is not extremely low (in order to avoid getting stuck
> >> > in the highest P-state in that case).
> >> >
> >> > Since it generally causes the CPU P-state to ramp up quickly, it
> >> > leads to satisfactory performance, but the metric used by it is
> >> > only
> >> > really valid when the CPU changes P-states by itself (ie. in the
> >> > turbo
> >> > range) and if the P-state value set by the driver is treated by the
> >> > CPU as the upper limit on turbo P-states selected by it.
> >>
> >> Do you mean CPU will ignore the request value which is submitted by
> >> intel_pstate driver in the turbo range, but respect the upper limit
> >> on
> >> the turbo P-states which is submitted by intel_pstate driver?
> > Any request to request a P-state even upper limit can be totally
> > ignored in turbo range.
> 
> How to understand Rafael's description "if the P-state value set by
> the driver is treated by the CPU as the upper limit on the turbo
> P-states selected by it"? The upper limit will be totally ignored in
> turbo range.

It wasn't ignored, though, by some older CPU models.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2016-08-23 17:25 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-31 23:31 [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Rafael J. Wysocki
2016-07-31 23:34 ` [RFC][PATCH 1/7] cpufreq / sched: Make schedutil access utilization data directly Rafael J. Wysocki
2016-08-01 19:28   ` Steve Muckle
2016-08-01 23:46     ` Rafael J. Wysocki
2016-08-02 10:38       ` Juri Lelli
2016-08-02 14:28         ` Steve Muckle
2016-08-02 14:43           ` Juri Lelli
2016-08-08 10:38       ` Peter Zijlstra
2016-07-31 23:35 ` [RFC][PATCH 2/7] cpufreq / sched: Drop cpufreq_trigger_update() Rafael J. Wysocki
2016-07-31 23:36 ` [RFC][PATCH 3/7] cpufreq / sched: Check cpu_of(rq) in cpufreq_update_util() Rafael J. Wysocki
2016-08-01  7:29   ` Dominik Brodowski
2016-08-01 14:57     ` Rafael J. Wysocki
2016-08-01 19:48     ` Steve Muckle
2016-08-01 23:43       ` Rafael J. Wysocki
2016-07-31 23:36 ` [RFC][PATCH 4/7] cpufreq / sched: Add flags argument to cpufreq_update_util() Rafael J. Wysocki
2016-08-01  7:33   ` Dominik Brodowski
2016-08-01 14:57     ` Rafael J. Wysocki
2016-08-01 19:59       ` Steve Muckle
2016-08-01 23:44         ` Rafael J. Wysocki
2016-08-02  1:36           ` Steve Muckle
2016-07-31 23:37 ` [RFC][PATCH 5/7] cpufreq / sched: UUF_IO flag to indicate iowait condition Rafael J. Wysocki
2016-08-02  1:22   ` Steve Muckle
2016-08-02  1:37     ` Rafael J. Wysocki
2016-08-02 22:02       ` Steve Muckle
2016-08-02 22:38         ` Rafael J. Wysocki
2016-08-04  2:24           ` Steve Muckle
2016-08-04 21:19             ` Rafael J. Wysocki
2016-08-04 22:09               ` Steve Muckle
2016-08-05 23:36                 ` Rafael J. Wysocki
2016-07-31 23:37 ` [RFC][PATCH 6/7] cpufreq: schedutil: Add iowait boosting Rafael J. Wysocki
2016-08-02  1:35   ` Steve Muckle
2016-08-02 23:03     ` Rafael J. Wysocki
2016-07-31 23:38 ` [RFC][PATCH 7/7] cpufreq: intel_pstate: Change P-state selection algorithm for Core Rafael J. Wysocki
2016-08-04  4:18   ` Doug Smythies
2016-08-04  6:53   ` Doug Smythies
2016-08-06  0:02     ` Rafael J. Wysocki
2016-08-09 17:16       ` Doug Smythies
2016-08-13 15:59       ` Doug Smythies
2016-08-19 14:47         ` Peter Zijlstra
2016-08-20  1:06           ` Rafael J. Wysocki
2016-08-20  6:40           ` Doug Smythies
2016-08-22 18:53         ` Srinivas Pandruvada
2016-08-22 22:53           ` Doug Smythies
2016-08-23  3:48   ` Wanpeng Li
2016-08-23  4:08     ` Srinivas Pandruvada
2016-08-23  4:50       ` Wanpeng Li
2016-08-23 17:30         ` Rafael J. Wysocki
2016-08-01 15:26 ` [RFC][PATCH 0/7] cpufreq / sched: cpufreq_update_util() flags and iowait boosting Doug Smythies
2016-08-01 16:30   ` Rafael J. Wysocki
2016-08-08 11:08     ` Peter Zijlstra
2016-08-08 13:01       ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.