All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] cpufreq: schedutil: improve latency of response
@ 2016-05-09 21:20 Steve Muckle
  2016-05-09 21:20 ` [PATCH 1/5] sched: cpufreq: add cpu to update_util_data Steve Muckle
                   ` (4 more replies)
  0 siblings, 5 replies; 35+ messages in thread
From: Steve Muckle @ 2016-05-09 21:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette,
	Viresh Kumar, Srinivas Pandruvada, Len Brown

The cpufreq update hook is presently not called when the target CPU is not the
current CPU. This may cause a delay in adjusting the frequency of the target
CPU to accommodate new task load since the target CPU may not run the scheduler
hook until the next tick.

In order to test this series I wrote a small sample program that reproduces a
scenario susceptible to the problem above. I ran this on x86 since ARM
currently lacks both fast switch support and a high priority (RT/DL) slow path
switch context. With HZ set to 100 I was easily able to see an unnecessary
latency of 7ms in the target CPU frequency being re-evaluated and adjusted.
This latency is eliminated with this series.

This is the second version of a patch series initially posted here:
http://thread.gmane.org/gmane.linux.power-management.general/74977
The approach is quite different though. The dbs and intel_pstate governors are
not modified to take remote callbacks and the scheduler modifications are
rather different so as to avoid redundant IPIs.

Having the governor invoke the switch on a remote CPU means sending an IPI.
Unfortunately if the scheduler event also results in preemption, the scheduler
(and consequently the cpufreq hook) will run soon on the target CPU anyway. A
resched IPI may also be sent. In short it is desirable to not run the cpufreq
hook remotely if preemption will occur as a result of the scheduler event. To
achieve this, remote cpufreq callbacks are processed after preemption is
decided in check_preempt_curr().

Another optimization is to process callbacks from remote CPUs within a target
CPU's cpufreq policy on the remote CPU, since any CPU within a frequency domain
should be able to update the frequency there.

Legacy behavior of the cpufreq hook, where the callback only occurs on the
target CPU, is maintained via special usage of a new parameter to
cpufreq_add_update_util_hook(). That function now takes a fourth parameter,
cpumask_var_t *policy_cpus. If NULL then the hook is not called remotely.
Otherwise it is expected to point to the cpumask for the frequency domain so
the scheduler may differentiate between local and remote callbacks as above.
Clients of the hook other than schedutil are configured to only receive local
callbacks so their behavior does not change.

The above logic did not alone fix the original issue because currently,
schedutil does not assign the raw required frequency to a target-supported
frequency. This means that even small variations in raw required frequency will
result in an attempted frequency switch and a corresponding advance of the rate
limit timestamp, last_freq_update_time. If the actual target-supported
frequency is unchanged then this is not necessary; furthermore, schedutil
will now be rate limited causing any imminent and more substantial updates to
have to wait.

To address this the required target-supported frequency is now calculated in
schedutil. Also last_freq_update_time is not advanced if there is no change in
the requested target-supported frequency. I expect the mapping to
target-supported frequency to require discussion since acpu_cpufreq_fast_switch
also performs this mapping.

I attempted to look for general performance regression but results seemed to be
inconclusive (tried a couple passes each).

model name      : Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
Performance counter stats for 'perf bench sched messaging -g 50 -l 5000' (30 runs):
baseline w/perf gov:
 20.847382649 seconds time elapsed ( +-  0.14% )
 20.701305631 seconds time elapsed ( +-  0.17% )
patched w/perf gov:
 20.665433861 seconds time elapsed ( +-  0.19% )
 20.606157659 seconds time elapsed ( +-  0.19% )
baseline w/sched gov:
 20.893686693 seconds time elapsed ( +-  0.12% )
 20.639752039 seconds time elapsed ( +-  0.12% )
patched w/sched gov:
 20.847845042 seconds time elapsed ( +-  0.14% )
 20.644003401 seconds time elapsed ( +-  0.14% )

Steve Muckle (5):
  sched: cpufreq: add cpu to update_util_data
  cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  sched: cpufreq: call cpufreq hook from remote CPUs
  cpufreq: schedutil: map raw required frequency to CPU-supported
    frequency
  cpufreq: schedutil: do not update rate limit ts when freq is unchanged

 drivers/cpufreq/cpufreq_governor.c |   2 +-
 drivers/cpufreq/intel_pstate.c     |   2 +-
 include/linux/sched.h              |   5 +-
 kernel/sched/core.c                |   4 ++
 kernel/sched/cpufreq.c             |  14 ++++-
 kernel/sched/cpufreq_schedutil.c   | 105 +++++++++++++++++++++++++++---------
 kernel/sched/fair.c                |  40 +++++++-------
 kernel/sched/sched.h               | 106 +++++++++++++++++++++++++++++--------
 8 files changed, 206 insertions(+), 72 deletions(-)

-- 
2.4.10

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/5] sched: cpufreq: add cpu to update_util_data
  2016-05-09 21:20 [PATCH 0/5] cpufreq: schedutil: improve latency of response Steve Muckle
@ 2016-05-09 21:20 ` Steve Muckle
  2016-05-18 23:13   ` Rafael J. Wysocki
  2016-05-09 21:20 ` [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs Steve Muckle
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-09 21:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette,
	Viresh Kumar, Srinivas Pandruvada, Len Brown

Upcoming support for scheduler cpufreq callbacks on remote wakeups
will require the client to know what the target CPU is that the
callback is being invoked for. Add this information into the callback
data structure.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 include/linux/sched.h  | 1 +
 kernel/sched/cpufreq.c | 1 +
 2 files changed, 2 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8344e1947eec..81aba7dc5966 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3238,6 +3238,7 @@ static inline unsigned long rlimit_max(unsigned int limit)
 struct update_util_data {
 	void (*func)(struct update_util_data *data,
 		     u64 time, unsigned long util, unsigned long max);
+	int cpu;
 };
 
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index 1141954e73b4..d88a78ea805d 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -42,6 +42,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
 		return;
 
 	data->func = func;
+	data->cpu = cpu;
 	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
 }
 EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  2016-05-09 21:20 [PATCH 0/5] cpufreq: schedutil: improve latency of response Steve Muckle
  2016-05-09 21:20 ` [PATCH 1/5] sched: cpufreq: add cpu to update_util_data Steve Muckle
@ 2016-05-09 21:20 ` Steve Muckle
  2016-05-18 23:24   ` Rafael J. Wysocki
  2016-05-09 21:20 ` [PATCH 3/5] sched: cpufreq: call cpufreq hook from " Steve Muckle
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-09 21:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette,
	Viresh Kumar, Srinivas Pandruvada, Len Brown

In preparation for the scheduler cpufreq callback happening on remote
CPUs, add support for this in schedutil.

Schedutil currently requires the callback occur on the CPU being
updated in order to support fast frequency switches. Remove this
limitation by checking for the current CPU being outside the target
CPU's cpufreq policy and if this is the case, enqueuing an irq_work on
the target CPU. The irq_work for schedutil is modified to carry out a
fast frequency switch if that is enabled for the policy.

If the callback occurs on a CPU within the target CPU's policy, the
transition is carried out on the local CPU.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 86 ++++++++++++++++++++++++++++++----------
 1 file changed, 65 insertions(+), 21 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 154ae3a51e86..c81f9432f520 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -76,27 +76,61 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
 	return delta_ns >= sg_policy->freq_update_delay_ns;
 }
 
-static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
+static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu,
+			      unsigned int next_freq)
+{
+	struct cpufreq_policy *policy = sg_policy->policy;
+
+	next_freq = cpufreq_driver_fast_switch(policy, next_freq);
+	if (next_freq == CPUFREQ_ENTRY_INVALID)
+		return;
+
+	policy->cur = next_freq;
+	trace_cpu_frequency(next_freq, cpu);
+}
+
+#ifdef CONFIG_SMP
+static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
+					 int cpu)
+{
+	struct cpufreq_policy *policy = sg_policy->policy;
+
+	if (!cpumask_test_cpu(smp_processor_id(), policy->cpus)) {
+		sg_policy->work_in_progress = true;
+		irq_work_queue_on(&sg_policy->irq_work, cpu);
+		return true;
+	}
+
+	return false;
+}
+#else
+static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
+					 int cpu)
+{
+	return false;
+}
+#endif
+
+static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
 				unsigned int next_freq)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
 
 	sg_policy->last_freq_update_time = time;
 
+	if (sg_policy->next_freq == next_freq) {
+		trace_cpu_frequency(policy->cur, cpu);
+		return;
+	}
+	sg_policy->next_freq = next_freq;
+
+	if (sugov_queue_remote_callback(sg_policy, cpu))
+		return;
+
 	if (policy->fast_switch_enabled) {
-		if (sg_policy->next_freq == next_freq) {
-			trace_cpu_frequency(policy->cur, smp_processor_id());
-			return;
-		}
-		sg_policy->next_freq = next_freq;
-		next_freq = cpufreq_driver_fast_switch(policy, next_freq);
-		if (next_freq == CPUFREQ_ENTRY_INVALID)
-			return;
-
-		policy->cur = next_freq;
-		trace_cpu_frequency(next_freq, smp_processor_id());
-	} else if (sg_policy->next_freq != next_freq) {
-		sg_policy->next_freq = next_freq;
+		sugov_fast_switch(sg_policy, cpu, next_freq);
+	} else {
 		sg_policy->work_in_progress = true;
 		irq_work_queue(&sg_policy->irq_work);
 	}
@@ -142,12 +176,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
 
 	next_f = util == ULONG_MAX ? policy->cpuinfo.max_freq :
 			get_next_freq(policy, util, max);
-	sugov_update_commit(sg_policy, time, next_f);
+	sugov_update_commit(sg_cpu, hook->cpu, time, next_f);
 }
 
-static unsigned int sugov_next_freq_shared(struct sugov_policy *sg_policy,
+static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu,
 					   unsigned long util, unsigned long max)
 {
+	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
 	unsigned int max_f = policy->cpuinfo.max_freq;
 	u64 last_freq_update_time = sg_policy->last_freq_update_time;
@@ -161,10 +196,10 @@ static unsigned int sugov_next_freq_shared(struct sugov_policy *sg_policy,
 		unsigned long j_util, j_max;
 		s64 delta_ns;
 
-		if (j == smp_processor_id())
+		j_sg_cpu = &per_cpu(sugov_cpu, j);
+		if (j_sg_cpu == sg_cpu)
 			continue;
 
-		j_sg_cpu = &per_cpu(sugov_cpu, j);
 		/*
 		 * If the CPU utilization was last updated before the previous
 		 * frequency update and the time elapsed between the last update
@@ -204,8 +239,8 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
 	sg_cpu->last_update = time;
 
 	if (sugov_should_update_freq(sg_policy, time)) {
-		next_f = sugov_next_freq_shared(sg_policy, util, max);
-		sugov_update_commit(sg_policy, time, next_f);
+		next_f = sugov_next_freq_shared(sg_cpu, util, max);
+		sugov_update_commit(sg_cpu, hook->cpu, time, next_f);
 	}
 
 	raw_spin_unlock(&sg_policy->update_lock);
@@ -226,9 +261,18 @@ static void sugov_work(struct work_struct *work)
 static void sugov_irq_work(struct irq_work *irq_work)
 {
 	struct sugov_policy *sg_policy;
+	struct cpufreq_policy *policy;
 
 	sg_policy = container_of(irq_work, struct sugov_policy, irq_work);
-	schedule_work_on(smp_processor_id(), &sg_policy->work);
+	policy = sg_policy->policy;
+
+	if (policy->fast_switch_enabled) {
+		sugov_fast_switch(sg_policy, smp_processor_id(),
+				  sg_policy->next_freq);
+		sg_policy->work_in_progress = false;
+	} else {
+		schedule_work_on(smp_processor_id(), &sg_policy->work);
+	}
 }
 
 /************************** sysfs interface ************************/
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-09 21:20 [PATCH 0/5] cpufreq: schedutil: improve latency of response Steve Muckle
  2016-05-09 21:20 ` [PATCH 1/5] sched: cpufreq: add cpu to update_util_data Steve Muckle
  2016-05-09 21:20 ` [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs Steve Muckle
@ 2016-05-09 21:20 ` Steve Muckle
  2016-05-18 23:33   ` Rafael J. Wysocki
  2016-05-09 21:20 ` [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency Steve Muckle
  2016-05-09 21:20 ` [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged Steve Muckle
  4 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-09 21:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette,
	Viresh Kumar, Srinivas Pandruvada, Len Brown

Without calling the cpufreq hook for a remote wakeup it is possible
for such a wakeup to go unnoticed by cpufreq on the target CPU for up
to a full tick. This can occur if the target CPU is running a
CPU-bound task and preemption does not occur. If preemption does occur
then the scheduler is expected to run soon on the target CPU anyway so
invoking the cpufreq hook on the remote wakeup is not required.

In order to avoid unnecessary interprocessor communication in the
governor for the preemption case, the existing hook (which happens
before preemption is decided) is only called when the target CPU
is within the current CPU's cpufreq policy. A new hook is added in
check_preempt_curr() to handle events outside the current CPU's
cpufreq policy where preemption does not happen.

Some governors may opt to not receive remote CPU callbacks. This
behavior is possible by providing NULL as the new policy_cpus
parameter to cpufreq_add_update_util_hook(). Callbacks will only be
issued in this case when the target CPU and the current CPU are the
same. Otherwise policy_cpus is used to determine what is a local
vs. remote callback.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c |   2 +-
 drivers/cpufreq/intel_pstate.c     |   2 +-
 include/linux/sched.h              |   4 +-
 kernel/sched/core.c                |   4 ++
 kernel/sched/cpufreq.c             |  13 ++++-
 kernel/sched/cpufreq_schedutil.c   |   6 ++-
 kernel/sched/fair.c                |  40 +++++++-------
 kernel/sched/sched.h               | 106 +++++++++++++++++++++++++++++--------
 8 files changed, 127 insertions(+), 50 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 20f0a4e114d1..e127a7a22177 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -311,7 +311,7 @@ static void gov_set_update_util(struct policy_dbs_info *policy_dbs,
 		struct cpu_dbs_info *cdbs = &per_cpu(cpu_dbs, cpu);
 
 		cpufreq_add_update_util_hook(cpu, &cdbs->update_util,
-					     dbs_update_util_handler);
+					     dbs_update_util_handler, NULL);
 	}
 }
 
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 6c7cff13f0ed..9cf262ef23af 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -1266,7 +1266,7 @@ static void intel_pstate_set_update_util_hook(unsigned int cpu_num)
 	/* Prevent intel_pstate_update_util() from using stale data. */
 	cpu->sample.time = 0;
 	cpufreq_add_update_util_hook(cpu_num, &cpu->update_util,
-				     intel_pstate_update_util);
+				     intel_pstate_update_util, NULL);
 }
 
 static void intel_pstate_clear_update_util_hook(unsigned int cpu)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 81aba7dc5966..ce154518119a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3239,11 +3239,13 @@ struct update_util_data {
 	void (*func)(struct update_util_data *data,
 		     u64 time, unsigned long util, unsigned long max);
 	int cpu;
+	cpumask_var_t *policy_cpus;
 };
 
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
 			void (*func)(struct update_util_data *data, u64 time,
-				     unsigned long util, unsigned long max));
+				     unsigned long util, unsigned long max),
+				  cpumask_var_t *policy_cpus);
 void cpufreq_remove_update_util_hook(int cpu);
 #endif /* CONFIG_CPU_FREQ */
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fcac37b..fce6c0b43231 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -450,6 +450,8 @@ void resched_curr(struct rq *rq)
 
 	lockdep_assert_held(&rq->lock);
 
+	cpufreq_set_skip_cb(rq);
+
 	if (test_tsk_need_resched(curr))
 		return;
 
@@ -922,6 +924,8 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 	 */
 	if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
 		rq_clock_skip_update(rq, true);
+
+	cpufreq_update_remote(rq);
 }
 
 #ifdef CONFIG_SMP
diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c
index d88a78ea805d..2946d2096bf2 100644
--- a/kernel/sched/cpufreq.c
+++ b/kernel/sched/cpufreq.c
@@ -18,6 +18,7 @@ DEFINE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
  * @cpu: The CPU to set the pointer for.
  * @data: New pointer value.
  * @func: Callback function to set for the CPU.
+ * @policy_cpus: Pointer to cpumask for CPUs which share the same policy.
  *
  * Set and publish the update_util_data pointer for the given CPU.
  *
@@ -28,12 +29,21 @@ DEFINE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
  * passed to it as the first argument which allows the function to get to the
  * target update_util_data structure and its container.
  *
+ * If the callback function is designed to be run from CPUs outside the policy
+ * as well as those inside then the policy_cpus pointer should be set. This will
+ * cause these remote callbacks to be run as long as the associated scheduler
+ * event does not trigger preemption. If preemption does occur then it is
+ * assumed that the callback will happen soon enough on the target CPU as a
+ * result of the preemption scheduler activity there. If policy_cpus is set to
+ * NULL, then callbacks will only occur if the target CPU is the current CPU.
+ *
  * The update_util_data pointer of @cpu must be NULL when this function is
  * called or it will WARN() and return with no effect.
  */
 void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
 			void (*func)(struct update_util_data *data, u64 time,
-				     unsigned long util, unsigned long max))
+				     unsigned long util, unsigned long max),
+				  cpumask_var_t *policy_cpus)
 {
 	if (WARN_ON(!data || !func))
 		return;
@@ -43,6 +53,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
 
 	data->func = func;
 	data->cpu = cpu;
+	data->policy_cpus = policy_cpus;
 	rcu_assign_pointer(per_cpu(cpufreq_update_util_data, cpu), data);
 }
 EXPORT_SYMBOL_GPL(cpufreq_add_update_util_hook);
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index c81f9432f520..6cb2ecc204ec 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -477,10 +477,12 @@ static int sugov_start(struct cpufreq_policy *policy)
 			sg_cpu->max = 0;
 			sg_cpu->last_update = 0;
 			cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
-						     sugov_update_shared);
+						     sugov_update_shared,
+						     &policy->cpus);
 		} else {
 			cpufreq_add_update_util_hook(cpu, &sg_cpu->update_util,
-						     sugov_update_single);
+						     sugov_update_single,
+						     &policy->cpus);
 		}
 	}
 	return 0;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2bcc54bd10a7..2b7179cc7063 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2824,30 +2824,26 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
 static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq)
 {
 	struct rq *rq = rq_of(cfs_rq);
-	int cpu = cpu_of(rq);
 
-	if (cpu == smp_processor_id() && &rq->cfs == cfs_rq) {
-		unsigned long max = rq->cpu_capacity_orig;
+	if (&rq->cfs != cfs_rq)
+		return;
 
-		/*
-		 * There are a few boundary cases this might miss but it should
-		 * get called often enough that that should (hopefully) not be
-		 * a real problem -- added to that it only calls on the local
-		 * CPU, so if we enqueue remotely we'll miss an update, but
-		 * the next tick/schedule should update.
-		 *
-		 * It will not get called when we go idle, because the idle
-		 * thread is a different class (!fair), nor will the utilization
-		 * number include things like RT tasks.
-		 *
-		 * As is, the util number is not freq-invariant (we'd have to
-		 * implement arch_scale_freq_capacity() for that).
-		 *
-		 * See cpu_util().
-		 */
-		cpufreq_update_util(rq_clock(rq),
-				    min(cfs_rq->avg.util_avg, max), max);
-	}
+	/*
+	 * There are a few boundary cases this might miss but it should
+	 * get called often enough that that should (hopefully) not be
+	 * a real problem. There is also a call to the hook after preemption
+	 * is checked.
+	 *
+	 * It will not get called when we go idle, because the idle
+	 * thread is a different class (!fair), nor will the utilization
+	 * number include things like RT tasks.
+	 *
+	 * As is, the util number is not freq-invariant (we'd have to
+	 * implement arch_scale_freq_capacity() for that).
+	 *
+	 * See cpu_util().
+	 */
+	cpufreq_update_util(rq);
 }
 
 /* Group cfs_rq's load_avg is used for task_h_load and update_cfs_share */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 921d6e5d33b7..0ee080e791b9 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -702,6 +702,10 @@ struct rq {
 	/* Must be inspected within a rcu lock section */
 	struct cpuidle_state *idle_state;
 #endif
+
+#if defined(CONFIG_SMP) && defined(CONFIG_CPU_FREQ)
+	bool cpufreq_skip_cb;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)
@@ -1798,26 +1802,6 @@ static inline u64 irq_time_read(int cpu)
 DECLARE_PER_CPU(struct update_util_data *, cpufreq_update_util_data);
 
 /**
- * cpufreq_update_util - Take a note about CPU utilization changes.
- * @time: Current time.
- * @util: Current utilization.
- * @max: Utilization ceiling.
- *
- * This function is called by the scheduler on every invocation of
- * update_load_avg() on the CPU whose utilization is being updated.
- *
- * It can only be called from RCU-sched read-side critical sections.
- */
-static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned long max)
-{
-       struct update_util_data *data;
-
-       data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
-       if (data)
-               data->func(data, time, util, max);
-}
-
-/**
  * cpufreq_trigger_update - Trigger CPU performance state evaluation if needed.
  * @time: Current time.
  *
@@ -1835,13 +1819,91 @@ static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned lo
  */
 static inline void cpufreq_trigger_update(u64 time)
 {
-	cpufreq_update_util(time, ULONG_MAX, 0);
+	struct update_util_data *data;
+
+	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
+	if (data)
+		data->func(data, time, ULONG_MAX, 0);
 }
+
 #else
-static inline void cpufreq_update_util(u64 time, unsigned long util, unsigned long max) {}
 static inline void cpufreq_trigger_update(u64 time) {}
 #endif /* CONFIG_CPU_FREQ */
 
+#if defined(CONFIG_CPU_FREQ) && defined(CONFIG_SMP)
+/**
+ * cpufreq_update_util - Take a note about CPU utilization changes.
+ * @rq: Runqueue of CPU to be updated.
+ *
+ * This function is called during scheduler events which cause a CPU's root
+ * cfs_rq utilization to be updated.
+*
+ * It can only be called from RCU-sched read-side critical sections.
+ */
+static inline void cpufreq_update_util(struct rq *rq)
+{
+	struct update_util_data *data;
+	unsigned long max;
+
+	data = rcu_dereference_sched(*this_cpu_ptr(&cpufreq_update_util_data));
+	if (!data)
+		return;
+
+	if (!data->policy_cpus && cpu_of(rq) != smp_processor_id())
+		return;
+
+	if (data->policy_cpus &&
+	    !cpumask_test_cpu(smp_processor_id(), *data->policy_cpus))
+		return;
+
+	max = rq->cpu_capacity_orig;
+	data->func(data, rq_clock(rq), min(rq->cfs.avg.util_avg, max), max);
+}
+
+/**
+ * cpufreq_update_remote - Process callbacks to CPUs in remote policies.
+ * @rq: Target runqueue.
+ *
+ * Remote cpufreq callbacks must be processed after preemption has been decided
+ * so that unnecessary IPIs may be avoided. Cpufreq callbacks to CPUs within the
+ * local policy are handled earlier.
+ */
+static inline void cpufreq_update_remote(struct rq *rq)
+{
+	struct update_util_data *data;
+	unsigned long max;
+	int cpu = smp_processor_id();
+	int target = cpu_of(rq);
+
+	if (rq->cpufreq_skip_cb) {
+		rq->cpufreq_skip_cb = false;
+		return;
+	}
+
+	if (target == cpu)
+		return;
+
+	data = rcu_dereference_sched(per_cpu(cpufreq_update_util_data, target));
+	if (!data || !data->policy_cpus)
+		return;
+
+	if (cpumask_test_cpu(cpu, *data->policy_cpus))
+		return;
+
+	max = rq->cpu_capacity_orig;
+	data->func(data, rq_clock(rq), min(rq->cfs.avg.util_avg, max), max);
+}
+
+static inline void cpufreq_set_skip_cb(struct rq *rq)
+{
+	rq->cpufreq_skip_cb = true;
+}
+#else
+static inline void cpufreq_update_util(struct rq *rq) {}
+static inline void cpufreq_update_remote(struct rq *rq) {}
+static inline void cpufreq_set_skip_cb(struct rq *rq) {}
+#endif /* CONFIG_SMP && CONFIG_CPU_FREQ */
+
 #ifdef arch_scale_freq_capacity
 #ifndef arch_scale_freq_invariant
 #define arch_scale_freq_invariant()	(true)
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency
  2016-05-09 21:20 [PATCH 0/5] cpufreq: schedutil: improve latency of response Steve Muckle
                   ` (2 preceding siblings ...)
  2016-05-09 21:20 ` [PATCH 3/5] sched: cpufreq: call cpufreq hook from " Steve Muckle
@ 2016-05-09 21:20 ` Steve Muckle
  2016-05-18 23:37   ` Rafael J. Wysocki
  2016-05-09 21:20 ` [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged Steve Muckle
  4 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-09 21:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette,
	Viresh Kumar, Srinivas Pandruvada, Len Brown

The mechanisms for remote CPU updates and slow-path frequency
transitions are relatively expensive - the former is an IPI while the
latter requires waking up a thread to do work. These activities should
be avoided if they are not necessary. To that end, calculate the
actual target-supported frequency required by the new utilization
value in schedutil. If it is the same as the previously requested
frequency then there is no need to continue with the update.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 6cb2ecc204ec..e185075fcb5c 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -153,14 +153,26 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
  * next_freq = C * curr_freq * util_raw / max
  *
  * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
+ *
+ * The lowest target-supported frequency which is equal or greater than the raw
+ * next_freq (as calculated above) is returned, or the CPU's max_freq if such
+ * a target-supported frequency does not exist.
  */
 static unsigned int get_next_freq(struct cpufreq_policy *policy,
 				  unsigned long util, unsigned long max)
 {
+	struct cpufreq_frequency_table *entry;
 	unsigned int freq = arch_scale_freq_invariant() ?
 				policy->cpuinfo.max_freq : policy->cur;
+	unsigned int target_freq = UINT_MAX;
+
+	freq = (freq + (freq >> 2)) * util / max;
+
+	cpufreq_for_each_valid_entry(entry, policy->freq_table)
+		if (entry->frequency >= freq && entry->frequency < target_freq)
+			target_freq = entry->frequency;
 
-	return (freq + (freq >> 2)) * util / max;
+	return target_freq != UINT_MAX ? target_freq : policy->cpuinfo.max_freq;
 }
 
 static void sugov_update_single(struct update_util_data *hook, u64 time,
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-09 21:20 [PATCH 0/5] cpufreq: schedutil: improve latency of response Steve Muckle
                   ` (3 preceding siblings ...)
  2016-05-09 21:20 ` [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency Steve Muckle
@ 2016-05-09 21:20 ` Steve Muckle
  2016-05-18 23:44   ` Rafael J. Wysocki
  4 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-09 21:20 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki
  Cc: linux-kernel, linux-pm, Vincent Guittot, Morten Rasmussen,
	Dietmar Eggemann, Juri Lelli, Patrick Bellasi, Michael Turquette,
	Viresh Kumar, Srinivas Pandruvada, Len Brown

The rate limit timestamp (last_freq_update_time) is currently advanced
anytime schedutil re-evaluates the policy regardless of whether the CPU
frequency is changed or not. This means that utilization updates which
have no effect can cause much more significant utilization updates
(which require a large increase or decrease in CPU frequency) to be
delayed due to rate limiting.

Instead only update the rate limiting timstamp when the requested
target-supported frequency changes. The rate limit will now apply to
the rate of CPU frequency changes rather than the rate of
re-evaluations of the policy frequency.

Signed-off-by: Steve Muckle <smuckle@linaro.org>
---
 kernel/sched/cpufreq_schedutil.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index e185075fcb5c..4d2907c8a142 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
 	struct sugov_policy *sg_policy = sg_cpu->sg_policy;
 	struct cpufreq_policy *policy = sg_policy->policy;
 
-	sg_policy->last_freq_update_time = time;
-
 	if (sg_policy->next_freq == next_freq) {
 		trace_cpu_frequency(policy->cur, cpu);
 		return;
 	}
+	sg_policy->last_freq_update_time = time;
 	sg_policy->next_freq = next_freq;
 
 	if (sugov_queue_remote_callback(sg_policy, cpu))
-- 
2.4.10

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/5] sched: cpufreq: add cpu to update_util_data
  2016-05-09 21:20 ` [PATCH 1/5] sched: cpufreq: add cpu to update_util_data Steve Muckle
@ 2016-05-18 23:13   ` Rafael J. Wysocki
  0 siblings, 0 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 23:13 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> Upcoming support for scheduler cpufreq callbacks on remote wakeups
> will require the client to know what the target CPU is that the
> callback is being invoked for. Add this information into the callback
> data structure.
>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>

LGTM

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  2016-05-09 21:20 ` [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs Steve Muckle
@ 2016-05-18 23:24   ` Rafael J. Wysocki
  2016-05-19 18:40     ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 23:24 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> In preparation for the scheduler cpufreq callback happening on remote
> CPUs, add support for this in schedutil.
>
> Schedutil currently requires the callback occur on the CPU being
> updated in order to support fast frequency switches. Remove this
> limitation by checking for the current CPU being outside the target
> CPU's cpufreq policy and if this is the case, enqueuing an irq_work on
> the target CPU. The irq_work for schedutil is modified to carry out a
> fast frequency switch if that is enabled for the policy.
>
> If the callback occurs on a CPU within the target CPU's policy, the
> transition is carried out on the local CPU.
>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>
> ---
>  kernel/sched/cpufreq_schedutil.c | 86 ++++++++++++++++++++++++++++++----------
>  1 file changed, 65 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 154ae3a51e86..c81f9432f520 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -76,27 +76,61 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>         return delta_ns >= sg_policy->freq_update_delay_ns;
>  }
>
> -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> +static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu,
> +                             unsigned int next_freq)
> +{
> +       struct cpufreq_policy *policy = sg_policy->policy;
> +
> +       next_freq = cpufreq_driver_fast_switch(policy, next_freq);
> +       if (next_freq == CPUFREQ_ENTRY_INVALID)
> +               return;
> +
> +       policy->cur = next_freq;
> +       trace_cpu_frequency(next_freq, cpu);
> +}
> +
> +#ifdef CONFIG_SMP

schedutil depends on CONFIG_SMP now, so that's not necessary at least
for the time being.

> +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
> +                                        int cpu)
> +{
> +       struct cpufreq_policy *policy = sg_policy->policy;
> +
> +       if (!cpumask_test_cpu(smp_processor_id(), policy->cpus)) {

This check is overkill for policies that aren't shared (and we have a
special case for them already).

> +               sg_policy->work_in_progress = true;
> +               irq_work_queue_on(&sg_policy->irq_work, cpu);
> +               return true;
> +       }
> +
> +       return false;
> +}
> +#else
> +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
> +                                        int cpu)
> +{
> +       return false;
> +}
> +#endif
> +
> +static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,

It looks like you might pass hook here instead of the sg_cpu, cpu pair.

>                                 unsigned int next_freq)
>  {
> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         struct cpufreq_policy *policy = sg_policy->policy;
>
>         sg_policy->last_freq_update_time = time;
>
> +       if (sg_policy->next_freq == next_freq) {
> +               trace_cpu_frequency(policy->cur, cpu);
> +               return;
> +       }

There was a reason why I put the above under the fast_switch_enabled
branch and it was because this check/trace is not necessary otherwise.

> +       sg_policy->next_freq = next_freq;
> +
> +       if (sugov_queue_remote_callback(sg_policy, cpu))
> +               return;
> +
>         if (policy->fast_switch_enabled) {
> -               if (sg_policy->next_freq == next_freq) {
> -                       trace_cpu_frequency(policy->cur, smp_processor_id());
> -                       return;
> -               }
> -               sg_policy->next_freq = next_freq;
> -               next_freq = cpufreq_driver_fast_switch(policy, next_freq);
> -               if (next_freq == CPUFREQ_ENTRY_INVALID)
> -                       return;
> -
> -               policy->cur = next_freq;
> -               trace_cpu_frequency(next_freq, smp_processor_id());
> -       } else if (sg_policy->next_freq != next_freq) {
> -               sg_policy->next_freq = next_freq;
> +               sugov_fast_switch(sg_policy, cpu, next_freq);
> +       } else {
>                 sg_policy->work_in_progress = true;
>                 irq_work_queue(&sg_policy->irq_work);
>         }
> @@ -142,12 +176,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>
>         next_f = util == ULONG_MAX ? policy->cpuinfo.max_freq :
>                         get_next_freq(policy, util, max);
> -       sugov_update_commit(sg_policy, time, next_f);
> +       sugov_update_commit(sg_cpu, hook->cpu, time, next_f);
>  }
>
> -static unsigned int sugov_next_freq_shared(struct sugov_policy *sg_policy,
> +static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu,
>                                            unsigned long util, unsigned long max)
>  {
> +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         struct cpufreq_policy *policy = sg_policy->policy;
>         unsigned int max_f = policy->cpuinfo.max_freq;
>         u64 last_freq_update_time = sg_policy->last_freq_update_time;
> @@ -161,10 +196,10 @@ static unsigned int sugov_next_freq_shared(struct sugov_policy *sg_policy,
>                 unsigned long j_util, j_max;
>                 s64 delta_ns;
>
> -               if (j == smp_processor_id())
> +               j_sg_cpu = &per_cpu(sugov_cpu, j);
> +               if (j_sg_cpu == sg_cpu)
>                         continue;
>
> -               j_sg_cpu = &per_cpu(sugov_cpu, j);
>                 /*
>                  * If the CPU utilization was last updated before the previous
>                  * frequency update and the time elapsed between the last update
> @@ -204,8 +239,8 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,
>         sg_cpu->last_update = time;
>
>         if (sugov_should_update_freq(sg_policy, time)) {
> -               next_f = sugov_next_freq_shared(sg_policy, util, max);
> -               sugov_update_commit(sg_policy, time, next_f);
> +               next_f = sugov_next_freq_shared(sg_cpu, util, max);
> +               sugov_update_commit(sg_cpu, hook->cpu, time, next_f);
>         }
>
>         raw_spin_unlock(&sg_policy->update_lock);
> @@ -226,9 +261,18 @@ static void sugov_work(struct work_struct *work)
>  static void sugov_irq_work(struct irq_work *irq_work)
>  {
>         struct sugov_policy *sg_policy;
> +       struct cpufreq_policy *policy;
>
>         sg_policy = container_of(irq_work, struct sugov_policy, irq_work);
> -       schedule_work_on(smp_processor_id(), &sg_policy->work);
> +       policy = sg_policy->policy;
> +
> +       if (policy->fast_switch_enabled) {
> +               sugov_fast_switch(sg_policy, smp_processor_id(),
> +                                 sg_policy->next_freq);
> +               sg_policy->work_in_progress = false;
> +       } else {
> +               schedule_work_on(smp_processor_id(), &sg_policy->work);
> +       }
>  }
>
>  /************************** sysfs interface ************************/
> --
> 2.4.10
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-09 21:20 ` [PATCH 3/5] sched: cpufreq: call cpufreq hook from " Steve Muckle
@ 2016-05-18 23:33   ` Rafael J. Wysocki
  2016-05-19 12:00     ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 23:33 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> Without calling the cpufreq hook for a remote wakeup it is possible
> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
> to a full tick. This can occur if the target CPU is running a
> CPU-bound task and preemption does not occur. If preemption does occur
> then the scheduler is expected to run soon on the target CPU anyway so
> invoking the cpufreq hook on the remote wakeup is not required.
>
> In order to avoid unnecessary interprocessor communication in the
> governor for the preemption case, the existing hook (which happens
> before preemption is decided) is only called when the target CPU
> is within the current CPU's cpufreq policy. A new hook is added in
> check_preempt_curr() to handle events outside the current CPU's
> cpufreq policy where preemption does not happen.
>
> Some governors may opt to not receive remote CPU callbacks. This
> behavior is possible by providing NULL as the new policy_cpus
> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
> issued in this case when the target CPU and the current CPU are the
> same. Otherwise policy_cpus is used to determine what is a local
> vs. remote callback.

I don't really like the extra complexity added by this patch.

It makes the code look fragile at some places and it only really
matters for schedutil and for the fast switch case in there.

Overall, it looks like a premature optimization to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency
  2016-05-09 21:20 ` [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency Steve Muckle
@ 2016-05-18 23:37   ` Rafael J. Wysocki
  2016-05-19 19:35     ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 23:37 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> The mechanisms for remote CPU updates and slow-path frequency
> transitions are relatively expensive - the former is an IPI while the
> latter requires waking up a thread to do work. These activities should
> be avoided if they are not necessary. To that end, calculate the
> actual target-supported frequency required by the new utilization
> value in schedutil. If it is the same as the previously requested
> frequency then there is no need to continue with the update.

Unless the max/min limits changed in the meantime, right?

>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>
> ---
>  kernel/sched/cpufreq_schedutil.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 6cb2ecc204ec..e185075fcb5c 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -153,14 +153,26 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>   * next_freq = C * curr_freq * util_raw / max
>   *
>   * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
> + *
> + * The lowest target-supported frequency which is equal or greater than the raw
> + * next_freq (as calculated above) is returned, or the CPU's max_freq if such
> + * a target-supported frequency does not exist.
>   */
>  static unsigned int get_next_freq(struct cpufreq_policy *policy,
>                                   unsigned long util, unsigned long max)
>  {
> +       struct cpufreq_frequency_table *entry;
>         unsigned int freq = arch_scale_freq_invariant() ?
>                                 policy->cpuinfo.max_freq : policy->cur;
> +       unsigned int target_freq = UINT_MAX;
> +
> +       freq = (freq + (freq >> 2)) * util / max;
> +
> +       cpufreq_for_each_valid_entry(entry, policy->freq_table)
> +               if (entry->frequency >= freq && entry->frequency < target_freq)
> +                       target_freq = entry->frequency;

Please don't assume that every driver will have a frequency table.
That may not be the case in the future (and I'm not even sure about
the existing CPPC driver for that matter).

>
> -       return (freq + (freq >> 2)) * util / max;
> +       return target_freq != UINT_MAX ? target_freq : policy->cpuinfo.max_freq;
>  }
>
>  static void sugov_update_single(struct update_util_data *hook, u64 time,
> --
> 2.4.10
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-09 21:20 ` [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged Steve Muckle
@ 2016-05-18 23:44   ` Rafael J. Wysocki
  2016-05-19 19:46     ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-18 23:44 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> The rate limit timestamp (last_freq_update_time) is currently advanced
> anytime schedutil re-evaluates the policy regardless of whether the CPU
> frequency is changed or not. This means that utilization updates which
> have no effect can cause much more significant utilization updates
> (which require a large increase or decrease in CPU frequency) to be
> delayed due to rate limiting.
>
> Instead only update the rate limiting timstamp when the requested
> target-supported frequency changes. The rate limit will now apply to
> the rate of CPU frequency changes rather than the rate of
> re-evaluations of the policy frequency.
>
> Signed-off-by: Steve Muckle <smuckle@linaro.org>

I'm sort of divided here to be honest.

> ---
>  kernel/sched/cpufreq_schedutil.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index e185075fcb5c..4d2907c8a142 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>         struct cpufreq_policy *policy = sg_policy->policy;
>
> -       sg_policy->last_freq_update_time = time;
> -
>         if (sg_policy->next_freq == next_freq) {
>                 trace_cpu_frequency(policy->cur, cpu);

You should at least rate limit the trace_cpu_frequency() thing here if
you don't want to advance the last update time I think, or you may
easily end up with the trace buffer flooded by irrelevant stuff.

>                 return;
>         }
> +       sg_policy->last_freq_update_time = time;
>         sg_policy->next_freq = next_freq;
>
>         if (sugov_queue_remote_callback(sg_policy, cpu))
> --
> 2.4.10
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-18 23:33   ` Rafael J. Wysocki
@ 2016-05-19 12:00     ` Rafael J. Wysocki
  2016-05-19 19:19       ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 12:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> Without calling the cpufreq hook for a remote wakeup it is possible
>> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
>> to a full tick. This can occur if the target CPU is running a
>> CPU-bound task and preemption does not occur. If preemption does occur
>> then the scheduler is expected to run soon on the target CPU anyway so
>> invoking the cpufreq hook on the remote wakeup is not required.
>>
>> In order to avoid unnecessary interprocessor communication in the
>> governor for the preemption case, the existing hook (which happens
>> before preemption is decided) is only called when the target CPU
>> is within the current CPU's cpufreq policy. A new hook is added in
>> check_preempt_curr() to handle events outside the current CPU's
>> cpufreq policy where preemption does not happen.
>>
>> Some governors may opt to not receive remote CPU callbacks. This
>> behavior is possible by providing NULL as the new policy_cpus
>> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
>> issued in this case when the target CPU and the current CPU are the
>> same. Otherwise policy_cpus is used to determine what is a local
>> vs. remote callback.
>
> I don't really like the extra complexity added by this patch.
>
> It makes the code look fragile at some places and it only really
> matters for schedutil and for the fast switch case in there.
>
> Overall, it looks like a premature optimization to me.

In particular, the dance with checking the policy CPUs from the
scheduler is questionable (the scheduler might be interested in this
information for other purposes too and hooking it up in an ad-hoc way
just for cpufreq doesn't seem to be appropriate from that perspective)
and also doesn't seem to be necessary.

You can check if the current CPU is a policy one from the governor and
if that is the case, simply run the frequency update on it directly
without any IPI (because if both the target CPU and the current CPU
belong to the same policy, it doesn't matter which one of them will
run the update).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  2016-05-18 23:24   ` Rafael J. Wysocki
@ 2016-05-19 18:40     ` Steve Muckle
  2016-05-19 20:55       ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 18:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 01:24:41AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > In preparation for the scheduler cpufreq callback happening on remote
> > CPUs, add support for this in schedutil.
> >
> > Schedutil currently requires the callback occur on the CPU being
> > updated in order to support fast frequency switches. Remove this
> > limitation by checking for the current CPU being outside the target
> > CPU's cpufreq policy and if this is the case, enqueuing an irq_work on
> > the target CPU. The irq_work for schedutil is modified to carry out a
> > fast frequency switch if that is enabled for the policy.
> >
> > If the callback occurs on a CPU within the target CPU's policy, the
> > transition is carried out on the local CPU.
> >
> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 86 ++++++++++++++++++++++++++++++----------
> >  1 file changed, 65 insertions(+), 21 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 154ae3a51e86..c81f9432f520 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -76,27 +76,61 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >         return delta_ns >= sg_policy->freq_update_delay_ns;
> >  }
> >
> > -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> > +static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu,
> > +                             unsigned int next_freq)
> > +{
> > +       struct cpufreq_policy *policy = sg_policy->policy;
> > +
> > +       next_freq = cpufreq_driver_fast_switch(policy, next_freq);
> > +       if (next_freq == CPUFREQ_ENTRY_INVALID)
> > +               return;
> > +
> > +       policy->cur = next_freq;
> > +       trace_cpu_frequency(next_freq, cpu);
> > +}
> > +
> > +#ifdef CONFIG_SMP
> 
> schedutil depends on CONFIG_SMP now, so that's not necessary at least
> for the time being.

Will remove.

> 
> > +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
> > +                                        int cpu)
> > +{
> > +       struct cpufreq_policy *policy = sg_policy->policy;
> > +
> > +       if (!cpumask_test_cpu(smp_processor_id(), policy->cpus)) {
> 
> This check is overkill for policies that aren't shared (and we have a
> special case for them already).

I don't see why it is overkill - regardless of whether the policy is
shared, we need to determine whether or not we are running on one of the
CPUs (or in the case of a non-shared policy, the single CPU) within that
policy to know whether we can immediately change the frequency in this
context or a remote call is required.

> > +               sg_policy->work_in_progress = true;
> > +               irq_work_queue_on(&sg_policy->irq_work, cpu);
> > +               return true;
> > +       }
> > +
> > +       return false;
> > +}
> > +#else
> > +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
> > +                                        int cpu)
> > +{
> > +       return false;
> > +}
> > +#endif
> > +
> > +static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
> 
> It looks like you might pass hook here instead of the sg_cpu, cpu pair.

I can do that but it means having to do the comtainer_of operation
again. Strictly speaking this seems slightly less efficient than passing
the above values which are already available in the callers.

> 
> >                                 unsigned int next_freq)
> >  {
> > +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >         struct cpufreq_policy *policy = sg_policy->policy;
> >
> >         sg_policy->last_freq_update_time = time;
> >
> > +       if (sg_policy->next_freq == next_freq) {
> > +               trace_cpu_frequency(policy->cur, cpu);
> > +               return;
> > +       }
> 
> There was a reason why I put the above under the fast_switch_enabled
> branch and it was because this check/trace is not necessary otherwise.

I remember asking about this tracepoint earlier. You had said it was
required because powertop would not work without it (reporting the CPU
as idle in certain situations).

I'm not sure why that is only true for the fast switch enabled case but
it seems like an odd inconsistency for the governor to trace unchanged
frequencies when fast switches are enabled but not otherwise. It'd be
useful I think for profiling and tuning if the tracing was consistent.

This behavioral change is admittedly not part of the purpose of the
patch and could be split out if needbe.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-19 12:00     ` Rafael J. Wysocki
@ 2016-05-19 19:19       ` Steve Muckle
  2016-05-19 21:06         ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 19:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 02:00:54PM +0200, Rafael J. Wysocki wrote:
> On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> >> Without calling the cpufreq hook for a remote wakeup it is possible
> >> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
> >> to a full tick. This can occur if the target CPU is running a
> >> CPU-bound task and preemption does not occur. If preemption does occur
> >> then the scheduler is expected to run soon on the target CPU anyway so
> >> invoking the cpufreq hook on the remote wakeup is not required.
> >>
> >> In order to avoid unnecessary interprocessor communication in the
> >> governor for the preemption case, the existing hook (which happens
> >> before preemption is decided) is only called when the target CPU
> >> is within the current CPU's cpufreq policy. A new hook is added in
> >> check_preempt_curr() to handle events outside the current CPU's
> >> cpufreq policy where preemption does not happen.
> >>
> >> Some governors may opt to not receive remote CPU callbacks. This
> >> behavior is possible by providing NULL as the new policy_cpus
> >> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
> >> issued in this case when the target CPU and the current CPU are the
> >> same. Otherwise policy_cpus is used to determine what is a local
> >> vs. remote callback.
> >
> > I don't really like the extra complexity added by this patch.
> >
> > It makes the code look fragile at some places 

Perhaps I can improve these, can you point them out?

> > and it only really matters for schedutil 

True. As we are trying to create an integrated scheduler+CPU frequency
control solution I think some of this is to be expected, and should be
worthwhile since this is hopefully the future and will eventually
replace the need for the other governors.

> > and for the fast switch case in there.

Once there is a high priority context for the slow path I expect it to
benefit from this as well.

> >
> > Overall, it looks like a premature optimization to me.

Are you referring to this new approach of avoiding duplicate IPIs, or
supporting updates on remote wakeups overall?

The duplicate IPI thing is admittedly not required for the problem I'm
looking to solve but I figured at least some people would be concerned
about it.  If folks can live with it for now then I can go back to the
simpler approach I had in my first posting.
 
> In particular, the dance with checking the policy CPUs from the
> scheduler is questionable (the scheduler might be interested in this
> information for other purposes too and hooking it up in an ad-hoc way
> just for cpufreq doesn't seem to be appropriate from that perspective)
> and also doesn't seem to be necessary.
> 
> You can check if the current CPU is a policy one from the governor and
> if that is the case, simply run the frequency update on it directly
> without any IPI (because if both the target CPU and the current CPU
> belong to the same policy, it doesn't matter which one of them will
> run the update).

The checking of policy CPUs from the scheduler was done so as to
minimize the number of calls to the hook, given their expense.

In the case of a remote update the hook has to run (or not) after it is
known whether preemption will occur so we don't do needless work or
IPIs. If the policy CPUs aren't known in the scheduler then the early
hook will always need to be called along with an indication that it is
the early hook being called. If it turns out to be a remote update it
could then be deferred to the later hook, which would only be called
when a remote update has been deferred and preemption has not occurred.

This means two hook inovcations for a remote non-preempting wakeup
though instead of one.  Perhaps this is a good middle ground on code
churn vs. optimization though.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency
  2016-05-18 23:37   ` Rafael J. Wysocki
@ 2016-05-19 19:35     ` Steve Muckle
  2016-05-19 21:07       ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 19:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 01:37:40AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > The mechanisms for remote CPU updates and slow-path frequency
> > transitions are relatively expensive - the former is an IPI while the
> > latter requires waking up a thread to do work. These activities should
> > be avoided if they are not necessary. To that end, calculate the
> > actual target-supported frequency required by the new utilization
> > value in schedutil. If it is the same as the previously requested
> > frequency then there is no need to continue with the update.
> 
> Unless the max/min limits changed in the meantime, right?

Right, I'll amend the commit text. The functionality is correct AFAICS.

> >
> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 6cb2ecc204ec..e185075fcb5c 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -153,14 +153,26 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
> >   * next_freq = C * curr_freq * util_raw / max
> >   *
> >   * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
> > + *
> > + * The lowest target-supported frequency which is equal or greater than the raw
> > + * next_freq (as calculated above) is returned, or the CPU's max_freq if such
> > + * a target-supported frequency does not exist.
> >   */
> >  static unsigned int get_next_freq(struct cpufreq_policy *policy,
> >                                   unsigned long util, unsigned long max)
> >  {
> > +       struct cpufreq_frequency_table *entry;
> >         unsigned int freq = arch_scale_freq_invariant() ?
> >                                 policy->cpuinfo.max_freq : policy->cur;
> > +       unsigned int target_freq = UINT_MAX;
> > +
> > +       freq = (freq + (freq >> 2)) * util / max;
> > +
> > +       cpufreq_for_each_valid_entry(entry, policy->freq_table)
> > +               if (entry->frequency >= freq && entry->frequency < target_freq)
> > +                       target_freq = entry->frequency;
> 
> Please don't assume that every driver will have a frequency table.
> That may not be the case in the future (and I'm not even sure about
> the existing CPPC driver for that matter).

For platforms without a frequency table I guess we can just continue
with the current behavior, passing in the raw calculated frequency. I'll
make this change.

At some point I imagine those platforms will want to somehow achieve
similar behavior to avoid very small transitions that do not result in
real benefit. Maybe some sort of threshold % in the schedutil down the
road.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-18 23:44   ` Rafael J. Wysocki
@ 2016-05-19 19:46     ` Steve Muckle
  2016-05-19 21:15       ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 19:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 01:44:36AM +0200, Rafael J. Wysocki wrote:
> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > The rate limit timestamp (last_freq_update_time) is currently advanced
> > anytime schedutil re-evaluates the policy regardless of whether the CPU
> > frequency is changed or not. This means that utilization updates which
> > have no effect can cause much more significant utilization updates
> > (which require a large increase or decrease in CPU frequency) to be
> > delayed due to rate limiting.
> >
> > Instead only update the rate limiting timstamp when the requested
> > target-supported frequency changes. The rate limit will now apply to
> > the rate of CPU frequency changes rather than the rate of
> > re-evaluations of the policy frequency.
> >
> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
> 
> I'm sort of divided here to be honest.

It is true that this means we'll do more frequency re-evaluations, they
will occur until an actual frequency change is requested.

But the way it stands now, with a system's typical background activity
there are so many minor events that it is very common for throttling to
be in effect, causing major events to be ignored. 
> 
> > ---
> >  kernel/sched/cpufreq_schedutil.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e185075fcb5c..4d2907c8a142 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
> >         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
> >         struct cpufreq_policy *policy = sg_policy->policy;
> >
> > -       sg_policy->last_freq_update_time = time;
> > -
> >         if (sg_policy->next_freq == next_freq) {
> >                 trace_cpu_frequency(policy->cur, cpu);
> 
> You should at least rate limit the trace_cpu_frequency() thing here if
> you don't want to advance the last update time I think, or you may
> easily end up with the trace buffer flooded by irrelevant stuff.

Going back to the reason this tracepoint exists, is it known why
powertop thinks the CPU is idle when this tracepoint is removed? Maybe
it's possible to get rid of this tracepoint altogether.

Thanks for reviewing the series.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  2016-05-19 18:40     ` Steve Muckle
@ 2016-05-19 20:55       ` Rafael J. Wysocki
  2016-05-19 22:59         ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 20:55 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 8:40 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 01:24:41AM +0200, Rafael J. Wysocki wrote:
>> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > In preparation for the scheduler cpufreq callback happening on remote
>> > CPUs, add support for this in schedutil.
>> >
>> > Schedutil currently requires the callback occur on the CPU being
>> > updated in order to support fast frequency switches. Remove this
>> > limitation by checking for the current CPU being outside the target
>> > CPU's cpufreq policy and if this is the case, enqueuing an irq_work on
>> > the target CPU. The irq_work for schedutil is modified to carry out a
>> > fast frequency switch if that is enabled for the policy.
>> >
>> > If the callback occurs on a CPU within the target CPU's policy, the
>> > transition is carried out on the local CPU.
>> >
>> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
>> > ---
>> >  kernel/sched/cpufreq_schedutil.c | 86 ++++++++++++++++++++++++++++++----------
>> >  1 file changed, 65 insertions(+), 21 deletions(-)
>> >
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index 154ae3a51e86..c81f9432f520 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -76,27 +76,61 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>> >         return delta_ns >= sg_policy->freq_update_delay_ns;
>> >  }
>> >
>> > -static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>> > +static void sugov_fast_switch(struct sugov_policy *sg_policy, int cpu,
>> > +                             unsigned int next_freq)
>> > +{
>> > +       struct cpufreq_policy *policy = sg_policy->policy;
>> > +
>> > +       next_freq = cpufreq_driver_fast_switch(policy, next_freq);
>> > +       if (next_freq == CPUFREQ_ENTRY_INVALID)
>> > +               return;
>> > +
>> > +       policy->cur = next_freq;
>> > +       trace_cpu_frequency(next_freq, cpu);
>> > +}
>> > +
>> > +#ifdef CONFIG_SMP
>>
>> schedutil depends on CONFIG_SMP now, so that's not necessary at least
>> for the time being.
>
> Will remove.
>
>>
>> > +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
>> > +                                        int cpu)
>> > +{
>> > +       struct cpufreq_policy *policy = sg_policy->policy;
>> > +
>> > +       if (!cpumask_test_cpu(smp_processor_id(), policy->cpus)) {
>>
>> This check is overkill for policies that aren't shared (and we have a
>> special case for them already).
>
> I don't see why it is overkill -

Because it requires more computation, memory accesses etc than simply
comparing smp_processor_id() with cpu.

> regardless of whether the policy is
> shared, we need to determine whether or not we are running on one of the
> CPUs (or in the case of a non-shared policy, the single CPU) within that
> policy to know whether we can immediately change the frequency in this
> context or a remote call is required.
>
>> > +               sg_policy->work_in_progress = true;
>> > +               irq_work_queue_on(&sg_policy->irq_work, cpu);
>> > +               return true;
>> > +       }
>> > +
>> > +       return false;
>> > +}
>> > +#else
>> > +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
>> > +                                        int cpu)
>> > +{
>> > +       return false;
>> > +}
>> > +#endif
>> > +
>> > +static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>>
>> It looks like you might pass hook here instead of the sg_cpu, cpu pair.
>
> I can do that but it means having to do the comtainer_of operation
> again. Strictly speaking this seems slightly less efficient than passing
> the above values which are already available in the callers.

Well, it seems a bit odd to pass two things referring to the same CPU,
but then I don't care that much.

>> >                                 unsigned int next_freq)
>> >  {
>> > +       struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>> >         struct cpufreq_policy *policy = sg_policy->policy;
>> >
>> >         sg_policy->last_freq_update_time = time;
>> >
>> > +       if (sg_policy->next_freq == next_freq) {
>> > +               trace_cpu_frequency(policy->cur, cpu);
>> > +               return;
>> > +       }
>>
>> There was a reason why I put the above under the fast_switch_enabled
>> branch and it was because this check/trace is not necessary otherwise.
>
> I remember asking about this tracepoint earlier. You had said it was
> required because powertop would not work without it (reporting the CPU
> as idle in certain situations).
>
> I'm not sure why that is only true for the fast switch enabled case

Because in the other case cpufreq stats are used by powertop and then
this problem is not visible.

> but it seems like an odd inconsistency for the governor to trace unchanged
> frequencies when fast switches are enabled but not otherwise. It'd be
> useful I think for profiling and tuning if the tracing was consistent.

Well, fair enough.

> This behavioral change is admittedly not part of the purpose of the
> patch and could be split out if needbe.

No need to split IMO, but it might be prudent to mention that change
in behavior in the changelog.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-19 19:19       ` Steve Muckle
@ 2016-05-19 21:06         ` Rafael J. Wysocki
  2016-05-19 23:04           ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 21:06 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 9:19 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 02:00:54PM +0200, Rafael J. Wysocki wrote:
>> On Thu, May 19, 2016 at 1:33 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> > On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> >> Without calling the cpufreq hook for a remote wakeup it is possible
>> >> for such a wakeup to go unnoticed by cpufreq on the target CPU for up
>> >> to a full tick. This can occur if the target CPU is running a
>> >> CPU-bound task and preemption does not occur. If preemption does occur
>> >> then the scheduler is expected to run soon on the target CPU anyway so
>> >> invoking the cpufreq hook on the remote wakeup is not required.
>> >>
>> >> In order to avoid unnecessary interprocessor communication in the
>> >> governor for the preemption case, the existing hook (which happens
>> >> before preemption is decided) is only called when the target CPU
>> >> is within the current CPU's cpufreq policy. A new hook is added in
>> >> check_preempt_curr() to handle events outside the current CPU's
>> >> cpufreq policy where preemption does not happen.
>> >>
>> >> Some governors may opt to not receive remote CPU callbacks. This
>> >> behavior is possible by providing NULL as the new policy_cpus
>> >> parameter to cpufreq_add_update_util_hook(). Callbacks will only be
>> >> issued in this case when the target CPU and the current CPU are the
>> >> same. Otherwise policy_cpus is used to determine what is a local
>> >> vs. remote callback.
>> >
>> > I don't really like the extra complexity added by this patch.
>> >
>> > It makes the code look fragile at some places
>
> Perhaps I can improve these, can you point them out?
>
>> > and it only really matters for schedutil
>
> True. As we are trying to create an integrated scheduler+CPU frequency
> control solution I think some of this is to be expected, and should be
> worthwhile since this is hopefully the future and will eventually
> replace the need for the other governors.
>
>> > and for the fast switch case in there.
>
> Once there is a high priority context for the slow path I expect it to
> benefit from this as well.

I don't think that saving an occasional IPI would matter for that case
overall, though.

>> >
>> > Overall, it looks like a premature optimization to me.
>
> Are you referring to this new approach of avoiding duplicate IPIs,

Yes.

> or supporting updates on remote wakeups overall?

No.  I already said I would be fine with that if done properly.

> The duplicate IPI thing is admittedly not required for the problem I'm
> looking to solve but I figured at least some people would be concerned
> about it.

Avoiding IPIs that aren't necessary is fine by me in general, but
doing that at the scheduler level doesn't seem to be necessary as I
said.

> If folks can live with it for now then I can go back to the
> simpler approach I had in my first posting.

Please at least avoid introducing internal cpufreq concepts into the
scheduler uncleanly.

>> In particular, the dance with checking the policy CPUs from the
>> scheduler is questionable (the scheduler might be interested in this
>> information for other purposes too and hooking it up in an ad-hoc way
>> just for cpufreq doesn't seem to be appropriate from that perspective)
>> and also doesn't seem to be necessary.
>>
>> You can check if the current CPU is a policy one from the governor and
>> if that is the case, simply run the frequency update on it directly
>> without any IPI (because if both the target CPU and the current CPU
>> belong to the same policy, it doesn't matter which one of them will
>> run the update).
>
> The checking of policy CPUs from the scheduler was done so as to
> minimize the number of calls to the hook, given their expense.

But policy CPUs is entirely an internal cpufreq concept and adding
that to the scheduler kind of via a kitchen door doesn't look good to
me.

> In the case of a remote update the hook has to run (or not) after it is
> known whether preemption will occur so we don't do needless work or
> IPIs. If the policy CPUs aren't known in the scheduler then the early
> hook will always need to be called along with an indication that it is
> the early hook being called. If it turns out to be a remote update it
> could then be deferred to the later hook, which would only be called
> when a remote update has been deferred and preemption has not occurred.
>
> This means two hook inovcations for a remote non-preempting wakeup
> though instead of one.  Perhaps this is a good middle ground on code
> churn vs. optimization though.

I would think so.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency
  2016-05-19 19:35     ` Steve Muckle
@ 2016-05-19 21:07       ` Rafael J. Wysocki
  0 siblings, 0 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 21:07 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 9:35 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 01:37:40AM +0200, Rafael J. Wysocki wrote:
>> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > The mechanisms for remote CPU updates and slow-path frequency
>> > transitions are relatively expensive - the former is an IPI while the
>> > latter requires waking up a thread to do work. These activities should
>> > be avoided if they are not necessary. To that end, calculate the
>> > actual target-supported frequency required by the new utilization
>> > value in schedutil. If it is the same as the previously requested
>> > frequency then there is no need to continue with the update.
>>
>> Unless the max/min limits changed in the meantime, right?
>
> Right, I'll amend the commit text. The functionality is correct AFAICS.
>
>> >
>> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
>> > ---
>> >  kernel/sched/cpufreq_schedutil.c | 14 +++++++++++++-
>> >  1 file changed, 13 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index 6cb2ecc204ec..e185075fcb5c 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -153,14 +153,26 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>> >   * next_freq = C * curr_freq * util_raw / max
>> >   *
>> >   * Take C = 1.25 for the frequency tipping point at (util / max) = 0.8.
>> > + *
>> > + * The lowest target-supported frequency which is equal or greater than the raw
>> > + * next_freq (as calculated above) is returned, or the CPU's max_freq if such
>> > + * a target-supported frequency does not exist.
>> >   */
>> >  static unsigned int get_next_freq(struct cpufreq_policy *policy,
>> >                                   unsigned long util, unsigned long max)
>> >  {
>> > +       struct cpufreq_frequency_table *entry;
>> >         unsigned int freq = arch_scale_freq_invariant() ?
>> >                                 policy->cpuinfo.max_freq : policy->cur;
>> > +       unsigned int target_freq = UINT_MAX;
>> > +
>> > +       freq = (freq + (freq >> 2)) * util / max;
>> > +
>> > +       cpufreq_for_each_valid_entry(entry, policy->freq_table)
>> > +               if (entry->frequency >= freq && entry->frequency < target_freq)
>> > +                       target_freq = entry->frequency;
>>
>> Please don't assume that every driver will have a frequency table.
>> That may not be the case in the future (and I'm not even sure about
>> the existing CPPC driver for that matter).
>
> For platforms without a frequency table I guess we can just continue
> with the current behavior, passing in the raw calculated frequency. I'll
> make this change.
>
> At some point I imagine those platforms will want to somehow achieve
> similar behavior to avoid very small transitions that do not result in
> real benefit. Maybe some sort of threshold % in the schedutil down the
> road.

So honestly, I'd like to defer this particular optimization for the time being.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-19 19:46     ` Steve Muckle
@ 2016-05-19 21:15       ` Rafael J. Wysocki
  2016-05-19 23:34         ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 21:15 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 9:46 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 01:44:36AM +0200, Rafael J. Wysocki wrote:
>> On Mon, May 9, 2016 at 11:20 PM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > The rate limit timestamp (last_freq_update_time) is currently advanced
>> > anytime schedutil re-evaluates the policy regardless of whether the CPU
>> > frequency is changed or not. This means that utilization updates which
>> > have no effect can cause much more significant utilization updates
>> > (which require a large increase or decrease in CPU frequency) to be
>> > delayed due to rate limiting.
>> >
>> > Instead only update the rate limiting timstamp when the requested
>> > target-supported frequency changes. The rate limit will now apply to
>> > the rate of CPU frequency changes rather than the rate of
>> > re-evaluations of the policy frequency.
>> >
>> > Signed-off-by: Steve Muckle <smuckle@linaro.org>
>>
>> I'm sort of divided here to be honest.
>
> It is true that this means we'll do more frequency re-evaluations, they
> will occur until an actual frequency change is requested.
>
> But the way it stands now, with a system's typical background activity
> there are so many minor events that it is very common for throttling to
> be in effect, causing major events to be ignored.
>>
>> > ---
>> >  kernel/sched/cpufreq_schedutil.c | 3 +--
>> >  1 file changed, 1 insertion(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index e185075fcb5c..4d2907c8a142 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -117,12 +117,11 @@ static void sugov_update_commit(struct sugov_cpu *sg_cpu, int cpu, u64 time,
>> >         struct sugov_policy *sg_policy = sg_cpu->sg_policy;
>> >         struct cpufreq_policy *policy = sg_policy->policy;
>> >
>> > -       sg_policy->last_freq_update_time = time;
>> > -
>> >         if (sg_policy->next_freq == next_freq) {
>> >                 trace_cpu_frequency(policy->cur, cpu);
>>
>> You should at least rate limit the trace_cpu_frequency() thing here if
>> you don't want to advance the last update time I think, or you may
>> easily end up with the trace buffer flooded by irrelevant stuff.
>
> Going back to the reason this tracepoint exists, is it known why
> powertop thinks the CPU is idle when this tracepoint is removed? Maybe
> it's possible to get rid of this tracepoint altogether.

I'm not sure ATM.  It seems to go by the time stamps and declare idle
if it doesn't see updates for long enough.

I was hoping to be able to make cpufreq stats usable for the fast
switch case, but that appears to mean some major surgery in there.

But anyway this change again seems to be an optimization that might be
done later to me.

I guess there are many things that might be optimized in schedutil,
but I'd prefer to address one item at a time, maybe going after the
ones that appear most relevant first?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  2016-05-19 20:55       ` Rafael J. Wysocki
@ 2016-05-19 22:59         ` Steve Muckle
  2016-05-19 23:14           ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 22:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 10:55:23PM +0200, Rafael J. Wysocki wrote:
> >> > +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
> >> > +                                        int cpu)
> >> > +{
> >> > +       struct cpufreq_policy *policy = sg_policy->policy;
> >> > +
> >> > +       if (!cpumask_test_cpu(smp_processor_id(), policy->cpus)) {
> >>
> >> This check is overkill for policies that aren't shared (and we have a
> >> special case for them already).
> >
> > I don't see why it is overkill -
> 
> Because it requires more computation, memory accesses etc than simply
> comparing smp_processor_id() with cpu.

Do you have a preference on how to restructure this? Otherwise I'll
create a second version of sugov_update_commit, factoring out as much of
it as I can into two inline sub-functions. 

...
> 
> > but it seems like an odd inconsistency for the governor to trace unchanged
> > frequencies when fast switches are enabled but not otherwise. It'd be
> > useful I think for profiling and tuning if the tracing was consistent.
> 
> Well, fair enough.
> 
> > This behavioral change is admittedly not part of the purpose of the
> > patch and could be split out if needbe.
> 
> No need to split IMO, but it might be prudent to mention that change
> in behavior in the changelog.

Will do.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-19 21:06         ` Rafael J. Wysocki
@ 2016-05-19 23:04           ` Steve Muckle
  2016-05-21 19:46             ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 23:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > In the case of a remote update the hook has to run (or not) after it is
> > known whether preemption will occur so we don't do needless work or
> > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > hook will always need to be called along with an indication that it is
> > the early hook being called. If it turns out to be a remote update it
> > could then be deferred to the later hook, which would only be called
> > when a remote update has been deferred and preemption has not occurred.
> >
> > This means two hook inovcations for a remote non-preempting wakeup
> > though instead of one.  Perhaps this is a good middle ground on code
> > churn vs. optimization though.
> 
> I would think so.

Ok, I will pursue this approach.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs
  2016-05-19 22:59         ` Steve Muckle
@ 2016-05-19 23:14           ` Rafael J. Wysocki
  0 siblings, 0 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-19 23:14 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 12:59 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 10:55:23PM +0200, Rafael J. Wysocki wrote:
>> >> > +static inline bool sugov_queue_remote_callback(struct sugov_policy *sg_policy,
>> >> > +                                        int cpu)
>> >> > +{
>> >> > +       struct cpufreq_policy *policy = sg_policy->policy;
>> >> > +
>> >> > +       if (!cpumask_test_cpu(smp_processor_id(), policy->cpus)) {
>> >>
>> >> This check is overkill for policies that aren't shared (and we have a
>> >> special case for them already).
>> >
>> > I don't see why it is overkill -
>>
>> Because it requires more computation, memory accesses etc than simply
>> comparing smp_processor_id() with cpu.
>
> Do you have a preference on how to restructure this?

Not really.

> Otherwise I'll create a second version of sugov_update_commit, factoring out as much of
> it as I can into two inline sub-functions.

I guess in that case it might be better to fold the
sugov_update_commit() code into its callers and then factor out common
stuff into sub-functions called from there.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-19 21:15       ` Rafael J. Wysocki
@ 2016-05-19 23:34         ` Steve Muckle
  2016-05-20  0:24           ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-19 23:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
> But anyway this change again seems to be an optimization that might be
> done later to me.
> 
> I guess there are many things that might be optimized in schedutil,
> but I'd prefer to address one item at a time, maybe going after the
> ones that appear most relevant first?

Calling the last two patches in this series optimizations is a stretch
IMO. Issuing frequency change requests that result in the same
target-supported frequency is clearly unnecessary and ends up delaying
more urgent frequency changes, which I think is more like a bug. These
patches are also needed in conjunction with the first three to address
the remote wakeup delay.

Are there specific items you want to see addressed before these patches
could go in? I'm aware of the RT/DL support that needs improving, though
that should be orthogonal.

Also if it helps, I can provide a test case and/or traces to show the
need for the last two patches.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-19 23:34         ` Steve Muckle
@ 2016-05-20  0:24           ` Rafael J. Wysocki
  2016-05-20  0:37             ` Rafael J. Wysocki
  2016-05-20  0:37             ` Steve Muckle
  0 siblings, 2 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20  0:24 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
>> But anyway this change again seems to be an optimization that might be
>> done later to me.
>>
>> I guess there are many things that might be optimized in schedutil,
>> but I'd prefer to address one item at a time, maybe going after the
>> ones that appear most relevant first?
>
> Calling the last two patches in this series optimizations is a stretch
> IMO. Issuing frequency change requests that result in the same
> target-supported frequency is clearly unnecessary and ends up delaying
> more urgent frequency changes, which I think is more like a bug.

The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
Frequency tables don't belong in schedutil, so don't pull them in
there.

If you want to do that cleanly, add a call to the driver that will
tell you what frequency would be selected by it if it were given a
particular target.

I actually do agree with the direction of it and the [5/5], but I
don't like cutting corners. :-)

> These patches are also needed in conjunction with the first three to address
> the remote wakeup delay.

Well, does this mean that without the [4-5/5] the rest of the series
doesn't provide as much benefit as initially expected?

> Are there specific items you want to see addressed before these patches could go in?

Do you mean in addition to what I already said in my comments?

> I'm aware of the RT/DL support that needs improving, though
> that should be orthogonal.
>
> Also if it helps, I can provide a test case and/or traces to show the
> need for the last two patches.

Yes, that will help.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20  0:24           ` Rafael J. Wysocki
@ 2016-05-20  0:37             ` Rafael J. Wysocki
  2016-05-20  0:40               ` Steve Muckle
  2016-05-20  0:37             ` Steve Muckle
  1 sibling, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20  0:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 2:24 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
>>> But anyway this change again seems to be an optimization that might be
>>> done later to me.
>>>
>>> I guess there are many things that might be optimized in schedutil,
>>> but I'd prefer to address one item at a time, maybe going after the
>>> ones that appear most relevant first?
>>
>> Calling the last two patches in this series optimizations is a stretch
>> IMO. Issuing frequency change requests that result in the same
>> target-supported frequency is clearly unnecessary and ends up delaying
>> more urgent frequency changes, which I think is more like a bug.
>
> The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
> Frequency tables don't belong in schedutil, so don't pull them in
> there.
>
> If you want to do that cleanly, add a call to the driver that will
> tell you what frequency would be selected by it if it were given a
> particular target.

Also I think that it would be good to avoid walking the frequency
table twice in case we end up wanting to update the frequency after
all.  With the [4/5] we'd do it once in get_next_freq() and then once
more in cpufreq_driver_fast_switch(), for example, and walking the
frequency table may be more expensive that doing the switch in the
first place.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20  0:24           ` Rafael J. Wysocki
  2016-05-20  0:37             ` Rafael J. Wysocki
@ 2016-05-20  0:37             ` Steve Muckle
  2016-05-20  0:55               ` Rafael J. Wysocki
  1 sibling, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-20  0:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 02:24:19AM +0200, Rafael J. Wysocki wrote:
> On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> > On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
> >> But anyway this change again seems to be an optimization that might be
> >> done later to me.
> >>
> >> I guess there are many things that might be optimized in schedutil,
> >> but I'd prefer to address one item at a time, maybe going after the
> >> ones that appear most relevant first?
> >
> > Calling the last two patches in this series optimizations is a stretch
> > IMO. Issuing frequency change requests that result in the same
> > target-supported frequency is clearly unnecessary and ends up delaying
> > more urgent frequency changes, which I think is more like a bug.
> 
> The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
> Frequency tables don't belong in schedutil, so don't pull them in
> there.
> 
> If you want to do that cleanly, add a call to the driver that will
> tell you what frequency would be selected by it if it were given a
> particular target.
> 
> I actually do agree with the direction of it and the [5/5], but I
> don't like cutting corners. :-)

Okay sure, makes sense and I'll work on a change to do that.

> 
> > These patches are also needed in conjunction with the first three to address
> > the remote wakeup delay.
> 
> Well, does this mean that without the [4-5/5] the rest of the series
> doesn't provide as much benefit as initially expected?

At least in my testing the last two patches were required to show the
improvement with the unit test I had developed. This is because all the
minor system activity would keep pushing the rate limit out so when the
remote cb came in it had no effect.

So the last two patches aren't a hard requirement to theoretically solve
the problem but in practice I don't think a benefit will be seen without
them.

> > Are there specific items you want to see addressed before these patches could go in?
> 
> Do you mean in addition to what I already said in my comments?

I was referring to the other possible optimizations/changes you
mentioned in schedutil that might serve as cause to defer these patches.

> 
> > I'm aware of the RT/DL support that needs improving, though
> > that should be orthogonal.
> >
> > Also if it helps, I can provide a test case and/or traces to show the
> > need for the last two patches.
> 
> Yes, that will help.

Okay I'll include a pointer to my unit test program and some trace data
in my next post.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20  0:37             ` Rafael J. Wysocki
@ 2016-05-20  0:40               ` Steve Muckle
  2016-05-20  0:46                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-20  0:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
> Also I think that it would be good to avoid walking the frequency
> table twice in case we end up wanting to update the frequency after
> all.  With the [4/5] we'd do it once in get_next_freq() and then once
> more in cpufreq_driver_fast_switch(), for example, and walking the
> frequency table may be more expensive that doing the switch in the
> first place.

If a driver API is added to return the platform frequency associated
with a target frequency, what do you think about requiring the
fast_switch API to take a target-supported frequency?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20  0:40               ` Steve Muckle
@ 2016-05-20  0:46                 ` Rafael J. Wysocki
  2016-05-20 11:39                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20  0:46 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>> Also I think that it would be good to avoid walking the frequency
>> table twice in case we end up wanting to update the frequency after
>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>> more in cpufreq_driver_fast_switch(), for example, and walking the
>> frequency table may be more expensive that doing the switch in the
>> first place.
>
> If a driver API is added to return the platform frequency associated
> with a target frequency, what do you think about requiring the
> fast_switch API to take a target-supported frequency?

That doesn't help much, because it generally would need to find a
table entry corresponding to it anyway, to find the actual command
value to write to a register, for example.

But the driver could be smart and cache the value returned from the
new callback along with the command value associated with it.  If
invoked with that particular frequency, it would use the cached
command.  Otherwise, it would walk the table.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20  0:37             ` Steve Muckle
@ 2016-05-20  0:55               ` Rafael J. Wysocki
  0 siblings, 0 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20  0:55 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 2:37 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
> On Fri, May 20, 2016 at 02:24:19AM +0200, Rafael J. Wysocki wrote:
>> On Fri, May 20, 2016 at 1:34 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> > On Thu, May 19, 2016 at 11:15:52PM +0200, Rafael J. Wysocki wrote:
>> >> But anyway this change again seems to be an optimization that might be
>> >> done later to me.
>> >>
>> >> I guess there are many things that might be optimized in schedutil,
>> >> but I'd prefer to address one item at a time, maybe going after the
>> >> ones that appear most relevant first?
>> >
>> > Calling the last two patches in this series optimizations is a stretch
>> > IMO. Issuing frequency change requests that result in the same
>> > target-supported frequency is clearly unnecessary and ends up delaying
>> > more urgent frequency changes, which I think is more like a bug.
>>
>> The [4/5] is pulling stuff where it doesn't belong.  Simple as that.
>> Frequency tables don't belong in schedutil, so don't pull them in
>> there.
>>
>> If you want to do that cleanly, add a call to the driver that will
>> tell you what frequency would be selected by it if it were given a
>> particular target.
>>
>> I actually do agree with the direction of it and the [5/5], but I
>> don't like cutting corners. :-)
>
> Okay sure, makes sense and I'll work on a change to do that.
>
>>
>> > These patches are also needed in conjunction with the first three to address
>> > the remote wakeup delay.
>>
>> Well, does this mean that without the [4-5/5] the rest of the series
>> doesn't provide as much benefit as initially expected?
>
> At least in my testing the last two patches were required to show the
> improvement with the unit test I had developed. This is because all the
> minor system activity would keep pushing the rate limit out so when the
> remote cb came in it had no effect.
>
> So the last two patches aren't a hard requirement to theoretically solve
> the problem but in practice I don't think a benefit will be seen without
> them.

Fair enough.

>> > Are there specific items you want to see addressed before these patches could go in?
>>
>> Do you mean in addition to what I already said in my comments?
>
> I was referring to the other possible optimizations/changes you
> mentioned in schedutil that might serve as cause to defer these patches.

That is sort of orthogonal, but we need to be able to pass hints from
the scheduler via cpufreq_update_util(), for a couple of reasons, and
maybe use those to trigger immediate updates ignoring the rate limit
in some cases.

But to avoid extending the list of arguments to pass to
cpufreq_update_util(), it would be good to teach schedutil to read the
utilization data directly first (Peter had a patch to do that some
time ago, but it would need to be rebased on top of the current code).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20  0:46                 ` Rafael J. Wysocki
@ 2016-05-20 11:39                   ` Rafael J. Wysocki
  2016-05-20 11:54                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20 11:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Steve Muckle, Peter Zijlstra, Ingo Molnar,
	Linux Kernel Mailing List, linux-pm, Vincent Guittot,
	Morten Rasmussen, Dietmar Eggemann, Juri Lelli, Patrick Bellasi,
	Michael Turquette, Viresh Kumar, Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 2:46 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>>> Also I think that it would be good to avoid walking the frequency
>>> table twice in case we end up wanting to update the frequency after
>>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>>> more in cpufreq_driver_fast_switch(), for example, and walking the
>>> frequency table may be more expensive that doing the switch in the
>>> first place.
>>
>> If a driver API is added to return the platform frequency associated
>> with a target frequency, what do you think about requiring the
>> fast_switch API to take a target-supported frequency?
>
> That doesn't help much, because it generally would need to find a
> table entry corresponding to it anyway, to find the actual command
> value to write to a register, for example.
>
> But the driver could be smart and cache the value returned from the
> new callback along with the command value associated with it.  If
> invoked with that particular frequency, it would use the cached
> command.  Otherwise, it would walk the table.

It also makes sense to save both the "raw" value computed by
get_next_freq() and the corresponding "driver" value, because if the
current "raw" value is equal to the previous "raw" value, it shouldn't
be necessary to walk the frequency table at all (as the previous
"driver" value would then be equal to the current "driver" value too).

So maybe the "driver" value should only be checked after the "raw"
value check in sugov_update_commit() or equivalent?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20 11:39                   ` Rafael J. Wysocki
@ 2016-05-20 11:54                     ` Rafael J. Wysocki
  2016-05-20 11:59                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20 11:54 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Linux Kernel Mailing List, linux-pm,
	Vincent Guittot, Morten Rasmussen, Dietmar Eggemann, Juri Lelli,
	Patrick Bellasi, Michael Turquette, Viresh Kumar,
	Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 1:39 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 2:46 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>>> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>>>> Also I think that it would be good to avoid walking the frequency
>>>> table twice in case we end up wanting to update the frequency after
>>>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>>>> more in cpufreq_driver_fast_switch(), for example, and walking the
>>>> frequency table may be more expensive that doing the switch in the
>>>> first place.
>>>
>>> If a driver API is added to return the platform frequency associated
>>> with a target frequency, what do you think about requiring the
>>> fast_switch API to take a target-supported frequency?
>>
>> That doesn't help much, because it generally would need to find a
>> table entry corresponding to it anyway, to find the actual command
>> value to write to a register, for example.
>>
>> But the driver could be smart and cache the value returned from the
>> new callback along with the command value associated with it.  If
>> invoked with that particular frequency, it would use the cached
>> command.  Otherwise, it would walk the table.
>
> It also makes sense to save both the "raw" value computed by
> get_next_freq() and the corresponding "driver" value, because if the
> current "raw" value is equal to the previous "raw" value, it shouldn't
> be necessary to walk the frequency table at all (as the previous
> "driver" value would then be equal to the current "driver" value too).
>
> So maybe the "driver" value should only be checked after the "raw"
> value check in sugov_update_commit() or equivalent?

Moreover, you need to be careful about policy->min/max changes,
because both cpufreq_driver_fast_switch() and
__cpufreq_driver_target() clamp the target frequency between those and
if they change in the meantime, you may end up having to use a
different frequency at the driver level even if you get the same "raw"
value as last time.

It looks like we don't do the right thing here in the current code even ...

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged
  2016-05-20 11:54                     ` Rafael J. Wysocki
@ 2016-05-20 11:59                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 35+ messages in thread
From: Rafael J. Wysocki @ 2016-05-20 11:59 UTC (permalink / raw)
  To: Steve Muckle
  Cc: Peter Zijlstra, Ingo Molnar, Linux Kernel Mailing List, linux-pm,
	Vincent Guittot, Morten Rasmussen, Dietmar Eggemann, Juri Lelli,
	Patrick Bellasi, Michael Turquette, Viresh Kumar,
	Srinivas Pandruvada, Len Brown

On Fri, May 20, 2016 at 1:54 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, May 20, 2016 at 1:39 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Fri, May 20, 2016 at 2:46 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>> On Fri, May 20, 2016 at 2:40 AM, Steve Muckle <steve.muckle@linaro.org> wrote:
>>>> On Fri, May 20, 2016 at 02:37:17AM +0200, Rafael J. Wysocki wrote:
>>>>> Also I think that it would be good to avoid walking the frequency
>>>>> table twice in case we end up wanting to update the frequency after
>>>>> all.  With the [4/5] we'd do it once in get_next_freq() and then once
>>>>> more in cpufreq_driver_fast_switch(), for example, and walking the
>>>>> frequency table may be more expensive that doing the switch in the
>>>>> first place.
>>>>
>>>> If a driver API is added to return the platform frequency associated
>>>> with a target frequency, what do you think about requiring the
>>>> fast_switch API to take a target-supported frequency?
>>>
>>> That doesn't help much, because it generally would need to find a
>>> table entry corresponding to it anyway, to find the actual command
>>> value to write to a register, for example.
>>>
>>> But the driver could be smart and cache the value returned from the
>>> new callback along with the command value associated with it.  If
>>> invoked with that particular frequency, it would use the cached
>>> command.  Otherwise, it would walk the table.
>>
>> It also makes sense to save both the "raw" value computed by
>> get_next_freq() and the corresponding "driver" value, because if the
>> current "raw" value is equal to the previous "raw" value, it shouldn't
>> be necessary to walk the frequency table at all (as the previous
>> "driver" value would then be equal to the current "driver" value too).
>>
>> So maybe the "driver" value should only be checked after the "raw"
>> value check in sugov_update_commit() or equivalent?
>
> Moreover, you need to be careful about policy->min/max changes,
> because both cpufreq_driver_fast_switch() and
> __cpufreq_driver_target() clamp the target frequency between those and
> if they change in the meantime, you may end up having to use a
> different frequency at the driver level even if you get the same "raw"
> value as last time.
>
> It looks like we don't do the right thing here in the current code even ...

Scratch that, sorry.  We'll get the "limits" notification and the
need_freq_update thing will cause next_freq to become zero then.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-19 23:04           ` Steve Muckle
@ 2016-05-21 19:46             ` Steve Muckle
  2016-06-01 20:09               ` Steve Muckle
  0 siblings, 1 reply; 35+ messages in thread
From: Steve Muckle @ 2016-05-21 19:46 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, linux-pm,
	Vincent Guittot, Morten Rasmussen, Dietmar Eggemann, Juri Lelli,
	Patrick Bellasi, Michael Turquette, Viresh Kumar,
	Srinivas Pandruvada, Len Brown

Hi Peter, Ingo,

On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > In the case of a remote update the hook has to run (or not) after it is
> > > known whether preemption will occur so we don't do needless work or
> > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > hook will always need to be called along with an indication that it is
> > > the early hook being called. If it turns out to be a remote update it
> > > could then be deferred to the later hook, which would only be called
> > > when a remote update has been deferred and preemption has not occurred.
> > >
> > > This means two hook inovcations for a remote non-preempting wakeup
> > > though instead of one.  Perhaps this is a good middle ground on code
> > > churn vs. optimization though.
> > 
> > I would think so.
> 
> Ok, I will pursue this approach.

I'd like to get your opinion here before proceeding further...

To catch you up and summarize, I'm trying to implement support for
calling the scheduler cpufreq callback during remote wakeups.  Currently
the scheduler cpufreq callback is only called when the target CPU is the
current CPU. If a remote wakeup does not result in preemption, the CPU
frequency may currently not be adjusted appropriately for up to a tick,
when we are guaranteed to call the hook again.

Invoking schedutil promptly for the target CPU in this situation means
sending an IPI if the current CPU is not in the target CPU's frequency
domain. This is because often a cpufreq driver must run on a CPU within
the frequency domain it is bound to.  But the catch is that we should
not do this and incur the overhead of an IPI if preemption will occur,
as in that case the scheduler (and schedutil) will run soon on the
target CPU anyway, potentially as a result of the scheduler sending its
own IPI.

I figured this unnecessary overhead would be unacceptable and so have
been working on an approach to avoid it. Unfortunately the current hooks
happen before the preemption decision is made. My current implementation
sets a flag if schedutil sees a remote wakeup and then bails. There's a
test to call the hook again at the end of check_preempt_curr() if the flag
is set.  The flag is cleared by resched_curr() as that means preemption
will happen on the target CPU. The flag currently lives at the end of
the rq struct. I could move it into the update_util_data hook structure
or elsewhere, but that would mean accessing another per-cpu item in
hot scheduler paths.

Thoughts? Note the current implementation described above differs a bit
from the last posting in this thread, per discussion with Rafael.

thanks,
Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/5] sched: cpufreq: call cpufreq hook from remote CPUs
  2016-05-21 19:46             ` Steve Muckle
@ 2016-06-01 20:09               ` Steve Muckle
  0 siblings, 0 replies; 35+ messages in thread
From: Steve Muckle @ 2016-06-01 20:09 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Rafael J. Wysocki, Linux Kernel Mailing List, linux-pm,
	Vincent Guittot, Morten Rasmussen, Dietmar Eggemann, Juri Lelli,
	Patrick Bellasi, Michael Turquette, Viresh Kumar,
	Srinivas Pandruvada, Len Brown

On Sat, May 21, 2016 at 12:46:06PM -0700, Steve Muckle wrote:
> Hi Peter, Ingo,

Hi Peter/Ingo would appreciate any thoughts you may have on the issue
below.

thanks,
Steve

> 
> On Thu, May 19, 2016 at 04:04:19PM -0700, Steve Muckle wrote:
> > On Thu, May 19, 2016 at 11:06:14PM +0200, Rafael J. Wysocki wrote:
> > > > In the case of a remote update the hook has to run (or not) after it is
> > > > known whether preemption will occur so we don't do needless work or
> > > > IPIs. If the policy CPUs aren't known in the scheduler then the early
> > > > hook will always need to be called along with an indication that it is
> > > > the early hook being called. If it turns out to be a remote update it
> > > > could then be deferred to the later hook, which would only be called
> > > > when a remote update has been deferred and preemption has not occurred.
> > > >
> > > > This means two hook inovcations for a remote non-preempting wakeup
> > > > though instead of one.  Perhaps this is a good middle ground on code
> > > > churn vs. optimization though.
> > > 
> > > I would think so.
> > 
> > Ok, I will pursue this approach.
> 
> I'd like to get your opinion here before proceeding further...
> 
> To catch you up and summarize, I'm trying to implement support for
> calling the scheduler cpufreq callback during remote wakeups.  Currently
> the scheduler cpufreq callback is only called when the target CPU is the
> current CPU. If a remote wakeup does not result in preemption, the CPU
> frequency may currently not be adjusted appropriately for up to a tick,
> when we are guaranteed to call the hook again.
> 
> Invoking schedutil promptly for the target CPU in this situation means
> sending an IPI if the current CPU is not in the target CPU's frequency
> domain. This is because often a cpufreq driver must run on a CPU within
> the frequency domain it is bound to.  But the catch is that we should
> not do this and incur the overhead of an IPI if preemption will occur,
> as in that case the scheduler (and schedutil) will run soon on the
> target CPU anyway, potentially as a result of the scheduler sending its
> own IPI.
> 
> I figured this unnecessary overhead would be unacceptable and so have
> been working on an approach to avoid it. Unfortunately the current hooks
> happen before the preemption decision is made. My current implementation
> sets a flag if schedutil sees a remote wakeup and then bails. There's a
> test to call the hook again at the end of check_preempt_curr() if the flag
> is set.  The flag is cleared by resched_curr() as that means preemption
> will happen on the target CPU. The flag currently lives at the end of
> the rq struct. I could move it into the update_util_data hook structure
> or elsewhere, but that would mean accessing another per-cpu item in
> hot scheduler paths.
> 
> Thoughts? Note the current implementation described above differs a bit
> from the last posting in this thread, per discussion with Rafael.
> 
> thanks,
> Steve

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-06-01 20:09 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-09 21:20 [PATCH 0/5] cpufreq: schedutil: improve latency of response Steve Muckle
2016-05-09 21:20 ` [PATCH 1/5] sched: cpufreq: add cpu to update_util_data Steve Muckle
2016-05-18 23:13   ` Rafael J. Wysocki
2016-05-09 21:20 ` [PATCH 2/5] cpufreq: schedutil: support scheduler cpufreq callbacks on remote CPUs Steve Muckle
2016-05-18 23:24   ` Rafael J. Wysocki
2016-05-19 18:40     ` Steve Muckle
2016-05-19 20:55       ` Rafael J. Wysocki
2016-05-19 22:59         ` Steve Muckle
2016-05-19 23:14           ` Rafael J. Wysocki
2016-05-09 21:20 ` [PATCH 3/5] sched: cpufreq: call cpufreq hook from " Steve Muckle
2016-05-18 23:33   ` Rafael J. Wysocki
2016-05-19 12:00     ` Rafael J. Wysocki
2016-05-19 19:19       ` Steve Muckle
2016-05-19 21:06         ` Rafael J. Wysocki
2016-05-19 23:04           ` Steve Muckle
2016-05-21 19:46             ` Steve Muckle
2016-06-01 20:09               ` Steve Muckle
2016-05-09 21:20 ` [PATCH 4/5] cpufreq: schedutil: map raw required frequency to CPU-supported frequency Steve Muckle
2016-05-18 23:37   ` Rafael J. Wysocki
2016-05-19 19:35     ` Steve Muckle
2016-05-19 21:07       ` Rafael J. Wysocki
2016-05-09 21:20 ` [PATCH 5/5] cpufreq: schedutil: do not update rate limit ts when freq is unchanged Steve Muckle
2016-05-18 23:44   ` Rafael J. Wysocki
2016-05-19 19:46     ` Steve Muckle
2016-05-19 21:15       ` Rafael J. Wysocki
2016-05-19 23:34         ` Steve Muckle
2016-05-20  0:24           ` Rafael J. Wysocki
2016-05-20  0:37             ` Rafael J. Wysocki
2016-05-20  0:40               ` Steve Muckle
2016-05-20  0:46                 ` Rafael J. Wysocki
2016-05-20 11:39                   ` Rafael J. Wysocki
2016-05-20 11:54                     ` Rafael J. Wysocki
2016-05-20 11:59                       ` Rafael J. Wysocki
2016-05-20  0:37             ` Steve Muckle
2016-05-20  0:55               ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.