All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3 0/5] CPUFreq: governors: further cleanups
@ 2015-10-13  8:09 Viresh Kumar
  2015-10-13  8:09   ` Viresh Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar

Hi Rafael,

Here is the third version based on the review comments you gave. I have
tried to resolve most of them, and it looks much better now. Thanks for
your comments.

As I am traveling this week, don't have access to hardware to retest the
series. But I am quite sure it should work just fine, as there weren't
lots of updates from how the final code looked earlier. Anyway, I have
pushed this for the build bot sometime back and it will let us know of
any obvious issues.

V2->V3:
- Few got merged already, and are dropped now
- Patches are reordered a bit to make them more sensible
- gov_queue_work() isn't modified at all with the mask of CPUs, as you
  suggested earlier.
- Some minor commit/logs updated.

V1->V2:
- Dropped 2/10 from V1 as it wasn't required
- 3/10 saw some changes due to above patch being dropped
- 7/10 changed a bit as we check for pending work items by looking at
  shared->policy, rather than calling delayed_work_pending. We wanted to
  check if governor is operational or not and the new check is enough
  for that.

Viresh Kumar (5):
  cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  cpufreq: ondemand: update sampling rate immediately
  cpufreq: ondemand: queue work for policy->cpus together
  cpufreq: governor: Quit work-handlers early if governor is stopped
  cpufreq: Get rid of ->governor_enabled and its lock

 drivers/cpufreq/cpufreq.c          | 24 ------------------
 drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++--------
 drivers/cpufreq/cpufreq_ondemand.c | 50 ++++++++++++--------------------------
 include/linux/cpufreq.h            |  1 -
 4 files changed, 39 insertions(+), 69 deletions(-)

-- 
2.4.0


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-13  8:09 [PATCH V3 0/5] CPUFreq: governors: further cleanups Viresh Kumar
@ 2015-10-13  8:09   ` Viresh Kumar
  2015-10-13  8:09   ` Viresh Kumar
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, Viresh Kumar, Preeti U Murthy, open list

'timer_mutex' is required to sync work-handlers of policy->cpus.
update_sampling_rate() is just canceling the works and queuing them
again. This isn't protecting anything at all in update_sampling_rate()
and is not gonna be of any use.

Even if a work-handler is already running for a CPU,
cancel_delayed_work_sync() will wait for it to finish.

Drop these unnecessary locks.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 1fa9088c84a8..03ac6ce54042 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -267,27 +267,19 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 		cpufreq_cpu_put(policy);
 
-		mutex_lock(&dbs_info->cdbs.shared->timer_mutex);
-
-		if (!delayed_work_pending(&dbs_info->cdbs.dwork)) {
-			mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
+		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
 			continue;
-		}
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
 		appointed_at = dbs_info->cdbs.dwork.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-
-			mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
 			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-			mutex_lock(&dbs_info->cdbs.shared->timer_mutex);
 
 			gov_queue_work(dbs_data, policy,
 				       usecs_to_jiffies(new_rate), true);
 
 		}
-		mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
 	}
 }
 
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
@ 2015-10-13  8:09   ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, Viresh Kumar, Preeti U Murthy, open list

'timer_mutex' is required to sync work-handlers of policy->cpus.
update_sampling_rate() is just canceling the works and queuing them
again. This isn't protecting anything at all in update_sampling_rate()
and is not gonna be of any use.

Even if a work-handler is already running for a CPU,
cancel_delayed_work_sync() will wait for it to finish.

Drop these unnecessary locks.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 1fa9088c84a8..03ac6ce54042 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -267,27 +267,19 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 		cpufreq_cpu_put(policy);
 
-		mutex_lock(&dbs_info->cdbs.shared->timer_mutex);
-
-		if (!delayed_work_pending(&dbs_info->cdbs.dwork)) {
-			mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
+		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
 			continue;
-		}
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
 		appointed_at = dbs_info->cdbs.dwork.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-
-			mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
 			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-			mutex_lock(&dbs_info->cdbs.shared->timer_mutex);
 
 			gov_queue_work(dbs_data, policy,
 				       usecs_to_jiffies(new_rate), true);
 
 		}
-		mutex_unlock(&dbs_info->cdbs.shared->timer_mutex);
 	}
 }
 
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-13  8:09 [PATCH V3 0/5] CPUFreq: governors: further cleanups Viresh Kumar
@ 2015-10-13  8:09   ` Viresh Kumar
  2015-10-13  8:09   ` Viresh Kumar
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

We are immediately updating sampling rate for already queued-works, only
if the new expiry is lesser than the old one.

But what about the case, where the user doesn't want frequent events and
want to increase sampling time immediately? Shouldn't we cancel the
works (and so their interrupts) on all policy->cpus (which might occur
very shortly).

This patch removes this special case and simplifies code by immediately
updating the expiry.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 03ac6ce54042..bf0511a9735c 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -231,17 +231,8 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
 static struct common_dbs_data od_dbs_cdata;
 
 /**
- * update_sampling_rate - update sampling rate effective immediately if needed.
+ * update_sampling_rate - update sampling rate immediately.
  * @new_rate: new sampling rate
- *
- * If new rate is smaller than the old, simply updating
- * dbs_tuners_int.sampling_rate might not be appropriate. For example, if the
- * original sampling_rate was 1 second and the requested new sampling rate is 10
- * ms because the user needs immediate reaction from ondemand governor, but not
- * sure if higher frequency will be required or not, then, the governor may
- * change the sampling rate too late; up to 1 second later. Thus, if we are
- * reducing the sampling rate, we need to make the new value effective
- * immediately.
  */
 static void update_sampling_rate(struct dbs_data *dbs_data,
 		unsigned int new_rate)
@@ -255,7 +246,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	for_each_online_cpu(cpu) {
 		struct cpufreq_policy *policy;
 		struct od_cpu_dbs_info_s *dbs_info;
-		unsigned long next_sampling, appointed_at;
 
 		policy = cpufreq_cpu_get(cpu);
 		if (!policy)
@@ -270,16 +260,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
 			continue;
 
-		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
-
-		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
-
-		}
+		cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
+		gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
+			       true);
 	}
 }
 
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
@ 2015-10-13  8:09   ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

We are immediately updating sampling rate for already queued-works, only
if the new expiry is lesser than the old one.

But what about the case, where the user doesn't want frequent events and
want to increase sampling time immediately? Shouldn't we cancel the
works (and so their interrupts) on all policy->cpus (which might occur
very shortly).

This patch removes this special case and simplifies code by immediately
updating the expiry.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 03ac6ce54042..bf0511a9735c 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -231,17 +231,8 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
 static struct common_dbs_data od_dbs_cdata;
 
 /**
- * update_sampling_rate - update sampling rate effective immediately if needed.
+ * update_sampling_rate - update sampling rate immediately.
  * @new_rate: new sampling rate
- *
- * If new rate is smaller than the old, simply updating
- * dbs_tuners_int.sampling_rate might not be appropriate. For example, if the
- * original sampling_rate was 1 second and the requested new sampling rate is 10
- * ms because the user needs immediate reaction from ondemand governor, but not
- * sure if higher frequency will be required or not, then, the governor may
- * change the sampling rate too late; up to 1 second later. Thus, if we are
- * reducing the sampling rate, we need to make the new value effective
- * immediately.
  */
 static void update_sampling_rate(struct dbs_data *dbs_data,
 		unsigned int new_rate)
@@ -255,7 +246,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	for_each_online_cpu(cpu) {
 		struct cpufreq_policy *policy;
 		struct od_cpu_dbs_info_s *dbs_info;
-		unsigned long next_sampling, appointed_at;
 
 		policy = cpufreq_cpu_get(cpu);
 		if (!policy)
@@ -270,16 +260,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
 			continue;
 
-		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
-
-		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
-
-		}
+		cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
+		gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
+			       true);
 	}
 }
 
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together
  2015-10-13  8:09 [PATCH V3 0/5] CPUFreq: governors: further cleanups Viresh Kumar
@ 2015-10-13  8:09   ` Viresh Kumar
  2015-10-13  8:09   ` Viresh Kumar
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

Currently update_sampling_rate() runs over each online CPU and
cancels/queues work on it. Its very inefficient for the case where a
single policy manages multiple CPUs, as they can be processed together.

Also drop the unnecessary cancel_delayed_work_sync() as we are doing a
mod_delayed_work_on() in gov_queue_work(), which will take care of
pending works for us.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index bf0511a9735c..4677c7e9d534 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -238,29 +238,36 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		unsigned int new_rate)
 {
 	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+	struct cpufreq_policy *policy;
+	struct od_cpu_dbs_info_s *dbs_info;
+	struct cpumask cpumask;
 	int cpu;
 
+	cpumask_copy(&cpumask, cpu_online_mask);
+
 	od_tuners->sampling_rate = new_rate = max(new_rate,
 			dbs_data->min_sampling_rate);
 
-	for_each_online_cpu(cpu) {
-		struct cpufreq_policy *policy;
-		struct od_cpu_dbs_info_s *dbs_info;
-
+	for_each_cpu(cpu, &cpumask) {
 		policy = cpufreq_cpu_get(cpu);
 		if (!policy)
 			continue;
+
+		/* clear all CPUs of this policy */
+		cpumask_andnot(&cpumask, &cpumask, policy->cpus);
+
 		if (policy->governor != &cpufreq_gov_ondemand) {
 			cpufreq_cpu_put(policy);
 			continue;
 		}
+
 		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 		cpufreq_cpu_put(policy);
 
-		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
+		/* Make sure the work is not canceled on policy->cpus */
+		if (!dbs_info->cdbs.shared->policy)
 			continue;
 
-		cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
 		gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
 			       true);
 	}
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together
@ 2015-10-13  8:09   ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

Currently update_sampling_rate() runs over each online CPU and
cancels/queues work on it. Its very inefficient for the case where a
single policy manages multiple CPUs, as they can be processed together.

Also drop the unnecessary cancel_delayed_work_sync() as we are doing a
mod_delayed_work_on() in gov_queue_work(), which will take care of
pending works for us.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index bf0511a9735c..4677c7e9d534 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -238,29 +238,36 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		unsigned int new_rate)
 {
 	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+	struct cpufreq_policy *policy;
+	struct od_cpu_dbs_info_s *dbs_info;
+	struct cpumask cpumask;
 	int cpu;
 
+	cpumask_copy(&cpumask, cpu_online_mask);
+
 	od_tuners->sampling_rate = new_rate = max(new_rate,
 			dbs_data->min_sampling_rate);
 
-	for_each_online_cpu(cpu) {
-		struct cpufreq_policy *policy;
-		struct od_cpu_dbs_info_s *dbs_info;
-
+	for_each_cpu(cpu, &cpumask) {
 		policy = cpufreq_cpu_get(cpu);
 		if (!policy)
 			continue;
+
+		/* clear all CPUs of this policy */
+		cpumask_andnot(&cpumask, &cpumask, policy->cpus);
+
 		if (policy->governor != &cpufreq_gov_ondemand) {
 			cpufreq_cpu_put(policy);
 			continue;
 		}
+
 		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 		cpufreq_cpu_put(policy);
 
-		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
+		/* Make sure the work is not canceled on policy->cpus */
+		if (!dbs_info->cdbs.shared->policy)
 			continue;
 
-		cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
 		gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
 			       true);
 	}
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped
  2015-10-13  8:09 [PATCH V3 0/5] CPUFreq: governors: further cleanups Viresh Kumar
@ 2015-10-13  8:09   ` Viresh Kumar
  2015-10-13  8:09   ` Viresh Kumar
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

cpufreq_governor_lock is abused by using it outside of cpufreq core,
i.e. in cpufreq-governors. But we didn't had a better solution to the
problem (described later) at that point of time, and following was the
only acceptable solution:

6f1e4efd882e ("cpufreq: Fix timer/workqueue corruption by protecting
reading governor_enabled")

The cpufreq governor core is fixed against possible races now and things
are in much better shape.

The original problem:

When a CPU is hot unplugged, we cancel delayed works for all
policy->cpus via gov_cancel_work(). If the work is already running on
any CPU, the workqueue code will wait for the work to finish, to prevent
the work items from re-queuing themselves.

This works most of the time, except for the case where the work handler
determines that it should adjust the delay for all other CPUs, that the
policy is managing. When this happens, the canceling CPU will cancel its
own work but can queue up the works on other policy->cpus.

For example, consider CPU 0-4 in a policy and we called
gov_cancel_work() for them. Workqueue core canceled the works for 0-3
and is waiting for the handler to finish on CPU4. At that time, handler
on CPU4 can restart works on CPU 0-3 again. Which makes 0-3 run works,
which the governor core thinks are canceled.

To fix that in a different (non-hacky) way, set set shared->policy to
false before trying to cancel the work. It should be updated within
timer_mutex, which will prevent the work-handlers to start. Once the
work-handlers finds that we are already trying to stop the governor, it
will exit early. And that will prevent queuing of works again as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 750626d8fb03..931424ca96d9 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -171,10 +171,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 {
 	int i;
 
-	mutex_lock(&cpufreq_governor_lock);
-	if (!policy->governor_enabled)
-		goto out_unlock;
-
 	if (!all_cpus) {
 		/*
 		 * Use raw_smp_processor_id() to avoid preemptible warnings.
@@ -188,9 +184,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 		for_each_cpu(i, policy->cpus)
 			__gov_queue_work(i, dbs_data, delay);
 	}
-
-out_unlock:
-	mutex_unlock(&cpufreq_governor_lock);
 }
 EXPORT_SYMBOL_GPL(gov_queue_work);
 
@@ -229,13 +222,24 @@ static void dbs_timer(struct work_struct *work)
 	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
 						 dwork.work);
 	struct cpu_common_dbs_info *shared = cdbs->shared;
-	struct cpufreq_policy *policy = shared->policy;
-	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpufreq_policy *policy;
+	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
 	bool modify_all = true;
 
 	mutex_lock(&shared->timer_mutex);
 
+	policy = shared->policy;
+
+	/*
+	 * Governor might already be disabled and there is no point continuing
+	 * with the work-handler.
+	 */
+	if (!policy)
+		goto unlock;
+
+	dbs_data = policy->governor_data;
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -252,6 +256,7 @@ static void dbs_timer(struct work_struct *work)
 	delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
 	gov_queue_work(dbs_data, policy, delay, modify_all);
 
+unlock:
 	mutex_unlock(&shared->timer_mutex);
 }
 
@@ -488,9 +493,17 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
+	/*
+	 * Work-handler must see this updated, as it should not proceed any
+	 * further after governor is disabled. And so timer_mutex is taken while
+	 * updating this value.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	shared->policy = NULL;
+	mutex_unlock(&shared->timer_mutex);
+
 	gov_cancel_work(dbs_data, policy);
 
-	shared->policy = NULL;
 	mutex_destroy(&shared->timer_mutex);
 	return 0;
 }
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped
@ 2015-10-13  8:09   ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

cpufreq_governor_lock is abused by using it outside of cpufreq core,
i.e. in cpufreq-governors. But we didn't had a better solution to the
problem (described later) at that point of time, and following was the
only acceptable solution:

6f1e4efd882e ("cpufreq: Fix timer/workqueue corruption by protecting
reading governor_enabled")

The cpufreq governor core is fixed against possible races now and things
are in much better shape.

The original problem:

When a CPU is hot unplugged, we cancel delayed works for all
policy->cpus via gov_cancel_work(). If the work is already running on
any CPU, the workqueue code will wait for the work to finish, to prevent
the work items from re-queuing themselves.

This works most of the time, except for the case where the work handler
determines that it should adjust the delay for all other CPUs, that the
policy is managing. When this happens, the canceling CPU will cancel its
own work but can queue up the works on other policy->cpus.

For example, consider CPU 0-4 in a policy and we called
gov_cancel_work() for them. Workqueue core canceled the works for 0-3
and is waiting for the handler to finish on CPU4. At that time, handler
on CPU4 can restart works on CPU 0-3 again. Which makes 0-3 run works,
which the governor core thinks are canceled.

To fix that in a different (non-hacky) way, set set shared->policy to
false before trying to cancel the work. It should be updated within
timer_mutex, which will prevent the work-handlers to start. Once the
work-handlers finds that we are already trying to stop the governor, it
will exit early. And that will prevent queuing of works again as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 750626d8fb03..931424ca96d9 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -171,10 +171,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 {
 	int i;
 
-	mutex_lock(&cpufreq_governor_lock);
-	if (!policy->governor_enabled)
-		goto out_unlock;
-
 	if (!all_cpus) {
 		/*
 		 * Use raw_smp_processor_id() to avoid preemptible warnings.
@@ -188,9 +184,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 		for_each_cpu(i, policy->cpus)
 			__gov_queue_work(i, dbs_data, delay);
 	}
-
-out_unlock:
-	mutex_unlock(&cpufreq_governor_lock);
 }
 EXPORT_SYMBOL_GPL(gov_queue_work);
 
@@ -229,13 +222,24 @@ static void dbs_timer(struct work_struct *work)
 	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
 						 dwork.work);
 	struct cpu_common_dbs_info *shared = cdbs->shared;
-	struct cpufreq_policy *policy = shared->policy;
-	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpufreq_policy *policy;
+	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
 	bool modify_all = true;
 
 	mutex_lock(&shared->timer_mutex);
 
+	policy = shared->policy;
+
+	/*
+	 * Governor might already be disabled and there is no point continuing
+	 * with the work-handler.
+	 */
+	if (!policy)
+		goto unlock;
+
+	dbs_data = policy->governor_data;
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -252,6 +256,7 @@ static void dbs_timer(struct work_struct *work)
 	delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
 	gov_queue_work(dbs_data, policy, delay, modify_all);
 
+unlock:
 	mutex_unlock(&shared->timer_mutex);
 }
 
@@ -488,9 +493,17 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
+	/*
+	 * Work-handler must see this updated, as it should not proceed any
+	 * further after governor is disabled. And so timer_mutex is taken while
+	 * updating this value.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	shared->policy = NULL;
+	mutex_unlock(&shared->timer_mutex);
+
 	gov_cancel_work(dbs_data, policy);
 
-	shared->policy = NULL;
 	mutex_destroy(&shared->timer_mutex);
 	return 0;
 }
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2015-10-13  8:09 [PATCH V3 0/5] CPUFreq: governors: further cleanups Viresh Kumar
@ 2015-10-13  8:09   ` Viresh Kumar
  2015-10-13  8:09   ` Viresh Kumar
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

Invalid state-transitions is verified by governor core now and there is
no need to replicate that in cpufreq core.

Stop verifying the same in cpufreq core. That will get rid of
policy->governor_enabled and cpufreq_governor_lock.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 24 ------------------------
 include/linux/cpufreq.h   |  1 -
 2 files changed, 25 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 25c4c15103a0..0c89b09672e4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list);
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
-DEFINE_MUTEX(cpufreq_governor_lock);
 
 /* Flag to suspend/resume CPUFreq governors */
 static bool cpufreq_suspended;
@@ -2035,21 +2034,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 
 	pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
 
-	mutex_lock(&cpufreq_governor_lock);
-	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
-	    || (!policy->governor_enabled
-	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
-		mutex_unlock(&cpufreq_governor_lock);
-		return -EBUSY;
-	}
-
-	if (event == CPUFREQ_GOV_STOP)
-		policy->governor_enabled = false;
-	else if (event == CPUFREQ_GOV_START)
-		policy->governor_enabled = true;
-
-	mutex_unlock(&cpufreq_governor_lock);
-
 	ret = policy->governor->governor(policy, event);
 
 	if (!ret) {
@@ -2057,14 +2041,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 			policy->governor->initialized++;
 		else if (event == CPUFREQ_GOV_POLICY_EXIT)
 			policy->governor->initialized--;
-	} else {
-		/* Restore original values */
-		mutex_lock(&cpufreq_governor_lock);
-		if (event == CPUFREQ_GOV_STOP)
-			policy->governor_enabled = true;
-		else if (event == CPUFREQ_GOV_START)
-			policy->governor_enabled = false;
-		mutex_unlock(&cpufreq_governor_lock);
 	}
 
 	if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index dca22de98d94..d211da0763b7 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -80,7 +80,6 @@ struct cpufreq_policy {
 	unsigned int		policy; /* see above */
 	struct cpufreq_governor	*governor; /* see below */
 	void			*governor_data;
-	bool			governor_enabled; /* governor start/stop flag */
 	char			last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
 
 	struct work_struct	update; /* if update_policy() needs to be
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH V3 5/5] cpufreq: Get rid of ->governor_enabled and its lock
@ 2015-10-13  8:09   ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-13  8:09 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, Viresh Kumar, open list

Invalid state-transitions is verified by governor core now and there is
no need to replicate that in cpufreq core.

Stop verifying the same in cpufreq core. That will get rid of
policy->governor_enabled and cpufreq_governor_lock.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 24 ------------------------
 include/linux/cpufreq.h   |  1 -
 2 files changed, 25 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 25c4c15103a0..0c89b09672e4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list);
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
-DEFINE_MUTEX(cpufreq_governor_lock);
 
 /* Flag to suspend/resume CPUFreq governors */
 static bool cpufreq_suspended;
@@ -2035,21 +2034,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 
 	pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
 
-	mutex_lock(&cpufreq_governor_lock);
-	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
-	    || (!policy->governor_enabled
-	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
-		mutex_unlock(&cpufreq_governor_lock);
-		return -EBUSY;
-	}
-
-	if (event == CPUFREQ_GOV_STOP)
-		policy->governor_enabled = false;
-	else if (event == CPUFREQ_GOV_START)
-		policy->governor_enabled = true;
-
-	mutex_unlock(&cpufreq_governor_lock);
-
 	ret = policy->governor->governor(policy, event);
 
 	if (!ret) {
@@ -2057,14 +2041,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 			policy->governor->initialized++;
 		else if (event == CPUFREQ_GOV_POLICY_EXIT)
 			policy->governor->initialized--;
-	} else {
-		/* Restore original values */
-		mutex_lock(&cpufreq_governor_lock);
-		if (event == CPUFREQ_GOV_STOP)
-			policy->governor_enabled = true;
-		else if (event == CPUFREQ_GOV_START)
-			policy->governor_enabled = false;
-		mutex_unlock(&cpufreq_governor_lock);
 	}
 
 	if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index dca22de98d94..d211da0763b7 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -80,7 +80,6 @@ struct cpufreq_policy {
 	unsigned int		policy; /* see above */
 	struct cpufreq_governor	*governor; /* see below */
 	void			*governor_data;
-	bool			governor_enabled; /* governor start/stop flag */
 	char			last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
 
 	struct work_struct	update; /* if update_policy() needs to be
-- 
2.4.0

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-13  8:09   ` Viresh Kumar
  (?)
@ 2015-10-28  4:05   ` Rafael J. Wysocki
  2015-10-28  4:44     ` Viresh Kumar
  -1 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  4:05 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, Preeti U Murthy, open list

On Tuesday, October 13, 2015 01:39:01 PM Viresh Kumar wrote:
> 'timer_mutex' is required to sync work-handlers of policy->cpus.
> update_sampling_rate() is just canceling the works and queuing them
> again. This isn't protecting anything at all in update_sampling_rate()
> and is not gonna be of any use.
> 
> Even if a work-handler is already running for a CPU,
> cancel_delayed_work_sync() will wait for it to finish.
> 
> Drop these unnecessary locks.
> 
> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

I'm queuing this up for 4.4, although I think that the changelog is not right.

While at it, what are the race conditions the lock is protecting against?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-28  4:05   ` Rafael J. Wysocki
@ 2015-10-28  4:44     ` Viresh Kumar
  2015-10-28  5:54       ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  4:44 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, Preeti U Murthy, open list

On 28-10-15, 05:05, Rafael J. Wysocki wrote:
> On Tuesday, October 13, 2015 01:39:01 PM Viresh Kumar wrote:
> > 'timer_mutex' is required to sync work-handlers of policy->cpus.
> > update_sampling_rate() is just canceling the works and queuing them
> > again. This isn't protecting anything at all in update_sampling_rate()
> > and is not gonna be of any use.
> > 
> > Even if a work-handler is already running for a CPU,
> > cancel_delayed_work_sync() will wait for it to finish.
> > 
> > Drop these unnecessary locks.
> > 
> > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> 
> I'm queuing this up for 4.4, although I think that the changelog is not right.
> 
> While at it, what are the race conditions the lock is protecting against?

In cases where a single policy controls multiple CPUs, a timer is
queued for every cpu present in policy->cpus. When we reach the timer
handler (which can be on multiple CPUs together) on any CPU, we trace
CPU load for all policy->cpus and update the frequency accordingly.

The lock is for protecting multiple CPUs to do the same thing
together, as only its required to be done by a single CPU. Once any
CPUs handler has completed, it updates the last update time and drops
the mutex. At that point of time, other blocked handler (if any) check
the last update time and return early.

And then there are enough minute things that can go wrong if multiple
CPUs do the load evaluation and freq-update at the same time, apart
from it being an time wasting effort.

And so I still think that the commit log isn't that bad. The
timer_mutex lock isn't required in other parts of the governor, they
are just for synchronizing the work-handlers of CPUs belonging to the
same policy.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-28  4:44     ` Viresh Kumar
@ 2015-10-28  5:54       ` Rafael J. Wysocki
  2015-10-28  6:43         ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  5:54 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, Preeti U Murthy, open list

On Wednesday, October 28, 2015 10:14:51 AM Viresh Kumar wrote:
> On 28-10-15, 05:05, Rafael J. Wysocki wrote:
> > On Tuesday, October 13, 2015 01:39:01 PM Viresh Kumar wrote:
> > > 'timer_mutex' is required to sync work-handlers of policy->cpus.
> > > update_sampling_rate() is just canceling the works and queuing them
> > > again. This isn't protecting anything at all in update_sampling_rate()
> > > and is not gonna be of any use.
> > > 
> > > Even if a work-handler is already running for a CPU,
> > > cancel_delayed_work_sync() will wait for it to finish.
> > > 
> > > Drop these unnecessary locks.
> > > 
> > > Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
> > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> > 
> > I'm queuing this up for 4.4, although I think that the changelog is not right.
> > 
> > While at it, what are the race conditions the lock is protecting against?
> 
> In cases where a single policy controls multiple CPUs, a timer is
> queued for every cpu present in policy->cpus. When we reach the timer
> handler (which can be on multiple CPUs together) on any CPU, we trace
> CPU load for all policy->cpus and update the frequency accordingly.

That would be in dbs_timer(), right?

> The lock is for protecting multiple CPUs to do the same thing
> together, as only its required to be done by a single CPU. Once any
> CPUs handler has completed, it updates the last update time and drops
> the mutex. At that point of time, other blocked handler (if any) check
> the last update time and return early.

Well, that would mean we only needed to hold the lock around the
need_load_eval() evaluation in dbs_timer() if I'm not mistaken.

We also should acquire it around updates of the sampling rate, which
essentially is set_sampling_rate().

Is there any reason to acquire it in cpufreq_governor_limits(), then,
for example?

> And then there are enough minute things that can go wrong if multiple
> CPUs do the load evaluation and freq-update at the same time, apart
> from it being an time wasting effort.
> 
> And so I still think that the commit log isn't that bad. The
> timer_mutex lock isn't required in other parts of the governor, they
> are just for synchronizing the work-handlers of CPUs belonging to the
> same policy.

I agree that it doesn't serve any purpose in the piece of code you're
removing it from (which is why I agree with the patch), but the changelog
is incomplete and confusing.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-13  8:09   ` Viresh Kumar
  (?)
@ 2015-10-28  6:28   ` Rafael J. Wysocki
  2015-10-28  9:31     ` Viresh Kumar
  -1 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  6:28 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Tuesday, October 13, 2015 01:39:02 PM Viresh Kumar wrote:
> We are immediately updating sampling rate for already queued-works, only
> if the new expiry is lesser than the old one.
> 
> But what about the case, where the user doesn't want frequent events and
> want to increase sampling time immediately? Shouldn't we cancel the
> works (and so their interrupts) on all policy->cpus (which might occur
> very shortly).
> 
> This patch removes this special case and simplifies code by immediately
> updating the expiry.

The changelog is a complete disaster. :-/

Your argument seems to be that it should be OK to do the
cancel_delayed_work_sync()/gov_queue_work() combo in all cases, because
even if the new rate is greater than the old one, the user may actually
want it to take effect immediately and it shouldn't hurt to skip the next
sample anyway in that case.

Is this really the case, though?  What about the old rate is 1s, the new one
is 2s and the timer is just about to expire?  Won't the canceling effectively
move the next sample 3s away from the previous one which may not be desirable?

The current code just allows the timer to expire, unless that would prevent
the new rate from taking effect for too long, which seems perfectly reasonable
to me.

All that seems to be racy with respect to the delayed work execution, but that's
a different problem.

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq_ondemand.c | 25 ++++---------------------
>  1 file changed, 4 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index 03ac6ce54042..bf0511a9735c 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -231,17 +231,8 @@ static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
>  static struct common_dbs_data od_dbs_cdata;
>  
>  /**
> - * update_sampling_rate - update sampling rate effective immediately if needed.
> + * update_sampling_rate - update sampling rate immediately.
>   * @new_rate: new sampling rate
> - *
> - * If new rate is smaller than the old, simply updating
> - * dbs_tuners_int.sampling_rate might not be appropriate. For example, if the
> - * original sampling_rate was 1 second and the requested new sampling rate is 10
> - * ms because the user needs immediate reaction from ondemand governor, but not
> - * sure if higher frequency will be required or not, then, the governor may
> - * change the sampling rate too late; up to 1 second later. Thus, if we are
> - * reducing the sampling rate, we need to make the new value effective
> - * immediately.
>   */
>  static void update_sampling_rate(struct dbs_data *dbs_data,
>  		unsigned int new_rate)
> @@ -255,7 +246,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
>  	for_each_online_cpu(cpu) {
>  		struct cpufreq_policy *policy;
>  		struct od_cpu_dbs_info_s *dbs_info;
> -		unsigned long next_sampling, appointed_at;
>  
>  		policy = cpufreq_cpu_get(cpu);
>  		if (!policy)
> @@ -270,16 +260,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
>  		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
>  			continue;
>  
> -		next_sampling = jiffies + usecs_to_jiffies(new_rate);
> -		appointed_at = dbs_info->cdbs.dwork.timer.expires;
> -
> -		if (time_before(next_sampling, appointed_at)) {
> -			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
> -
> -			gov_queue_work(dbs_data, policy,
> -				       usecs_to_jiffies(new_rate), true);
> -
> -		}
> +		cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
> +		gov_queue_work(dbs_data, policy, usecs_to_jiffies(new_rate),
> +			       true);
>  	}
>  }

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together
  2015-10-13  8:09   ` Viresh Kumar
  (?)
@ 2015-10-28  6:38   ` Rafael J. Wysocki
  2015-10-28  6:46     ` Viresh Kumar
  -1 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  6:38 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Tuesday, October 13, 2015 01:39:03 PM Viresh Kumar wrote:
> Currently update_sampling_rate() runs over each online CPU and
> cancels/queues work on it. Its very inefficient for the case where a
> single policy manages multiple CPUs, as they can be processed together.

In the case of one policy object shared between multiple CPUs, I'm
wondering why we don't use a single delayed work function for all of them
in the first place.  That would address the problem at the source instead
of dealing with the symptoms.

> Also drop the unnecessary cancel_delayed_work_sync() as we are doing a
> mod_delayed_work_on() in gov_queue_work(), which will take care of
> pending works for us.

I'd prefer a separate patch for that if poss.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-28  5:54       ` Rafael J. Wysocki
@ 2015-10-28  6:43         ` Viresh Kumar
  2015-10-28  7:46           ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  6:43 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, Preeti U Murthy, open list

On 28-10-15, 06:54, Rafael J. Wysocki wrote:
> On Wednesday, October 28, 2015 10:14:51 AM Viresh Kumar wrote:
> > In cases where a single policy controls multiple CPUs, a timer is
> > queued for every cpu present in policy->cpus. When we reach the timer
> > handler (which can be on multiple CPUs together) on any CPU, we trace
> > CPU load for all policy->cpus and update the frequency accordingly.
> 
> That would be in dbs_timer(), right?

Yeah, and we already do stuff from within the mutex there.

> > The lock is for protecting multiple CPUs to do the same thing
> > together, as only its required to be done by a single CPU. Once any
> > CPUs handler has completed, it updates the last update time and drops
> > the mutex. At that point of time, other blocked handler (if any) check
> > the last update time and return early.
> 
> Well, that would mean we only needed to hold the lock around the
> need_load_eval() evaluation in dbs_timer() if I'm not mistaken.

Actually yeah, but then the fourth patch of this series uses the
timer_mutex to fix a long standing problem (which was fixed by hacking
the code earlier). And so we need to take the lock for the entire
dbs_timer() routine.

> We also should acquire it around updates of the sampling rate, which
> essentially is set_sampling_rate().

Why? In the worst case we may schedule the next timer for the earlier
sampling rate. But do we care that much for that race, that we want to
add locks here as well ?

> Is there any reason to acquire it in cpufreq_governor_limits(), then,
> for example?

Yeah, we are calling dbs_check_cpu(dbs_data, cpu) from that path,
which will reevaluate the load.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together
  2015-10-28  6:38   ` Rafael J. Wysocki
@ 2015-10-28  6:46     ` Viresh Kumar
  2015-10-28  7:33       ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  6:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 07:38, Rafael J. Wysocki wrote:
> On Tuesday, October 13, 2015 01:39:03 PM Viresh Kumar wrote:
> > Currently update_sampling_rate() runs over each online CPU and
> > cancels/queues work on it. Its very inefficient for the case where a
> > single policy manages multiple CPUs, as they can be processed together.
> 
> In the case of one policy object shared between multiple CPUs, I'm
> wondering why we don't use a single delayed work function for all of them
> in the first place.  That would address the problem at the source instead
> of dealing with the symptoms.

That's what we had long back. The problem is that the timers queued
for cpufreq are deferrable and if the CPU, on which the timer is
queued, goes idle, then the governor would halt. And there can be
other CPUs in the policy->cpus group which are still running.

> > Also drop the unnecessary cancel_delayed_work_sync() as we are doing a
> > mod_delayed_work_on() in gov_queue_work(), which will take care of
> > pending works for us.
> 
> I'd prefer a separate patch for that if poss.

okay.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped
  2015-10-13  8:09   ` Viresh Kumar
  (?)
@ 2015-10-28  7:10   ` Rafael J. Wysocki
  2015-10-28  8:25     ` Viresh Kumar
  -1 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  7:10 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Tuesday, October 13, 2015 01:39:04 PM Viresh Kumar wrote:
> cpufreq_governor_lock is abused by using it outside of cpufreq core,
> i.e. in cpufreq-governors. But we didn't had a better solution to the
> problem (described later) at that point of time, and following was the
> only acceptable solution:
> 
> 6f1e4efd882e ("cpufreq: Fix timer/workqueue corruption by protecting
> reading governor_enabled")
> 
> The cpufreq governor core is fixed against possible races now and things
> are in much better shape.
> 
> The original problem:
> 
> When a CPU is hot unplugged, we cancel delayed works for all
> policy->cpus via gov_cancel_work(). If the work is already running on
> any CPU, the workqueue code will wait for the work to finish, to prevent
> the work items from re-queuing themselves.
> 
> This works most of the time, except for the case where the work handler
> determines that it should adjust the delay for all other CPUs, that the
> policy is managing. When this happens, the canceling CPU will cancel its
> own work but can queue up the works on other policy->cpus.
> 
> For example, consider CPU 0-4 in a policy and we called
> gov_cancel_work() for them. Workqueue core canceled the works for 0-3
> and is waiting for the handler to finish on CPU4. At that time, handler
> on CPU4 can restart works on CPU 0-3 again. Which makes 0-3 run works,
> which the governor core thinks are canceled.
> 
> To fix that in a different (non-hacky) way, set set shared->policy to
> false before trying to cancel the work. It should be updated within
> timer_mutex, which will prevent the work-handlers to start. Once the
> work-handlers finds that we are already trying to stop the governor, it
> will exit early. And that will prevent queuing of works again as well.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

I have a hard time figuring out what the patch is supposed to achieve from
the above.

Do we eventually want to get rid of cpufreq_governor_lock and that's why we're
doing this?

> ---
>  drivers/cpufreq/cpufreq_governor.c | 33 +++++++++++++++++++++++----------
>  1 file changed, 23 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 750626d8fb03..931424ca96d9 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -171,10 +171,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
>  {
>  	int i;
>  
> -	mutex_lock(&cpufreq_governor_lock);
> -	if (!policy->governor_enabled)
> -		goto out_unlock;
> -
>  	if (!all_cpus) {
>  		/*
>  		 * Use raw_smp_processor_id() to avoid preemptible warnings.
> @@ -188,9 +184,6 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
>  		for_each_cpu(i, policy->cpus)
>  			__gov_queue_work(i, dbs_data, delay);
>  	}
> -
> -out_unlock:
> -	mutex_unlock(&cpufreq_governor_lock);
>  }
>  EXPORT_SYMBOL_GPL(gov_queue_work);
>  
> @@ -229,13 +222,24 @@ static void dbs_timer(struct work_struct *work)
>  	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
>  						 dwork.work);
>  	struct cpu_common_dbs_info *shared = cdbs->shared;
> -	struct cpufreq_policy *policy = shared->policy;
> -	struct dbs_data *dbs_data = policy->governor_data;
> +	struct cpufreq_policy *policy;
> +	struct dbs_data *dbs_data;
>  	unsigned int sampling_rate, delay;
>  	bool modify_all = true;
>  
>  	mutex_lock(&shared->timer_mutex);
>  
> +	policy = shared->policy;
> +
> +	/*
> +	 * Governor might already be disabled and there is no point continuing
> +	 * with the work-handler.
> +	 */
> +	if (!policy)
> +		goto unlock;
> +
> +	dbs_data = policy->governor_data;
> +
>  	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
>  		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
>  
> @@ -252,6 +256,7 @@ static void dbs_timer(struct work_struct *work)
>  	delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
>  	gov_queue_work(dbs_data, policy, delay, modify_all);
>  
> +unlock:
>  	mutex_unlock(&shared->timer_mutex);
>  }
>  
> @@ -488,9 +493,17 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
>  	if (!shared || !shared->policy)
>  		return -EBUSY;
>  
> +	/*
> +	 * Work-handler must see this updated, as it should not proceed any
> +	 * further after governor is disabled. And so timer_mutex is taken while
> +	 * updating this value.
> +	 */
> +	mutex_lock(&shared->timer_mutex);
> +	shared->policy = NULL;
> +	mutex_unlock(&shared->timer_mutex);

So this assumes that dbs_timer() will acquire the mutex and see that
shared->policy is now NULL, so it will bail out immediately, but ->

> +
>  	gov_cancel_work(dbs_data, policy);
>  
> -	shared->policy = NULL;
>  	mutex_destroy(&shared->timer_mutex);

-> the mutex is destroyed here, so what the guarantee that the mutex will
still be around when dbs_timer() runs?

>  	return 0;
>  }
> 

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together
  2015-10-28  6:46     ` Viresh Kumar
@ 2015-10-28  7:33       ` Rafael J. Wysocki
  2015-10-28  8:34         ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  7:33 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Wednesday, October 28, 2015 12:16:35 PM Viresh Kumar wrote:
> On 28-10-15, 07:38, Rafael J. Wysocki wrote:
> > On Tuesday, October 13, 2015 01:39:03 PM Viresh Kumar wrote:
> > > Currently update_sampling_rate() runs over each online CPU and
> > > cancels/queues work on it. Its very inefficient for the case where a
> > > single policy manages multiple CPUs, as they can be processed together.
> > 
> > In the case of one policy object shared between multiple CPUs, I'm
> > wondering why we don't use a single delayed work function for all of them
> > in the first place.  That would address the problem at the source instead
> > of dealing with the symptoms.
> 
> That's what we had long back. The problem is that the timers queued
> for cpufreq are deferrable and if the CPU, on which the timer is
> queued, goes idle, then the governor would halt. And there can be
> other CPUs in the policy->cpus group which are still running.

It looks like we shouldn't be using delayed works for this, really.

We should be using timer functions and normal work items.  Schedule the
timer function on all CPUs sharing the policy and then queue up the
work item from the first one that executes the timer.  Then make the
timer function bail out immediately until the work has completed and
re-schedule the timers from the work item.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-28  6:43         ` Viresh Kumar
@ 2015-10-28  7:46           ` Rafael J. Wysocki
  2015-10-28  8:56             ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28  7:46 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, Preeti U Murthy, open list

On Wednesday, October 28, 2015 12:13:17 PM Viresh Kumar wrote:
> On 28-10-15, 06:54, Rafael J. Wysocki wrote:
> > On Wednesday, October 28, 2015 10:14:51 AM Viresh Kumar wrote:
> > > In cases where a single policy controls multiple CPUs, a timer is
> > > queued for every cpu present in policy->cpus. When we reach the timer
> > > handler (which can be on multiple CPUs together) on any CPU, we trace
> > > CPU load for all policy->cpus and update the frequency accordingly.
> > 
> > That would be in dbs_timer(), right?
> 
> Yeah, and we already do stuff from within the mutex there.
> 
> > > The lock is for protecting multiple CPUs to do the same thing
> > > together, as only its required to be done by a single CPU. Once any
> > > CPUs handler has completed, it updates the last update time and drops
> > > the mutex. At that point of time, other blocked handler (if any) check
> > > the last update time and return early.
> > 
> > Well, that would mean we only needed to hold the lock around the
> > need_load_eval() evaluation in dbs_timer() if I'm not mistaken.
> 
> Actually yeah, but then the fourth patch of this series uses the
> timer_mutex to fix a long standing problem (which was fixed by hacking
> the code earlier). And so we need to take the lock for the entire
> dbs_timer() routine.

I don't actually think that that patch is correct and even if it is,
we'll only need to do that *after* that patch, so at least it would be
fair to say a word about it in the changelog, wouldn't it?

> > We also should acquire it around updates of the sampling rate, which
> > essentially is set_sampling_rate().
> 
> Why? In the worst case we may schedule the next timer for the earlier
> sampling rate. But do we care that much for that race, that we want to
> add locks here as well ?

OK

That works because we actully hold the mutex around the whole function,
as otherwise we'd have seen races between delayed work items on different
CPUs sharing the policy.

> > Is there any reason to acquire it in cpufreq_governor_limits(), then,
> > for example?
> 
> Yeah, we are calling dbs_check_cpu(dbs_data, cpu) from that path,
> which will reevaluate the load.

Which means that we should take the lock around dbs_check_cpu() everywhere
in a consistent way.  Which in turn means that the lock actually does more
than you said.

My point is basically that we seem to have a vague idea about what the lock
is used for, while we need to know exactly why we need it.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped
  2015-10-28  7:10   ` Rafael J. Wysocki
@ 2015-10-28  8:25     ` Viresh Kumar
  2015-10-28 15:12       ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  8:25 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 08:10, Rafael J. Wysocki wrote:
> I have a hard time figuring out what the patch is supposed to achieve from
> the above.

We had a problem earlier, where even after stopping the governor for a
policy, the work was still queued for some of its CPUs.

We failed to understand the real problem then, and so abused the wider
cpufreq_governor_lock.

I understood the problem much better now, and got a straight forward,
and precise solution for that.

> Do we eventually want to get rid of cpufreq_governor_lock and that's why we're
> doing this?

That's another benefit we get out of this change.

> > +	mutex_lock(&shared->timer_mutex);
> > +	shared->policy = NULL;
> > +	mutex_unlock(&shared->timer_mutex);

Right.

> So this assumes that dbs_timer() will acquire the mutex and see that
> shared->policy is now NULL, so it will bail out immediately, but ->
> 
> > +
> >  	gov_cancel_work(dbs_data, policy);
> >  
> > -	shared->policy = NULL;
> >  	mutex_destroy(&shared->timer_mutex);
> 
> -> the mutex is destroyed here, so what the guarantee that the mutex will
> still be around when dbs_timer() runs?

You really got me worried for few minutes :)

The earlier update of shared->policy = NULL, makes sure that no
work-handler can start real work. After the unlock the work handlers
will start executing but will return early.

We also have gov_cancel_work(), which will until the time all the
current handlers have finished executing and no work is queued.

And so we are sure that there are no users of the mutex when it is
destroyed.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together
  2015-10-28  7:33       ` Rafael J. Wysocki
@ 2015-10-28  8:34         ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  8:34 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 08:33, Rafael J. Wysocki wrote:
> It looks like we shouldn't be using delayed works for this, really.
> 
> We should be using timer functions and normal work items.  Schedule the
> timer function on all CPUs sharing the policy and then queue up the
> work item from the first one that executes the timer.  Then make the
> timer function bail out immediately until the work has completed and
> re-schedule the timers from the work item.

Okay, I will try to get some code out for that then.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate()
  2015-10-28  7:46           ` Rafael J. Wysocki
@ 2015-10-28  8:56             ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  8:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, Preeti U Murthy, open list

On 28-10-15, 08:46, Rafael J. Wysocki wrote:
> On Wednesday, October 28, 2015 12:13:17 PM Viresh Kumar wrote:
> > Actually yeah, but then the fourth patch of this series uses the
> > timer_mutex to fix a long standing problem (which was fixed by hacking
> > the code earlier). And so we need to take the lock for the entire
> > dbs_timer() routine.

Well, there is another reason why the lock is taken for the complete
dbs_timer() routine. There are two parts of that routine:
- Checking if load evaluation is required or not + updating the
  last-update time.
- The second is the load evaluation + freq change thing.

Lock around the first check makes sure that timer handlers of other
CPUs don't do load evaluation in parallel and that they don't do it
before the sampling period.

Lock around the second part makes sure there is only one thread which
is doing load evaluation + freq update. The other thread being
cpufreq_governor_limits(). And so the same lock taken across that part
as well.

> I don't actually think that that patch is correct and even if it is,
> we'll only need to do that *after* that patch, so at least it would be
> fair to say a word about it in the changelog, wouldn't it?

Hmm, If you agree about the above reasoning, then we may not require
an update to the changelog, otherwise I will mention that in the
changelog of this patch.

> > Yeah, we are calling dbs_check_cpu(dbs_data, cpu) from that path,
> > which will reevaluate the load.
> 
> Which means that we should take the lock around dbs_check_cpu() everywhere
> in a consistent way.

We already do this from everywhere.

> Which in turn means that the lock actually does more
> than you said.

What I described towards the top is probably a better answer to the
earlier query.

> My point is basically that we seem to have a vague idea about what the lock
> is used for, while we need to know exactly why we need it.

I am totally with you on this, we have surely screwed up on locking
for a long time in cpufreq. And we should know exactly why we want to
change it now.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-28  6:28   ` Rafael J. Wysocki
@ 2015-10-28  9:31     ` Viresh Kumar
  2015-10-28 15:31       ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28  9:31 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 07:28, Rafael J. Wysocki wrote:
> Your argument seems to be that it should be OK to do the
> cancel_delayed_work_sync()/gov_queue_work() combo in all cases, because
> even if the new rate is greater than the old one, the user may actually
> want it to take effect immediately and it shouldn't hurt to skip the next
> sample anyway in that case.
> 
> Is this really the case, though?  What about the old rate is 1s, the new one
> is 2s and the timer is just about to expire?  Won't the canceling effectively
> move the next sample 3s away from the previous one which may not be desirable?
> 
> The current code just allows the timer to expire, unless that would prevent
> the new rate from taking effect for too long, which seems perfectly reasonable
> to me.

Okay, what about this case: old rate is 1s, new rate it 5s and we have
just serviced the timer. With the current code we will receive
evaluate again after 1 second instead of 5. Is that desirable ?

I didn't wanted to keep special code for such corner cases. And then
how many times are we going to update sampling rates ?

But if we want to do something special, then we may schedule the work
for following delay:

delay = shared->time_stamp + new_sampling_rate.

shared->time_stamp is the last time we evaluated the load.

With this, we will be at shoot at the exact requested time, relative
to the last time we evaluated the loads.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped
  2015-10-28 15:12       ` Rafael J. Wysocki
@ 2015-10-28 14:46         ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28 14:46 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 16:12, Rafael J. Wysocki wrote:
> So this is a changelog matching your patch:
> 
> "gov_queue_work() acquires cpufreq_governor_lock to allow cpufreq_governor_stop()
> to drain delayed work items possibly scheduled on CPUs that share the policy with
> a CPU being taken offline.
> 
> However, the same goal may be achieved in a more straightforward way if the
> policy pointer in the struct cpu_dbs_info matching the policy CPU is reset
> upfront by cpufreq_governor_stop() under the timer_mutex belonging to it and
> checked against NULL, under the same lock, at the beginning of dbs_timer().
> 
> In that case every instance of dbs_timer() run for a struct cpu_dbs_info
> sharing the policy pointer in question after cpufreq_governor_stop() has started
> will notice that that pointer is NULL and bail out immediately without queuing up
> any new work items.  In turn, gov_cancel_work() called by cpufreq_governor_stop()
> before destroying timer_mutex will wait for all of the delayed work items
> currently running on the CPUs sharing the policy to drop the mutex, so it may
> be destroyed safely.
> 
> Make cpufreq_governor_stop() and dbs_timer() work as described and modify
> gov_queue_work() so it does not acquire cpufreq_governor_lock any more."

Looks far better, thanks :)

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped
  2015-10-28  8:25     ` Viresh Kumar
@ 2015-10-28 15:12       ` Rafael J. Wysocki
  2015-10-28 14:46         ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28 15:12 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Wednesday, October 28, 2015 01:55:59 PM Viresh Kumar wrote:
> On 28-10-15, 08:10, Rafael J. Wysocki wrote:
> > I have a hard time figuring out what the patch is supposed to achieve from
> > the above.
> 
> We had a problem earlier, where even after stopping the governor for a
> policy, the work was still queued for some of its CPUs.
> 
> We failed to understand the real problem then, and so abused the wider
> cpufreq_governor_lock.
> 
> I understood the problem much better now, and got a straight forward,
> and precise solution for that.
> 
> > Do we eventually want to get rid of cpufreq_governor_lock and that's why we're
> > doing this?
> 
> That's another benefit we get out of this change.
> 
> > > +	mutex_lock(&shared->timer_mutex);
> > > +	shared->policy = NULL;
> > > +	mutex_unlock(&shared->timer_mutex);
> 
> Right.
> 
> > So this assumes that dbs_timer() will acquire the mutex and see that
> > shared->policy is now NULL, so it will bail out immediately, but ->
> > 
> > > +
> > >  	gov_cancel_work(dbs_data, policy);
> > >  
> > > -	shared->policy = NULL;
> > >  	mutex_destroy(&shared->timer_mutex);
> > 
> > -> the mutex is destroyed here, so what the guarantee that the mutex will
> > still be around when dbs_timer() runs?
> 
> You really got me worried for few minutes :)
> 
> The earlier update of shared->policy = NULL, makes sure that no
> work-handler can start real work. After the unlock the work handlers
> will start executing but will return early.

That's not sufficient, because it doesn't guarantee that the lock will be
dropped before we destroy it.

> We also have gov_cancel_work(), which will until the time all the
> current handlers have finished executing and no work is queued.
> 
> And so we are sure that there are no users of the mutex when it is
> destroyed.

OK, so the gov_cancel_work() provides the guarantee.

So this is a changelog matching your patch:

"gov_queue_work() acquires cpufreq_governor_lock to allow cpufreq_governor_stop()
to drain delayed work items possibly scheduled on CPUs that share the policy with
a CPU being taken offline.

However, the same goal may be achieved in a more straightforward way if the
policy pointer in the struct cpu_dbs_info matching the policy CPU is reset
upfront by cpufreq_governor_stop() under the timer_mutex belonging to it and
checked against NULL, under the same lock, at the beginning of dbs_timer().

In that case every instance of dbs_timer() run for a struct cpu_dbs_info
sharing the policy pointer in question after cpufreq_governor_stop() has started
will notice that that pointer is NULL and bail out immediately without queuing up
any new work items.  In turn, gov_cancel_work() called by cpufreq_governor_stop()
before destroying timer_mutex will wait for all of the delayed work items
currently running on the CPUs sharing the policy to drop the mutex, so it may
be destroyed safely.

Make cpufreq_governor_stop() and dbs_timer() work as described and modify
gov_queue_work() so it does not acquire cpufreq_governor_lock any more."

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-28 15:31       ` Rafael J. Wysocki
@ 2015-10-28 15:28         ` Viresh Kumar
  2015-10-28 16:13           ` Rafael J. Wysocki
  0 siblings, 1 reply; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28 15:28 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 16:31, Rafael J. Wysocki wrote:
> Is the current code really problematic?

Its not problematic, but just that I didn't like special code written
here.

Also, its a blocker for the next patch which tries to schedule work on
all the policy->cpus together.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-28  9:31     ` Viresh Kumar
@ 2015-10-28 15:31       ` Rafael J. Wysocki
  2015-10-28 15:28         ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28 15:31 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Wednesday, October 28, 2015 03:01:09 PM Viresh Kumar wrote:
> On 28-10-15, 07:28, Rafael J. Wysocki wrote:
> > Your argument seems to be that it should be OK to do the
> > cancel_delayed_work_sync()/gov_queue_work() combo in all cases, because
> > even if the new rate is greater than the old one, the user may actually
> > want it to take effect immediately and it shouldn't hurt to skip the next
> > sample anyway in that case.
> > 
> > Is this really the case, though?  What about the old rate is 1s, the new one
> > is 2s and the timer is just about to expire?  Won't the canceling effectively
> > move the next sample 3s away from the previous one which may not be desirable?
> > 
> > The current code just allows the timer to expire, unless that would prevent
> > the new rate from taking effect for too long, which seems perfectly reasonable
> > to me.
> 
> Okay, what about this case: old rate is 1s, new rate it 5s and we have
> just serviced the timer. With the current code we will receive
> evaluate again after 1 second instead of 5. Is that desirable ?

That is OK.

The change is not guaranteed to happen instantaneously and the old rate may
stay in effect for a longer while.

The case in which that may be annoying (but arguably not incorrect) is when
the new rate is much less than the old one, but that is currently optimized
for.

> I didn't wanted to keep special code for such corner cases. And then
> how many times are we going to update sampling rates ?
> 
> But if we want to do something special, then we may schedule the work
> for following delay:
> 
> delay = shared->time_stamp + new_sampling_rate.
> 
> shared->time_stamp is the last time we evaluated the load.
> 
> With this, we will be at shoot at the exact requested time, relative
> to the last time we evaluated the loads.

Is the current code really problematic?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-28 16:13           ` Rafael J. Wysocki
@ 2015-10-28 15:47             ` Viresh Kumar
  0 siblings, 0 replies; 31+ messages in thread
From: Viresh Kumar @ 2015-10-28 15:47 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linaro-kernel, linux-pm, open list

On 28-10-15, 17:13, Rafael J. Wysocki wrote:
> Well, the second statement above sort of contradicts the first one. :-)
> 
> I guess the answer is "it is problematic, because I can't do the other
> optimization then".

Hehe, right.

> To that I'd really suggest trying to rework the code to use timer
> functions directly in the first place.

I will, but this problem will be present there as well. Because at
that point of time, we will talk about per-cpu timers instead of
delayed-works.

-- 
viresh

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately
  2015-10-28 15:28         ` Viresh Kumar
@ 2015-10-28 16:13           ` Rafael J. Wysocki
  2015-10-28 15:47             ` Viresh Kumar
  0 siblings, 1 reply; 31+ messages in thread
From: Rafael J. Wysocki @ 2015-10-28 16:13 UTC (permalink / raw)
  To: Viresh Kumar; +Cc: linaro-kernel, linux-pm, open list

On Wednesday, October 28, 2015 08:58:11 PM Viresh Kumar wrote:
> On 28-10-15, 16:31, Rafael J. Wysocki wrote:
> > Is the current code really problematic?
> 
> Its not problematic, but just that I didn't like special code written
> here.
> 
> Also, its a blocker for the next patch which tries to schedule work on
> all the policy->cpus together.

Well, the second statement above sort of contradicts the first one. :-)

I guess the answer is "it is problematic, because I can't do the other
optimization then".

To that I'd really suggest trying to rework the code to use timer
functions directly in the first place.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2015-10-28 15:51 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-13  8:09 [PATCH V3 0/5] CPUFreq: governors: further cleanups Viresh Kumar
2015-10-13  8:09 ` [PATCH V3 1/5] cpufreq: ondemand: Drop unnecessary locks from update_sampling_rate() Viresh Kumar
2015-10-13  8:09   ` Viresh Kumar
2015-10-28  4:05   ` Rafael J. Wysocki
2015-10-28  4:44     ` Viresh Kumar
2015-10-28  5:54       ` Rafael J. Wysocki
2015-10-28  6:43         ` Viresh Kumar
2015-10-28  7:46           ` Rafael J. Wysocki
2015-10-28  8:56             ` Viresh Kumar
2015-10-13  8:09 ` [PATCH V3 2/5] cpufreq: ondemand: update sampling rate immediately Viresh Kumar
2015-10-13  8:09   ` Viresh Kumar
2015-10-28  6:28   ` Rafael J. Wysocki
2015-10-28  9:31     ` Viresh Kumar
2015-10-28 15:31       ` Rafael J. Wysocki
2015-10-28 15:28         ` Viresh Kumar
2015-10-28 16:13           ` Rafael J. Wysocki
2015-10-28 15:47             ` Viresh Kumar
2015-10-13  8:09 ` [PATCH V3 3/5] cpufreq: ondemand: queue work for policy->cpus together Viresh Kumar
2015-10-13  8:09   ` Viresh Kumar
2015-10-28  6:38   ` Rafael J. Wysocki
2015-10-28  6:46     ` Viresh Kumar
2015-10-28  7:33       ` Rafael J. Wysocki
2015-10-28  8:34         ` Viresh Kumar
2015-10-13  8:09 ` [PATCH V3 4/5] cpufreq: governor: Quit work-handlers early if governor is stopped Viresh Kumar
2015-10-13  8:09   ` Viresh Kumar
2015-10-28  7:10   ` Rafael J. Wysocki
2015-10-28  8:25     ` Viresh Kumar
2015-10-28 15:12       ` Rafael J. Wysocki
2015-10-28 14:46         ` Viresh Kumar
2015-10-13  8:09 ` [PATCH V3 5/5] cpufreq: Get rid of ->governor_enabled and its lock Viresh Kumar
2015-10-13  8:09   ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.