All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers
@ 2015-12-03  4:07 Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki; +Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar

Hi Rafael,

As suggested by you, this series updates the governor core to keep
per-cpu timers and a shared work for the entire policy.

More details are present in the changelogs, hope they are somewhat
better this time.

I have tested it with the test-suite, that I created sometime back while
fixing locking issues in governors.. Tried all kind of stuff in parallel
that could have broken it (those are the testcases that separate people
reported over time, around governors). It works fine.

The first one is a bug fix really, which I noticed today only :), next
three are minor cleanups to prepare for the big change. Fourth one is
the main patch that does the conversion and the final one is
cherry-picked from the last series as that was still relevant.

This is rebased over: pm/linux-next.

V1->V2:
- Updated first patch to not move variable definitions to the top of the
  routine.
- Last patch had minor rebase conflict due to above change and so is
  updated as well.
- Included RBY tag from Ashwin.
- Rebased over pm/linux-next

Viresh Kumar (6):
  cpufreq: ondemand: Update sampling rate only for concerned policies
  cpufreq: ondemand: Work is guaranteed to be pending
  cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()
  cpufreq: governor: initialize/destroy timer_mutex with 'shared'
  cpufreq: governor: replace per-cpu delayed work with timers
  cpufreq: ondemand: update update_sampling_rate() to make it more
    efficient

 drivers/cpufreq/cpufreq_conservative.c |   6 +-
 drivers/cpufreq/cpufreq_governor.c     | 145 +++++++++++++++++++--------------
 drivers/cpufreq/cpufreq_governor.h     |  23 ++++--
 drivers/cpufreq/cpufreq_ondemand.c     |  61 ++++++++++----
 4 files changed, 145 insertions(+), 90 deletions(-)

-- 
2.6.2.198.g614a2ac


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH V2 1/6] cpufreq: ondemand: Update sampling rate only for concerned policies
  2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
@ 2015-12-03  4:07   ` Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

We are comparing policy->governor against cpufreq_gov_ondemand to make
sure that we update sampling rate only for the concerned CPUs. But that
isn't enough.

In case of governor_per_policy, there can be multiple instances of
ondemand governor and we will always end up updating all of them with
current code. What we rather need to do, is to compare dbs_data with
poilcy->governor_data, which will match only for the policies governed
by dbs_data.

This code is also racy as the governor might be getting stopped at that
time and we may end up scheduling work for a policy, which we have just
disabled.

Fix that by protecting the entire function with &od_dbs_cdata.mutex,
which will prevent against races with policy START/STOP/etc.

After these locks are in place, we can safely get the policy via per-cpu
dbs_info.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 03ac6ce54042..089ca6a6ca02 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -252,20 +252,39 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	od_tuners->sampling_rate = new_rate = max(new_rate,
 			dbs_data->min_sampling_rate);
 
+	/*
+	 * Lock governor so that governor start/stop can't execute in parallel.
+	 */
+	mutex_lock(&od_dbs_cdata.mutex);
+
 	for_each_online_cpu(cpu) {
 		struct cpufreq_policy *policy;
 		struct od_cpu_dbs_info_s *dbs_info;
+		struct cpu_dbs_info *cdbs;
+		struct cpu_common_dbs_info *shared;
 		unsigned long next_sampling, appointed_at;
 
-		policy = cpufreq_cpu_get(cpu);
-		if (!policy)
+		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
+		cdbs = &dbs_info->cdbs;
+		shared = cdbs->shared;
+
+		/*
+		 * A valid shared and shared->policy means governor hasn't
+		 * stopped or exited yet.
+		 */
+		if (!shared || !shared->policy)
 			continue;
-		if (policy->governor != &cpufreq_gov_ondemand) {
-			cpufreq_cpu_put(policy);
+
+		policy = shared->policy;
+
+		/*
+		 * Update sampling rate for CPUs whose policy is governed by
+		 * dbs_data. In case of governor_per_policy, only a single
+		 * policy will be governed by dbs_data, otherwise there can be
+		 * multiple policies that are governed by the same dbs_data.
+		 */
+		if (dbs_data != policy->governor_data)
 			continue;
-		}
-		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
-		cpufreq_cpu_put(policy);
 
 		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
 			continue;
@@ -281,6 +300,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 
 		}
 	}
+
+	mutex_unlock(&od_dbs_cdata.mutex);
 }
 
 static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 1/6] cpufreq: ondemand: Update sampling rate only for concerned policies
@ 2015-12-03  4:07   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

We are comparing policy->governor against cpufreq_gov_ondemand to make
sure that we update sampling rate only for the concerned CPUs. But that
isn't enough.

In case of governor_per_policy, there can be multiple instances of
ondemand governor and we will always end up updating all of them with
current code. What we rather need to do, is to compare dbs_data with
poilcy->governor_data, which will match only for the policies governed
by dbs_data.

This code is also racy as the governor might be getting stopped at that
time and we may end up scheduling work for a policy, which we have just
disabled.

Fix that by protecting the entire function with &od_dbs_cdata.mutex,
which will prevent against races with policy START/STOP/etc.

After these locks are in place, we can safely get the policy via per-cpu
dbs_info.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 35 ++++++++++++++++++++++++++++-------
 1 file changed, 28 insertions(+), 7 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 03ac6ce54042..089ca6a6ca02 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -252,20 +252,39 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	od_tuners->sampling_rate = new_rate = max(new_rate,
 			dbs_data->min_sampling_rate);
 
+	/*
+	 * Lock governor so that governor start/stop can't execute in parallel.
+	 */
+	mutex_lock(&od_dbs_cdata.mutex);
+
 	for_each_online_cpu(cpu) {
 		struct cpufreq_policy *policy;
 		struct od_cpu_dbs_info_s *dbs_info;
+		struct cpu_dbs_info *cdbs;
+		struct cpu_common_dbs_info *shared;
 		unsigned long next_sampling, appointed_at;
 
-		policy = cpufreq_cpu_get(cpu);
-		if (!policy)
+		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
+		cdbs = &dbs_info->cdbs;
+		shared = cdbs->shared;
+
+		/*
+		 * A valid shared and shared->policy means governor hasn't
+		 * stopped or exited yet.
+		 */
+		if (!shared || !shared->policy)
 			continue;
-		if (policy->governor != &cpufreq_gov_ondemand) {
-			cpufreq_cpu_put(policy);
+
+		policy = shared->policy;
+
+		/*
+		 * Update sampling rate for CPUs whose policy is governed by
+		 * dbs_data. In case of governor_per_policy, only a single
+		 * policy will be governed by dbs_data, otherwise there can be
+		 * multiple policies that are governed by the same dbs_data.
+		 */
+		if (dbs_data != policy->governor_data)
 			continue;
-		}
-		dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
-		cpufreq_cpu_put(policy);
 
 		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
 			continue;
@@ -281,6 +300,8 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 
 		}
 	}
+
+	mutex_unlock(&od_dbs_cdata.mutex);
 }
 
 static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 2/6] cpufreq: ondemand: Work is guaranteed to be pending
  2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
@ 2015-12-03  4:07   ` Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

We are guaranteed to have works scheduled for policy->cpus, as the
policy isn't stopped yet. And so there is no need to check that again.
Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 089ca6a6ca02..08f2aa602f9e 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,9 +286,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		if (dbs_data != policy->governor_data)
 			continue;
 
-		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
-			continue;
-
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
 		appointed_at = dbs_info->cdbs.dwork.timer.expires;
 
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 2/6] cpufreq: ondemand: Work is guaranteed to be pending
@ 2015-12-03  4:07   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

We are guaranteed to have works scheduled for policy->cpus, as the
policy isn't stopped yet. And so there is no need to check that again.
Drop it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 089ca6a6ca02..08f2aa602f9e 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,9 +286,6 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		if (dbs_data != policy->governor_data)
 			continue;
 
-		if (!delayed_work_pending(&dbs_info->cdbs.dwork))
-			continue;
-
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
 		appointed_at = dbs_info->cdbs.dwork.timer.expires;
 
-- 
2.6.2.198.g614a2ac

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 3/6] cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()
  2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
@ 2015-12-03  4:07   ` Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and
dbs_data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 6 +++---
 drivers/cpufreq/cpufreq_governor.c     | 2 +-
 drivers/cpufreq/cpufreq_governor.h     | 3 +--
 drivers/cpufreq/cpufreq_ondemand.c     | 5 ++---
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 1fa1deb6e91f..606ad74abe6e 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -115,13 +115,13 @@ static void cs_check_cpu(int cpu, unsigned int load)
 	}
 }
 
-static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs,
-				 struct dbs_data *dbs_data, bool modify_all)
+static unsigned int cs_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
 	if (modify_all)
-		dbs_check_cpu(dbs_data, cdbs->shared->policy->cpu);
+		dbs_check_cpu(dbs_data, policy->cpu);
 
 	return delay_for_sampling_rate(cs_tuners->sampling_rate);
 }
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index b260576ddb12..cdcb56a49b28 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -253,7 +253,7 @@ static void dbs_timer(struct work_struct *work)
 	if (!need_load_eval(cdbs->shared, sampling_rate))
 		modify_all = false;
 
-	delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
 	gov_queue_work(dbs_data, policy, delay, modify_all);
 
 unlock:
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 5621bb03e874..0c7589016b6c 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -209,8 +209,7 @@ struct common_dbs_data {
 
 	struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu);
 	void *(*get_cpu_dbs_info_s)(int cpu);
-	unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs,
-				      struct dbs_data *dbs_data,
+	unsigned int (*gov_dbs_timer)(struct cpufreq_policy *policy,
 				      bool modify_all);
 	void (*gov_check_cpu)(int cpu, unsigned int load);
 	int (*init)(struct dbs_data *dbs_data, bool notify);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 08f2aa602f9e..fc0384b4d02d 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -191,10 +191,9 @@ static void od_check_cpu(int cpu, unsigned int load)
 	}
 }
 
-static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
-				 struct dbs_data *dbs_data, bool modify_all)
+static unsigned int od_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
 {
-	struct cpufreq_policy *policy = cdbs->shared->policy;
+	struct dbs_data *dbs_data = policy->governor_data;
 	unsigned int cpu = policy->cpu;
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info,
 			cpu);
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 3/6] cpufreq: governor: Pass policy as argument to ->gov_dbs_timer()
@ 2015-12-03  4:07   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and
dbs_data.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 6 +++---
 drivers/cpufreq/cpufreq_governor.c     | 2 +-
 drivers/cpufreq/cpufreq_governor.h     | 3 +--
 drivers/cpufreq/cpufreq_ondemand.c     | 5 ++---
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 1fa1deb6e91f..606ad74abe6e 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -115,13 +115,13 @@ static void cs_check_cpu(int cpu, unsigned int load)
 	}
 }
 
-static unsigned int cs_dbs_timer(struct cpu_dbs_info *cdbs,
-				 struct dbs_data *dbs_data, bool modify_all)
+static unsigned int cs_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
 	if (modify_all)
-		dbs_check_cpu(dbs_data, cdbs->shared->policy->cpu);
+		dbs_check_cpu(dbs_data, policy->cpu);
 
 	return delay_for_sampling_rate(cs_tuners->sampling_rate);
 }
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index b260576ddb12..cdcb56a49b28 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -253,7 +253,7 @@ static void dbs_timer(struct work_struct *work)
 	if (!need_load_eval(cdbs->shared, sampling_rate))
 		modify_all = false;
 
-	delay = dbs_data->cdata->gov_dbs_timer(cdbs, dbs_data, modify_all);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
 	gov_queue_work(dbs_data, policy, delay, modify_all);
 
 unlock:
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 5621bb03e874..0c7589016b6c 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -209,8 +209,7 @@ struct common_dbs_data {
 
 	struct cpu_dbs_info *(*get_cpu_cdbs)(int cpu);
 	void *(*get_cpu_dbs_info_s)(int cpu);
-	unsigned int (*gov_dbs_timer)(struct cpu_dbs_info *cdbs,
-				      struct dbs_data *dbs_data,
+	unsigned int (*gov_dbs_timer)(struct cpufreq_policy *policy,
 				      bool modify_all);
 	void (*gov_check_cpu)(int cpu, unsigned int load);
 	int (*init)(struct dbs_data *dbs_data, bool notify);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 08f2aa602f9e..fc0384b4d02d 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -191,10 +191,9 @@ static void od_check_cpu(int cpu, unsigned int load)
 	}
 }
 
-static unsigned int od_dbs_timer(struct cpu_dbs_info *cdbs,
-				 struct dbs_data *dbs_data, bool modify_all)
+static unsigned int od_dbs_timer(struct cpufreq_policy *policy, bool modify_all)
 {
-	struct cpufreq_policy *policy = cdbs->shared->policy;
+	struct dbs_data *dbs_data = policy->governor_data;
 	unsigned int cpu = policy->cpu;
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info,
 			cpu);
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 4/6] cpufreq: governor: initialize/destroy timer_mutex with 'shared'
  2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
@ 2015-12-03  4:07   ` Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

timer_mutex is required to be initialized only while memory for 'shared'
is allocated and in a similar way it is required to be destroyed only
when memory for 'shared' is freed.

There is no need to do the same every time we start/stop the governor.
Move code to initialize/destroy timer_mutex to the relevant places.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index cdcb56a49b28..999e1f6addf9 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -287,6 +287,7 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 	for_each_cpu(j, policy->related_cpus)
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
+	mutex_init(&shared->timer_mutex);
 	return 0;
 }
 
@@ -297,6 +298,8 @@ static void free_common_dbs_info(struct cpufreq_policy *policy,
 	struct cpu_common_dbs_info *shared = cdbs->shared;
 	int j;
 
+	mutex_destroy(&shared->timer_mutex);
+
 	for_each_cpu(j, policy->cpus)
 		cdata->get_cpu_cdbs(j)->shared = NULL;
 
@@ -433,7 +436,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 
 	shared->policy = policy;
 	shared->time_stamp = ktime_get();
-	mutex_init(&shared->timer_mutex);
 
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j);
@@ -493,8 +495,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	mutex_unlock(&shared->timer_mutex);
 
 	gov_cancel_work(dbs_data, policy);
-
-	mutex_destroy(&shared->timer_mutex);
 	return 0;
 }
 
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 4/6] cpufreq: governor: initialize/destroy timer_mutex with 'shared'
@ 2015-12-03  4:07   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

timer_mutex is required to be initialized only while memory for 'shared'
is allocated and in a similar way it is required to be destroyed only
when memory for 'shared' is freed.

There is no need to do the same every time we start/stop the governor.
Move code to initialize/destroy timer_mutex to the relevant places.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index cdcb56a49b28..999e1f6addf9 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -287,6 +287,7 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 	for_each_cpu(j, policy->related_cpus)
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
+	mutex_init(&shared->timer_mutex);
 	return 0;
 }
 
@@ -297,6 +298,8 @@ static void free_common_dbs_info(struct cpufreq_policy *policy,
 	struct cpu_common_dbs_info *shared = cdbs->shared;
 	int j;
 
+	mutex_destroy(&shared->timer_mutex);
+
 	for_each_cpu(j, policy->cpus)
 		cdata->get_cpu_cdbs(j)->shared = NULL;
 
@@ -433,7 +436,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 
 	shared->policy = policy;
 	shared->time_stamp = ktime_get();
-	mutex_init(&shared->timer_mutex);
 
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_info *j_cdbs = cdata->get_cpu_cdbs(j);
@@ -493,8 +495,6 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	mutex_unlock(&shared->timer_mutex);
 
 	gov_cancel_work(dbs_data, policy);
-
-	mutex_destroy(&shared->timer_mutex);
 	return 0;
 }
 
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
@ 2015-12-03  4:07   ` Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar,
	Rafael J. Wysocki, open list

cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c | 139 +++++++++++++++++++++----------------
 drivers/cpufreq/cpufreq_governor.h |  20 ++++--
 drivers/cpufreq/cpufreq_ondemand.c |   8 +--
 3 files changed, 98 insertions(+), 69 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 999e1f6addf9..a3f9bc9b98e9 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -158,47 +158,52 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
-		unsigned int delay)
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
 {
-	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
-
-	mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
-
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus)
-{
-	int i;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpu_dbs_info *cdbs;
+	int cpu;
 
-	if (!all_cpus) {
-		/*
-		 * Use raw_smp_processor_id() to avoid preemptible warnings.
-		 * We know that this is only called with all_cpus == false from
-		 * works that have been queued with *_work_on() functions and
-		 * those works are canceled during CPU_DOWN_PREPARE so they
-		 * can't possibly run on any other CPU.
-		 */
-		__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
-	} else {
-		for_each_cpu(i, policy->cpus)
-			__gov_queue_work(i, dbs_data, delay);
+	for_each_cpu(cpu, policy->cpus) {
+		cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+		cdbs->timer.expires = jiffies + delay;
+		add_timer_on(&cdbs->timer, cpu);
 	}
 }
-EXPORT_SYMBOL_GPL(gov_queue_work);
+EXPORT_SYMBOL_GPL(gov_add_timers);
 
-static inline void gov_cancel_work(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy)
+static inline void gov_cancel_timers(struct cpufreq_policy *policy)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cpu_dbs_info *cdbs;
 	int i;
 
 	for_each_cpu(i, policy->cpus) {
 		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
-		cancel_delayed_work_sync(&cdbs->dwork);
+		del_timer_sync(&cdbs->timer);
 	}
 }
 
+void gov_cancel_work(struct cpu_common_dbs_info *shared)
+{
+	unsigned long flags;
+
+	/*
+	 * No work will be queued from timer handlers after skip_work is
+	 * updated. And so we can safely cancel the work first and then the
+	 * timers.
+	 */
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work++;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	cancel_work_sync(&shared->work);
+
+	gov_cancel_timers(shared->policy);
+
+	shared->skip_work = 0;
+}
+
 /* Will return if we need to evaluate cpu load again or not */
 static bool need_load_eval(struct cpu_common_dbs_info *shared,
 			   unsigned int sampling_rate)
@@ -217,29 +222,21 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
 	return true;
 }
 
-static void dbs_timer(struct work_struct *work)
+static void dbs_work_handler(struct work_struct *work)
 {
-	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
-						 dwork.work);
-	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpu_common_dbs_info *shared = container_of(work, struct
+					cpu_common_dbs_info, work);
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	bool modify_all = true;
-
-	mutex_lock(&shared->timer_mutex);
+	bool eval_load;
 
 	policy = shared->policy;
-
-	/*
-	 * Governor might already be disabled and there is no point continuing
-	 * with the work-handler.
-	 */
-	if (!policy)
-		goto unlock;
-
 	dbs_data = policy->governor_data;
 
+	/* Kill all timers */
+	gov_cancel_timers(policy);
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -250,14 +247,44 @@ static void dbs_timer(struct work_struct *work)
 		sampling_rate = od_tuners->sampling_rate;
 	}
 
-	if (!need_load_eval(cdbs->shared, sampling_rate))
-		modify_all = false;
+	eval_load = need_load_eval(shared, sampling_rate);
 
-	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
-	gov_queue_work(dbs_data, policy, delay, modify_all);
+	/*
+	 * Make sure cpufreq_governor_limits() isn't evaluating load in
+	 * parallel.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
+	mutex_unlock(&shared->timer_mutex);
+
+	shared->skip_work--;
+	gov_add_timers(policy, delay);
+}
+
+static void dbs_timer_handler(unsigned long data)
+{
+	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
+	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpufreq_policy *policy;
+	unsigned long flags;
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	policy = shared->policy;
+
+	/*
+	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * - Another timer handler has done that
+	 * - We are stopping the governor
+	 * - Or we are updating the sampling rate of ondemand governor
+	 */
+	if (shared->skip_work)
+		goto unlock;
+
+	shared->skip_work++;
+	queue_work(system_wq, &shared->work);
 
 unlock:
-	mutex_unlock(&shared->timer_mutex);
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -288,6 +315,8 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
+	spin_lock_init(&shared->timer_lock);
+	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
 
@@ -452,7 +481,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-		INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
+		__setup_timer(&j_cdbs->timer, dbs_timer_handler,
+			      (unsigned long)j_cdbs,
+			      TIMER_DEFERRABLE | TIMER_IRQSAFE);
 	}
 
 	if (cdata->governor == GOV_CONSERVATIVE) {
@@ -470,8 +501,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
-		       true);
+	gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
 	return 0;
 }
 
@@ -485,16 +515,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
-	/*
-	 * Work-handler must see this updated, as it should not proceed any
-	 * further after governor is disabled. And so timer_mutex is taken while
-	 * updating this value.
-	 */
-	mutex_lock(&shared->timer_mutex);
+	gov_cancel_work(shared);
 	shared->policy = NULL;
-	mutex_unlock(&shared->timer_mutex);
 
-	gov_cancel_work(dbs_data, policy);
 	return 0;
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 0c7589016b6c..76742902491e 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -132,12 +132,20 @@ static void *get_cpu_dbs_info_s(int cpu)				\
 struct cpu_common_dbs_info {
 	struct cpufreq_policy *policy;
 	/*
-	 * percpu mutex that serializes governor limit change with dbs_timer
-	 * invocation. We do not want dbs_timer to run when user is changing
-	 * the governor or limits.
+	 * Per policy mutex that serializes load evaluation from limit-change
+	 * and work-handler.
 	 */
 	struct mutex timer_mutex;
+
+	/*
+	 * Per policy lock that serializes access to queuing work from timer
+	 * handlers.
+	 */
+	spinlock_t timer_lock;
+
 	ktime_t time_stamp;
+	unsigned int skip_work;
+	struct work_struct work;
 };
 
 /* Per cpu structures */
@@ -152,7 +160,7 @@ struct cpu_dbs_info {
 	 * wake-up from idle.
 	 */
 	unsigned int prev_load;
-	struct delayed_work dwork;
+	struct timer_list timer;
 	struct cpu_common_dbs_info *shared;
 };
 
@@ -268,11 +276,11 @@ static ssize_t show_sampling_rate_min_gov_pol				\
 
 extern struct mutex cpufreq_governor_lock;
 
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
+void gov_cancel_work(struct cpu_common_dbs_info *shared);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		struct common_dbs_data *cdata, unsigned int event);
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus);
 void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index fc0384b4d02d..f879012cf849 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,13 +286,11 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 			continue;
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
+		appointed_at = dbs_info->cdbs.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
+			gov_cancel_work(shared);
+			gov_add_timers(policy, usecs_to_jiffies(new_rate));
 
 		}
 	}
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
@ 2015-12-03  4:07   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar,
	Rafael J. Wysocki, open list

cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c | 139 +++++++++++++++++++++----------------
 drivers/cpufreq/cpufreq_governor.h |  20 ++++--
 drivers/cpufreq/cpufreq_ondemand.c |   8 +--
 3 files changed, 98 insertions(+), 69 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 999e1f6addf9..a3f9bc9b98e9 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -158,47 +158,52 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
-		unsigned int delay)
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
 {
-	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
-
-	mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
-
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus)
-{
-	int i;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpu_dbs_info *cdbs;
+	int cpu;
 
-	if (!all_cpus) {
-		/*
-		 * Use raw_smp_processor_id() to avoid preemptible warnings.
-		 * We know that this is only called with all_cpus == false from
-		 * works that have been queued with *_work_on() functions and
-		 * those works are canceled during CPU_DOWN_PREPARE so they
-		 * can't possibly run on any other CPU.
-		 */
-		__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
-	} else {
-		for_each_cpu(i, policy->cpus)
-			__gov_queue_work(i, dbs_data, delay);
+	for_each_cpu(cpu, policy->cpus) {
+		cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+		cdbs->timer.expires = jiffies + delay;
+		add_timer_on(&cdbs->timer, cpu);
 	}
 }
-EXPORT_SYMBOL_GPL(gov_queue_work);
+EXPORT_SYMBOL_GPL(gov_add_timers);
 
-static inline void gov_cancel_work(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy)
+static inline void gov_cancel_timers(struct cpufreq_policy *policy)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cpu_dbs_info *cdbs;
 	int i;
 
 	for_each_cpu(i, policy->cpus) {
 		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
-		cancel_delayed_work_sync(&cdbs->dwork);
+		del_timer_sync(&cdbs->timer);
 	}
 }
 
+void gov_cancel_work(struct cpu_common_dbs_info *shared)
+{
+	unsigned long flags;
+
+	/*
+	 * No work will be queued from timer handlers after skip_work is
+	 * updated. And so we can safely cancel the work first and then the
+	 * timers.
+	 */
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work++;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	cancel_work_sync(&shared->work);
+
+	gov_cancel_timers(shared->policy);
+
+	shared->skip_work = 0;
+}
+
 /* Will return if we need to evaluate cpu load again or not */
 static bool need_load_eval(struct cpu_common_dbs_info *shared,
 			   unsigned int sampling_rate)
@@ -217,29 +222,21 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
 	return true;
 }
 
-static void dbs_timer(struct work_struct *work)
+static void dbs_work_handler(struct work_struct *work)
 {
-	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
-						 dwork.work);
-	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpu_common_dbs_info *shared = container_of(work, struct
+					cpu_common_dbs_info, work);
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	bool modify_all = true;
-
-	mutex_lock(&shared->timer_mutex);
+	bool eval_load;
 
 	policy = shared->policy;
-
-	/*
-	 * Governor might already be disabled and there is no point continuing
-	 * with the work-handler.
-	 */
-	if (!policy)
-		goto unlock;
-
 	dbs_data = policy->governor_data;
 
+	/* Kill all timers */
+	gov_cancel_timers(policy);
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -250,14 +247,44 @@ static void dbs_timer(struct work_struct *work)
 		sampling_rate = od_tuners->sampling_rate;
 	}
 
-	if (!need_load_eval(cdbs->shared, sampling_rate))
-		modify_all = false;
+	eval_load = need_load_eval(shared, sampling_rate);
 
-	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
-	gov_queue_work(dbs_data, policy, delay, modify_all);
+	/*
+	 * Make sure cpufreq_governor_limits() isn't evaluating load in
+	 * parallel.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
+	mutex_unlock(&shared->timer_mutex);
+
+	shared->skip_work--;
+	gov_add_timers(policy, delay);
+}
+
+static void dbs_timer_handler(unsigned long data)
+{
+	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
+	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpufreq_policy *policy;
+	unsigned long flags;
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	policy = shared->policy;
+
+	/*
+	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * - Another timer handler has done that
+	 * - We are stopping the governor
+	 * - Or we are updating the sampling rate of ondemand governor
+	 */
+	if (shared->skip_work)
+		goto unlock;
+
+	shared->skip_work++;
+	queue_work(system_wq, &shared->work);
 
 unlock:
-	mutex_unlock(&shared->timer_mutex);
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -288,6 +315,8 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
+	spin_lock_init(&shared->timer_lock);
+	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
 
@@ -452,7 +481,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-		INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
+		__setup_timer(&j_cdbs->timer, dbs_timer_handler,
+			      (unsigned long)j_cdbs,
+			      TIMER_DEFERRABLE | TIMER_IRQSAFE);
 	}
 
 	if (cdata->governor == GOV_CONSERVATIVE) {
@@ -470,8 +501,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
-		       true);
+	gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
 	return 0;
 }
 
@@ -485,16 +515,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
-	/*
-	 * Work-handler must see this updated, as it should not proceed any
-	 * further after governor is disabled. And so timer_mutex is taken while
-	 * updating this value.
-	 */
-	mutex_lock(&shared->timer_mutex);
+	gov_cancel_work(shared);
 	shared->policy = NULL;
-	mutex_unlock(&shared->timer_mutex);
 
-	gov_cancel_work(dbs_data, policy);
 	return 0;
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 0c7589016b6c..76742902491e 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -132,12 +132,20 @@ static void *get_cpu_dbs_info_s(int cpu)				\
 struct cpu_common_dbs_info {
 	struct cpufreq_policy *policy;
 	/*
-	 * percpu mutex that serializes governor limit change with dbs_timer
-	 * invocation. We do not want dbs_timer to run when user is changing
-	 * the governor or limits.
+	 * Per policy mutex that serializes load evaluation from limit-change
+	 * and work-handler.
 	 */
 	struct mutex timer_mutex;
+
+	/*
+	 * Per policy lock that serializes access to queuing work from timer
+	 * handlers.
+	 */
+	spinlock_t timer_lock;
+
 	ktime_t time_stamp;
+	unsigned int skip_work;
+	struct work_struct work;
 };
 
 /* Per cpu structures */
@@ -152,7 +160,7 @@ struct cpu_dbs_info {
 	 * wake-up from idle.
 	 */
 	unsigned int prev_load;
-	struct delayed_work dwork;
+	struct timer_list timer;
 	struct cpu_common_dbs_info *shared;
 };
 
@@ -268,11 +276,11 @@ static ssize_t show_sampling_rate_min_gov_pol				\
 
 extern struct mutex cpufreq_governor_lock;
 
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
+void gov_cancel_work(struct cpu_common_dbs_info *shared);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		struct common_dbs_data *cdata, unsigned int event);
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus);
 void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index fc0384b4d02d..f879012cf849 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,13 +286,11 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 			continue;
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
+		appointed_at = dbs_info->cdbs.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
+			gov_cancel_work(shared);
+			gov_add_timers(policy, usecs_to_jiffies(new_rate));
 
 		}
 	}
-- 
2.6.2.198.g614a2ac

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 6/6] cpufreq: ondemand: update update_sampling_rate() to make it more efficient
  2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
@ 2015-12-03  4:07   ` Viresh Kumar
  2015-12-03  4:07   ` Viresh Kumar
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

Currently update_sampling_rate() runs over each online CPU and
cancels/queues timers on all policy->cpus every time. This should be
done just once for any cpu belonging to a policy.

Create a cpumask and keep on clearing it as and when we process
policies, so that we don't have to traverse through all CPUs of the same
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index f879012cf849..eae51070c034 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -246,6 +246,7 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		unsigned int new_rate)
 {
 	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+	struct cpumask cpumask;
 	int cpu;
 
 	od_tuners->sampling_rate = new_rate = max(new_rate,
@@ -256,7 +257,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	 */
 	mutex_lock(&od_dbs_cdata.mutex);
 
-	for_each_online_cpu(cpu) {
+	cpumask_copy(&cpumask, cpu_online_mask);
+
+	for_each_cpu(cpu, &cpumask) {
 		struct cpufreq_policy *policy;
 		struct od_cpu_dbs_info_s *dbs_info;
 		struct cpu_dbs_info *cdbs;
@@ -276,6 +279,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 
 		policy = shared->policy;
 
+		/* clear all CPUs of this policy */
+		cpumask_andnot(&cpumask, &cpumask, policy->cpus);
+
 		/*
 		 * Update sampling rate for CPUs whose policy is governed by
 		 * dbs_data. In case of governor_per_policy, only a single
@@ -285,6 +291,10 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		if (dbs_data != policy->governor_data)
 			continue;
 
+		/*
+		 * Checking this for any CPU should be fine, timers for all of
+		 * them are scheduled together.
+		 */
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
 		appointed_at = dbs_info->cdbs.timer.expires;
 
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V2 6/6] cpufreq: ondemand: update update_sampling_rate() to make it more efficient
@ 2015-12-03  4:07   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-03  4:07 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Viresh Kumar, open list

Currently update_sampling_rate() runs over each online CPU and
cancels/queues timers on all policy->cpus every time. This should be
done just once for any cpu belonging to a policy.

Create a cpumask and keep on clearing it as and when we process
policies, so that we don't have to traverse through all CPUs of the same
policy.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_ondemand.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index f879012cf849..eae51070c034 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -246,6 +246,7 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		unsigned int new_rate)
 {
 	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+	struct cpumask cpumask;
 	int cpu;
 
 	od_tuners->sampling_rate = new_rate = max(new_rate,
@@ -256,7 +257,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	 */
 	mutex_lock(&od_dbs_cdata.mutex);
 
-	for_each_online_cpu(cpu) {
+	cpumask_copy(&cpumask, cpu_online_mask);
+
+	for_each_cpu(cpu, &cpumask) {
 		struct cpufreq_policy *policy;
 		struct od_cpu_dbs_info_s *dbs_info;
 		struct cpu_dbs_info *cdbs;
@@ -276,6 +279,9 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 
 		policy = shared->policy;
 
+		/* clear all CPUs of this policy */
+		cpumask_andnot(&cpumask, &cpumask, policy->cpus);
+
 		/*
 		 * Update sampling rate for CPUs whose policy is governed by
 		 * dbs_data. In case of governor_per_policy, only a single
@@ -285,6 +291,10 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 		if (dbs_data != policy->governor_data)
 			continue;
 
+		/*
+		 * Checking this for any CPU should be fine, timers for all of
+		 * them are scheduled together.
+		 */
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
 		appointed_at = dbs_info->cdbs.timer.expires;
 
-- 
2.6.2.198.g614a2ac

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-03  4:07   ` Viresh Kumar
  (?)
@ 2015-12-04  1:18   ` Rafael J. Wysocki
  2015-12-04  6:11     ` Viresh Kumar
  -1 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-04  1:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On Thursday, December 03, 2015 09:37:53 AM Viresh Kumar wrote:
> cpufreq governors evaluate load at sampling rate and based on that they
> update frequency for a group of CPUs belonging to the same cpufreq
> policy.
> 
> This is required to be done in a single thread for all policy->cpus, but
> because we don't want to wakeup idle CPUs to do just that, we use
> deferrable work for this. If we would have used a single delayed
> deferrable work for the entire policy, there were chances that the CPU
> required to run the handler can be in idle and we might end up not
> changing the frequency for the entire group with load variations.
> 
> And so we were forced to keep per-cpu works, and only the one that
> expires first need to do the real work and others are rescheduled for
> next sampling time.
> 
> We have been using the more complex solution until now, where we used a
> delayed deferrable work for this, which is a combination of a timer and
> a work.
> 
> This could be made lightweight by keeping per-cpu deferred timers with a
> single work item, which is scheduled by the first timer that expires.
> 
> This patch does just that and here are important changes:
> - The timer handler will run in irq context and so we need to use a
>   spin_lock instead of the timer_mutex. And so a separate timer_lock is
>   created. This also makes the use of the mutex and lock quite clear, as
>   we know what exactly they are protecting.
> - A new field 'skip_work' is added to track when the timer handlers can
>   queue a work. More comments present in code.
> 
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>

I've tentatively queued this one up, but I still have a couple of questions.

> ---
>  drivers/cpufreq/cpufreq_governor.c | 139 +++++++++++++++++++++----------------
>  drivers/cpufreq/cpufreq_governor.h |  20 ++++--
>  drivers/cpufreq/cpufreq_ondemand.c |   8 +--
>  3 files changed, 98 insertions(+), 69 deletions(-)

[cut]

> @@ -250,14 +247,44 @@ static void dbs_timer(struct work_struct *work)
>  		sampling_rate = od_tuners->sampling_rate;
>  	}
>  
> -	if (!need_load_eval(cdbs->shared, sampling_rate))
> -		modify_all = false;
> +	eval_load = need_load_eval(shared, sampling_rate);
>  
> -	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
> -	gov_queue_work(dbs_data, policy, delay, modify_all);
> +	/*
> +	 * Make sure cpufreq_governor_limits() isn't evaluating load in
> +	 * parallel.
> +	 */
> +	mutex_lock(&shared->timer_mutex);
> +	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
> +	mutex_unlock(&shared->timer_mutex);
> +
> +	shared->skip_work--;

Is there any reason for incrementing and decrementing this instead of setting
it to either 0 or 1 (or maybe either 'true' or 'false' for that matter)?

If my reading of the patch is correct, it can only be either 0 or 1 anyway, right?

> +	gov_add_timers(policy, delay);
> +}
> +
> +static void dbs_timer_handler(unsigned long data)
> +{
> +	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
> +	struct cpu_common_dbs_info *shared = cdbs->shared;
> +	struct cpufreq_policy *policy;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&shared->timer_lock, flags);
> +	policy = shared->policy;

Why do we need policy here?

> +
> +	/*
> +	 * Timer handler isn't allowed to queue work at the moment, because:
> +	 * - Another timer handler has done that
> +	 * - We are stopping the governor
> +	 * - Or we are updating the sampling rate of ondemand governor
> +	 */
> +	if (shared->skip_work)
> +		goto unlock;
> +
> +	shared->skip_work++;
> +	queue_work(system_wq, &shared->work);
>  
>  unlock:

What about writing the above as

	if (!shared->work_in_progress) {
		shared->work_in_progress = true;
		queue_work(system_wq, &shared->work);
	}

and then you won't need the unlock label.

> -	mutex_unlock(&shared->timer_mutex);
> +	spin_unlock_irqrestore(&shared->timer_lock, flags);
>  }
>  
>  static void set_sampling_rate(struct dbs_data *dbs_data,

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-04  1:18   ` Rafael J. Wysocki
@ 2015-12-04  6:11     ` Viresh Kumar
  2015-12-05  2:14       ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-04  6:11 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On 04-12-15, 02:18, Rafael J. Wysocki wrote:
> > +	shared->skip_work--;
> 
> Is there any reason for incrementing and decrementing this instead of setting
> it to either 0 or 1 (or maybe either 'true' or 'false' for that matter)?
> 
> If my reading of the patch is correct, it can only be either 0 or 1 anyway, right?

No. It can be 0, 1 or 2.

If the timer handler is running on any CPU, we increment skip_work, so
its value is 1. If at the same time, we try to stop the governor, we
increment it again and its value is 2 now.

Once timer-handler finishes, it decrements it and its value become 1.
Which guarantees that no other timer handler starts executing at this
point of time and we can safely do gov_cancel_timers(). And once we
are sure that we don't have any work/timer left, we make it 0 (as we
aren't sure of the current value, which can be 0 (if the timer handler
wasn't running when we stopped the governor) or 1 (if the timer
handler was running while stopping the governor)).

Hope this clarifies it.

> > +static void dbs_timer_handler(unsigned long data)
> > +{
> > +	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
> > +	struct cpu_common_dbs_info *shared = cdbs->shared;
> > +	struct cpufreq_policy *policy;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&shared->timer_lock, flags);
> > +	policy = shared->policy;
> 
> Why do we need policy here?
> 
> > +
> > +	/*
> > +	 * Timer handler isn't allowed to queue work at the moment, because:
> > +	 * - Another timer handler has done that
> > +	 * - We are stopping the governor
> > +	 * - Or we are updating the sampling rate of ondemand governor
> > +	 */
> > +	if (shared->skip_work)
> > +		goto unlock;
> > +
> > +	shared->skip_work++;
> > +	queue_work(system_wq, &shared->work);
> >  
> >  unlock:
> 
> What about writing the above as
> 
> 	if (!shared->work_in_progress) {
> 		shared->work_in_progress = true;
> 		queue_work(system_wq, &shared->work);
> 	}
> 
> and then you won't need the unlock label.

Here is a diff for that:

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index a3f9bc9b98e9..c9e420bd0eec 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -265,11 +265,9 @@ static void dbs_timer_handler(unsigned long data)
 {
        struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
        struct cpu_common_dbs_info *shared = cdbs->shared;
-       struct cpufreq_policy *policy;
        unsigned long flags;
 
        spin_lock_irqsave(&shared->timer_lock, flags);
-       policy = shared->policy;
 
        /*
         * Timer handler isn't allowed to queue work at the moment, because:
@@ -277,13 +275,11 @@ static void dbs_timer_handler(unsigned long data)
         * - We are stopping the governor
         * - Or we are updating the sampling rate of ondemand governor
         */
-       if (shared->skip_work)
-               goto unlock;
-
-       shared->skip_work++;
-       queue_work(system_wq, &shared->work);
+       if (!shared->skip_work) {
+               shared->skip_work++;
+               queue_work(system_wq, &shared->work);
+       }
 
-unlock:
        spin_unlock_irqrestore(&shared->timer_lock, flags);
 }

I will resend this patch now.

-- 
viresh

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V3 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-03  4:07   ` Viresh Kumar
@ 2015-12-04  6:13     ` Viresh Kumar
  -1 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-04  6:13 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, Viresh Kumar, Rafael J. Wysocki,
	Ashwin Chaugule, open list

cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
V2->V3:
- Dropped unused variable policy
- Rearranged code to kill an extra label

 drivers/cpufreq/cpufreq_governor.c | 137 +++++++++++++++++++++----------------
 drivers/cpufreq/cpufreq_governor.h |  20 ++++--
 drivers/cpufreq/cpufreq_ondemand.c |   8 +--
 3 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 999e1f6addf9..c9e420bd0eec 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -158,47 +158,52 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
-		unsigned int delay)
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
 {
-	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
-
-	mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
-
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus)
-{
-	int i;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpu_dbs_info *cdbs;
+	int cpu;
 
-	if (!all_cpus) {
-		/*
-		 * Use raw_smp_processor_id() to avoid preemptible warnings.
-		 * We know that this is only called with all_cpus == false from
-		 * works that have been queued with *_work_on() functions and
-		 * those works are canceled during CPU_DOWN_PREPARE so they
-		 * can't possibly run on any other CPU.
-		 */
-		__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
-	} else {
-		for_each_cpu(i, policy->cpus)
-			__gov_queue_work(i, dbs_data, delay);
+	for_each_cpu(cpu, policy->cpus) {
+		cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+		cdbs->timer.expires = jiffies + delay;
+		add_timer_on(&cdbs->timer, cpu);
 	}
 }
-EXPORT_SYMBOL_GPL(gov_queue_work);
+EXPORT_SYMBOL_GPL(gov_add_timers);
 
-static inline void gov_cancel_work(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy)
+static inline void gov_cancel_timers(struct cpufreq_policy *policy)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cpu_dbs_info *cdbs;
 	int i;
 
 	for_each_cpu(i, policy->cpus) {
 		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
-		cancel_delayed_work_sync(&cdbs->dwork);
+		del_timer_sync(&cdbs->timer);
 	}
 }
 
+void gov_cancel_work(struct cpu_common_dbs_info *shared)
+{
+	unsigned long flags;
+
+	/*
+	 * No work will be queued from timer handlers after skip_work is
+	 * updated. And so we can safely cancel the work first and then the
+	 * timers.
+	 */
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work++;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	cancel_work_sync(&shared->work);
+
+	gov_cancel_timers(shared->policy);
+
+	shared->skip_work = 0;
+}
+
 /* Will return if we need to evaluate cpu load again or not */
 static bool need_load_eval(struct cpu_common_dbs_info *shared,
 			   unsigned int sampling_rate)
@@ -217,29 +222,21 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
 	return true;
 }
 
-static void dbs_timer(struct work_struct *work)
+static void dbs_work_handler(struct work_struct *work)
 {
-	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
-						 dwork.work);
-	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpu_common_dbs_info *shared = container_of(work, struct
+					cpu_common_dbs_info, work);
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	bool modify_all = true;
-
-	mutex_lock(&shared->timer_mutex);
+	bool eval_load;
 
 	policy = shared->policy;
-
-	/*
-	 * Governor might already be disabled and there is no point continuing
-	 * with the work-handler.
-	 */
-	if (!policy)
-		goto unlock;
-
 	dbs_data = policy->governor_data;
 
+	/* Kill all timers */
+	gov_cancel_timers(policy);
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -250,14 +247,40 @@ static void dbs_timer(struct work_struct *work)
 		sampling_rate = od_tuners->sampling_rate;
 	}
 
-	if (!need_load_eval(cdbs->shared, sampling_rate))
-		modify_all = false;
-
-	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
-	gov_queue_work(dbs_data, policy, delay, modify_all);
+	eval_load = need_load_eval(shared, sampling_rate);
 
-unlock:
+	/*
+	 * Make sure cpufreq_governor_limits() isn't evaluating load in
+	 * parallel.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
 	mutex_unlock(&shared->timer_mutex);
+
+	shared->skip_work--;
+	gov_add_timers(policy, delay);
+}
+
+static void dbs_timer_handler(unsigned long data)
+{
+	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
+	struct cpu_common_dbs_info *shared = cdbs->shared;
+	unsigned long flags;
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+
+	/*
+	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * - Another timer handler has done that
+	 * - We are stopping the governor
+	 * - Or we are updating the sampling rate of ondemand governor
+	 */
+	if (!shared->skip_work) {
+		shared->skip_work++;
+		queue_work(system_wq, &shared->work);
+	}
+
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -288,6 +311,8 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
+	spin_lock_init(&shared->timer_lock);
+	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
 
@@ -452,7 +477,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-		INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
+		__setup_timer(&j_cdbs->timer, dbs_timer_handler,
+			      (unsigned long)j_cdbs,
+			      TIMER_DEFERRABLE | TIMER_IRQSAFE);
 	}
 
 	if (cdata->governor == GOV_CONSERVATIVE) {
@@ -470,8 +497,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
-		       true);
+	gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
 	return 0;
 }
 
@@ -485,16 +511,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
-	/*
-	 * Work-handler must see this updated, as it should not proceed any
-	 * further after governor is disabled. And so timer_mutex is taken while
-	 * updating this value.
-	 */
-	mutex_lock(&shared->timer_mutex);
+	gov_cancel_work(shared);
 	shared->policy = NULL;
-	mutex_unlock(&shared->timer_mutex);
 
-	gov_cancel_work(dbs_data, policy);
 	return 0;
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 0c7589016b6c..76742902491e 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -132,12 +132,20 @@ static void *get_cpu_dbs_info_s(int cpu)				\
 struct cpu_common_dbs_info {
 	struct cpufreq_policy *policy;
 	/*
-	 * percpu mutex that serializes governor limit change with dbs_timer
-	 * invocation. We do not want dbs_timer to run when user is changing
-	 * the governor or limits.
+	 * Per policy mutex that serializes load evaluation from limit-change
+	 * and work-handler.
 	 */
 	struct mutex timer_mutex;
+
+	/*
+	 * Per policy lock that serializes access to queuing work from timer
+	 * handlers.
+	 */
+	spinlock_t timer_lock;
+
 	ktime_t time_stamp;
+	unsigned int skip_work;
+	struct work_struct work;
 };
 
 /* Per cpu structures */
@@ -152,7 +160,7 @@ struct cpu_dbs_info {
 	 * wake-up from idle.
 	 */
 	unsigned int prev_load;
-	struct delayed_work dwork;
+	struct timer_list timer;
 	struct cpu_common_dbs_info *shared;
 };
 
@@ -268,11 +276,11 @@ static ssize_t show_sampling_rate_min_gov_pol				\
 
 extern struct mutex cpufreq_governor_lock;
 
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
+void gov_cancel_work(struct cpu_common_dbs_info *shared);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		struct common_dbs_data *cdata, unsigned int event);
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus);
 void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index fc0384b4d02d..f879012cf849 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,13 +286,11 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 			continue;
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
+		appointed_at = dbs_info->cdbs.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
+			gov_cancel_work(shared);
+			gov_add_timers(policy, usecs_to_jiffies(new_rate));
 
 		}
 	}
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V3 5/6] cpufreq: governor: replace per-cpu delayed work with timers
@ 2015-12-04  6:13     ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-04  6:13 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, Viresh Kumar, Rafael J. Wysocki,
	Ashwin Chaugule, open list

cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
V2->V3:
- Dropped unused variable policy
- Rearranged code to kill an extra label

 drivers/cpufreq/cpufreq_governor.c | 137 +++++++++++++++++++++----------------
 drivers/cpufreq/cpufreq_governor.h |  20 ++++--
 drivers/cpufreq/cpufreq_ondemand.c |   8 +--
 3 files changed, 95 insertions(+), 70 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 999e1f6addf9..c9e420bd0eec 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -158,47 +158,52 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
-		unsigned int delay)
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
 {
-	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
-
-	mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
-
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus)
-{
-	int i;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpu_dbs_info *cdbs;
+	int cpu;
 
-	if (!all_cpus) {
-		/*
-		 * Use raw_smp_processor_id() to avoid preemptible warnings.
-		 * We know that this is only called with all_cpus == false from
-		 * works that have been queued with *_work_on() functions and
-		 * those works are canceled during CPU_DOWN_PREPARE so they
-		 * can't possibly run on any other CPU.
-		 */
-		__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
-	} else {
-		for_each_cpu(i, policy->cpus)
-			__gov_queue_work(i, dbs_data, delay);
+	for_each_cpu(cpu, policy->cpus) {
+		cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+		cdbs->timer.expires = jiffies + delay;
+		add_timer_on(&cdbs->timer, cpu);
 	}
 }
-EXPORT_SYMBOL_GPL(gov_queue_work);
+EXPORT_SYMBOL_GPL(gov_add_timers);
 
-static inline void gov_cancel_work(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy)
+static inline void gov_cancel_timers(struct cpufreq_policy *policy)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cpu_dbs_info *cdbs;
 	int i;
 
 	for_each_cpu(i, policy->cpus) {
 		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
-		cancel_delayed_work_sync(&cdbs->dwork);
+		del_timer_sync(&cdbs->timer);
 	}
 }
 
+void gov_cancel_work(struct cpu_common_dbs_info *shared)
+{
+	unsigned long flags;
+
+	/*
+	 * No work will be queued from timer handlers after skip_work is
+	 * updated. And so we can safely cancel the work first and then the
+	 * timers.
+	 */
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work++;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	cancel_work_sync(&shared->work);
+
+	gov_cancel_timers(shared->policy);
+
+	shared->skip_work = 0;
+}
+
 /* Will return if we need to evaluate cpu load again or not */
 static bool need_load_eval(struct cpu_common_dbs_info *shared,
 			   unsigned int sampling_rate)
@@ -217,29 +222,21 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
 	return true;
 }
 
-static void dbs_timer(struct work_struct *work)
+static void dbs_work_handler(struct work_struct *work)
 {
-	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
-						 dwork.work);
-	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpu_common_dbs_info *shared = container_of(work, struct
+					cpu_common_dbs_info, work);
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	bool modify_all = true;
-
-	mutex_lock(&shared->timer_mutex);
+	bool eval_load;
 
 	policy = shared->policy;
-
-	/*
-	 * Governor might already be disabled and there is no point continuing
-	 * with the work-handler.
-	 */
-	if (!policy)
-		goto unlock;
-
 	dbs_data = policy->governor_data;
 
+	/* Kill all timers */
+	gov_cancel_timers(policy);
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -250,14 +247,40 @@ static void dbs_timer(struct work_struct *work)
 		sampling_rate = od_tuners->sampling_rate;
 	}
 
-	if (!need_load_eval(cdbs->shared, sampling_rate))
-		modify_all = false;
-
-	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
-	gov_queue_work(dbs_data, policy, delay, modify_all);
+	eval_load = need_load_eval(shared, sampling_rate);
 
-unlock:
+	/*
+	 * Make sure cpufreq_governor_limits() isn't evaluating load in
+	 * parallel.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
 	mutex_unlock(&shared->timer_mutex);
+
+	shared->skip_work--;
+	gov_add_timers(policy, delay);
+}
+
+static void dbs_timer_handler(unsigned long data)
+{
+	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
+	struct cpu_common_dbs_info *shared = cdbs->shared;
+	unsigned long flags;
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+
+	/*
+	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * - Another timer handler has done that
+	 * - We are stopping the governor
+	 * - Or we are updating the sampling rate of ondemand governor
+	 */
+	if (!shared->skip_work) {
+		shared->skip_work++;
+		queue_work(system_wq, &shared->work);
+	}
+
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -288,6 +311,8 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
+	spin_lock_init(&shared->timer_lock);
+	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
 
@@ -452,7 +477,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-		INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
+		__setup_timer(&j_cdbs->timer, dbs_timer_handler,
+			      (unsigned long)j_cdbs,
+			      TIMER_DEFERRABLE | TIMER_IRQSAFE);
 	}
 
 	if (cdata->governor == GOV_CONSERVATIVE) {
@@ -470,8 +497,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
-		       true);
+	gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
 	return 0;
 }
 
@@ -485,16 +511,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
-	/*
-	 * Work-handler must see this updated, as it should not proceed any
-	 * further after governor is disabled. And so timer_mutex is taken while
-	 * updating this value.
-	 */
-	mutex_lock(&shared->timer_mutex);
+	gov_cancel_work(shared);
 	shared->policy = NULL;
-	mutex_unlock(&shared->timer_mutex);
 
-	gov_cancel_work(dbs_data, policy);
 	return 0;
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 0c7589016b6c..76742902491e 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -132,12 +132,20 @@ static void *get_cpu_dbs_info_s(int cpu)				\
 struct cpu_common_dbs_info {
 	struct cpufreq_policy *policy;
 	/*
-	 * percpu mutex that serializes governor limit change with dbs_timer
-	 * invocation. We do not want dbs_timer to run when user is changing
-	 * the governor or limits.
+	 * Per policy mutex that serializes load evaluation from limit-change
+	 * and work-handler.
 	 */
 	struct mutex timer_mutex;
+
+	/*
+	 * Per policy lock that serializes access to queuing work from timer
+	 * handlers.
+	 */
+	spinlock_t timer_lock;
+
 	ktime_t time_stamp;
+	unsigned int skip_work;
+	struct work_struct work;
 };
 
 /* Per cpu structures */
@@ -152,7 +160,7 @@ struct cpu_dbs_info {
 	 * wake-up from idle.
 	 */
 	unsigned int prev_load;
-	struct delayed_work dwork;
+	struct timer_list timer;
 	struct cpu_common_dbs_info *shared;
 };
 
@@ -268,11 +276,11 @@ static ssize_t show_sampling_rate_min_gov_pol				\
 
 extern struct mutex cpufreq_governor_lock;
 
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
+void gov_cancel_work(struct cpu_common_dbs_info *shared);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		struct common_dbs_data *cdata, unsigned int event);
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus);
 void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index fc0384b4d02d..f879012cf849 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,13 +286,11 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 			continue;
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
+		appointed_at = dbs_info->cdbs.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
+			gov_cancel_work(shared);
+			gov_add_timers(policy, usecs_to_jiffies(new_rate));
 
 		}
 	}
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-04  6:11     ` Viresh Kumar
@ 2015-12-05  2:14       ` Rafael J. Wysocki
  2015-12-05  4:10         ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-05  2:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On Friday, December 04, 2015 11:41:01 AM Viresh Kumar wrote:
> On 04-12-15, 02:18, Rafael J. Wysocki wrote:
> > > +	shared->skip_work--;
> > 
> > Is there any reason for incrementing and decrementing this instead of setting
> > it to either 0 or 1 (or maybe either 'true' or 'false' for that matter)?
> > 
> > If my reading of the patch is correct, it can only be either 0 or 1 anyway, right?
> 
> No. It can be 0, 1 or 2.
> 
> If the timer handler is running on any CPU, we increment skip_work, so
> its value is 1. If at the same time, we try to stop the governor, we
> increment it again and its value is 2 now.
> 
> Once timer-handler finishes, it decrements it and its value become 1.
> Which guarantees that no other timer handler starts executing at this
> point of time and we can safely do gov_cancel_timers(). And once we
> are sure that we don't have any work/timer left, we make it 0 (as we
> aren't sure of the current value, which can be 0 (if the timer handler
> wasn't running when we stopped the governor) or 1 (if the timer
> handler was running while stopping the governor)).
> 
> Hope this clarifies it.

Well, almost, but not quite yet, because now the question is what prevents
gov_cancel_work() from racing with dbs_work_handler().

If you can guarantee that they'll never run in parallel with each other,
you probably don't need the whole counter dance.  Otherwise, dbs_work_handler()
should decrement the counter under the spinlock after all I suppose.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-05  2:14       ` Rafael J. Wysocki
@ 2015-12-05  4:10         ` Viresh Kumar
  2015-12-07  1:28           ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-05  4:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On 05-12-15, 03:14, Rafael J. Wysocki wrote:
> Well, almost, but not quite yet, because now the question is what prevents
> gov_cancel_work() from racing with dbs_work_handler().
> 
> If you can guarantee that they'll never run in parallel with each other,

They can run in parallel and that's how we fix it now:
- raising skip_work to 2 makes sure that no new timer-handler can
  queue a new work.
- After raising the value of skip_work to 2, we do cancel_work_sync().
  Which will make sure that the work-handler has finished after
  cancel_work_sync() has returned.
- At this point of time we are sure that the works and their handlers
  are completely killed.
- All that is left is to kill all timer-handler (which might have
  gotten queued from the work handler, before it finished).
- And we do that with gov_cancel_timers().
- And then we are in safe state, where we are guaranteed that there
  are no leftovers.

> you probably don't need the whole counter dance.  Otherwise, dbs_work_handler()
> should decrement the counter under the spinlock after all I suppose.

Its not required because we don't have any race around that decrement
operation.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-05  4:10         ` Viresh Kumar
@ 2015-12-07  1:28           ` Rafael J. Wysocki
  2015-12-07  7:50             ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-07  1:28 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On Saturday, December 05, 2015 09:40:42 AM Viresh Kumar wrote:
> On 05-12-15, 03:14, Rafael J. Wysocki wrote:
> > Well, almost, but not quite yet, because now the question is what prevents
> > gov_cancel_work() from racing with dbs_work_handler().
> > 
> > If you can guarantee that they'll never run in parallel with each other,
> 
> They can run in parallel and that's how we fix it now:
> - raising skip_work to 2 makes sure that no new timer-handler can
>   queue a new work.

What about if that happens in parallel with the decrementation in
dbs_work_handler()?

Is there anything preventing that from happening?

> - After raising the value of skip_work to 2, we do cancel_work_sync().
>   Which will make sure that the work-handler has finished after
>   cancel_work_sync() has returned.
> - At this point of time we are sure that the works and their handlers
>   are completely killed.
> - All that is left is to kill all timer-handler (which might have
>   gotten queued from the work handler, before it finished).
> - And we do that with gov_cancel_timers().
> - And then we are in safe state, where we are guaranteed that there
>   are no leftovers.

Yes, that part will work.

> > you probably don't need the whole counter dance.  Otherwise, dbs_work_handler()
> > should decrement the counter under the spinlock after all I suppose.
> 
> Its not required because we don't have any race around that decrement
> operation.

As I said, if you can guarantee that the decrementation of the counter in
dbs_work_handler() cannot happen at the same time as the incrementation of
it in gov_cancel_work(), all is fine, but can you actually guarantee that?

That aside, I think you could avoid using the spinlock altogether if the
counter was atomic (and which would make the above irrelevant too).

Say, skip_work is atomic the the relevant code in dbs_timer_handler() is
written as

	atomic_inc(&shared->skip_work);
	smp_mb__after_atomic();
	if (atomic_read(&shared->skip_work) > 1)
		atomic_dec(&shared->skip_work);
	else
		queue_work(system_wq, &shared->work);

and the remaining incrementation and decrementation of skip_work are replaced
with the corresponding atomic operations, it still should work, no?

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-07  1:28           ` Rafael J. Wysocki
@ 2015-12-07  7:50             ` Viresh Kumar
  2015-12-07 22:43               ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-07  7:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On 07-12-15, 02:28, Rafael J. Wysocki wrote:
> What about if that happens in parallel with the decrementation in
> dbs_work_handler()?
> 
> Is there anything preventing that from happening?

Hmmm, you are right. Following is required for that.

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index c9e420bd0eec..d8a89e653933 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -230,6 +230,7 @@ static void dbs_work_handler(struct work_struct *work)
        struct dbs_data *dbs_data;
        unsigned int sampling_rate, delay;
        bool eval_load;
+       unsigned long flags;
 
        policy = shared->policy;
        dbs_data = policy->governor_data;
@@ -257,7 +258,10 @@ static void dbs_work_handler(struct work_struct *work)
        delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
        mutex_unlock(&shared->timer_mutex);
 
+       spin_lock_irqsave(&shared->timer_lock, flags);
        shared->skip_work--;
+       spin_unlock_irqrestore(&shared->timer_lock, flags);
+
        gov_add_timers(policy, delay);
 }

> That aside, I think you could avoid using the spinlock altogether if the
> counter was atomic (and which would make the above irrelevant too).
> 
> Say, skip_work is atomic the the relevant code in dbs_timer_handler() is
> written as
> 
> 	atomic_inc(&shared->skip_work);
> 	smp_mb__after_atomic();
> 	if (atomic_read(&shared->skip_work) > 1)
> 		atomic_dec(&shared->skip_work);
> 	else

At this point we might end up decrementing skip_work from
gov_cancel_work() and then cancel the work which we haven't queued
yet. And the end result will be that the work is still queued while
gov_cancel_work() has finished.

And we have to keep the atomic operation, as well as queue_work()
within the lock.

> 		queue_work(system_wq, &shared->work);
> 
> and the remaining incrementation and decrementation of skip_work are replaced
> with the corresponding atomic operations, it still should work, no?

-- 
viresh

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-07  7:50             ` Viresh Kumar
@ 2015-12-07 22:43               ` Rafael J. Wysocki
  2015-12-07 23:17                 ` Rafael J. Wysocki
  2015-12-08  6:56                 ` Viresh Kumar
  0 siblings, 2 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-07 22:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On Monday, December 07, 2015 01:20:27 PM Viresh Kumar wrote:
> On 07-12-15, 02:28, Rafael J. Wysocki wrote:
> > What about if that happens in parallel with the decrementation in
> > dbs_work_handler()?
> > 
> > Is there anything preventing that from happening?
> 
> Hmmm, you are right. Following is required for that.
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index c9e420bd0eec..d8a89e653933 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -230,6 +230,7 @@ static void dbs_work_handler(struct work_struct *work)
>         struct dbs_data *dbs_data;
>         unsigned int sampling_rate, delay;
>         bool eval_load;
> +       unsigned long flags;
>  
>         policy = shared->policy;
>         dbs_data = policy->governor_data;
> @@ -257,7 +258,10 @@ static void dbs_work_handler(struct work_struct *work)
>         delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
>         mutex_unlock(&shared->timer_mutex);
>  
> +       spin_lock_irqsave(&shared->timer_lock, flags);
>         shared->skip_work--;
> +       spin_unlock_irqrestore(&shared->timer_lock, flags);
> +
>         gov_add_timers(policy, delay);
>  }

OK, so can you please send an updated patch with the above change folded in?

> > That aside, I think you could avoid using the spinlock altogether if the
> > counter was atomic (and which would make the above irrelevant too).
> > 
> > Say, skip_work is atomic the the relevant code in dbs_timer_handler() is
> > written as
> > 
> > 	atomic_inc(&shared->skip_work);
> > 	smp_mb__after_atomic();
> > 	if (atomic_read(&shared->skip_work) > 1)
> > 		atomic_dec(&shared->skip_work);
> > 	else
> 
> At this point we might end up decrementing skip_work from
> gov_cancel_work() and then cancel the work which we haven't queued
> yet. And the end result will be that the work is still queued while
> gov_cancel_work() has finished.

I'm not quite sure how that can happen.

There is a bug in this code snippet, but it may cause us to fail to queue
the work at all, so the incrementation and the check need to be done
under the spinlock.

> And we have to keep the atomic operation, as well as queue_work()
> within the lock.

Putting queue_work() under the lock doesn't prevent any races from happening,
because only one of the CPUs can execute that part of the function anyway.

> > 		queue_work(system_wq, &shared->work);
> > 
> > and the remaining incrementation and decrementation of skip_work are replaced
> > with the corresponding atomic operations, it still should work, no?

Well, no, the above wouldn't work.

But what about something like this instead:

	if (atomic_inc_return(&shared->skip_work) > 1)
		atomic_dec(&shared->skip_work);
	else
		queue_work(system_wq, &shared->work);

(plus the changes requisite replacements in the other places)?

Only one CPU can see the result of the atomic_inc_return() as 1 and this is the
only one that will queue up the work item, unless I'm missing anything super
subtle.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-07 22:43               ` Rafael J. Wysocki
@ 2015-12-07 23:17                 ` Rafael J. Wysocki
  2015-12-08  0:39                   ` [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization Rafael J. Wysocki
  2015-12-08  6:46                   ` [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
  2015-12-08  6:56                 ` Viresh Kumar
  1 sibling, 2 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-07 23:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, LKML

On Monday, December 07, 2015 11:43:50 PM Rafael J. Wysocki wrote:
> On Monday, December 07, 2015 01:20:27 PM Viresh Kumar wrote:
> > On 07-12-15, 02:28, Rafael J. Wysocki wrote:
> > > What about if that happens in parallel with the decrementation in
> > > dbs_work_handler()?
> > > 
> > > Is there anything preventing that from happening?
> > 
> > Hmmm, you are right. Following is required for that.
> > 
> > diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> > index c9e420bd0eec..d8a89e653933 100644
> > --- a/drivers/cpufreq/cpufreq_governor.c
> > +++ b/drivers/cpufreq/cpufreq_governor.c
> > @@ -230,6 +230,7 @@ static void dbs_work_handler(struct work_struct *work)
> >         struct dbs_data *dbs_data;
> >         unsigned int sampling_rate, delay;
> >         bool eval_load;
> > +       unsigned long flags;
> >  
> >         policy = shared->policy;
> >         dbs_data = policy->governor_data;
> > @@ -257,7 +258,10 @@ static void dbs_work_handler(struct work_struct *work)
> >         delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
> >         mutex_unlock(&shared->timer_mutex);
> >  
> > +       spin_lock_irqsave(&shared->timer_lock, flags);
> >         shared->skip_work--;
> > +       spin_unlock_irqrestore(&shared->timer_lock, flags);
> > +
> >         gov_add_timers(policy, delay);
> >  }
> 
> OK, so can you please send an updated patch with the above change folded in?

In fact, I've already folded the above changes into the $subject patch (but this
is an exception).

I'll send the "atomic" changes separately.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-07 23:17                 ` Rafael J. Wysocki
@ 2015-12-08  0:39                   ` Rafael J. Wysocki
  2015-12-08  6:59                     ` Viresh Kumar
  2015-12-08  6:46                   ` [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
  1 sibling, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08  0:39 UTC (permalink / raw)
  To: Viresh Kumar, linux-pm
  Cc: linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Use the observation that if skip_work in struct cpu_common_dbs_info
is an atomic_t variable, the code may be rearranged to avoid using
the timer_lock spinlock in which case that lock is not necessary any
more.

Make that change and drop timer_lock.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

This is on top of my current linux-next branch.  Completely experimental and
untested.

---
 drivers/cpufreq/cpufreq_governor.c |   29 +++++++----------------------
 drivers/cpufreq/cpufreq_governor.h |    9 ++-------
 2 files changed, 9 insertions(+), 29 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
+++ linux-pm/drivers/cpufreq/cpufreq_governor.c
@@ -186,22 +186,15 @@ static inline void gov_cancel_timers(str
 
 void gov_cancel_work(struct cpu_common_dbs_info *shared)
 {
-	unsigned long flags;
-
 	/*
 	 * No work will be queued from timer handlers after skip_work is
 	 * updated. And so we can safely cancel the work first and then the
 	 * timers.
 	 */
-	spin_lock_irqsave(&shared->timer_lock, flags);
-	shared->skip_work++;
-	spin_unlock_irqrestore(&shared->timer_lock, flags);
-
+	atomic_inc(&shared->skip_work);
 	cancel_work_sync(&shared->work);
-
 	gov_cancel_timers(shared->policy);
-
-	shared->skip_work = 0;
+	atomic_set(&shared->skip_work, 0);
 }
 
 /* Will return if we need to evaluate cpu load again or not */
@@ -229,7 +222,6 @@ static void dbs_work_handler(struct work
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	unsigned long flags;
 	bool eval_load;
 
 	policy = shared->policy;
@@ -258,9 +250,7 @@ static void dbs_work_handler(struct work
 	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
 	mutex_unlock(&shared->timer_mutex);
 
-	spin_lock_irqsave(&shared->timer_lock, flags);
-	shared->skip_work--;
-	spin_unlock_irqrestore(&shared->timer_lock, flags);
+	atomic_dec(&shared->skip_work);
 
 	gov_add_timers(policy, delay);
 }
@@ -269,9 +259,6 @@ static void dbs_timer_handler(unsigned l
 {
 	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
 	struct cpu_common_dbs_info *shared = cdbs->shared;
-	unsigned long flags;
-
-	spin_lock_irqsave(&shared->timer_lock, flags);
 
 	/*
 	 * Timer handler isn't allowed to queue work at the moment, because:
@@ -279,12 +266,10 @@ static void dbs_timer_handler(unsigned l
 	 * - We are stopping the governor
 	 * - Or we are updating the sampling rate of ondemand governor
 	 */
-	if (!shared->skip_work) {
-		shared->skip_work++;
+	if (atomic_inc_return(&shared->skip_work) > 1)
+		atomic_dec(&shared->skip_work);
+	else
 		queue_work(system_wq, &shared->work);
-	}
-
-	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -315,7 +300,7 @@ static int alloc_common_dbs_info(struct
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
-	spin_lock_init(&shared->timer_lock);
+	atomic_set(&shared->skip_work, 0);
 	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
Index: linux-pm/drivers/cpufreq/cpufreq_governor.h
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h
+++ linux-pm/drivers/cpufreq/cpufreq_governor.h
@@ -17,6 +17,7 @@
 #ifndef _CPUFREQ_GOVERNOR_H
 #define _CPUFREQ_GOVERNOR_H
 
+#include <linux/atomic.h>
 #include <linux/cpufreq.h>
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
@@ -137,14 +138,8 @@ struct cpu_common_dbs_info {
 	 */
 	struct mutex timer_mutex;
 
-	/*
-	 * Per policy lock that serializes access to queuing work from timer
-	 * handlers.
-	 */
-	spinlock_t timer_lock;
-
 	ktime_t time_stamp;
-	unsigned int skip_work;
+	atomic_t skip_work;
 	struct work_struct work;
 };
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-07 23:17                 ` Rafael J. Wysocki
  2015-12-08  0:39                   ` [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization Rafael J. Wysocki
@ 2015-12-08  6:46                   ` Viresh Kumar
  1 sibling, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08  6:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, LKML

On 08-12-15, 00:17, Rafael J. Wysocki wrote:
> In fact, I've already folded the above changes into the $subject patch (but this
> is an exception).

Yeah I know, I would have sent a patch this morning though.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-07 22:43               ` Rafael J. Wysocki
  2015-12-07 23:17                 ` Rafael J. Wysocki
@ 2015-12-08  6:56                 ` Viresh Kumar
  2015-12-08 13:18                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08  6:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On 07-12-15, 23:43, Rafael J. Wysocki wrote:
> On Monday, December 07, 2015 01:20:27 PM Viresh Kumar wrote:

> > At this point we might end up decrementing skip_work from
> > gov_cancel_work() and then cancel the work which we haven't queued
> > yet. And the end result will be that the work is still queued while
> > gov_cancel_work() has finished.
> 
> I'm not quite sure how that can happen.

I will describe that towards the end of this email.

> There is a bug in this code snippet, but it may cause us to fail to queue
> the work at all, so the incrementation and the check need to be done
> under the spinlock.

What bug ?

> > And we have to keep the atomic operation, as well as queue_work()
> > within the lock.
> 
> Putting queue_work() under the lock doesn't prevent any races from happening,

Then I am not able to think about it properly, but I will at least
present my case here :)

> because only one of the CPUs can execute that part of the function anyway.
> 
> > > 		queue_work(system_wq, &shared->work);
> > > 
> > > and the remaining incrementation and decrementation of skip_work are replaced
> > > with the corresponding atomic operations, it still should work, no?
> 
> Well, no, the above wouldn't work.
> 
> But what about something like this instead:
> 
> 	if (atomic_inc_return(&shared->skip_work) > 1)
> 		atomic_dec(&shared->skip_work);
> 	else
> 		queue_work(system_wq, &shared->work);
> 
> (plus the changes requisite replacements in the other places)?
> 
> Only one CPU can see the result of the atomic_inc_return() as 1 and this is the
> only one that will queue up the work item, unless I'm missing anything super
> subtle.

Looks like you are talking about the race between different timer
handlers, which race against queuing the work. Sorry if you are not.
But I am not talking about that thing..

Suppose queue_work() isn't done within the spin lock.

CPU0                                            CPU1

cpufreq_governor_stop()                         dbs_timer_handler()
-> gov_cancel_work()                            -> lock
                                                -> shared->skip_work++, as skip_work was 0. //skip_work=1
                                                -> unlock
   -> lock
   -> shared->skip_work++; //skip_work=2
   -> unlock
   -> cancel_work_sync(&shared->work);
                                                -> queue_work();
   -> gov_cancel_timers(shared->policy);
   -> shared->skip_work = 0;
                                                dbs_work_handler();



And according to how I understand it, we are screwed up at this point.
And its the same old bug which I fixed recently (which we hacked up by
using gov-lock earlier).

The work handler is still active after the policy-governor is stopped.

And your latest patch looks wrong for the same reason ...

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08  0:39                   ` [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization Rafael J. Wysocki
@ 2015-12-08  6:59                     ` Viresh Kumar
  2015-12-08 13:30                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08  6:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On 08-12-15, 01:39, Rafael J. Wysocki wrote:
> @@ -269,9 +259,6 @@ static void dbs_timer_handler(unsigned l
>  {
>  	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
>  	struct cpu_common_dbs_info *shared = cdbs->shared;
> -	unsigned long flags;
> -
> -	spin_lock_irqsave(&shared->timer_lock, flags);
>  
>  	/*
>  	 * Timer handler isn't allowed to queue work at the moment, because:
> @@ -279,12 +266,10 @@ static void dbs_timer_handler(unsigned l
>  	 * - We are stopping the governor
>  	 * - Or we are updating the sampling rate of ondemand governor
>  	 */
> -	if (!shared->skip_work) {
> -		shared->skip_work++;
> +	if (atomic_inc_return(&shared->skip_work) > 1)
> +		atomic_dec(&shared->skip_work);
> +	else
>  		queue_work(system_wq, &shared->work);
> -	}

As explained in the other email, this is wrong..

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-08  6:56                 ` Viresh Kumar
@ 2015-12-08 13:18                   ` Rafael J. Wysocki
  2015-12-08 13:30                     ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08 13:18 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On Tuesday, December 08, 2015 12:26:22 PM Viresh Kumar wrote:
> On 07-12-15, 23:43, Rafael J. Wysocki wrote:
> > On Monday, December 07, 2015 01:20:27 PM Viresh Kumar wrote:
> 
> > > At this point we might end up decrementing skip_work from
> > > gov_cancel_work() and then cancel the work which we haven't queued
> > > yet. And the end result will be that the work is still queued while
> > > gov_cancel_work() has finished.
> > 
> > I'm not quite sure how that can happen.
> 
> I will describe that towards the end of this email.
> 
> > There is a bug in this code snippet, but it may cause us to fail to queue
> > the work at all, so the incrementation and the check need to be done
> > under the spinlock.
> 
> What bug ?

Well, if the timer function runs on all CPUs at the same time, they all
can see skip_work > 1 and none of them will queue the work.

> > > And we have to keep the atomic operation, as well as queue_work()
> > > within the lock.
> > 
> > Putting queue_work() under the lock doesn't prevent any races from happening,
> 
> Then I am not able to think about it properly, but I will at least
> present my case here :)
> 
> > because only one of the CPUs can execute that part of the function anyway.
> > 
> > > > 		queue_work(system_wq, &shared->work);
> > > > 
> > > > and the remaining incrementation and decrementation of skip_work are replaced
> > > > with the corresponding atomic operations, it still should work, no?
> > 
> > Well, no, the above wouldn't work.
> > 
> > But what about something like this instead:
> > 
> > 	if (atomic_inc_return(&shared->skip_work) > 1)
> > 		atomic_dec(&shared->skip_work);
> > 	else
> > 		queue_work(system_wq, &shared->work);
> > 
> > (plus the changes requisite replacements in the other places)?
> > 
> > Only one CPU can see the result of the atomic_inc_return() as 1 and this is the
> > only one that will queue up the work item, unless I'm missing anything super
> > subtle.
> 
> Looks like you are talking about the race between different timer
> handlers, which race against queuing the work. Sorry if you are not.
> But I am not talking about that thing..
> 
> Suppose queue_work() isn't done within the spin lock.
> 
> CPU0                                            CPU1
> 
> cpufreq_governor_stop()                         dbs_timer_handler()
> -> gov_cancel_work()                            -> lock
>                                                 -> shared->skip_work++, as skip_work was 0. //skip_work=1
>                                                 -> unlock
>    -> lock
>    -> shared->skip_work++; //skip_work=2
>    -> unlock
>    -> cancel_work_sync(&shared->work);
>                                                 -> queue_work();
>    -> gov_cancel_timers(shared->policy);
>    -> shared->skip_work = 0;
>                                                 dbs_work_handler();
> 
> 
> 
> And according to how I understand it, we are screwed up at this point.
> And its the same old bug which I fixed recently (which we hacked up by
> using gov-lock earlier).

You are right, I've overlooked that race (but then it is rather easy to
overlook).

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-08 13:18                   ` Rafael J. Wysocki
@ 2015-12-08 13:30                     ` Viresh Kumar
  2015-12-08 14:04                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08 13:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On 08-12-15, 14:18, Rafael J. Wysocki wrote:
> Well, if the timer function runs on all CPUs at the same time, they all
> can see skip_work > 1 and none of them will queue the work.

You are talking about code after my patch, right?

Will will all of them see it > 1? At least one of them will see it 0
and queue the work, unless the governor is stopped completely.

> You are right, I've overlooked that race (but then it is rather easy to
> overlook).

Yeah, we (at least I) took a long time to understand that this was the
real problem we always had and so fixed it recently.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08  6:59                     ` Viresh Kumar
@ 2015-12-08 13:30                       ` Rafael J. Wysocki
  2015-12-08 13:36                         ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08 13:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On Tuesday, December 08, 2015 12:29:05 PM Viresh Kumar wrote:
> On 08-12-15, 01:39, Rafael J. Wysocki wrote:
> > @@ -269,9 +259,6 @@ static void dbs_timer_handler(unsigned l
> >  {
> >  	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
> >  	struct cpu_common_dbs_info *shared = cdbs->shared;
> > -	unsigned long flags;
> > -
> > -	spin_lock_irqsave(&shared->timer_lock, flags);
> >  
> >  	/*
> >  	 * Timer handler isn't allowed to queue work at the moment, because:
> > @@ -279,12 +266,10 @@ static void dbs_timer_handler(unsigned l
> >  	 * - We are stopping the governor
> >  	 * - Or we are updating the sampling rate of ondemand governor
> >  	 */
> > -	if (!shared->skip_work) {
> > -		shared->skip_work++;
> > +	if (atomic_inc_return(&shared->skip_work) > 1)
> > +		atomic_dec(&shared->skip_work);
> > +	else
> >  		queue_work(system_wq, &shared->work);
> > -	}
> 
> As explained in the other email, this is wrong..

OK, but instead of relying on the spinlock to wait for the already running
dbs_timer_handler() in gov_cancel_work() (which is really easy to overlook
and should at least be mentioned in a comment) we can wait for it explicitly.

That is, if the relevant code in gov_cancel_work() is like this:


	atomic_inc(&shared->skip_work);
	gov_cancel_timers(shared->policy);
	cancel_work_sync(&shared->work);
	gov_cancel_timers(shared->policy);
	atomic_set(&shared->skip_work, 0);

then the work item should not be leaked behind the cancel_work_sync() any more
AFAICS.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 13:30                       ` Rafael J. Wysocki
@ 2015-12-08 13:36                         ` Viresh Kumar
  2015-12-08 14:19                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08 13:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On 08-12-15, 14:30, Rafael J. Wysocki wrote:
> OK, but instead of relying on the spinlock to wait for the already running

That's the purpose of the spinlock, not a side-effect.

> dbs_timer_handler() in gov_cancel_work() (which is really easy to overlook
> and should at least be mentioned in a comment) we can wait for it explicitly.

I agree, and I will add explicit comment about it.

> That is, if the relevant code in gov_cancel_work() is like this:
> 
> 
> 	atomic_inc(&shared->skip_work);
> 	gov_cancel_timers(shared->policy);
> 	cancel_work_sync(&shared->work);
> 	gov_cancel_timers(shared->policy);

Apart from it being *really* ugly (we should know exactly what should
be done, it rather looks like hit and try), it is still racy.

> 	atomic_set(&shared->skip_work, 0);
> 
> then the work item should not be leaked behind the cancel_work_sync() any more
> AFAICS.

Suppose queue_work() isn't done within the spin lock.

CPU0                                            CPU1

cpufreq_governor_stop()                         dbs_timer_handler()
-> gov_cancel_work()                            -> lock
                                                -> shared->skip_work++, as skip_work was 0. //skip_work=1
                                                -> unlock
   -> lock
   -> shared->skip_work++; //skip_work=2
   -> unlock
                                                -> queue_work();
   -> gov_cancel_timers(shared->policy);
                                                dbs_work_handler();
                                                -> queue-timers again (as we aren't checking skip_work here)
   -> cancel_work_sync(&shared->work);
                                                dbs_timer_handler()
                                                -> lock
                                                -> shared->skip_work++, as skip_work was 0. //skip_work=1
                                                -> unlock
                                                ->queue_work()
   -> gov_cancel_timers(shared->policy);
   -> shared->skip_work = 0;


And we have the same situation again. I have thought of all this
before I wrote the initial patch, and really tried the ugly double
timer-cancel thing. But the current approach is really the right thing
to do.

I will send a patch adding the comment.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 14:19                           ` Rafael J. Wysocki
@ 2015-12-08 13:55                             ` Viresh Kumar
  2015-12-08 14:30                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08 13:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On 08-12-15, 15:19, Rafael J. Wysocki wrote:
> We know what should be done.  We need to wait for the timer function to
> complete, then cancel the work item spawned by it (if any) and then
> cancel the timers set by that work item.

Yeah, there is no race, but it looks ugly to me. I have written it
earlier, and then the spinlock thing looked better to me. :)

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-08 13:30                     ` Viresh Kumar
@ 2015-12-08 14:04                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08 14:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, ashwin.chaugule, Rafael J. Wysocki, open list

On Tuesday, December 08, 2015 07:00:36 PM Viresh Kumar wrote:
> On 08-12-15, 14:18, Rafael J. Wysocki wrote:
> > Well, if the timer function runs on all CPUs at the same time, they all
> > can see skip_work > 1 and none of them will queue the work.
> 
> You are talking about code after my patch, right?

No, I was talking about my first attempt at using the atomic variable. :-)

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 13:36                         ` Viresh Kumar
@ 2015-12-08 14:19                           ` Rafael J. Wysocki
  2015-12-08 13:55                             ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08 14:19 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On Tuesday, December 08, 2015 07:06:33 PM Viresh Kumar wrote:
> On 08-12-15, 14:30, Rafael J. Wysocki wrote:
> > OK, but instead of relying on the spinlock to wait for the already running
> 
> That's the purpose of the spinlock, not a side-effect.
> 
> > dbs_timer_handler() in gov_cancel_work() (which is really easy to overlook
> > and should at least be mentioned in a comment) we can wait for it explicitly.
> 
> I agree, and I will add explicit comment about it.
> 
> > That is, if the relevant code in gov_cancel_work() is like this:
> > 
> > 
> > 	atomic_inc(&shared->skip_work);
> > 	gov_cancel_timers(shared->policy);
> > 	cancel_work_sync(&shared->work);
> > 	gov_cancel_timers(shared->policy);
> 
> Apart from it being *really* ugly (we should know exactly what should
> be done, it rather looks like hit and try),

We know what should be done.  We need to wait for the timer function to
complete, then cancel the work item spawned by it (if any) and then
cancel the timers set by that work item.

> it is still racy.
> 
> > 	atomic_set(&shared->skip_work, 0);
> > 
> > then the work item should not be leaked behind the cancel_work_sync() any more
> > AFAICS.
> 
> Suppose queue_work() isn't done within the spin lock.
> 
> CPU0                                            CPU1
> 
> cpufreq_governor_stop()                         dbs_timer_handler()
> -> gov_cancel_work()                            -> lock
>                                                 -> shared->skip_work++, as skip_work was 0. //skip_work=1
>                                                 -> unlock
>    -> lock
>    -> shared->skip_work++; //skip_work=2

(*)

>    -> unlock
>                                                 -> queue_work();
>    -> gov_cancel_timers(shared->policy);
>                                                 dbs_work_handler();
>                                                 -> queue-timers again (as we aren't checking skip_work here)

skip_work = 1 (because dbs_work_handler() decrements it).

>    -> cancel_work_sync(&shared->work);
>                                                 dbs_timer_handler()
>                                                 -> lock
>                                                 -> shared->skip_work++, as skip_work was 0.

No, it wasn't 0, it was 1, because (*) incremented it
and it has only been decremented once by dbs_work_handler().

> //skip_work=1
>                                                 -> unlock

And the below won't happen.

>                                                 ->queue_work()
>
>    -> gov_cancel_timers(shared->policy);
>    -> shared->skip_work = 0;
> 
> 
> And we have the same situation again.

I don't really think so.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 13:55                             ` Viresh Kumar
@ 2015-12-08 14:30                               ` Rafael J. Wysocki
  2015-12-08 14:56                                 ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08 14:30 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On Tuesday, December 08, 2015 07:25:18 PM Viresh Kumar wrote:
> On 08-12-15, 15:19, Rafael J. Wysocki wrote:
> > We know what should be done.  We need to wait for the timer function to
> > complete, then cancel the work item spawned by it (if any) and then
> > cancel the timers set by that work item.
> 
> Yeah, there is no race, but it looks ugly to me. I have written it
> earlier, and then the spinlock thing looked better to me. :)

It doesn't look nice, but then having a lockless timer function is worth
it in my view.

The code in gov_cancel_work() runs relatively rarely, but the timer
function can run very often, so avoiding the lock in there is a priority
to me.

Plus we can avoid disabling interrupts in two places this way.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 14:30                               ` Rafael J. Wysocki
@ 2015-12-08 14:56                                 ` Viresh Kumar
  2015-12-08 16:42                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08 14:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On 08-12-15, 15:30, Rafael J. Wysocki wrote:
> It doesn't look nice, but then having a lockless timer function is worth
> it in my view.
> 
> The code in gov_cancel_work() runs relatively rarely, but the timer
> function can run very often, so avoiding the lock in there is a priority
> to me.
> 
> Plus we can avoid disabling interrupts in two places this way.

Okay, that's good enough then. I hope you will be sending these
patches now, right? And ofcourse, we need documentation in this case
as well.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 16:42                                   ` Rafael J. Wysocki
@ 2015-12-08 16:34                                     ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-08 16:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On 08-12-15, 17:42, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] cpufreq: governor: Use lockless timer function
> 
> It is possible to get rid of the timer_lock spinlock used by the
> governor timer function for synchronization, but a couple of races
> need to be avoided.
> 
> The first race is between multiple dbs_timer_handler() instances
> that may be running in parallel with each other on different
> CPUs.  Namely, one of them has to queue up the work item, but it
> cannot be queued up more than once.  To achieve that,
> atomic_inc_return() can be used on the skip_work field of
> struct cpu_common_dbs_info.
> 
> The second race is between an already running dbs_timer_handler()
> and gov_cancel_work().  In that case the dbs_timer_handler() might
> not notice the skip_work incrementation in gov_cancel_work() and
> it might queue up its work item after gov_cancel_work() had
> returned (and that work item would corrupt skip_work going
> forward).  To prevent that from happening, gov_cancel_work()
> can be made wait for the timer function to complete (on all CPUs)
> right after skip_work has been incremented.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/cpufreq_governor.c |   49 ++++++++++++++++---------------------
>  drivers/cpufreq/cpufreq_governor.h |    9 +-----
>  2 files changed, 24 insertions(+), 34 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization
  2015-12-08 14:56                                 ` Viresh Kumar
@ 2015-12-08 16:42                                   ` Rafael J. Wysocki
  2015-12-08 16:34                                     ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-08 16:42 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linux-pm, linaro-kernel, ashwin.chaugule, Rafael J. Wysocki, LKML

On Tuesday, December 08, 2015 08:26:58 PM Viresh Kumar wrote:
> On 08-12-15, 15:30, Rafael J. Wysocki wrote:
> > It doesn't look nice, but then having a lockless timer function is worth
> > it in my view.
> > 
> > The code in gov_cancel_work() runs relatively rarely, but the timer
> > function can run very often, so avoiding the lock in there is a priority
> > to me.
> > 
> > Plus we can avoid disabling interrupts in two places this way.
> 
> Okay, that's good enough then. I hope you will be sending these
> patches now, right? And ofcourse, we need documentation in this case
> as well.

Your series is in my linux-next branch now, so that's just one patch on top
of it.  The current version of it is appended.  Unfortunately, I can't test
it here, but I'll do that later today.

I have updated the comments too, so please let me know if they are clear enough.

Thanks,
Rafael


---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] cpufreq: governor: Use lockless timer function

It is possible to get rid of the timer_lock spinlock used by the
governor timer function for synchronization, but a couple of races
need to be avoided.

The first race is between multiple dbs_timer_handler() instances
that may be running in parallel with each other on different
CPUs.  Namely, one of them has to queue up the work item, but it
cannot be queued up more than once.  To achieve that,
atomic_inc_return() can be used on the skip_work field of
struct cpu_common_dbs_info.

The second race is between an already running dbs_timer_handler()
and gov_cancel_work().  In that case the dbs_timer_handler() might
not notice the skip_work incrementation in gov_cancel_work() and
it might queue up its work item after gov_cancel_work() had
returned (and that work item would corrupt skip_work going
forward).  To prevent that from happening, gov_cancel_work()
can be made wait for the timer function to complete (on all CPUs)
right after skip_work has been incremented.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/cpufreq_governor.c |   49 ++++++++++++++++---------------------
 drivers/cpufreq/cpufreq_governor.h |    9 +-----
 2 files changed, 24 insertions(+), 34 deletions(-)

Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.c
+++ linux-pm/drivers/cpufreq/cpufreq_governor.c
@@ -186,22 +186,24 @@ static inline void gov_cancel_timers(str
 
 void gov_cancel_work(struct cpu_common_dbs_info *shared)
 {
-	unsigned long flags;
-
+	/* Tell dbs_timer_handler() to skip queuing up work items. */
+	atomic_inc(&shared->skip_work);
 	/*
-	 * No work will be queued from timer handlers after skip_work is
-	 * updated. And so we can safely cancel the work first and then the
-	 * timers.
+	 * If dbs_timer_handler() is already running, it may not notice the
+	 * incremented skip_work, so wait for it to complete to prevent its work
+	 * item from being queued up after the cancel_work_sync() below.
+	 */
+	gov_cancel_timers(shared->policy);
+	/*
+	 * In case dbs_timer_handler() managed to run and spawn a work item
+	 * before the timers have been canceled, wait for that work item to
+	 * complete and then cancel all of the timers set up by it.  If
+	 * dbs_timer_handler() runs again at that point, it will see the
+	 * positive value of skip_work and won't spawn any more work items.
 	 */
-	spin_lock_irqsave(&shared->timer_lock, flags);
-	shared->skip_work++;
-	spin_unlock_irqrestore(&shared->timer_lock, flags);
-
 	cancel_work_sync(&shared->work);
-
 	gov_cancel_timers(shared->policy);
-
-	shared->skip_work = 0;
+	atomic_set(&shared->skip_work, 0);
 }
 
 /* Will return if we need to evaluate cpu load again or not */
@@ -229,7 +231,6 @@ static void dbs_work_handler(struct work
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	unsigned long flags;
 	bool eval_load;
 
 	policy = shared->policy;
@@ -258,9 +259,7 @@ static void dbs_work_handler(struct work
 	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
 	mutex_unlock(&shared->timer_mutex);
 
-	spin_lock_irqsave(&shared->timer_lock, flags);
-	shared->skip_work--;
-	spin_unlock_irqrestore(&shared->timer_lock, flags);
+	atomic_dec(&shared->skip_work);
 
 	gov_add_timers(policy, delay);
 }
@@ -269,22 +268,18 @@ static void dbs_timer_handler(unsigned l
 {
 	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
 	struct cpu_common_dbs_info *shared = cdbs->shared;
-	unsigned long flags;
-
-	spin_lock_irqsave(&shared->timer_lock, flags);
 
 	/*
-	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * Timer handler may not be allowed to queue the work at the moment,
+	 * because:
 	 * - Another timer handler has done that
 	 * - We are stopping the governor
-	 * - Or we are updating the sampling rate of ondemand governor
+	 * - Or we are updating the sampling rate of the ondemand governor
 	 */
-	if (!shared->skip_work) {
-		shared->skip_work++;
+	if (atomic_inc_return(&shared->skip_work) > 1)
+		atomic_dec(&shared->skip_work);
+	else
 		queue_work(system_wq, &shared->work);
-	}
-
-	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -315,7 +310,7 @@ static int alloc_common_dbs_info(struct
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
-	spin_lock_init(&shared->timer_lock);
+	atomic_set(&shared->skip_work, 0);
 	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
Index: linux-pm/drivers/cpufreq/cpufreq_governor.h
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_governor.h
+++ linux-pm/drivers/cpufreq/cpufreq_governor.h
@@ -17,6 +17,7 @@
 #ifndef _CPUFREQ_GOVERNOR_H
 #define _CPUFREQ_GOVERNOR_H
 
+#include <linux/atomic.h>
 #include <linux/cpufreq.h>
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
@@ -137,14 +138,8 @@ struct cpu_common_dbs_info {
 	 */
 	struct mutex timer_mutex;
 
-	/*
-	 * Per policy lock that serializes access to queuing work from timer
-	 * handlers.
-	 */
-	spinlock_t timer_lock;
-
 	ktime_t time_stamp;
-	unsigned int skip_work;
+	atomic_t skip_work;
 	struct work_struct work;
 };
 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH V4 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-04  6:13     ` Viresh Kumar
@ 2015-12-09  2:04       ` Viresh Kumar
  -1 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-09  2:04 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, sfr, Viresh Kumar, Rafael J. Wysocki,
	Ashwin Chaugule, open list

cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
V4: Add the missing EXPORT_SYMBOL_GPL(gov_cancel_work).

 drivers/cpufreq/cpufreq_governor.c | 142 ++++++++++++++++++++++---------------
 drivers/cpufreq/cpufreq_governor.h |  20 ++++--
 drivers/cpufreq/cpufreq_ondemand.c |   8 +--
 3 files changed, 100 insertions(+), 70 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 999e1f6addf9..2d61eae5cc5d 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -158,47 +158,53 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
-		unsigned int delay)
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
 {
-	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
-
-	mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
-
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus)
-{
-	int i;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpu_dbs_info *cdbs;
+	int cpu;
 
-	if (!all_cpus) {
-		/*
-		 * Use raw_smp_processor_id() to avoid preemptible warnings.
-		 * We know that this is only called with all_cpus == false from
-		 * works that have been queued with *_work_on() functions and
-		 * those works are canceled during CPU_DOWN_PREPARE so they
-		 * can't possibly run on any other CPU.
-		 */
-		__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
-	} else {
-		for_each_cpu(i, policy->cpus)
-			__gov_queue_work(i, dbs_data, delay);
+	for_each_cpu(cpu, policy->cpus) {
+		cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+		cdbs->timer.expires = jiffies + delay;
+		add_timer_on(&cdbs->timer, cpu);
 	}
 }
-EXPORT_SYMBOL_GPL(gov_queue_work);
+EXPORT_SYMBOL_GPL(gov_add_timers);
 
-static inline void gov_cancel_work(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy)
+static inline void gov_cancel_timers(struct cpufreq_policy *policy)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cpu_dbs_info *cdbs;
 	int i;
 
 	for_each_cpu(i, policy->cpus) {
 		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
-		cancel_delayed_work_sync(&cdbs->dwork);
+		del_timer_sync(&cdbs->timer);
 	}
 }
 
+void gov_cancel_work(struct cpu_common_dbs_info *shared)
+{
+	unsigned long flags;
+
+	/*
+	 * No work will be queued from timer handlers after skip_work is
+	 * updated. And so we can safely cancel the work first and then the
+	 * timers.
+	 */
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work++;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	cancel_work_sync(&shared->work);
+
+	gov_cancel_timers(shared->policy);
+
+	shared->skip_work = 0;
+}
+EXPORT_SYMBOL_GPL(gov_cancel_work);
+
 /* Will return if we need to evaluate cpu load again or not */
 static bool need_load_eval(struct cpu_common_dbs_info *shared,
 			   unsigned int sampling_rate)
@@ -217,29 +223,22 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
 	return true;
 }
 
-static void dbs_timer(struct work_struct *work)
+static void dbs_work_handler(struct work_struct *work)
 {
-	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
-						 dwork.work);
-	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpu_common_dbs_info *shared = container_of(work, struct
+					cpu_common_dbs_info, work);
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	bool modify_all = true;
-
-	mutex_lock(&shared->timer_mutex);
+	unsigned long flags;
+	bool eval_load;
 
 	policy = shared->policy;
-
-	/*
-	 * Governor might already be disabled and there is no point continuing
-	 * with the work-handler.
-	 */
-	if (!policy)
-		goto unlock;
-
 	dbs_data = policy->governor_data;
 
+	/* Kill all timers */
+	gov_cancel_timers(policy);
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -250,14 +249,43 @@ static void dbs_timer(struct work_struct *work)
 		sampling_rate = od_tuners->sampling_rate;
 	}
 
-	if (!need_load_eval(cdbs->shared, sampling_rate))
-		modify_all = false;
+	eval_load = need_load_eval(shared, sampling_rate);
 
-	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
-	gov_queue_work(dbs_data, policy, delay, modify_all);
-
-unlock:
+	/*
+	 * Make sure cpufreq_governor_limits() isn't evaluating load in
+	 * parallel.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
 	mutex_unlock(&shared->timer_mutex);
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work--;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	gov_add_timers(policy, delay);
+}
+
+static void dbs_timer_handler(unsigned long data)
+{
+	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
+	struct cpu_common_dbs_info *shared = cdbs->shared;
+	unsigned long flags;
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+
+	/*
+	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * - Another timer handler has done that
+	 * - We are stopping the governor
+	 * - Or we are updating the sampling rate of ondemand governor
+	 */
+	if (!shared->skip_work) {
+		shared->skip_work++;
+		queue_work(system_wq, &shared->work);
+	}
+
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -288,6 +316,8 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
+	spin_lock_init(&shared->timer_lock);
+	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
 
@@ -452,7 +482,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-		INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
+		__setup_timer(&j_cdbs->timer, dbs_timer_handler,
+			      (unsigned long)j_cdbs,
+			      TIMER_DEFERRABLE | TIMER_IRQSAFE);
 	}
 
 	if (cdata->governor == GOV_CONSERVATIVE) {
@@ -470,8 +502,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
-		       true);
+	gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
 	return 0;
 }
 
@@ -485,16 +516,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
-	/*
-	 * Work-handler must see this updated, as it should not proceed any
-	 * further after governor is disabled. And so timer_mutex is taken while
-	 * updating this value.
-	 */
-	mutex_lock(&shared->timer_mutex);
+	gov_cancel_work(shared);
 	shared->policy = NULL;
-	mutex_unlock(&shared->timer_mutex);
 
-	gov_cancel_work(dbs_data, policy);
 	return 0;
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 0c7589016b6c..76742902491e 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -132,12 +132,20 @@ static void *get_cpu_dbs_info_s(int cpu)				\
 struct cpu_common_dbs_info {
 	struct cpufreq_policy *policy;
 	/*
-	 * percpu mutex that serializes governor limit change with dbs_timer
-	 * invocation. We do not want dbs_timer to run when user is changing
-	 * the governor or limits.
+	 * Per policy mutex that serializes load evaluation from limit-change
+	 * and work-handler.
 	 */
 	struct mutex timer_mutex;
+
+	/*
+	 * Per policy lock that serializes access to queuing work from timer
+	 * handlers.
+	 */
+	spinlock_t timer_lock;
+
 	ktime_t time_stamp;
+	unsigned int skip_work;
+	struct work_struct work;
 };
 
 /* Per cpu structures */
@@ -152,7 +160,7 @@ struct cpu_dbs_info {
 	 * wake-up from idle.
 	 */
 	unsigned int prev_load;
-	struct delayed_work dwork;
+	struct timer_list timer;
 	struct cpu_common_dbs_info *shared;
 };
 
@@ -268,11 +276,11 @@ static ssize_t show_sampling_rate_min_gov_pol				\
 
 extern struct mutex cpufreq_governor_lock;
 
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
+void gov_cancel_work(struct cpu_common_dbs_info *shared);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		struct common_dbs_data *cdata, unsigned int event);
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus);
 void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index fc0384b4d02d..f879012cf849 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,13 +286,11 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 			continue;
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
+		appointed_at = dbs_info->cdbs.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
+			gov_cancel_work(shared);
+			gov_add_timers(policy, usecs_to_jiffies(new_rate));
 
 		}
 	}
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH V4 5/6] cpufreq: governor: replace per-cpu delayed work with timers
@ 2015-12-09  2:04       ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-09  2:04 UTC (permalink / raw)
  To: Rafael Wysocki
  Cc: linaro-kernel, linux-pm, sfr, Viresh Kumar, Rafael J. Wysocki,
	Ashwin Chaugule, open list

cpufreq governors evaluate load at sampling rate and based on that they
update frequency for a group of CPUs belonging to the same cpufreq
policy.

This is required to be done in a single thread for all policy->cpus, but
because we don't want to wakeup idle CPUs to do just that, we use
deferrable work for this. If we would have used a single delayed
deferrable work for the entire policy, there were chances that the CPU
required to run the handler can be in idle and we might end up not
changing the frequency for the entire group with load variations.

And so we were forced to keep per-cpu works, and only the one that
expires first need to do the real work and others are rescheduled for
next sampling time.

We have been using the more complex solution until now, where we used a
delayed deferrable work for this, which is a combination of a timer and
a work.

This could be made lightweight by keeping per-cpu deferred timers with a
single work item, which is scheduled by the first timer that expires.

This patch does just that and here are important changes:
- The timer handler will run in irq context and so we need to use a
  spin_lock instead of the timer_mutex. And so a separate timer_lock is
  created. This also makes the use of the mutex and lock quite clear, as
  we know what exactly they are protecting.
- A new field 'skip_work' is added to track when the timer handlers can
  queue a work. More comments present in code.

Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>
---
V4: Add the missing EXPORT_SYMBOL_GPL(gov_cancel_work).

 drivers/cpufreq/cpufreq_governor.c | 142 ++++++++++++++++++++++---------------
 drivers/cpufreq/cpufreq_governor.h |  20 ++++--
 drivers/cpufreq/cpufreq_ondemand.c |   8 +--
 3 files changed, 100 insertions(+), 70 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 999e1f6addf9..2d61eae5cc5d 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -158,47 +158,53 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
-static inline void __gov_queue_work(int cpu, struct dbs_data *dbs_data,
-		unsigned int delay)
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay)
 {
-	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
-
-	mod_delayed_work_on(cpu, system_wq, &cdbs->dwork, delay);
-}
-
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus)
-{
-	int i;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cpu_dbs_info *cdbs;
+	int cpu;
 
-	if (!all_cpus) {
-		/*
-		 * Use raw_smp_processor_id() to avoid preemptible warnings.
-		 * We know that this is only called with all_cpus == false from
-		 * works that have been queued with *_work_on() functions and
-		 * those works are canceled during CPU_DOWN_PREPARE so they
-		 * can't possibly run on any other CPU.
-		 */
-		__gov_queue_work(raw_smp_processor_id(), dbs_data, delay);
-	} else {
-		for_each_cpu(i, policy->cpus)
-			__gov_queue_work(i, dbs_data, delay);
+	for_each_cpu(cpu, policy->cpus) {
+		cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+		cdbs->timer.expires = jiffies + delay;
+		add_timer_on(&cdbs->timer, cpu);
 	}
 }
-EXPORT_SYMBOL_GPL(gov_queue_work);
+EXPORT_SYMBOL_GPL(gov_add_timers);
 
-static inline void gov_cancel_work(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy)
+static inline void gov_cancel_timers(struct cpufreq_policy *policy)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct cpu_dbs_info *cdbs;
 	int i;
 
 	for_each_cpu(i, policy->cpus) {
 		cdbs = dbs_data->cdata->get_cpu_cdbs(i);
-		cancel_delayed_work_sync(&cdbs->dwork);
+		del_timer_sync(&cdbs->timer);
 	}
 }
 
+void gov_cancel_work(struct cpu_common_dbs_info *shared)
+{
+	unsigned long flags;
+
+	/*
+	 * No work will be queued from timer handlers after skip_work is
+	 * updated. And so we can safely cancel the work first and then the
+	 * timers.
+	 */
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work++;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	cancel_work_sync(&shared->work);
+
+	gov_cancel_timers(shared->policy);
+
+	shared->skip_work = 0;
+}
+EXPORT_SYMBOL_GPL(gov_cancel_work);
+
 /* Will return if we need to evaluate cpu load again or not */
 static bool need_load_eval(struct cpu_common_dbs_info *shared,
 			   unsigned int sampling_rate)
@@ -217,29 +223,22 @@ static bool need_load_eval(struct cpu_common_dbs_info *shared,
 	return true;
 }
 
-static void dbs_timer(struct work_struct *work)
+static void dbs_work_handler(struct work_struct *work)
 {
-	struct cpu_dbs_info *cdbs = container_of(work, struct cpu_dbs_info,
-						 dwork.work);
-	struct cpu_common_dbs_info *shared = cdbs->shared;
+	struct cpu_common_dbs_info *shared = container_of(work, struct
+					cpu_common_dbs_info, work);
 	struct cpufreq_policy *policy;
 	struct dbs_data *dbs_data;
 	unsigned int sampling_rate, delay;
-	bool modify_all = true;
-
-	mutex_lock(&shared->timer_mutex);
+	unsigned long flags;
+	bool eval_load;
 
 	policy = shared->policy;
-
-	/*
-	 * Governor might already be disabled and there is no point continuing
-	 * with the work-handler.
-	 */
-	if (!policy)
-		goto unlock;
-
 	dbs_data = policy->governor_data;
 
+	/* Kill all timers */
+	gov_cancel_timers(policy);
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
@@ -250,14 +249,43 @@ static void dbs_timer(struct work_struct *work)
 		sampling_rate = od_tuners->sampling_rate;
 	}
 
-	if (!need_load_eval(cdbs->shared, sampling_rate))
-		modify_all = false;
+	eval_load = need_load_eval(shared, sampling_rate);
 
-	delay = dbs_data->cdata->gov_dbs_timer(policy, modify_all);
-	gov_queue_work(dbs_data, policy, delay, modify_all);
-
-unlock:
+	/*
+	 * Make sure cpufreq_governor_limits() isn't evaluating load in
+	 * parallel.
+	 */
+	mutex_lock(&shared->timer_mutex);
+	delay = dbs_data->cdata->gov_dbs_timer(policy, eval_load);
 	mutex_unlock(&shared->timer_mutex);
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+	shared->skip_work--;
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
+
+	gov_add_timers(policy, delay);
+}
+
+static void dbs_timer_handler(unsigned long data)
+{
+	struct cpu_dbs_info *cdbs = (struct cpu_dbs_info *)data;
+	struct cpu_common_dbs_info *shared = cdbs->shared;
+	unsigned long flags;
+
+	spin_lock_irqsave(&shared->timer_lock, flags);
+
+	/*
+	 * Timer handler isn't allowed to queue work at the moment, because:
+	 * - Another timer handler has done that
+	 * - We are stopping the governor
+	 * - Or we are updating the sampling rate of ondemand governor
+	 */
+	if (!shared->skip_work) {
+		shared->skip_work++;
+		queue_work(system_wq, &shared->work);
+	}
+
+	spin_unlock_irqrestore(&shared->timer_lock, flags);
 }
 
 static void set_sampling_rate(struct dbs_data *dbs_data,
@@ -288,6 +316,8 @@ static int alloc_common_dbs_info(struct cpufreq_policy *policy,
 		cdata->get_cpu_cdbs(j)->shared = shared;
 
 	mutex_init(&shared->timer_mutex);
+	spin_lock_init(&shared->timer_lock);
+	INIT_WORK(&shared->work, dbs_work_handler);
 	return 0;
 }
 
@@ -452,7 +482,9 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		if (ignore_nice)
 			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-		INIT_DEFERRABLE_WORK(&j_cdbs->dwork, dbs_timer);
+		__setup_timer(&j_cdbs->timer, dbs_timer_handler,
+			      (unsigned long)j_cdbs,
+			      TIMER_DEFERRABLE | TIMER_IRQSAFE);
 	}
 
 	if (cdata->governor == GOV_CONSERVATIVE) {
@@ -470,8 +502,7 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
-		       true);
+	gov_add_timers(policy, delay_for_sampling_rate(sampling_rate));
 	return 0;
 }
 
@@ -485,16 +516,9 @@ static int cpufreq_governor_stop(struct cpufreq_policy *policy,
 	if (!shared || !shared->policy)
 		return -EBUSY;
 
-	/*
-	 * Work-handler must see this updated, as it should not proceed any
-	 * further after governor is disabled. And so timer_mutex is taken while
-	 * updating this value.
-	 */
-	mutex_lock(&shared->timer_mutex);
+	gov_cancel_work(shared);
 	shared->policy = NULL;
-	mutex_unlock(&shared->timer_mutex);
 
-	gov_cancel_work(dbs_data, policy);
 	return 0;
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 0c7589016b6c..76742902491e 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -132,12 +132,20 @@ static void *get_cpu_dbs_info_s(int cpu)				\
 struct cpu_common_dbs_info {
 	struct cpufreq_policy *policy;
 	/*
-	 * percpu mutex that serializes governor limit change with dbs_timer
-	 * invocation. We do not want dbs_timer to run when user is changing
-	 * the governor or limits.
+	 * Per policy mutex that serializes load evaluation from limit-change
+	 * and work-handler.
 	 */
 	struct mutex timer_mutex;
+
+	/*
+	 * Per policy lock that serializes access to queuing work from timer
+	 * handlers.
+	 */
+	spinlock_t timer_lock;
+
 	ktime_t time_stamp;
+	unsigned int skip_work;
+	struct work_struct work;
 };
 
 /* Per cpu structures */
@@ -152,7 +160,7 @@ struct cpu_dbs_info {
 	 * wake-up from idle.
 	 */
 	unsigned int prev_load;
-	struct delayed_work dwork;
+	struct timer_list timer;
 	struct cpu_common_dbs_info *shared;
 };
 
@@ -268,11 +276,11 @@ static ssize_t show_sampling_rate_min_gov_pol				\
 
 extern struct mutex cpufreq_governor_lock;
 
+void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
+void gov_cancel_work(struct cpu_common_dbs_info *shared);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		struct common_dbs_data *cdata, unsigned int event);
-void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
-		unsigned int delay, bool all_cpus);
 void od_register_powersave_bias_handler(unsigned int (*f)
 		(struct cpufreq_policy *, unsigned int, unsigned int),
 		unsigned int powersave_bias);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index fc0384b4d02d..f879012cf849 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -286,13 +286,11 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 			continue;
 
 		next_sampling = jiffies + usecs_to_jiffies(new_rate);
-		appointed_at = dbs_info->cdbs.dwork.timer.expires;
+		appointed_at = dbs_info->cdbs.timer.expires;
 
 		if (time_before(next_sampling, appointed_at)) {
-			cancel_delayed_work_sync(&dbs_info->cdbs.dwork);
-
-			gov_queue_work(dbs_data, policy,
-				       usecs_to_jiffies(new_rate), true);
+			gov_cancel_work(shared);
+			gov_add_timers(policy, usecs_to_jiffies(new_rate));
 
 		}
 	}
-- 
2.6.2.198.g614a2ac


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH V4 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-09  2:04       ` Viresh Kumar
  (?)
@ 2015-12-09 22:06       ` Rafael J. Wysocki
  2015-12-10  2:36         ` Viresh Kumar
  -1 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-09 22:06 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, sfr, Rafael J. Wysocki, Ashwin Chaugule,
	open list

On Wednesday, December 09, 2015 07:34:42 AM Viresh Kumar wrote:
> cpufreq governors evaluate load at sampling rate and based on that they
> update frequency for a group of CPUs belonging to the same cpufreq
> policy.
> 
> This is required to be done in a single thread for all policy->cpus, but
> because we don't want to wakeup idle CPUs to do just that, we use
> deferrable work for this. If we would have used a single delayed
> deferrable work for the entire policy, there were chances that the CPU
> required to run the handler can be in idle and we might end up not
> changing the frequency for the entire group with load variations.
> 
> And so we were forced to keep per-cpu works, and only the one that
> expires first need to do the real work and others are rescheduled for
> next sampling time.
> 
> We have been using the more complex solution until now, where we used a
> delayed deferrable work for this, which is a combination of a timer and
> a work.
> 
> This could be made lightweight by keeping per-cpu deferred timers with a
> single work item, which is scheduled by the first timer that expires.
> 
> This patch does just that and here are important changes:
> - The timer handler will run in irq context and so we need to use a
>   spin_lock instead of the timer_mutex. And so a separate timer_lock is
>   created. This also makes the use of the mutex and lock quite clear, as
>   we know what exactly they are protecting.
> - A new field 'skip_work' is added to track when the timer handlers can
>   queue a work. More comments present in code.
> 
> Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Ashwin Chaugule <ashwin.chaugule@linaro.org>

OK, replaced the one in my tree with this one, thanks!

BTW, can you please add an extra From: line to the bodies of your patch
messages?

For some unknown reason Patchwork or your mailer or the combination of the
two mangles your name for me and I have to fix it up manually in every patch
from you which is a !@#$%^&*() pain.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V4 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-09 22:06       ` Rafael J. Wysocki
@ 2015-12-10  2:36         ` Viresh Kumar
  2015-12-10 22:17           ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2015-12-10  2:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, sfr, Rafael J. Wysocki, Ashwin Chaugule,
	open list

On 09-12-15, 23:06, Rafael J. Wysocki wrote:
> BTW, can you please add an extra From: line to the bodies of your patch
> messages?
> 
> For some unknown reason Patchwork or your mailer or the combination of the
> two mangles your name for me and I have to fix it up manually in every patch
> from you which is a !@#$%^&*() pain.

I have raised an RT request for this, lets see what's wrong.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V4 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-10  2:36         ` Viresh Kumar
@ 2015-12-10 22:17           ` Rafael J. Wysocki
  2015-12-11  1:42             ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2015-12-10 22:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: linaro-kernel, linux-pm, sfr, Rafael J. Wysocki, Ashwin Chaugule,
	open list

On Thursday, December 10, 2015 08:06:26 AM Viresh Kumar wrote:
> On 09-12-15, 23:06, Rafael J. Wysocki wrote:
> > BTW, can you please add an extra From: line to the bodies of your patch
> > messages?
> > 
> > For some unknown reason Patchwork or your mailer or the combination of the
> > two mangles your name for me and I have to fix it up manually in every patch
> > from you which is a !@#$%^&*() pain.
> 
> I have raised an RT request for this, lets see what's wrong.

Well, if you did what I asked for, it would work around the problem whatever
that is.

Thanks,
Rafael


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH V4 5/6] cpufreq: governor: replace per-cpu delayed work with timers
  2015-12-10 22:17           ` Rafael J. Wysocki
@ 2015-12-11  1:42             ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2015-12-11  1:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linaro-kernel, linux-pm, sfr, Rafael J. Wysocki, Ashwin Chaugule,
	open list

On 10-12-15, 23:17, Rafael J. Wysocki wrote:
> Well, if you did what I asked for, it would work around the problem whatever
> that is.

Yeah, I will do that until it is resolved from backend.

-- 
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2015-12-11  1:42 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-03  4:07 [PATCH V2 0/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
2015-12-03  4:07 ` [PATCH V2 1/6] cpufreq: ondemand: Update sampling rate only for concerned policies Viresh Kumar
2015-12-03  4:07   ` Viresh Kumar
2015-12-03  4:07 ` [PATCH V2 2/6] cpufreq: ondemand: Work is guaranteed to be pending Viresh Kumar
2015-12-03  4:07   ` Viresh Kumar
2015-12-03  4:07 ` [PATCH V2 3/6] cpufreq: governor: Pass policy as argument to ->gov_dbs_timer() Viresh Kumar
2015-12-03  4:07   ` Viresh Kumar
2015-12-03  4:07 ` [PATCH V2 4/6] cpufreq: governor: initialize/destroy timer_mutex with 'shared' Viresh Kumar
2015-12-03  4:07   ` Viresh Kumar
2015-12-03  4:07 ` [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
2015-12-03  4:07   ` Viresh Kumar
2015-12-04  1:18   ` Rafael J. Wysocki
2015-12-04  6:11     ` Viresh Kumar
2015-12-05  2:14       ` Rafael J. Wysocki
2015-12-05  4:10         ` Viresh Kumar
2015-12-07  1:28           ` Rafael J. Wysocki
2015-12-07  7:50             ` Viresh Kumar
2015-12-07 22:43               ` Rafael J. Wysocki
2015-12-07 23:17                 ` Rafael J. Wysocki
2015-12-08  0:39                   ` [PATCH][experimantal] cpufreq: governor: Use an atomic variable for synchronization Rafael J. Wysocki
2015-12-08  6:59                     ` Viresh Kumar
2015-12-08 13:30                       ` Rafael J. Wysocki
2015-12-08 13:36                         ` Viresh Kumar
2015-12-08 14:19                           ` Rafael J. Wysocki
2015-12-08 13:55                             ` Viresh Kumar
2015-12-08 14:30                               ` Rafael J. Wysocki
2015-12-08 14:56                                 ` Viresh Kumar
2015-12-08 16:42                                   ` Rafael J. Wysocki
2015-12-08 16:34                                     ` Viresh Kumar
2015-12-08  6:46                   ` [PATCH V2 5/6] cpufreq: governor: replace per-cpu delayed work with timers Viresh Kumar
2015-12-08  6:56                 ` Viresh Kumar
2015-12-08 13:18                   ` Rafael J. Wysocki
2015-12-08 13:30                     ` Viresh Kumar
2015-12-08 14:04                       ` Rafael J. Wysocki
2015-12-04  6:13   ` [PATCH V3 " Viresh Kumar
2015-12-04  6:13     ` Viresh Kumar
2015-12-09  2:04     ` [PATCH V4 " Viresh Kumar
2015-12-09  2:04       ` Viresh Kumar
2015-12-09 22:06       ` Rafael J. Wysocki
2015-12-10  2:36         ` Viresh Kumar
2015-12-10 22:17           ` Rafael J. Wysocki
2015-12-11  1:42             ` Viresh Kumar
2015-12-03  4:07 ` [PATCH V2 6/6] cpufreq: ondemand: update update_sampling_rate() to make it more efficient Viresh Kumar
2015-12-03  4:07   ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.