All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] cpufreq: governor: Fix potential races
@ 2015-06-03 10:27 Viresh Kumar
  2015-06-03 10:27 ` [PATCH 1/3] cpufreq: governor: register notifier from cs_init() Viresh Kumar
                   ` (4 more replies)
  0 siblings, 5 replies; 24+ messages in thread
From: Viresh Kumar @ 2015-06-03 10:27 UTC (permalink / raw)
  To: Rafael Wysocki, Preeti U Murthy
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan, Viresh Kumar

Hi Rafael,

Preeti recently highlighted [1] some issues in cpufreq core locking with
respect to governors. I wanted to solve them after we have simplified
the hotplug paths in cpufreq core with my latest patches, but now that
she has poked me, I have done some work in that area.

I am trying to solve only a part of the bigger problem (in a way that I
feel is the right way ahead). The first patches restructures code to
make it more readable and the last patch does all the major changes. The
logs in that one should be good enough to explain why and what I am
doing.

The first two shouldn't bring any functional change and so can be
applied early if you are confident about them.

@Preeti: I would like you to test these patches. These should get rid of
the crashes you were facing but may generate a WARN() from line 447 of
cpufreq_governor.c, if the sequence is wrong. That has to be fixed
separately.

Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))

Rebased over: v4.1-rc6
Tested-on: ARM dual Cortex -A15 Exynos board.

[1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ibm.com

Viresh Kumar (3):
  cpufreq: governor: register notifier from cs_init()
  cpufreq: governor: split cpufreq_governor_dbs()
  cpufreq: governor: Serialize governor callbacks

 drivers/cpufreq/cpufreq_conservative.c |  28 +--
 drivers/cpufreq/cpufreq_governor.c     | 340 ++++++++++++++++++---------------
 drivers/cpufreq/cpufreq_governor.h     |  16 +-
 drivers/cpufreq/cpufreq_ondemand.c     |   6 +-
 4 files changed, 209 insertions(+), 181 deletions(-)

-- 
2.4.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] cpufreq: governor: register notifier from cs_init()
  2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
@ 2015-06-03 10:27 ` Viresh Kumar
  2015-06-04  5:38   ` Preeti U Murthy
  2015-06-03 10:27 ` [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs() Viresh Kumar
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-03 10:27 UTC (permalink / raw)
  To: Rafael Wysocki, Preeti U Murthy
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan, Viresh Kumar

Notifiers are required only for conservative governor and the common
governor code is unnecessarily polluted with that. Handle that from
cs_init/exit() instead of cpufreq_governor_dbs().

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 26 +++++++++++++++-----------
 drivers/cpufreq/cpufreq_governor.c     | 22 +++-------------------
 drivers/cpufreq/cpufreq_governor.h     |  8 ++------
 drivers/cpufreq/cpufreq_ondemand.c     |  4 ++--
 4 files changed, 22 insertions(+), 38 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 25a70d06c5bf..75f875bb155e 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -148,6 +148,10 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 	return 0;
 }
 
+static struct notifier_block cs_cpufreq_notifier_block = {
+	.notifier_call = dbs_cpufreq_notifier,
+};
+
 /************************** sysfs interface ************************/
 static struct common_dbs_data cs_dbs_cdata;
 
@@ -317,7 +321,7 @@ static struct attribute_group cs_attr_group_gov_pol = {
 
 /************************** sysfs end ************************/
 
-static int cs_init(struct dbs_data *dbs_data)
+static int cs_init(struct dbs_data *dbs_data, bool notify)
 {
 	struct cs_dbs_tuners *tuners;
 
@@ -336,25 +340,26 @@ static int cs_init(struct dbs_data *dbs_data)
 	dbs_data->tuners = tuners;
 	dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
 		jiffies_to_usecs(10);
+
+	if (notify)
+		cpufreq_register_notifier(&cs_cpufreq_notifier_block,
+					  CPUFREQ_TRANSITION_NOTIFIER);
+
 	mutex_init(&dbs_data->mutex);
 	return 0;
 }
 
-static void cs_exit(struct dbs_data *dbs_data)
+static void cs_exit(struct dbs_data *dbs_data, bool notify)
 {
+	if (notify)
+		cpufreq_unregister_notifier(&cs_cpufreq_notifier_block,
+					    CPUFREQ_TRANSITION_NOTIFIER);
+
 	kfree(dbs_data->tuners);
 }
 
 define_get_cpu_dbs_routines(cs_cpu_dbs_info);
 
-static struct notifier_block cs_cpufreq_notifier_block = {
-	.notifier_call = dbs_cpufreq_notifier,
-};
-
-static struct cs_ops cs_ops = {
-	.notifier_block = &cs_cpufreq_notifier_block,
-};
-
 static struct common_dbs_data cs_dbs_cdata = {
 	.governor = GOV_CONSERVATIVE,
 	.attr_group_gov_sys = &cs_attr_group_gov_sys,
@@ -363,7 +368,6 @@ static struct common_dbs_data cs_dbs_cdata = {
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = cs_dbs_timer,
 	.gov_check_cpu = cs_check_cpu,
-	.gov_ops = &cs_ops,
 	.init = cs_init,
 	.exit = cs_exit,
 };
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 1b44496b2d2b..d64a82e6481a 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -278,7 +278,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 
 		dbs_data->cdata = cdata;
 		dbs_data->usage_count = 1;
-		rc = cdata->init(dbs_data);
+		rc = cdata->init(dbs_data, !policy->governor->initialized);
 		if (rc) {
 			pr_err("%s: POLICY_INIT: init() failed\n", __func__);
 			kfree(dbs_data);
@@ -291,7 +291,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		rc = sysfs_create_group(get_governor_parent_kobj(policy),
 				get_sysfs_attr(dbs_data));
 		if (rc) {
-			cdata->exit(dbs_data);
+			cdata->exit(dbs_data, !policy->governor->initialized);
 			kfree(dbs_data);
 			return rc;
 		}
@@ -309,14 +309,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
 					latency * LATENCY_MULTIPLIER));
 
-		if ((cdata->governor == GOV_CONSERVATIVE) &&
-				(!policy->governor->initialized)) {
-			struct cs_ops *cs_ops = dbs_data->cdata->gov_ops;
-
-			cpufreq_register_notifier(cs_ops->notifier_block,
-					CPUFREQ_TRANSITION_NOTIFIER);
-		}
-
 		if (!have_governor_per_policy())
 			cdata->gdbs_data = dbs_data;
 
@@ -329,15 +321,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 			if (!have_governor_per_policy())
 				cpufreq_put_global_kobject();
 
-			if ((dbs_data->cdata->governor == GOV_CONSERVATIVE) &&
-				(policy->governor->initialized == 1)) {
-				struct cs_ops *cs_ops = dbs_data->cdata->gov_ops;
-
-				cpufreq_unregister_notifier(cs_ops->notifier_block,
-						CPUFREQ_TRANSITION_NOTIFIER);
-			}
-
-			cdata->exit(dbs_data);
+			cdata->exit(dbs_data, policy->governor->initialized == 1);
 			kfree(dbs_data);
 			cdata->gdbs_data = NULL;
 		}
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index cc401d147e72..1690120df487 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -208,8 +208,8 @@ struct common_dbs_data {
 	void *(*get_cpu_dbs_info_s)(int cpu);
 	void (*gov_dbs_timer)(struct work_struct *work);
 	void (*gov_check_cpu)(int cpu, unsigned int load);
-	int (*init)(struct dbs_data *dbs_data);
-	void (*exit)(struct dbs_data *dbs_data);
+	int (*init)(struct dbs_data *dbs_data, bool notify);
+	void (*exit)(struct dbs_data *dbs_data, bool notify);
 
 	/* Governor specific ops, see below */
 	void *gov_ops;
@@ -234,10 +234,6 @@ struct od_ops {
 	void (*freq_increase)(struct cpufreq_policy *policy, unsigned int freq);
 };
 
-struct cs_ops {
-	struct notifier_block *notifier_block;
-};
-
 static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 {
 	int delay = usecs_to_jiffies(sampling_rate);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index ad3f38fd3eb9..4fe78a9caa04 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -475,7 +475,7 @@ static struct attribute_group od_attr_group_gov_pol = {
 
 /************************** sysfs end ************************/
 
-static int od_init(struct dbs_data *dbs_data)
+static int od_init(struct dbs_data *dbs_data, bool notify)
 {
 	struct od_dbs_tuners *tuners;
 	u64 idle_time;
@@ -517,7 +517,7 @@ static int od_init(struct dbs_data *dbs_data)
 	return 0;
 }
 
-static void od_exit(struct dbs_data *dbs_data)
+static void od_exit(struct dbs_data *dbs_data, bool notify)
 {
 	kfree(dbs_data->tuners);
 }
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs()
  2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
  2015-06-03 10:27 ` [PATCH 1/3] cpufreq: governor: register notifier from cs_init() Viresh Kumar
@ 2015-06-03 10:27 ` Viresh Kumar
  2015-06-04 10:04   ` Preeti U Murthy
  2015-06-04 11:13   ` [PATCH V2 " Viresh Kumar
  2015-06-03 10:27 ` [PATCH 3/3] cpufreq: governor: Serialize governor callbacks Viresh Kumar
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 24+ messages in thread
From: Viresh Kumar @ 2015-06-03 10:27 UTC (permalink / raw)
  To: Rafael Wysocki, Preeti U Murthy
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan, Viresh Kumar

cpufreq_governor_dbs() is hardly readable, it is just too big and
complicated. Lets make it more readable by splitting out event specific
routines.

Order of statements is changed at few places, but that shouldn't bring
any functional change.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
Best way to verify the changes here is to keep both copies of code side
by side and comparing it event wise.

 drivers/cpufreq/cpufreq_governor.c | 326 +++++++++++++++++++++----------------
 1 file changed, 185 insertions(+), 141 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index d64a82e6481a..dc382a5a2158 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -239,195 +239,239 @@ static void set_sampling_rate(struct dbs_data *dbs_data,
 	}
 }
 
-int cpufreq_governor_dbs(struct cpufreq_policy *policy,
-		struct common_dbs_data *cdata, unsigned int event)
+static int cpufreq_governor_init(struct cpufreq_policy *policy,
+				 struct dbs_data *dbs_data,
+				 struct common_dbs_data *cdata)
 {
-	struct dbs_data *dbs_data;
-	struct od_cpu_dbs_info_s *od_dbs_info = NULL;
-	struct cs_cpu_dbs_info_s *cs_dbs_info = NULL;
-	struct od_ops *od_ops = NULL;
-	struct od_dbs_tuners *od_tuners = NULL;
-	struct cs_dbs_tuners *cs_tuners = NULL;
-	struct cpu_dbs_common_info *cpu_cdbs;
-	unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu;
-	int io_busy = 0;
-	int rc;
+	unsigned int latency;
+	int ret;
 
-	if (have_governor_per_policy())
-		dbs_data = policy->governor_data;
-	else
-		dbs_data = cdata->gdbs_data;
+	if (dbs_data) {
+		WARN_ON(have_governor_per_policy());
+		dbs_data->usage_count++;
+		policy->governor_data = dbs_data;
+		return 0;
+	}
 
-	WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT));
+	dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL);
+	if (!dbs_data)
+		return -ENOMEM;
 
-	switch (event) {
-	case CPUFREQ_GOV_POLICY_INIT:
-		if (have_governor_per_policy()) {
-			WARN_ON(dbs_data);
-		} else if (dbs_data) {
-			dbs_data->usage_count++;
-			policy->governor_data = dbs_data;
-			return 0;
-		}
+	dbs_data->cdata = cdata;
+	dbs_data->usage_count = 1;
 
-		dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL);
-		if (!dbs_data) {
-			pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__);
-			return -ENOMEM;
-		}
+	ret = cdata->init(dbs_data, !policy->governor->initialized);
+	if (ret)
+		goto free_dbs_data;
 
-		dbs_data->cdata = cdata;
-		dbs_data->usage_count = 1;
-		rc = cdata->init(dbs_data, !policy->governor->initialized);
-		if (rc) {
-			pr_err("%s: POLICY_INIT: init() failed\n", __func__);
-			kfree(dbs_data);
-			return rc;
-		}
+	/* policy latency is in ns. Convert it to us first */
+	latency = policy->cpuinfo.transition_latency / 1000;
+	if (latency == 0)
+		latency = 1;
 
-		if (!have_governor_per_policy())
-			WARN_ON(cpufreq_get_global_kobject());
+	/* Bring kernel and HW constraints together */
+	dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
+					  MIN_LATENCY_MULTIPLIER * latency);
+	set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
+					latency * LATENCY_MULTIPLIER));
 
-		rc = sysfs_create_group(get_governor_parent_kobj(policy),
-				get_sysfs_attr(dbs_data));
-		if (rc) {
-			cdata->exit(dbs_data, !policy->governor->initialized);
-			kfree(dbs_data);
-			return rc;
-		}
+	if (!have_governor_per_policy()) {
+		WARN_ON(cpufreq_get_global_kobject());
+		cdata->gdbs_data = dbs_data;
+	}
 
-		policy->governor_data = dbs_data;
+	ret = sysfs_create_group(get_governor_parent_kobj(policy),
+				 get_sysfs_attr(dbs_data));
+	if (ret)
+		goto cdata_exit;
 
-		/* policy latency is in ns. Convert it to us first */
-		latency = policy->cpuinfo.transition_latency / 1000;
-		if (latency == 0)
-			latency = 1;
+	policy->governor_data = dbs_data;
 
-		/* Bring kernel and HW constraints together */
-		dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
-				MIN_LATENCY_MULTIPLIER * latency);
-		set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
-					latency * LATENCY_MULTIPLIER));
+	return 0;
 
-		if (!have_governor_per_policy())
-			cdata->gdbs_data = dbs_data;
+cdata_exit:
+	if (!have_governor_per_policy()) {
+		cdata->gdbs_data = NULL;
+		cpufreq_put_global_kobject();
+	}
+	cdata->exit(dbs_data, !policy->governor->initialized);
+free_dbs_data:
+	kfree(dbs_data);
+	return ret;
+}
 
-		return 0;
-	case CPUFREQ_GOV_POLICY_EXIT:
-		if (!--dbs_data->usage_count) {
-			sysfs_remove_group(get_governor_parent_kobj(policy),
-					get_sysfs_attr(dbs_data));
+static void cpufreq_governor_exit(struct cpufreq_policy *policy,
+				  struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
 
-			if (!have_governor_per_policy())
-				cpufreq_put_global_kobject();
+	policy->governor_data = NULL;
+	if (!--dbs_data->usage_count) {
+		sysfs_remove_group(get_governor_parent_kobj(policy),
+				   get_sysfs_attr(dbs_data));
 
-			cdata->exit(dbs_data, policy->governor->initialized == 1);
-			kfree(dbs_data);
+		if (!have_governor_per_policy()) {
 			cdata->gdbs_data = NULL;
+			cpufreq_put_global_kobject();
 		}
 
-		policy->governor_data = NULL;
-		return 0;
+		cdata->exit(dbs_data, policy->governor->initialized == 1);
+		kfree(dbs_data);
 	}
+}
 
-	cpu_cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+static int cpufreq_governor_start(struct cpufreq_policy *policy,
+				  struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
+	unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu;
+	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
+	int io_busy = 0;
+
+	if (!policy->cur)
+		return -EINVAL;
+
+	if (cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
-	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
-		cs_tuners = dbs_data->tuners;
-		cs_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
 		sampling_rate = cs_tuners->sampling_rate;
 		ignore_nice = cs_tuners->ignore_nice_load;
 	} else {
-		od_tuners = dbs_data->tuners;
-		od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
+		struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+
 		sampling_rate = od_tuners->sampling_rate;
 		ignore_nice = od_tuners->ignore_nice_load;
-		od_ops = dbs_data->cdata->gov_ops;
 		io_busy = od_tuners->io_is_busy;
 	}
 
-	switch (event) {
-	case CPUFREQ_GOV_START:
-		if (!policy->cur)
-			return -EINVAL;
+	mutex_lock(&dbs_data->mutex);
 
-		mutex_lock(&dbs_data->mutex);
+	for_each_cpu(j, policy->cpus) {
+		struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j);
+		unsigned int prev_load;
 
-		for_each_cpu(j, policy->cpus) {
-			struct cpu_dbs_common_info *j_cdbs =
-				dbs_data->cdata->get_cpu_cdbs(j);
-			unsigned int prev_load;
+		j_cdbs->cpu = j;
+		j_cdbs->cur_policy = policy;
+		j_cdbs->prev_cpu_idle =
+			get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
 
-			j_cdbs->cpu = j;
-			j_cdbs->cur_policy = policy;
-			j_cdbs->prev_cpu_idle = get_cpu_idle_time(j,
-					       &j_cdbs->prev_cpu_wall, io_busy);
+		prev_load = (unsigned int)(j_cdbs->prev_cpu_wall -
+					    j_cdbs->prev_cpu_idle);
+		j_cdbs->prev_load = 100 * prev_load /
+				    (unsigned int)j_cdbs->prev_cpu_wall;
 
-			prev_load = (unsigned int)
-				(j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle);
-			j_cdbs->prev_load = 100 * prev_load /
-					(unsigned int) j_cdbs->prev_cpu_wall;
+		if (ignore_nice)
+			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-			if (ignore_nice)
-				j_cdbs->prev_cpu_nice =
-					kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+		mutex_init(&j_cdbs->timer_mutex);
+		INIT_DEFERRABLE_WORK(&j_cdbs->work, cdata->gov_dbs_timer);
+	}
 
-			mutex_init(&j_cdbs->timer_mutex);
-			INIT_DEFERRABLE_WORK(&j_cdbs->work,
-					     dbs_data->cdata->gov_dbs_timer);
-		}
+	if (cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_cpu_dbs_info_s *cs_dbs_info =
+			cdata->get_cpu_dbs_info_s(cpu);
 
-		if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
-			cs_dbs_info->down_skip = 0;
-			cs_dbs_info->enable = 1;
-			cs_dbs_info->requested_freq = policy->cur;
-		} else {
-			od_dbs_info->rate_mult = 1;
-			od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
-			od_ops->powersave_bias_init_cpu(cpu);
-		}
+		cs_dbs_info->down_skip = 0;
+		cs_dbs_info->enable = 1;
+		cs_dbs_info->requested_freq = policy->cur;
+	} else {
+		struct od_ops *od_ops = cdata->gov_ops;
+		struct od_cpu_dbs_info_s *od_dbs_info = cdata->get_cpu_dbs_info_s(cpu);
 
-		mutex_unlock(&dbs_data->mutex);
+		od_dbs_info->rate_mult = 1;
+		od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
+		od_ops->powersave_bias_init_cpu(cpu);
+	}
 
-		/* Initiate timer time stamp */
-		cpu_cdbs->time_stamp = ktime_get();
+	mutex_unlock(&dbs_data->mutex);
 
-		gov_queue_work(dbs_data, policy,
-				delay_for_sampling_rate(sampling_rate), true);
-		break;
+	/* Initiate timer time stamp */
+	cpu_cdbs->time_stamp = ktime_get();
 
-	case CPUFREQ_GOV_STOP:
-		if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
-			cs_dbs_info->enable = 0;
+	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
+		       true);
+	return 0;
+}
+
+static void cpufreq_governor_stop(struct cpufreq_policy *policy,
+				  struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
+	unsigned int cpu = policy->cpu;
+	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
+
+	if (cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_cpu_dbs_info_s *cs_dbs_info =
+			cdata->get_cpu_dbs_info_s(cpu);
 
-		gov_cancel_work(dbs_data, policy);
+		cs_dbs_info->enable = 0;
+	}
+
+	gov_cancel_work(dbs_data, policy);
+
+	mutex_lock(&dbs_data->mutex);
+	mutex_destroy(&cpu_cdbs->timer_mutex);
+	cpu_cdbs->cur_policy = NULL;
+	mutex_unlock(&dbs_data->mutex);
+}
 
-		mutex_lock(&dbs_data->mutex);
-		mutex_destroy(&cpu_cdbs->timer_mutex);
-		cpu_cdbs->cur_policy = NULL;
+static void cpufreq_governor_limits(struct cpufreq_policy *policy,
+				    struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
+	unsigned int cpu = policy->cpu;
+	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
 
+	mutex_lock(&dbs_data->mutex);
+	if (!cpu_cdbs->cur_policy) {
 		mutex_unlock(&dbs_data->mutex);
+		return;
+	}
 
-		break;
+	mutex_lock(&cpu_cdbs->timer_mutex);
+	if (policy->max < cpu_cdbs->cur_policy->cur)
+		__cpufreq_driver_target(cpu_cdbs->cur_policy, policy->max,
+					CPUFREQ_RELATION_H);
+	else if (policy->min > cpu_cdbs->cur_policy->cur)
+		__cpufreq_driver_target(cpu_cdbs->cur_policy, policy->min,
+					CPUFREQ_RELATION_L);
+	dbs_check_cpu(dbs_data, cpu);
+	mutex_unlock(&cpu_cdbs->timer_mutex);
+
+	mutex_unlock(&dbs_data->mutex);
+}
 
+int cpufreq_governor_dbs(struct cpufreq_policy *policy,
+			 struct common_dbs_data *cdata, unsigned int event)
+{
+	struct dbs_data *dbs_data;
+	int ret = 0;
+
+	if (have_governor_per_policy())
+		dbs_data = policy->governor_data;
+	else
+		dbs_data = cdata->gdbs_data;
+
+	WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT));
+
+	switch (event) {
+	case CPUFREQ_GOV_POLICY_INIT:
+		ret = cpufreq_governor_init(policy, dbs_data, cdata);
+		break;
+	case CPUFREQ_GOV_POLICY_EXIT:
+		cpufreq_governor_exit(policy, dbs_data);
+		break;
+	case CPUFREQ_GOV_START:
+		ret = cpufreq_governor_start(policy, dbs_data);
+		break;
+	case CPUFREQ_GOV_STOP:
+		cpufreq_governor_stop(policy, dbs_data);
+		break;
 	case CPUFREQ_GOV_LIMITS:
-		mutex_lock(&dbs_data->mutex);
-		if (!cpu_cdbs->cur_policy) {
-			mutex_unlock(&dbs_data->mutex);
-			break;
-		}
-		mutex_lock(&cpu_cdbs->timer_mutex);
-		if (policy->max < cpu_cdbs->cur_policy->cur)
-			__cpufreq_driver_target(cpu_cdbs->cur_policy,
-					policy->max, CPUFREQ_RELATION_H);
-		else if (policy->min > cpu_cdbs->cur_policy->cur)
-			__cpufreq_driver_target(cpu_cdbs->cur_policy,
-					policy->min, CPUFREQ_RELATION_L);
-		dbs_check_cpu(dbs_data, cpu);
-		mutex_unlock(&cpu_cdbs->timer_mutex);
-		mutex_unlock(&dbs_data->mutex);
+		cpufreq_governor_limits(policy, dbs_data);
 		break;
 	}
-	return 0;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/3] cpufreq: governor: Serialize governor callbacks
  2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
  2015-06-03 10:27 ` [PATCH 1/3] cpufreq: governor: register notifier from cs_init() Viresh Kumar
  2015-06-03 10:27 ` [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs() Viresh Kumar
@ 2015-06-03 10:27 ` Viresh Kumar
  2015-06-04 10:47   ` Preeti U Murthy
  2015-06-04  5:14 ` [PATCH 0/3] cpufreq: governor: Fix potential races Preeti U Murthy
  2015-06-15 23:48 ` Rafael J. Wysocki
  4 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-03 10:27 UTC (permalink / raw)
  To: Rafael Wysocki, Preeti U Murthy
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan, Viresh Kumar

There are several races reported in cpufreq core around governors (only
ondemand and conservative) by different people.

There are at least two race scenarios present in governor code:
 (a) Concurrent access/updates of governor internal structures.

 It is possible that fields such as 'dbs_data->usage_count', etc.  are
 accessed simultaneously for different policies using same governor
 structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
 because of this we can dereference bad pointers.

 For example consider a system with two CPUs with separate 'struct
 cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
 CPU0 switching to powersave and CPU1 to ondemand:
	CPU0				CPU1

	store*				store*

	cpufreq_governor_exit()		cpufreq_governor_init()
					dbs_data = cdata->gdbs_data;

	if (!--dbs_data->usage_count)
		kfree(dbs_data);

					dbs_data->usage_count++;
					*Bad pointer dereference*

 There are other races possible between EXIT and START/STOP/LIMIT as
 well. Its really complicated.

 (b) Switching governor state in bad sequence:

 For example trying to switch a governor to START state, when the
 governor is in EXIT state. There are some checks present in
 __cpufreq_governor() but they aren't sufficient as they compare events
 against 'policy->governor_enabled', where as we need to take governor's
 state into account, which can be used by multiple policies.

These two issues need to be solved separately and the responsibility
should be properly divided between cpufreq and governor core.

The first problem is more about the governor core, as it needs to
protect its structures properly. And the second problem should be fixed
in cpufreq core instead of governor, as its all about sequence of
events.

This patch is trying to solve only the first problem.

There are two types of data we need to protect,
- 'struct common_dbs_data': No matter what, there is going to be a
  single copy of this per governor.
- 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
  will have per-policy copy of this data, otherwise a single copy.

Because of such complexities, the mutex present in 'struct dbs_data' is
insufficient to solve our problem. For example we need to protect
fetching of 'dbs_data' from different structures at the beginning of
cpufreq_governor_dbs(), to make sure it isn't currently being updated.

This can be fixed if we can guarantee serialization of event parsing
code for an individual governor. This is best solved with a mutex per
governor, and the placeholder for that is 'struct common_dbs_data'.

And so this patch moves the mutex from 'struct dbs_data' to 'struct
common_dbs_data' and takes it at the beginning and drops it at the end
of cpufreq_governor_dbs().

Tested with and without following configuration options:

CONFIG_LOCKDEP_SUPPORT=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c |  2 +-
 drivers/cpufreq/cpufreq_governor.c     | 24 +++++++++++-------------
 drivers/cpufreq/cpufreq_governor.h     |  8 +++++---
 drivers/cpufreq/cpufreq_ondemand.c     |  2 +-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 75f875bb155e..c86a10c30912 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -345,7 +345,6 @@ static int cs_init(struct dbs_data *dbs_data, bool notify)
 		cpufreq_register_notifier(&cs_cpufreq_notifier_block,
 					  CPUFREQ_TRANSITION_NOTIFIER);
 
-	mutex_init(&dbs_data->mutex);
 	return 0;
 }
 
@@ -370,6 +369,7 @@ static struct common_dbs_data cs_dbs_cdata = {
 	.gov_check_cpu = cs_check_cpu,
 	.init = cs_init,
 	.exit = cs_exit,
+	.mutex = __MUTEX_INITIALIZER(cs_dbs_cdata.mutex),
 };
 
 static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index dc382a5a2158..ed849a8777dd 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -344,8 +344,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		io_busy = od_tuners->io_is_busy;
 	}
 
-	mutex_lock(&dbs_data->mutex);
-
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j);
 		unsigned int prev_load;
@@ -383,8 +381,6 @@ static int cpufreq_governor_start(struct cpufreq_policy *policy,
 		od_ops->powersave_bias_init_cpu(cpu);
 	}
 
-	mutex_unlock(&dbs_data->mutex);
-
 	/* Initiate timer time stamp */
 	cpu_cdbs->time_stamp = ktime_get();
 
@@ -409,10 +405,8 @@ static void cpufreq_governor_stop(struct cpufreq_policy *policy,
 
 	gov_cancel_work(dbs_data, policy);
 
-	mutex_lock(&dbs_data->mutex);
 	mutex_destroy(&cpu_cdbs->timer_mutex);
 	cpu_cdbs->cur_policy = NULL;
-	mutex_unlock(&dbs_data->mutex);
 }
 
 static void cpufreq_governor_limits(struct cpufreq_policy *policy,
@@ -422,11 +416,8 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy,
 	unsigned int cpu = policy->cpu;
 	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
 
-	mutex_lock(&dbs_data->mutex);
-	if (!cpu_cdbs->cur_policy) {
-		mutex_unlock(&dbs_data->mutex);
+	if (!cpu_cdbs->cur_policy)
 		return;
-	}
 
 	mutex_lock(&cpu_cdbs->timer_mutex);
 	if (policy->max < cpu_cdbs->cur_policy->cur)
@@ -437,8 +428,6 @@ static void cpufreq_governor_limits(struct cpufreq_policy *policy,
 					CPUFREQ_RELATION_L);
 	dbs_check_cpu(dbs_data, cpu);
 	mutex_unlock(&cpu_cdbs->timer_mutex);
-
-	mutex_unlock(&dbs_data->mutex);
 }
 
 int cpufreq_governor_dbs(struct cpufreq_policy *policy,
@@ -447,12 +436,18 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 	struct dbs_data *dbs_data;
 	int ret = 0;
 
+	/* Lock governor to block concurrent initialization of governor */
+	mutex_lock(&cdata->mutex);
+
 	if (have_governor_per_policy())
 		dbs_data = policy->governor_data;
 	else
 		dbs_data = cdata->gdbs_data;
 
-	WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT));
+	if (WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))) {
+		ret = -EINVAL;
+		goto unlock;
+	}
 
 	switch (event) {
 	case CPUFREQ_GOV_POLICY_INIT:
@@ -472,6 +467,9 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		break;
 	}
 
+unlock:
+	mutex_unlock(&cdata->mutex);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 1690120df487..34736f5e869d 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -213,6 +213,11 @@ struct common_dbs_data {
 
 	/* Governor specific ops, see below */
 	void *gov_ops;
+
+	/*
+	 * Protects governor's data (struct dbs_data and struct common_dbs_data)
+	 */
+	struct mutex mutex;
 };
 
 /* Governor Per policy data */
@@ -221,9 +226,6 @@ struct dbs_data {
 	unsigned int min_sampling_rate;
 	int usage_count;
 	void *tuners;
-
-	/* dbs_mutex protects dbs_enable in governor start/stop */
-	struct mutex mutex;
 };
 
 /* Governor specific ops, will be passed to dbs_data->gov_ops */
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 4fe78a9caa04..3c1e10f2304c 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -513,7 +513,6 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
 	tuners->io_is_busy = should_io_be_busy();
 
 	dbs_data->tuners = tuners;
-	mutex_init(&dbs_data->mutex);
 	return 0;
 }
 
@@ -541,6 +540,7 @@ static struct common_dbs_data od_dbs_cdata = {
 	.gov_ops = &od_ops,
 	.init = od_init,
 	.exit = od_exit,
+	.mutex = __MUTEX_INITIALIZER(od_dbs_cdata.mutex),
 };
 
 static void od_set_powersave_bias(unsigned int powersave_bias)
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
                   ` (2 preceding siblings ...)
  2015-06-03 10:27 ` [PATCH 3/3] cpufreq: governor: Serialize governor callbacks Viresh Kumar
@ 2015-06-04  5:14 ` Preeti U Murthy
  2015-06-04  6:08   ` Preeti U Murthy
  2015-06-05  3:00   ` Viresh Kumar
  2015-06-15 23:48 ` Rafael J. Wysocki
  4 siblings, 2 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  5:14 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/03/2015 03:57 PM, Viresh Kumar wrote:
> Hi Rafael,
> 
> Preeti recently highlighted [1] some issues in cpufreq core locking with
> respect to governors. I wanted to solve them after we have simplified
> the hotplug paths in cpufreq core with my latest patches, but now that
> she has poked me, I have done some work in that area.
> 
> I am trying to solve only a part of the bigger problem (in a way that I
> feel is the right way ahead). The first patches restructures code to
> make it more readable and the last patch does all the major changes. The
> logs in that one should be good enough to explain why and what I am
> doing.
> 
> The first two shouldn't bring any functional change and so can be
> applied early if you are confident about them.
> 
> @Preeti: I would like you to test these patches. These should get rid of
> the crashes you were facing but may generate a WARN() from line 447 of
> cpufreq_governor.c, if the sequence is wrong. That has to be fixed
> separately.
> 
> Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
> 
> Rebased over: v4.1-rc6
> Tested-on: ARM dual Cortex -A15 Exynos board.
> 
> [1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ibm.com
> 
> Viresh Kumar (3):
>   cpufreq: governor: register notifier from cs_init()
>   cpufreq: governor: split cpufreq_governor_dbs()
>   cpufreq: governor: Serialize governor callbacks
> 
>  drivers/cpufreq/cpufreq_conservative.c |  28 +--
>  drivers/cpufreq/cpufreq_governor.c     | 340 ++++++++++++++++++---------------
>  drivers/cpufreq/cpufreq_governor.h     |  16 +-
>  drivers/cpufreq/cpufreq_ondemand.c     |   6 +-
>  4 files changed, 209 insertions(+), 181 deletions(-)
> 

I did a hotplug test on a single core alongside changing governors
between ondemand and conservative on the same core. The policy is per
core on powerpc. Within a second of that run the kernel panics. The
backtrace is below:

[  165.981836] Unable to handle kernel paging request for data at
address 0x00000000
[  165.981929] Faulting instruction address: 0xc00000000053b3e0
cpu 0x4: Vector: 300 (Data Access) at [c000000fe0b2b880]
    pc: c00000000053b3e0: __bitmap_weight+0x70/0x100
    lr: c00000000085a008: need_load_eval+0x38/0xf0
    sp: c000000fe0b2bb00
   msr: 9000000100009033
   dar: 0
 dsisr: 40000000
  current = 0xc000000003e4fc90
  paca    = 0xc000000007da2600	 softe: 0	 irq_happened: 0x01
    pid   = 812, comm = kworker/4:2
enter ? for help
[c000000fe0b2bb50] c00000000085a008 need_load_eval+0x38/0xf0
[c000000fe0b2bb80] c00000000085815c cs_dbs_timer+0xdc/0x150
[c000000fe0b2bbe0] c0000000000f489c process_one_work+0x24c/0x910
[c000000fe0b2bc90] c0000000000f50dc worker_thread+0x17c/0x540
[c000000fe0b2bd20] c0000000000fed70 kthread+0x120/0x140
[c000000fe0b2be30] c000000000009678 ret_from_kernel_thread+0x5c/0x64

The crash is the same as was reported at
http://www.gossamer-threads.com/lists/linux/kernel/2186336.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] cpufreq: governor: register notifier from cs_init()
  2015-06-03 10:27 ` [PATCH 1/3] cpufreq: governor: register notifier from cs_init() Viresh Kumar
@ 2015-06-04  5:38   ` Preeti U Murthy
  2015-06-04  6:02     ` Viresh Kumar
  0 siblings, 1 reply; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  5:38 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/03/2015 03:57 PM, Viresh Kumar wrote:
> Notifiers are required only for conservative governor and the common
> governor code is unnecessarily polluted with that. Handle that from
> cs_init/exit() instead of cpufreq_governor_dbs().
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq_conservative.c | 26 +++++++++++++++-----------
>  drivers/cpufreq/cpufreq_governor.c     | 22 +++-------------------
>  drivers/cpufreq/cpufreq_governor.h     |  8 ++------
>  drivers/cpufreq/cpufreq_ondemand.c     |  4 ++--
>  4 files changed, 22 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index 25a70d06c5bf..75f875bb155e 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -148,6 +148,10 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
>  	return 0;
>  }
> 
> +static struct notifier_block cs_cpufreq_notifier_block = {
> +	.notifier_call = dbs_cpufreq_notifier,
> +};
> +
>  /************************** sysfs interface ************************/
>  static struct common_dbs_data cs_dbs_cdata;
> 
> @@ -317,7 +321,7 @@ static struct attribute_group cs_attr_group_gov_pol = {
> 
>  /************************** sysfs end ************************/
> 
> -static int cs_init(struct dbs_data *dbs_data)
> +static int cs_init(struct dbs_data *dbs_data, bool notify)
>  {
>  	struct cs_dbs_tuners *tuners;
> 
> @@ -336,25 +340,26 @@ static int cs_init(struct dbs_data *dbs_data)
>  	dbs_data->tuners = tuners;
>  	dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
>  		jiffies_to_usecs(10);
> +
> +	if (notify)
> +		cpufreq_register_notifier(&cs_cpufreq_notifier_block,
> +					  CPUFREQ_TRANSITION_NOTIFIER);
> +
>  	mutex_init(&dbs_data->mutex);
>  	return 0;
>  }
> 
> -static void cs_exit(struct dbs_data *dbs_data)
> +static void cs_exit(struct dbs_data *dbs_data, bool notify)
>  {
> +	if (notify)
> +		cpufreq_unregister_notifier(&cs_cpufreq_notifier_block,
> +					    CPUFREQ_TRANSITION_NOTIFIER);
> +
>  	kfree(dbs_data->tuners);
>  }
> 
>  define_get_cpu_dbs_routines(cs_cpu_dbs_info);
> 
> -static struct notifier_block cs_cpufreq_notifier_block = {
> -	.notifier_call = dbs_cpufreq_notifier,
> -};
> -
> -static struct cs_ops cs_ops = {
> -	.notifier_block = &cs_cpufreq_notifier_block,
> -};
> -
>  static struct common_dbs_data cs_dbs_cdata = {
>  	.governor = GOV_CONSERVATIVE,
>  	.attr_group_gov_sys = &cs_attr_group_gov_sys,
> @@ -363,7 +368,6 @@ static struct common_dbs_data cs_dbs_cdata = {
>  	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
>  	.gov_dbs_timer = cs_dbs_timer,
>  	.gov_check_cpu = cs_check_cpu,
> -	.gov_ops = &cs_ops,
>  	.init = cs_init,
>  	.exit = cs_exit,
>  };
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 1b44496b2d2b..d64a82e6481a 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -278,7 +278,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> 
>  		dbs_data->cdata = cdata;
>  		dbs_data->usage_count = 1;
> -		rc = cdata->init(dbs_data);
> +		rc = cdata->init(dbs_data, !policy->governor->initialized);
>  		if (rc) {
>  			pr_err("%s: POLICY_INIT: init() failed\n", __func__);
>  			kfree(dbs_data);
> @@ -291,7 +291,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>  		rc = sysfs_create_group(get_governor_parent_kobj(policy),
>  				get_sysfs_attr(dbs_data));
>  		if (rc) {
> -			cdata->exit(dbs_data);
> +			cdata->exit(dbs_data, !policy->governor->initialized);
>  			kfree(dbs_data);
>  			return rc;
>  		}
> @@ -309,14 +309,6 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>  		set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
>  					latency * LATENCY_MULTIPLIER));
> 
> -		if ((cdata->governor == GOV_CONSERVATIVE) &&
> -				(!policy->governor->initialized)) {
> -			struct cs_ops *cs_ops = dbs_data->cdata->gov_ops;
> -
> -			cpufreq_register_notifier(cs_ops->notifier_block,
> -					CPUFREQ_TRANSITION_NOTIFIER);
> -		}
> -
>  		if (!have_governor_per_policy())
>  			cdata->gdbs_data = dbs_data;
> 
> @@ -329,15 +321,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>  			if (!have_governor_per_policy())
>  				cpufreq_put_global_kobject();
> 
> -			if ((dbs_data->cdata->governor == GOV_CONSERVATIVE) &&
> -				(policy->governor->initialized == 1)) {
> -				struct cs_ops *cs_ops = dbs_data->cdata->gov_ops;
> -
> -				cpufreq_unregister_notifier(cs_ops->notifier_block,
> -						CPUFREQ_TRANSITION_NOTIFIER);
> -			}
> -
> -			cdata->exit(dbs_data);
> +			cdata->exit(dbs_data, policy->governor->initialized == 1);
>  			kfree(dbs_data);
>  			cdata->gdbs_data = NULL;

I don't see why we need the check on policy->governor->initialized
because we call cdata->init() and cdata->exit(), *only* when the first
and last references to the governor are being made respectively
(filtered by dbs_data->usage_count), which is precisely what the
initialized flag checks. So passing policy->governor->initialized seems
to be redundant? And this is the case for both gov_per_policy and otherwise.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] cpufreq: governor: register notifier from cs_init()
  2015-06-04  5:38   ` Preeti U Murthy
@ 2015-06-04  6:02     ` Viresh Kumar
  2015-06-04  7:33       ` Preeti U Murthy
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-04  6:02 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 04-06-15, 11:08, Preeti U Murthy wrote:
> I don't see why we need the check on policy->governor->initialized
> because we call cdata->init() and cdata->exit(), *only* when the first
> and last references to the governor are being made respectively
> (filtered by dbs_data->usage_count), which is precisely what the
> initialized flag checks. So passing policy->governor->initialized seems
> to be redundant? And this is the case for both gov_per_policy and otherwise.

That's the case only for !gov_per_policy.

In case of gov_per_policy, the same governor is used by multiple
policies but with different dbs_data objects. And in this case INIT
will be called only once for a dbs_data and so usage count will be max
1 at any time. But we want to register the notifiers for a governor
only once and so we need this extra check.

policy->governor->initialized is set to 1 when the governor is
initialized for the first policy. For all others it is incremented to
other values.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  5:14 ` [PATCH 0/3] cpufreq: governor: Fix potential races Preeti U Murthy
@ 2015-06-04  6:08   ` Preeti U Murthy
  2015-06-04  6:11     ` Viresh Kumar
  2015-06-05  3:00   ` Viresh Kumar
  1 sibling, 1 reply; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  6:08 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/04/2015 10:44 AM, Preeti U Murthy wrote:
> On 06/03/2015 03:57 PM, Viresh Kumar wrote:
>> Hi Rafael,
>>
>> Preeti recently highlighted [1] some issues in cpufreq core locking with
>> respect to governors. I wanted to solve them after we have simplified
>> the hotplug paths in cpufreq core with my latest patches, but now that
>> she has poked me, I have done some work in that area.
>>
>> I am trying to solve only a part of the bigger problem (in a way that I
>> feel is the right way ahead). The first patches restructures code to
>> make it more readable and the last patch does all the major changes. The
>> logs in that one should be good enough to explain why and what I am
>> doing.
>>
>> The first two shouldn't bring any functional change and so can be
>> applied early if you are confident about them.
>>
>> @Preeti: I would like you to test these patches. These should get rid of
>> the crashes you were facing but may generate a WARN() from line 447 of
>> cpufreq_governor.c, if the sequence is wrong. That has to be fixed
>> separately.
>>
>> Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
>>
>> Rebased over: v4.1-rc6
>> Tested-on: ARM dual Cortex -A15 Exynos board.
>>
>> [1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ibm.com
>>
>> Viresh Kumar (3):
>>   cpufreq: governor: register notifier from cs_init()
>>   cpufreq: governor: split cpufreq_governor_dbs()
>>   cpufreq: governor: Serialize governor callbacks
>>
>>  drivers/cpufreq/cpufreq_conservative.c |  28 +--
>>  drivers/cpufreq/cpufreq_governor.c     | 340 ++++++++++++++++++---------------
>>  drivers/cpufreq/cpufreq_governor.h     |  16 +-
>>  drivers/cpufreq/cpufreq_ondemand.c     |   6 +-
>>  4 files changed, 209 insertions(+), 181 deletions(-)
>>
> 
> I did a hotplug test on a single core alongside changing governors
> between ondemand and conservative on the same core. The policy is per
> core on powerpc. Within a second of that run the kernel panics. The
> backtrace is below:
> 
> [  165.981836] Unable to handle kernel paging request for data at
> address 0x00000000
> [  165.981929] Faulting instruction address: 0xc00000000053b3e0
> cpu 0x4: Vector: 300 (Data Access) at [c000000fe0b2b880]
>     pc: c00000000053b3e0: __bitmap_weight+0x70/0x100
>     lr: c00000000085a008: need_load_eval+0x38/0xf0
>     sp: c000000fe0b2bb00
>    msr: 9000000100009033
>    dar: 0
>  dsisr: 40000000
>   current = 0xc000000003e4fc90
>   paca    = 0xc000000007da2600	 softe: 0	 irq_happened: 0x01
>     pid   = 812, comm = kworker/4:2
> enter ? for help
> [c000000fe0b2bb50] c00000000085a008 need_load_eval+0x38/0xf0
> [c000000fe0b2bb80] c00000000085815c cs_dbs_timer+0xdc/0x150
> [c000000fe0b2bbe0] c0000000000f489c process_one_work+0x24c/0x910
> [c000000fe0b2bc90] c0000000000f50dc worker_thread+0x17c/0x540
> [c000000fe0b2bd20] c0000000000fed70 kthread+0x120/0x140
> [c000000fe0b2be30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
> 
> The crash is the same as was reported at
> http://www.gossamer-threads.com/lists/linux/kernel/2186336.
> 
> Regards
> Preeti U Murthy

And a crash at the cpufreq worker thread again due to data access
exception when I change governors in parallel on a single core.

cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0]
    pc: c000000000856750: od_dbs_timer+0x60/0x1e0
    lr: c0000000000f489c: process_one_work+0x24c/0x910
    sp: c000000fedb53b70
   msr: 9000000100009033
   dar: 10
 dsisr: 40000000
  current = 0xc000000fe3d128e0
  paca    = 0xc000000007da1c80	 softe: 0	 irq_happened: 0x01
    pid   = 17227, comm = kworker/3:1

With the backtrace being:

[c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910
[c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540
[c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140
[c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64

But the kernel stays sane longer than before with the patchset. The
above crash happens around 15 seconds after the test begins, while
earlier it wouldn't survive 2 seconds even.

Regards
Preeti U Murthy
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  6:08   ` Preeti U Murthy
@ 2015-06-04  6:11     ` Viresh Kumar
  2015-06-04  6:36       ` Preeti U Murthy
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-04  6:11 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 04-06-15, 11:38, Preeti U Murthy wrote:
> And a crash at the cpufreq worker thread again due to data access
> exception when I change governors in parallel on a single core.
> 
> cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0]
>     pc: c000000000856750: od_dbs_timer+0x60/0x1e0
>     lr: c0000000000f489c: process_one_work+0x24c/0x910
>     sp: c000000fedb53b70
>    msr: 9000000100009033
>    dar: 10
>  dsisr: 40000000
>   current = 0xc000000fe3d128e0
>   paca    = 0xc000000007da1c80	 softe: 0	 irq_happened: 0x01
>     pid   = 17227, comm = kworker/3:1
> 
> With the backtrace being:
> 
> [c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910
> [c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540
> [c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140
> [c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
> 
> But the kernel stays sane longer than before with the patchset. The
> above crash happens around 15 seconds after the test begins, while
> earlier it wouldn't survive 2 seconds even.

I haven't attempted to solve the race between the worker threads and
governor-callbacks yet. What I have tried to solve is the race between
different callbacks. And you shouldn't see a race there for now. For
example a race between INIT/EXIT/START/STOP/LIMITS.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  6:11     ` Viresh Kumar
@ 2015-06-04  6:36       ` Preeti U Murthy
  2015-06-04  6:42         ` Viresh Kumar
  0 siblings, 1 reply; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  6:36 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 06/04/2015 11:41 AM, Viresh Kumar wrote:
> On 04-06-15, 11:38, Preeti U Murthy wrote:
>> And a crash at the cpufreq worker thread again due to data access
>> exception when I change governors in parallel on a single core.
>>
>> cpu 0x3: Vector: 300 (Data Access) at [c000000fedb538f0]
>>     pc: c000000000856750: od_dbs_timer+0x60/0x1e0
>>     lr: c0000000000f489c: process_one_work+0x24c/0x910
>>     sp: c000000fedb53b70
>>    msr: 9000000100009033
>>    dar: 10
>>  dsisr: 40000000
>>   current = 0xc000000fe3d128e0
>>   paca    = 0xc000000007da1c80	 softe: 0	 irq_happened: 0x01
>>     pid   = 17227, comm = kworker/3:1
>>
>> With the backtrace being:
>>
>> [c000000fedb53be0] c0000000000f489c process_one_work+0x24c/0x910
>> [c000000fedb53c90] c0000000000f50dc worker_thread+0x17c/0x540
>> [c000000fedb53d20] c0000000000fed70 kthread+0x120/0x140
>> [c000000fedb53e30] c000000000009678 ret_from_kernel_thread+0x5c/0x64
>>
>> But the kernel stays sane longer than before with the patchset. The
>> above crash happens around 15 seconds after the test begins, while
>> earlier it wouldn't survive 2 seconds even.
> 
> I haven't attempted to solve the race between the worker threads and
> governor-callbacks yet. What I have tried to solve is the race between
> different callbacks. And you shouldn't see a race there for now. For
> example a race between INIT/EXIT/START/STOP/LIMITS.

Your fix may not be complete and here is why. The reason we see the crash
is because we have *only* attempted to serialize calls to cpufreq_governor_dbs()
and not attempted to serialize *entire logical sequence of operations*. Let's
take a look at what is happening as a consequence.

CPU0                                          CPU1
store_scaling_governor()                   __cpufreq_remove_dev_finish()
 __cpufreq_governor(CPUFREQ_GOV_STOP)       __cpufreq_governor(CPUFREQ_GOV_START)
   policy->governor_enabled = false                    
   cpufreq_governor_dbs()                     policy->governor_enabled = true
    mutex_lock()
     gov_cancel_work()                        cpufreq_governor_dbs()
                                               wait on lock

      may call gov_queue_work()
       if (!policy->enabled) : fails
         and we end up queuing work

    mutex_unlock()                              
                                              mutex_lock()
                                              gov_queue_work()
                                              mutex_unlock()
 
__cpufreq_governor(CPUFREQ_GOV_POLICY_EXIT)
  mutex_lock()
    cpufreq_governor_dbs()
        kfree(dbs_data)
                                           
                                               timer fires and od_dbs_timer/cs_dbs_timer() runs
                                               References governor data structures which are freed                                          

The issue as I see it is one set of operations must be allowed to run *completely* 
before another begins. When store_scaling_governor() says STOP, all governor operations
must be stopped, till the time store_scaling_governor() itself gives permission
to restart. Somebody else, in this case __cpufreq_remove_dev_finish() cannot overrule
this if it arrives late. This is what is happening above.

So if store_scaling_governor() arrives first, STOP|EXIT|START|LIMIT must complete before
START|LIMIT of __cpufreq_remove_dev_finish() is allowed to run. So it is just not about
serializing, its about *what needs to be serialized*.

Regards
Preeti U Murthy
                                       
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  6:36       ` Preeti U Murthy
@ 2015-06-04  6:42         ` Viresh Kumar
  2015-06-04  7:04           ` Preeti U Murthy
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-04  6:42 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 04-06-15, 12:06, Preeti U Murthy wrote:
> Your fix may not be complete and here is why. The reason we see the crash
> is because we have *only* attempted to serialize calls to cpufreq_governor_dbs()
> and not attempted to serialize *entire logical sequence of operations*. Let's
> take a look at what is happening as a consequence.

You missed my logs (For the first time in my life I wrote them so well).
This is what I mentioned in 3/3:

"
These two issues need to be solved separately and the responsibility
should be properly divided between cpufreq and governor core.

The first problem is more about the governor core, as it needs to
protect its structures properly. And the second problem should be fixed
in cpufreq core instead of governor, as its all about sequence of
events.

This patch is trying to solve only the first problem.
"

I NEVER claimed that I solved all the issues.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  6:42         ` Viresh Kumar
@ 2015-06-04  7:04           ` Preeti U Murthy
  2015-06-04  7:13             ` Viresh Kumar
  0 siblings, 1 reply; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  7:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 06/04/2015 12:12 PM, Viresh Kumar wrote:
> On 04-06-15, 12:06, Preeti U Murthy wrote:
>> Your fix may not be complete and here is why. The reason we see the crash
>> is because we have *only* attempted to serialize calls to cpufreq_governor_dbs()
>> and not attempted to serialize *entire logical sequence of operations*. Let's
>> take a look at what is happening as a consequence.
> 
> You missed my logs (For the first time in my life I wrote them so well).
> This is what I mentioned in 3/3:
> 
> "
> These two issues need to be solved separately and the responsibility
> should be properly divided between cpufreq and governor core.

> 
> The first problem is more about the governor core, as it needs to
> protect its structures properly. And the second problem should be fixed
> in cpufreq core instead of governor, as its all about sequence of
> events.

My point is do we really need to treat them as separate problems ?
Will not serializing sequence of events help solve both issues ?

When we know the problem, why not fix it proper, rather than breaking it
up ?
> 
> This patch is trying to solve only the first problem.
> "
> 
> I NEVER claimed that I solved all the issues.

That is true. My intention was to point out explicitly what still
remains to be solved. It is true that you have mentioned that the
problem in the cpufreq core is about sequencing of events. I intended to
highlight what it was.

I would have restrained from pointing it out had the issues that I am
seeing waned a wee bit, but it has not, which is why I did not see value
in having the third patch as a stand alone patch, with more going in as
series.

Regards
Preeti U Murthy
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  7:04           ` Preeti U Murthy
@ 2015-06-04  7:13             ` Viresh Kumar
  2015-06-04  7:27               ` Preeti U Murthy
  0 siblings, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-04  7:13 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 04-06-15, 12:34, Preeti U Murthy wrote:
> My point is do we really need to treat them as separate problems ?

Yes.

> Will not serializing sequence of events help solve both issues ?

That's not the point. Even if it solves the problem, it may not be the
right approach. There are two problems here:
- One lies in cpufreq.c or in policies domain.
- Other one is about governor. Governor code shouldn't rely of
  cpufreq.c locking to guarantee that access to its structures isn't
  racy.

And the way you proposed to solve it (yes the original idea was from
one of my earlier patches) is not the right way to do it.

For example, cpufreq_set_policy() shouldn't care about how the
governor code is placed. It should just do enough to get rid of racy
code belonging to that policy.

But with our other approach, we are trying to stop the governor to be
used by anyone else in the kernel. Who the hell is that 'policy' to
decide who can access the governor ?

That's why I divided it up, so that we don't come to it again. I
haven't learnt these things earlier, and wrote messy locking code
earlier. But after looking/working on core code like timers etc, I
understood the importance of right design and proper partitioning of
responsibilities.

> When we know the problem, why not fix it proper, rather than breaking it
> up ?

What are we breaking here ?

> > This patch is trying to solve only the first problem.
> > "
> > 
> > I NEVER claimed that I solved all the issues.
> 
> That is true. My intention was to point out explicitly what still
> remains to be solved. It is true that you have mentioned that the
> problem in the cpufreq core is about sequencing of events. I intended to
> highlight what it was.

Okay, that should be the next target we have to fix after applying
these patches.

> I would have restrained from pointing it out had the issues that I am
> seeing waned a wee bit, but it has not, which is why I did not see value
> in having the third patch as a stand alone patch, with more going in as
> series.

Its solving what it is intended to solve. But we need more patches
over it to fix problems outside the scope of governors.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  7:13             ` Viresh Kumar
@ 2015-06-04  7:27               ` Preeti U Murthy
  0 siblings, 0 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  7:27 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 06/04/2015 12:43 PM, Viresh Kumar wrote:
> On 04-06-15, 12:34, Preeti U Murthy wrote:
>> My point is do we really need to treat them as separate problems ?
> 
> Yes.
> 
>> Will not serializing sequence of events help solve both issues ?
> 
> That's not the point. Even if it solves the problem, it may not be the
> right approach. There are two problems here:
> - One lies in cpufreq.c or in policies domain.
> - Other one is about governor. Governor code shouldn't rely of
>   cpufreq.c locking to guarantee that access to its structures isn't
>   racy.
> 
> And the way you proposed to solve it (yes the original idea was from
> one of my earlier patches) is not the right way to do it.
> 
> For example, cpufreq_set_policy() shouldn't care about how the
> governor code is placed. It should just do enough to get rid of racy
> code belonging to that policy.
> 
> But with our other approach, we are trying to stop the governor to be
> used by anyone else in the kernel. Who the hell is that 'policy' to
> decide who can access the governor ?
> 
> That's why I divided it up, so that we don't come to it again. I
> haven't learnt these things earlier, and wrote messy locking code
> earlier. But after looking/working on core code like timers etc, I
> understood the importance of right design and proper partitioning of
> responsibilities.

Ok, fair enough.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/3] cpufreq: governor: register notifier from cs_init()
  2015-06-04  6:02     ` Viresh Kumar
@ 2015-06-04  7:33       ` Preeti U Murthy
  0 siblings, 0 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04  7:33 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 06/04/2015 11:32 AM, Viresh Kumar wrote:
> On 04-06-15, 11:08, Preeti U Murthy wrote:
>> I don't see why we need the check on policy->governor->initialized
>> because we call cdata->init() and cdata->exit(), *only* when the first
>> and last references to the governor are being made respectively
>> (filtered by dbs_data->usage_count), which is precisely what the
>> initialized flag checks. So passing policy->governor->initialized seems
>> to be redundant? And this is the case for both gov_per_policy and otherwise.
> 
> That's the case only for !gov_per_policy.
> 
> In case of gov_per_policy, the same governor is used by multiple
> policies but with different dbs_data objects. And in this case INIT
> will be called only once for a dbs_data and so usage count will be max
> 1 at any time. But we want to register the notifiers for a governor
> only once and so we need this extra check.
> 
> policy->governor->initialized is set to 1 when the governor is
> initialized for the first policy. For all others it is incremented to
> other values.

Ok I see. The patch looks good to me then.

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs()
  2015-06-03 10:27 ` [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs() Viresh Kumar
@ 2015-06-04 10:04   ` Preeti U Murthy
  2015-06-04 10:17     ` Viresh Kumar
  2015-06-04 11:13   ` [PATCH V2 " Viresh Kumar
  1 sibling, 1 reply; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04 10:04 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/03/2015 03:57 PM, Viresh Kumar wrote:
> cpufreq_governor_dbs() is hardly readable, it is just too big and
> complicated. Lets make it more readable by splitting out event specific
> routines.
> 
> Order of statements is changed at few places, but that shouldn't bring
> any functional change.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Best way to verify the changes here is to keep both copies of code side
> by side and comparing it event wise.
> 
>  drivers/cpufreq/cpufreq_governor.c | 326 +++++++++++++++++++++----------------
>  1 file changed, 185 insertions(+), 141 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index d64a82e6481a..dc382a5a2158 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -239,195 +239,239 @@ static void set_sampling_rate(struct dbs_data *dbs_data,
>  	}
>  }
> 
> -int cpufreq_governor_dbs(struct cpufreq_policy *policy,
> -		struct common_dbs_data *cdata, unsigned int event)
> +static int cpufreq_governor_init(struct cpufreq_policy *policy,
> +				 struct dbs_data *dbs_data,
> +				 struct common_dbs_data *cdata)
>  {
> -	struct dbs_data *dbs_data;
> -	struct od_cpu_dbs_info_s *od_dbs_info = NULL;
> -	struct cs_cpu_dbs_info_s *cs_dbs_info = NULL;
> -	struct od_ops *od_ops = NULL;
> -	struct od_dbs_tuners *od_tuners = NULL;
> -	struct cs_dbs_tuners *cs_tuners = NULL;
> -	struct cpu_dbs_common_info *cpu_cdbs;
> -	unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu;
> -	int io_busy = 0;
> -	int rc;
> +	unsigned int latency;
> +	int ret;
> 
> -	if (have_governor_per_policy())
> -		dbs_data = policy->governor_data;
> -	else
> -		dbs_data = cdata->gdbs_data;
> +	if (dbs_data) {
> +		WARN_ON(have_governor_per_policy());

Shouldn't this be outside this loop ? We warn here and allocate dbs_dta
freshly in the current code for the case where governor is per policy.

> +		dbs_data->usage_count++;

Besides, in the case where a governor exists per policy, we will end up
incrementing the usage_count to more than 1 under this condition, which
does not make sense.

> +		policy->governor_data = dbs_data;
> +		return 0;
> +	}

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs()
  2015-06-04 10:04   ` Preeti U Murthy
@ 2015-06-04 10:17     ` Viresh Kumar
  0 siblings, 0 replies; 24+ messages in thread
From: Viresh Kumar @ 2015-06-04 10:17 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 04-06-15, 15:34, Preeti U Murthy wrote:
> > +	if (dbs_data) {
> > +		WARN_ON(have_governor_per_policy());
> 
> Shouldn't this be outside this loop ? We warn here and allocate dbs_dta

Loop ? Its just an 'if' block :)

> freshly in the current code for the case where governor is per policy.

So what we are doing in the current code is equally disgusting. We
already have a pointer and we overwrite it.

> 
> > +		dbs_data->usage_count++;
> 
> Besides, in the case where a governor exists per policy, we will end up
> incrementing the usage_count to more than 1 under this condition, which
> does not make sense.

So, the only sane option here is to return an error immediately I
think.

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index ed849a8777dd..57a39f8a92b7 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -247,7 +247,8 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 	int ret;
 
 	if (dbs_data) {
-		WARN_ON(have_governor_per_policy());
+		if (WARN_ON(have_governor_per_policy()))
+			return -EINVAL;
 		dbs_data->usage_count++;
 		policy->governor_data = dbs_data;
 		return 0;
@@ -276,24 +277,28 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 					latency * LATENCY_MULTIPLIER));
 
 	if (!have_governor_per_policy()) {
-		WARN_ON(cpufreq_get_global_kobject());
+		if (WARN_ON(cpufreq_get_global_kobject())) {
+			ret = -EINVAL;
+			goto cdata_exit;
+		}
 		cdata->gdbs_data = dbs_data;
 	}
 
 	ret = sysfs_create_group(get_governor_parent_kobj(policy),
 				 get_sysfs_attr(dbs_data));
 	if (ret)
-		goto cdata_exit;
+		goto put_kobj;
 
 	policy->governor_data = dbs_data;
 
 	return 0;
 
-cdata_exit:
+put_kobj:
 	if (!have_governor_per_policy()) {
 		cdata->gdbs_data = NULL;
 		cpufreq_put_global_kobject();
 	}
+cdata_exit:
 	cdata->exit(dbs_data, !policy->governor->initialized);
 free_dbs_data:
 	kfree(dbs_data);

-- 
viresh

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/3] cpufreq: governor: Serialize governor callbacks
  2015-06-03 10:27 ` [PATCH 3/3] cpufreq: governor: Serialize governor callbacks Viresh Kumar
@ 2015-06-04 10:47   ` Preeti U Murthy
  0 siblings, 0 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-04 10:47 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/03/2015 03:57 PM, Viresh Kumar wrote:
> There are several races reported in cpufreq core around governors (only
> ondemand and conservative) by different people.
> 
> There are at least two race scenarios present in governor code:
>  (a) Concurrent access/updates of governor internal structures.
> 
>  It is possible that fields such as 'dbs_data->usage_count', etc.  are
>  accessed simultaneously for different policies using same governor
>  structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
>  because of this we can dereference bad pointers.
> 
>  For example consider a system with two CPUs with separate 'struct
>  cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
>  CPU0 switching to powersave and CPU1 to ondemand:
> 	CPU0				CPU1
> 
> 	store*				store*
> 
> 	cpufreq_governor_exit()		cpufreq_governor_init()
> 					dbs_data = cdata->gdbs_data;
> 
> 	if (!--dbs_data->usage_count)
> 		kfree(dbs_data);
> 
> 					dbs_data->usage_count++;
> 					*Bad pointer dereference*
> 
>  There are other races possible between EXIT and START/STOP/LIMIT as
>  well. Its really complicated.
> 
>  (b) Switching governor state in bad sequence:
> 
>  For example trying to switch a governor to START state, when the
>  governor is in EXIT state. There are some checks present in
>  __cpufreq_governor() but they aren't sufficient as they compare events
>  against 'policy->governor_enabled', where as we need to take governor's
>  state into account, which can be used by multiple policies.
> 
> These two issues need to be solved separately and the responsibility
> should be properly divided between cpufreq and governor core.
> 
> The first problem is more about the governor core, as it needs to
> protect its structures properly. And the second problem should be fixed
> in cpufreq core instead of governor, as its all about sequence of
> events.
> 
> This patch is trying to solve only the first problem.
> 
> There are two types of data we need to protect,
> - 'struct common_dbs_data': No matter what, there is going to be a
>   single copy of this per governor.
> - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
>   will have per-policy copy of this data, otherwise a single copy.
> 
> Because of such complexities, the mutex present in 'struct dbs_data' is
> insufficient to solve our problem. For example we need to protect
> fetching of 'dbs_data' from different structures at the beginning of
> cpufreq_governor_dbs(), to make sure it isn't currently being updated.
> 
> This can be fixed if we can guarantee serialization of event parsing
> code for an individual governor. This is best solved with a mutex per
> governor, and the placeholder for that is 'struct common_dbs_data'.
> 
> And so this patch moves the mutex from 'struct dbs_data' to 'struct
> common_dbs_data' and takes it at the beginning and drops it at the end
> of cpufreq_governor_dbs().
> 
> Tested with and without following configuration options:
> 
> CONFIG_LOCKDEP_SUPPORT=y
> CONFIG_DEBUG_RT_MUTEXES=y
> CONFIG_DEBUG_PI_LIST=y
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_MUTEXES=y
> CONFIG_DEBUG_LOCK_ALLOC=y
> CONFIG_PROVE_LOCKING=y
> CONFIG_LOCKDEP=y
> CONFIG_DEBUG_ATOMIC_SLEEP=y
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH V2 2/3] cpufreq: governor: split cpufreq_governor_dbs()
  2015-06-03 10:27 ` [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs() Viresh Kumar
  2015-06-04 10:04   ` Preeti U Murthy
@ 2015-06-04 11:13   ` Viresh Kumar
  2015-06-05  2:51     ` Preeti U Murthy
  1 sibling, 1 reply; 24+ messages in thread
From: Viresh Kumar @ 2015-06-04 11:13 UTC (permalink / raw)
  To: Rafael Wysocki, Preeti U Murthy
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan, Viresh Kumar

cpufreq_governor_dbs() is hardly readable, it is just too big and
complicated. Lets make it more readable by splitting out event specific
routines.

Order of statements is changed at few places, but that shouldn't bring
any functional change.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
V1->V2: Return errors after hitting WARN_ON. (Preeti)

 drivers/cpufreq/cpufreq_governor.c | 329 +++++++++++++++++++++----------------
 1 file changed, 189 insertions(+), 140 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index d64a82e6481a..ccf6ce7e5983 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -239,195 +239,244 @@ static void set_sampling_rate(struct dbs_data *dbs_data,
 	}
 }
 
-int cpufreq_governor_dbs(struct cpufreq_policy *policy,
-		struct common_dbs_data *cdata, unsigned int event)
+static int cpufreq_governor_init(struct cpufreq_policy *policy,
+				 struct dbs_data *dbs_data,
+				 struct common_dbs_data *cdata)
 {
-	struct dbs_data *dbs_data;
-	struct od_cpu_dbs_info_s *od_dbs_info = NULL;
-	struct cs_cpu_dbs_info_s *cs_dbs_info = NULL;
-	struct od_ops *od_ops = NULL;
-	struct od_dbs_tuners *od_tuners = NULL;
-	struct cs_dbs_tuners *cs_tuners = NULL;
-	struct cpu_dbs_common_info *cpu_cdbs;
-	unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu;
-	int io_busy = 0;
-	int rc;
+	unsigned int latency;
+	int ret;
 
-	if (have_governor_per_policy())
-		dbs_data = policy->governor_data;
-	else
-		dbs_data = cdata->gdbs_data;
+	if (dbs_data) {
+		if (WARN_ON(have_governor_per_policy()))
+			return -EINVAL;
+		dbs_data->usage_count++;
+		policy->governor_data = dbs_data;
+		return 0;
+	}
 
-	WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT));
+	dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL);
+	if (!dbs_data)
+		return -ENOMEM;
 
-	switch (event) {
-	case CPUFREQ_GOV_POLICY_INIT:
-		if (have_governor_per_policy()) {
-			WARN_ON(dbs_data);
-		} else if (dbs_data) {
-			dbs_data->usage_count++;
-			policy->governor_data = dbs_data;
-			return 0;
-		}
+	dbs_data->cdata = cdata;
+	dbs_data->usage_count = 1;
 
-		dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL);
-		if (!dbs_data) {
-			pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__);
-			return -ENOMEM;
-		}
+	ret = cdata->init(dbs_data, !policy->governor->initialized);
+	if (ret)
+		goto free_dbs_data;
 
-		dbs_data->cdata = cdata;
-		dbs_data->usage_count = 1;
-		rc = cdata->init(dbs_data, !policy->governor->initialized);
-		if (rc) {
-			pr_err("%s: POLICY_INIT: init() failed\n", __func__);
-			kfree(dbs_data);
-			return rc;
-		}
+	/* policy latency is in ns. Convert it to us first */
+	latency = policy->cpuinfo.transition_latency / 1000;
+	if (latency == 0)
+		latency = 1;
 
-		if (!have_governor_per_policy())
-			WARN_ON(cpufreq_get_global_kobject());
+	/* Bring kernel and HW constraints together */
+	dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
+					  MIN_LATENCY_MULTIPLIER * latency);
+	set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
+					latency * LATENCY_MULTIPLIER));
 
-		rc = sysfs_create_group(get_governor_parent_kobj(policy),
-				get_sysfs_attr(dbs_data));
-		if (rc) {
-			cdata->exit(dbs_data, !policy->governor->initialized);
-			kfree(dbs_data);
-			return rc;
+	if (!have_governor_per_policy()) {
+		if (WARN_ON(cpufreq_get_global_kobject())) {
+			ret = -EINVAL;
+			goto cdata_exit;
 		}
+		cdata->gdbs_data = dbs_data;
+	}
 
-		policy->governor_data = dbs_data;
+	ret = sysfs_create_group(get_governor_parent_kobj(policy),
+				 get_sysfs_attr(dbs_data));
+	if (ret)
+		goto put_kobj;
 
-		/* policy latency is in ns. Convert it to us first */
-		latency = policy->cpuinfo.transition_latency / 1000;
-		if (latency == 0)
-			latency = 1;
+	policy->governor_data = dbs_data;
 
-		/* Bring kernel and HW constraints together */
-		dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
-				MIN_LATENCY_MULTIPLIER * latency);
-		set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
-					latency * LATENCY_MULTIPLIER));
+	return 0;
 
-		if (!have_governor_per_policy())
-			cdata->gdbs_data = dbs_data;
+put_kobj:
+	if (!have_governor_per_policy()) {
+		cdata->gdbs_data = NULL;
+		cpufreq_put_global_kobject();
+	}
+cdata_exit:
+	cdata->exit(dbs_data, !policy->governor->initialized);
+free_dbs_data:
+	kfree(dbs_data);
+	return ret;
+}
 
-		return 0;
-	case CPUFREQ_GOV_POLICY_EXIT:
-		if (!--dbs_data->usage_count) {
-			sysfs_remove_group(get_governor_parent_kobj(policy),
-					get_sysfs_attr(dbs_data));
+static void cpufreq_governor_exit(struct cpufreq_policy *policy,
+				  struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
 
-			if (!have_governor_per_policy())
-				cpufreq_put_global_kobject();
+	policy->governor_data = NULL;
+	if (!--dbs_data->usage_count) {
+		sysfs_remove_group(get_governor_parent_kobj(policy),
+				   get_sysfs_attr(dbs_data));
 
-			cdata->exit(dbs_data, policy->governor->initialized == 1);
-			kfree(dbs_data);
+		if (!have_governor_per_policy()) {
 			cdata->gdbs_data = NULL;
+			cpufreq_put_global_kobject();
 		}
 
-		policy->governor_data = NULL;
-		return 0;
+		cdata->exit(dbs_data, policy->governor->initialized == 1);
+		kfree(dbs_data);
 	}
+}
 
-	cpu_cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+static int cpufreq_governor_start(struct cpufreq_policy *policy,
+				  struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
+	unsigned int sampling_rate, ignore_nice, j, cpu = policy->cpu;
+	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
+	int io_busy = 0;
+
+	if (!policy->cur)
+		return -EINVAL;
+
+	if (cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 
-	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
-		cs_tuners = dbs_data->tuners;
-		cs_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
 		sampling_rate = cs_tuners->sampling_rate;
 		ignore_nice = cs_tuners->ignore_nice_load;
 	} else {
-		od_tuners = dbs_data->tuners;
-		od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
+		struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+
 		sampling_rate = od_tuners->sampling_rate;
 		ignore_nice = od_tuners->ignore_nice_load;
-		od_ops = dbs_data->cdata->gov_ops;
 		io_busy = od_tuners->io_is_busy;
 	}
 
-	switch (event) {
-	case CPUFREQ_GOV_START:
-		if (!policy->cur)
-			return -EINVAL;
+	mutex_lock(&dbs_data->mutex);
 
-		mutex_lock(&dbs_data->mutex);
+	for_each_cpu(j, policy->cpus) {
+		struct cpu_dbs_common_info *j_cdbs = cdata->get_cpu_cdbs(j);
+		unsigned int prev_load;
 
-		for_each_cpu(j, policy->cpus) {
-			struct cpu_dbs_common_info *j_cdbs =
-				dbs_data->cdata->get_cpu_cdbs(j);
-			unsigned int prev_load;
+		j_cdbs->cpu = j;
+		j_cdbs->cur_policy = policy;
+		j_cdbs->prev_cpu_idle =
+			get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy);
 
-			j_cdbs->cpu = j;
-			j_cdbs->cur_policy = policy;
-			j_cdbs->prev_cpu_idle = get_cpu_idle_time(j,
-					       &j_cdbs->prev_cpu_wall, io_busy);
+		prev_load = (unsigned int)(j_cdbs->prev_cpu_wall -
+					    j_cdbs->prev_cpu_idle);
+		j_cdbs->prev_load = 100 * prev_load /
+				    (unsigned int)j_cdbs->prev_cpu_wall;
 
-			prev_load = (unsigned int)
-				(j_cdbs->prev_cpu_wall - j_cdbs->prev_cpu_idle);
-			j_cdbs->prev_load = 100 * prev_load /
-					(unsigned int) j_cdbs->prev_cpu_wall;
+		if (ignore_nice)
+			j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
-			if (ignore_nice)
-				j_cdbs->prev_cpu_nice =
-					kcpustat_cpu(j).cpustat[CPUTIME_NICE];
+		mutex_init(&j_cdbs->timer_mutex);
+		INIT_DEFERRABLE_WORK(&j_cdbs->work, cdata->gov_dbs_timer);
+	}
 
-			mutex_init(&j_cdbs->timer_mutex);
-			INIT_DEFERRABLE_WORK(&j_cdbs->work,
-					     dbs_data->cdata->gov_dbs_timer);
-		}
+	if (cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_cpu_dbs_info_s *cs_dbs_info =
+			cdata->get_cpu_dbs_info_s(cpu);
 
-		if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
-			cs_dbs_info->down_skip = 0;
-			cs_dbs_info->enable = 1;
-			cs_dbs_info->requested_freq = policy->cur;
-		} else {
-			od_dbs_info->rate_mult = 1;
-			od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
-			od_ops->powersave_bias_init_cpu(cpu);
-		}
+		cs_dbs_info->down_skip = 0;
+		cs_dbs_info->enable = 1;
+		cs_dbs_info->requested_freq = policy->cur;
+	} else {
+		struct od_ops *od_ops = cdata->gov_ops;
+		struct od_cpu_dbs_info_s *od_dbs_info = cdata->get_cpu_dbs_info_s(cpu);
 
-		mutex_unlock(&dbs_data->mutex);
+		od_dbs_info->rate_mult = 1;
+		od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
+		od_ops->powersave_bias_init_cpu(cpu);
+	}
 
-		/* Initiate timer time stamp */
-		cpu_cdbs->time_stamp = ktime_get();
+	mutex_unlock(&dbs_data->mutex);
 
-		gov_queue_work(dbs_data, policy,
-				delay_for_sampling_rate(sampling_rate), true);
-		break;
+	/* Initiate timer time stamp */
+	cpu_cdbs->time_stamp = ktime_get();
 
-	case CPUFREQ_GOV_STOP:
-		if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
-			cs_dbs_info->enable = 0;
+	gov_queue_work(dbs_data, policy, delay_for_sampling_rate(sampling_rate),
+		       true);
+	return 0;
+}
+
+static void cpufreq_governor_stop(struct cpufreq_policy *policy,
+				  struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
+	unsigned int cpu = policy->cpu;
+	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
+
+	if (cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_cpu_dbs_info_s *cs_dbs_info =
+			cdata->get_cpu_dbs_info_s(cpu);
 
-		gov_cancel_work(dbs_data, policy);
+		cs_dbs_info->enable = 0;
+	}
+
+	gov_cancel_work(dbs_data, policy);
+
+	mutex_lock(&dbs_data->mutex);
+	mutex_destroy(&cpu_cdbs->timer_mutex);
+	cpu_cdbs->cur_policy = NULL;
+	mutex_unlock(&dbs_data->mutex);
+}
 
-		mutex_lock(&dbs_data->mutex);
-		mutex_destroy(&cpu_cdbs->timer_mutex);
-		cpu_cdbs->cur_policy = NULL;
+static void cpufreq_governor_limits(struct cpufreq_policy *policy,
+				    struct dbs_data *dbs_data)
+{
+	struct common_dbs_data *cdata = dbs_data->cdata;
+	unsigned int cpu = policy->cpu;
+	struct cpu_dbs_common_info *cpu_cdbs = cdata->get_cpu_cdbs(cpu);
 
+	mutex_lock(&dbs_data->mutex);
+	if (!cpu_cdbs->cur_policy) {
 		mutex_unlock(&dbs_data->mutex);
+		return;
+	}
 
-		break;
+	mutex_lock(&cpu_cdbs->timer_mutex);
+	if (policy->max < cpu_cdbs->cur_policy->cur)
+		__cpufreq_driver_target(cpu_cdbs->cur_policy, policy->max,
+					CPUFREQ_RELATION_H);
+	else if (policy->min > cpu_cdbs->cur_policy->cur)
+		__cpufreq_driver_target(cpu_cdbs->cur_policy, policy->min,
+					CPUFREQ_RELATION_L);
+	dbs_check_cpu(dbs_data, cpu);
+	mutex_unlock(&cpu_cdbs->timer_mutex);
+
+	mutex_unlock(&dbs_data->mutex);
+}
 
+int cpufreq_governor_dbs(struct cpufreq_policy *policy,
+			 struct common_dbs_data *cdata, unsigned int event)
+{
+	struct dbs_data *dbs_data;
+	int ret = 0;
+
+	if (have_governor_per_policy())
+		dbs_data = policy->governor_data;
+	else
+		dbs_data = cdata->gdbs_data;
+
+	WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT));
+
+	switch (event) {
+	case CPUFREQ_GOV_POLICY_INIT:
+		ret = cpufreq_governor_init(policy, dbs_data, cdata);
+		break;
+	case CPUFREQ_GOV_POLICY_EXIT:
+		cpufreq_governor_exit(policy, dbs_data);
+		break;
+	case CPUFREQ_GOV_START:
+		ret = cpufreq_governor_start(policy, dbs_data);
+		break;
+	case CPUFREQ_GOV_STOP:
+		cpufreq_governor_stop(policy, dbs_data);
+		break;
 	case CPUFREQ_GOV_LIMITS:
-		mutex_lock(&dbs_data->mutex);
-		if (!cpu_cdbs->cur_policy) {
-			mutex_unlock(&dbs_data->mutex);
-			break;
-		}
-		mutex_lock(&cpu_cdbs->timer_mutex);
-		if (policy->max < cpu_cdbs->cur_policy->cur)
-			__cpufreq_driver_target(cpu_cdbs->cur_policy,
-					policy->max, CPUFREQ_RELATION_H);
-		else if (policy->min > cpu_cdbs->cur_policy->cur)
-			__cpufreq_driver_target(cpu_cdbs->cur_policy,
-					policy->min, CPUFREQ_RELATION_L);
-		dbs_check_cpu(dbs_data, cpu);
-		mutex_unlock(&cpu_cdbs->timer_mutex);
-		mutex_unlock(&dbs_data->mutex);
+		cpufreq_governor_limits(policy, dbs_data);
 		break;
 	}
-	return 0;
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(cpufreq_governor_dbs);
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH V2 2/3] cpufreq: governor: split cpufreq_governor_dbs()
  2015-06-04 11:13   ` [PATCH V2 " Viresh Kumar
@ 2015-06-05  2:51     ` Preeti U Murthy
  0 siblings, 0 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-05  2:51 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/04/2015 04:43 PM, Viresh Kumar wrote:
> cpufreq_governor_dbs() is hardly readable, it is just too big and
> complicated. Lets make it more readable by splitting out event specific
> routines.
> 
> Order of statements is changed at few places, but that shouldn't bring
> any functional change.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> V1->V2: Return errors after hitting WARN_ON. (Preeti)

Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-04  5:14 ` [PATCH 0/3] cpufreq: governor: Fix potential races Preeti U Murthy
  2015-06-04  6:08   ` Preeti U Murthy
@ 2015-06-05  3:00   ` Viresh Kumar
  2015-06-05  3:04     ` Preeti U Murthy
  2015-06-05  4:05     ` Preeti U Murthy
  1 sibling, 2 replies; 24+ messages in thread
From: Viresh Kumar @ 2015-06-05  3:00 UTC (permalink / raw)
  To: Preeti U Murthy
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 04-06-15, 10:44, Preeti U Murthy wrote:
> On 06/03/2015 03:57 PM, Viresh Kumar wrote:
> > Preeti recently highlighted [1] some issues in cpufreq core locking with
> > respect to governors. I wanted to solve them after we have simplified
> > the hotplug paths in cpufreq core with my latest patches, but now that
> > she has poked me, I have done some work in that area.
> > 
> > I am trying to solve only a part of the bigger problem (in a way that I
> > feel is the right way ahead). The first patches restructures code to
> > make it more readable and the last patch does all the major changes. The
> > logs in that one should be good enough to explain why and what I am
> > doing.
> > 
> > The first two shouldn't bring any functional change and so can be
> > applied early if you are confident about them.
> > 
> > @Preeti: I would like you to test these patches. These should get rid of
> > the crashes you were facing but may generate a WARN() from line 447 of
> > cpufreq_governor.c, if the sequence is wrong. That has to be fixed
> > separately.
> > 
> > Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))

Hi Preeti,

Thanks for giving your RBY tags for all the patches, would you also
like to give Tested-by's if you have done any testing on these.

That is just to confirm it hasn't broken things any further and that
we haven't seen any crashes in races between
INIT/EXIT/START/STOP/LIMITS.

-- 
viresh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-05  3:00   ` Viresh Kumar
@ 2015-06-05  3:04     ` Preeti U Murthy
  2015-06-05  4:05     ` Preeti U Murthy
  1 sibling, 0 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-05  3:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On 06/05/2015 08:30 AM, Viresh Kumar wrote:
> On 04-06-15, 10:44, Preeti U Murthy wrote:
>> On 06/03/2015 03:57 PM, Viresh Kumar wrote:
>>> Preeti recently highlighted [1] some issues in cpufreq core locking with
>>> respect to governors. I wanted to solve them after we have simplified
>>> the hotplug paths in cpufreq core with my latest patches, but now that
>>> she has poked me, I have done some work in that area.
>>>
>>> I am trying to solve only a part of the bigger problem (in a way that I
>>> feel is the right way ahead). The first patches restructures code to
>>> make it more readable and the last patch does all the major changes. The
>>> logs in that one should be good enough to explain why and what I am
>>> doing.
>>>
>>> The first two shouldn't bring any functional change and so can be
>>> applied early if you are confident about them.
>>>
>>> @Preeti: I would like you to test these patches. These should get rid of
>>> the crashes you were facing but may generate a WARN() from line 447 of
>>> cpufreq_governor.c, if the sequence is wrong. That has to be fixed
>>> separately.
>>>
>>> Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
> 
> Hi Preeti,
> 
> Thanks for giving your RBY tags for all the patches, would you also
> like to give Tested-by's if you have done any testing on these.
> 
> That is just to confirm it hasn't broken things any further and that
> we haven't seen any crashes in races between
> INIT/EXIT/START/STOP/LIMITS.
> 

Let me run a fair bit of tests today and confirm this.

Regards
Preeti U Murthy


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-05  3:00   ` Viresh Kumar
  2015-06-05  3:04     ` Preeti U Murthy
@ 2015-06-05  4:05     ` Preeti U Murthy
  1 sibling, 0 replies; 24+ messages in thread
From: Preeti U Murthy @ 2015-06-05  4:05 UTC (permalink / raw)
  To: Viresh Kumar, Rafael Wysocki
  Cc: linaro-kernel, linux-pm, ego, paulus, shilpa.bhat, prarit,
	robert.schoene, skannan

On 06/05/2015 08:30 AM, Viresh Kumar wrote:
> On 04-06-15, 10:44, Preeti U Murthy wrote:
>> On 06/03/2015 03:57 PM, Viresh Kumar wrote:
>>> Preeti recently highlighted [1] some issues in cpufreq core locking with
>>> respect to governors. I wanted to solve them after we have simplified
>>> the hotplug paths in cpufreq core with my latest patches, but now that
>>> she has poked me, I have done some work in that area.
>>>
>>> I am trying to solve only a part of the bigger problem (in a way that I
>>> feel is the right way ahead). The first patches restructures code to
>>> make it more readable and the last patch does all the major changes. The
>>> logs in that one should be good enough to explain why and what I am
>>> doing.
>>>
>>> The first two shouldn't bring any functional change and so can be
>>> applied early if you are confident about them.
>>>
>>> @Preeti: I would like you to test these patches. These should get rid of
>>> the crashes you were facing but may generate a WARN() from line 447 of
>>> cpufreq_governor.c, if the sequence is wrong. That has to be fixed
>>> separately.
>>>
>>> Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
> 
> Hi Preeti,
> 
> Thanks for giving your RBY tags for all the patches, would you also
> like to give Tested-by's if you have done any testing on these.
> 
> That is just to confirm it hasn't broken things any further and that
> we haven't seen any crashes in races between
> INIT/EXIT/START/STOP/LIMITS.

I realize that I am not currently hitting the races on my machine that
this patchset is trying to solve. The races that I am hitting have been
conveyed on this thread and we should be fixing that separately as
Viresh pointed out. I can therefore provide only by Reviewed-bys to this
patchset.

Thanks

Regards
Preeti U Murthy
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] cpufreq: governor: Fix potential races
  2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
                   ` (3 preceding siblings ...)
  2015-06-04  5:14 ` [PATCH 0/3] cpufreq: governor: Fix potential races Preeti U Murthy
@ 2015-06-15 23:48 ` Rafael J. Wysocki
  4 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2015-06-15 23:48 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Preeti U Murthy, linaro-kernel, linux-pm, ego, paulus,
	shilpa.bhat, prarit, robert.schoene, skannan

On Wednesday, June 03, 2015 03:57:10 PM Viresh Kumar wrote:
> Hi Rafael,
> 
> Preeti recently highlighted [1] some issues in cpufreq core locking with
> respect to governors. I wanted to solve them after we have simplified
> the hotplug paths in cpufreq core with my latest patches, but now that
> she has poked me, I have done some work in that area.
> 
> I am trying to solve only a part of the bigger problem (in a way that I
> feel is the right way ahead). The first patches restructures code to
> make it more readable and the last patch does all the major changes. The
> logs in that one should be good enough to explain why and what I am
> doing.
> 
> The first two shouldn't bring any functional change and so can be
> applied early if you are confident about them.
> 
> @Preeti: I would like you to test these patches. These should get rid of
> the crashes you were facing but may generate a WARN() from line 447 of
> cpufreq_governor.c, if the sequence is wrong. That has to be fixed
> separately.
> 
> Line 447: WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT))
> 
> Rebased over: v4.1-rc6
> Tested-on: ARM dual Cortex -A15 Exynos board.
> 
> [1] http://marc.info/?i=20150601064031.2972.59208.stgit%40perfhull-ltc.austin.ibm.com
> 
> Viresh Kumar (3):
>   cpufreq: governor: register notifier from cs_init()
>   cpufreq: governor: split cpufreq_governor_dbs()
>   cpufreq: governor: Serialize governor callbacks
> 
>  drivers/cpufreq/cpufreq_conservative.c |  28 +--
>  drivers/cpufreq/cpufreq_governor.c     | 340 ++++++++++++++++++---------------
>  drivers/cpufreq/cpufreq_governor.h     |  16 +-
>  drivers/cpufreq/cpufreq_ondemand.c     |   6 +-
>  4 files changed, 209 insertions(+), 181 deletions(-)

I've queued up the series (with the replacement [2/3]) for 4.2, thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-06-15 23:22 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-03 10:27 [PATCH 0/3] cpufreq: governor: Fix potential races Viresh Kumar
2015-06-03 10:27 ` [PATCH 1/3] cpufreq: governor: register notifier from cs_init() Viresh Kumar
2015-06-04  5:38   ` Preeti U Murthy
2015-06-04  6:02     ` Viresh Kumar
2015-06-04  7:33       ` Preeti U Murthy
2015-06-03 10:27 ` [PATCH 2/3] cpufreq: governor: split cpufreq_governor_dbs() Viresh Kumar
2015-06-04 10:04   ` Preeti U Murthy
2015-06-04 10:17     ` Viresh Kumar
2015-06-04 11:13   ` [PATCH V2 " Viresh Kumar
2015-06-05  2:51     ` Preeti U Murthy
2015-06-03 10:27 ` [PATCH 3/3] cpufreq: governor: Serialize governor callbacks Viresh Kumar
2015-06-04 10:47   ` Preeti U Murthy
2015-06-04  5:14 ` [PATCH 0/3] cpufreq: governor: Fix potential races Preeti U Murthy
2015-06-04  6:08   ` Preeti U Murthy
2015-06-04  6:11     ` Viresh Kumar
2015-06-04  6:36       ` Preeti U Murthy
2015-06-04  6:42         ` Viresh Kumar
2015-06-04  7:04           ` Preeti U Murthy
2015-06-04  7:13             ` Viresh Kumar
2015-06-04  7:27               ` Preeti U Murthy
2015-06-05  3:00   ` Viresh Kumar
2015-06-05  3:04     ` Preeti U Murthy
2015-06-05  4:05     ` Preeti U Murthy
2015-06-15 23:48 ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.