All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
@ 2016-02-03 14:02 Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable Viresh Kumar
                   ` (7 more replies)
  0 siblings, 8 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

Hi Rafael,

Here is the V2 with updated patches as suggested by you guys.

These are pushed here:
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/governor-kobject

The first four patches are for 4.5, if possible and others you can keep
for 4.6.

V1->V2:
- Improved changelogs, thanks Rafael.
- Added new dbs_data->mutex to avoid concurrent updates to tunables.
- Moved kobj_type to common_dbs_data.
- Updated macros to static inline routines
- s/show/governor_show
- s/store/governor_store
- Improved comments

@Juri: More testing requested :)

Viresh Kumar (7):
  cpufreq: governor: Treat min_sampling_rate as a governor-specific
    tunable
  cpufreq: governor: New sysfs show/store callbacks for governor
    tunables
  cpufreq: governor: Drop unused macros for creating governor tunable
    attributes
  Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
  cpufreq: Merge cpufreq_offline_prepare/finish routines
  cpufreq: Call __cpufreq_governor() with policy->rwsem held
  cpufreq: Remove cpufreq_governor_lock

 drivers/cpufreq/cpufreq.c              |  93 +++++++++++++----------------
 drivers/cpufreq/cpufreq_conservative.c |  79 +++++++++----------------
 drivers/cpufreq/cpufreq_governor.c     | 105 +++++++++++++++++++++++++--------
 drivers/cpufreq/cpufreq_governor.h     | 104 ++++++++------------------------
 drivers/cpufreq/cpufreq_ondemand.c     |  79 +++++++++----------------
 include/linux/cpufreq.h                |   4 --
 6 files changed, 203 insertions(+), 261 deletions(-)

-- 
2.7.0.79.gdc08a19

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-05  2:31   ` Rafael J. Wysocki
  2016-02-03 14:02 ` [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables Viresh Kumar
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

The min_sampling_rate governor tunable is a field in struct dbs_data, so
it has to be handled in a special way separate from the rest of governor
tunables.  In particular, that requires a special macro to be present
for creating its show/store sysfs attribute callbacks.

However, there is no real need for the data structures and code in
question to be arranged this way and if min_sampling_rate is moved to
data structures holding the other governor tunables, the sysfs attribute
creation macros that work with those tunables will also work with
min_sampling_rate and the special macro for it won't be necessary any
more.  That will make it easier to modify the governor code going
forward, so do it.

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 14 ++++++-------
 drivers/cpufreq/cpufreq_governor.c     | 36 ++++++++++++++++++++--------------
 drivers/cpufreq/cpufreq_governor.h     | 18 ++---------------
 drivers/cpufreq/cpufreq_ondemand.c     | 14 ++++++-------
 4 files changed, 37 insertions(+), 45 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 606ad74abe6e..57750367bd26 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -185,7 +185,7 @@ static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
 	if (ret != 1)
 		return -EINVAL;
 
-	cs_tuners->sampling_rate = max(input, dbs_data->min_sampling_rate);
+	cs_tuners->sampling_rate = max(input, cs_tuners->min_sampling_rate);
 	return count;
 }
 
@@ -281,7 +281,7 @@ show_store_one(cs, up_threshold);
 show_store_one(cs, down_threshold);
 show_store_one(cs, ignore_nice_load);
 show_store_one(cs, freq_step);
-declare_show_sampling_rate_min(cs);
+show_one(cs, min_sampling_rate);
 
 gov_sys_pol_attr_rw(sampling_rate);
 gov_sys_pol_attr_rw(sampling_down_factor);
@@ -289,10 +289,10 @@ gov_sys_pol_attr_rw(up_threshold);
 gov_sys_pol_attr_rw(down_threshold);
 gov_sys_pol_attr_rw(ignore_nice_load);
 gov_sys_pol_attr_rw(freq_step);
-gov_sys_pol_attr_ro(sampling_rate_min);
+gov_sys_pol_attr_ro(min_sampling_rate);
 
 static struct attribute *dbs_attributes_gov_sys[] = {
-	&sampling_rate_min_gov_sys.attr,
+	&min_sampling_rate_gov_sys.attr,
 	&sampling_rate_gov_sys.attr,
 	&sampling_down_factor_gov_sys.attr,
 	&up_threshold_gov_sys.attr,
@@ -308,7 +308,7 @@ static struct attribute_group cs_attr_group_gov_sys = {
 };
 
 static struct attribute *dbs_attributes_gov_pol[] = {
-	&sampling_rate_min_gov_pol.attr,
+	&min_sampling_rate_gov_pol.attr,
 	&sampling_rate_gov_pol.attr,
 	&sampling_down_factor_gov_pol.attr,
 	&up_threshold_gov_pol.attr,
@@ -340,10 +340,10 @@ static int cs_init(struct dbs_data *dbs_data, bool notify)
 	tuners->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR;
 	tuners->ignore_nice_load = 0;
 	tuners->freq_step = DEF_FREQUENCY_STEP;
+	tuners->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
+		jiffies_to_usecs(10);
 
 	dbs_data->tuners = tuners;
-	dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
-		jiffies_to_usecs(10);
 
 	if (notify)
 		cpufreq_register_notifier(&cs_cpufreq_notifier_block,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index e0d111024d48..9a7edc91ad57 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -286,16 +286,32 @@ static void dbs_timer_handler(unsigned long data)
 		queue_work(system_wq, &shared->work);
 }
 
-static void set_sampling_rate(struct dbs_data *dbs_data,
-		unsigned int sampling_rate)
+static void set_sampling_rate(struct cpufreq_policy *policy,
+			      struct dbs_data *dbs_data)
 {
+	unsigned int *sampling_rate;
+	unsigned int *min_sampling_rate;
+	unsigned int latency;
+
+	/* policy latency is in ns. Convert it to us first */
+	latency = policy->cpuinfo.transition_latency / 1000;
+	if (latency == 0)
+		latency = 1;
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
-		cs_tuners->sampling_rate = sampling_rate;
+		sampling_rate = &cs_tuners->sampling_rate;
+		min_sampling_rate = &cs_tuners->min_sampling_rate;
 	} else {
 		struct od_dbs_tuners *od_tuners = dbs_data->tuners;
-		od_tuners->sampling_rate = sampling_rate;
+		sampling_rate = &od_tuners->sampling_rate;
+		min_sampling_rate = &od_tuners->min_sampling_rate;
 	}
+
+	/* Bring kernel and HW constraints together */
+	*min_sampling_rate = max(*min_sampling_rate,
+				 MIN_LATENCY_MULTIPLIER * latency);
+	*sampling_rate = max(*min_sampling_rate, latency * LATENCY_MULTIPLIER);
 }
 
 static int alloc_common_dbs_info(struct cpufreq_policy *policy,
@@ -338,7 +354,6 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 				 struct dbs_data *dbs_data,
 				 struct common_dbs_data *cdata)
 {
-	unsigned int latency;
 	int ret;
 
 	/* State should be equivalent to EXIT */
@@ -373,16 +388,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 	if (ret)
 		goto free_common_dbs_info;
 
-	/* policy latency is in ns. Convert it to us first */
-	latency = policy->cpuinfo.transition_latency / 1000;
-	if (latency == 0)
-		latency = 1;
-
-	/* Bring kernel and HW constraints together */
-	dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
-					  MIN_LATENCY_MULTIPLIER * latency);
-	set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
-					latency * LATENCY_MULTIPLIER));
+	set_sampling_rate(policy, dbs_data);
 
 	if (!have_governor_per_policy())
 		cdata->gdbs_data = dbs_data;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 91e767a058a7..ad44a8546a3a 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -183,6 +183,7 @@ struct od_dbs_tuners {
 	unsigned int up_threshold;
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
+	unsigned int min_sampling_rate;
 };
 
 struct cs_dbs_tuners {
@@ -192,6 +193,7 @@ struct cs_dbs_tuners {
 	unsigned int up_threshold;
 	unsigned int down_threshold;
 	unsigned int freq_step;
+	unsigned int min_sampling_rate;
 };
 
 /* Common Governor data across policies */
@@ -230,7 +232,6 @@ struct common_dbs_data {
 /* Governor Per policy data */
 struct dbs_data {
 	struct common_dbs_data *cdata;
-	unsigned int min_sampling_rate;
 	int usage_count;
 	void *tuners;
 };
@@ -254,21 +255,6 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 	return delay;
 }
 
-#define declare_show_sampling_rate_min(_gov)				\
-static ssize_t show_sampling_rate_min_gov_sys				\
-(struct kobject *kobj, struct attribute *attr, char *buf)		\
-{									\
-	struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data;		\
-	return sprintf(buf, "%u\n", dbs_data->min_sampling_rate);	\
-}									\
-									\
-static ssize_t show_sampling_rate_min_gov_pol				\
-(struct cpufreq_policy *policy, char *buf)				\
-{									\
-	struct dbs_data *dbs_data = policy->governor_data;		\
-	return sprintf(buf, "%u\n", dbs_data->min_sampling_rate);	\
-}
-
 extern struct mutex cpufreq_governor_lock;
 
 void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index eae51070c034..b31f64745232 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -250,7 +250,7 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	int cpu;
 
 	od_tuners->sampling_rate = new_rate = max(new_rate,
-			dbs_data->min_sampling_rate);
+			od_tuners->min_sampling_rate);
 
 	/*
 	 * Lock governor so that governor start/stop can't execute in parallel.
@@ -442,7 +442,7 @@ show_store_one(od, up_threshold);
 show_store_one(od, sampling_down_factor);
 show_store_one(od, ignore_nice_load);
 show_store_one(od, powersave_bias);
-declare_show_sampling_rate_min(od);
+show_one(od, min_sampling_rate);
 
 gov_sys_pol_attr_rw(sampling_rate);
 gov_sys_pol_attr_rw(io_is_busy);
@@ -450,10 +450,10 @@ gov_sys_pol_attr_rw(up_threshold);
 gov_sys_pol_attr_rw(sampling_down_factor);
 gov_sys_pol_attr_rw(ignore_nice_load);
 gov_sys_pol_attr_rw(powersave_bias);
-gov_sys_pol_attr_ro(sampling_rate_min);
+gov_sys_pol_attr_ro(min_sampling_rate);
 
 static struct attribute *dbs_attributes_gov_sys[] = {
-	&sampling_rate_min_gov_sys.attr,
+	&min_sampling_rate_gov_sys.attr,
 	&sampling_rate_gov_sys.attr,
 	&up_threshold_gov_sys.attr,
 	&sampling_down_factor_gov_sys.attr,
@@ -469,7 +469,7 @@ static struct attribute_group od_attr_group_gov_sys = {
 };
 
 static struct attribute *dbs_attributes_gov_pol[] = {
-	&sampling_rate_min_gov_pol.attr,
+	&min_sampling_rate_gov_pol.attr,
 	&sampling_rate_gov_pol.attr,
 	&up_threshold_gov_pol.attr,
 	&sampling_down_factor_gov_pol.attr,
@@ -509,12 +509,12 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
 		 * not depending on HZ, but fixed (very low). The deferred
 		 * timer might skip some samples if idle/sleeping as needed.
 		*/
-		dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
+		tuners->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
 	} else {
 		tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
 
 		/* For correct statistics, we need 10 ticks for each measure */
-		dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
+		tuners->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
 			jiffies_to_usecs(10);
 	}
 
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-03 16:17   ` Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 3/7] cpufreq: governor: Drop unused macros for creating governor tunable attributes Viresh Kumar
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use the same governor with different tunables at the
same time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 73 ++++++++++++----------------------
 drivers/cpufreq/cpufreq_governor.c     | 69 +++++++++++++++++++++++++++-----
 drivers/cpufreq/cpufreq_governor.h     | 34 ++++++++++++++--
 drivers/cpufreq/cpufreq_ondemand.c     | 73 ++++++++++++----------------------
 4 files changed, 142 insertions(+), 107 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 57750367bd26..c749fb4fe5d2 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -275,54 +275,33 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(cs, sampling_rate);
-show_store_one(cs, sampling_down_factor);
-show_store_one(cs, up_threshold);
-show_store_one(cs, down_threshold);
-show_store_one(cs, ignore_nice_load);
-show_store_one(cs, freq_step);
-show_one(cs, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(down_threshold);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(freq_step);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&down_threshold_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&freq_step_gov_sys.attr,
+gov_show_one(cs, sampling_rate);
+gov_show_one(cs, sampling_down_factor);
+gov_show_one(cs, up_threshold);
+gov_show_one(cs, down_threshold);
+gov_show_one(cs, ignore_nice_load);
+gov_show_one(cs, freq_step);
+gov_show_one(cs, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(up_threshold);
+gov_attr_rw(down_threshold);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(freq_step);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *cs_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&sampling_down_factor.attr,
+	&up_threshold.attr,
+	&down_threshold.attr,
+	&ignore_nice_load.attr,
+	&freq_step.attr,
 	NULL
 };
 
-static struct attribute_group cs_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "conservative",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&down_threshold_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&freq_step_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group cs_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
-	.name = "conservative",
-};
-
 /************************** sysfs end ************************/
 
 static int cs_init(struct dbs_data *dbs_data, bool notify)
@@ -365,8 +344,8 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
 
 static struct common_dbs_data cs_dbs_cdata = {
 	.governor = GOV_CONSERVATIVE,
-	.attr_group_gov_sys = &cs_attr_group_gov_sys,
-	.attr_group_gov_pol = &cs_attr_group_gov_pol,
+	.kobj_name = "conservative",
+	.kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = cs_attributes },
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = cs_dbs_timer,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9a7edc91ad57..e7f79d2477fa 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -22,14 +22,58 @@
 
 #include "cpufreq_governor.h"
 
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
+static inline struct dbs_data *to_dbs_data(struct kobject *kobj)
 {
-	if (have_governor_per_policy())
-		return dbs_data->cdata->attr_group_gov_pol;
-	else
-		return dbs_data->cdata->attr_group_gov_sys;
+	return container_of(kobj, struct dbs_data, kobj);
 }
 
+static inline struct governor_attr *to_gov_attr(struct attribute *attr)
+{
+	return container_of(attr, struct governor_attr, attr);
+}
+
+static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
+			     char *buf)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_gov_attr(attr);
+	int ret = -EIO;
+
+	if (gattr->show)
+		ret = gattr->show(dbs_data, buf);
+
+	return ret;
+}
+
+static ssize_t governor_store(struct kobject *kobj, struct attribute *attr,
+			      const char *buf, size_t count)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_gov_attr(attr);
+	int ret = -EIO;
+
+	mutex_lock(&dbs_data->mutex);
+
+	if (gattr->store)
+		ret = gattr->store(dbs_data, buf, count);
+
+	mutex_unlock(&dbs_data->mutex);
+
+	return ret;
+}
+
+/*
+ * Sysfs Ops for accessing governor attributes.
+ *
+ * All show/store invocations for governor specific sysfs attributes, will first
+ * call the below show/store callbacks and the attribute specific callback will
+ * be called from within it.
+ */
+const struct sysfs_ops governor_sysfs_ops = {
+	.show	= governor_show,
+	.store	= governor_store,
+};
+
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 {
 	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
@@ -383,6 +427,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	dbs_data->cdata = cdata;
 	dbs_data->usage_count = 1;
+	mutex_init(&dbs_data->mutex);
 
 	ret = cdata->init(dbs_data, !policy->governor->initialized);
 	if (ret)
@@ -395,10 +440,14 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	policy->governor_data = dbs_data;
 
-	ret = sysfs_create_group(get_governor_parent_kobj(policy),
-				 get_sysfs_attr(dbs_data));
-	if (ret)
+	ret = kobject_init_and_add(&dbs_data->kobj, &cdata->kobj_type,
+				   get_governor_parent_kobj(policy),
+				   cdata->kobj_name);
+	if (ret) {
+		pr_err("cpufreq: Governor initialization failed (dbs_data kobject initialization error %d)\n",
+		       ret);
 		goto reset_gdbs_data;
+	}
 
 	return 0;
 
@@ -426,8 +475,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
 		return -EBUSY;
 
 	if (!--dbs_data->usage_count) {
-		sysfs_remove_group(get_governor_parent_kobj(policy),
-				   get_sysfs_attr(dbs_data));
+		kobject_put(&dbs_data->kobj);
 
 		policy->governor_data = NULL;
 
@@ -435,6 +483,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
 			cdata->gdbs_data = NULL;
 
 		cdata->exit(dbs_data, policy->governor->initialized == 1);
+		mutex_destroy(&dbs_data->mutex);
 		kfree(dbs_data);
 	} else {
 		policy->governor_data = NULL;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index ad44a8546a3a..45a6ac1ecfb0 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol				\
 show_one(_gov, file_name);						\
 store_one(_gov, file_name)
 
+/* Governor's specific attributes */
+struct dbs_data;
+struct governor_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
+	ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
+			 size_t count);
+};
+
+#define gov_show_one(_gov, file_name)					\
+static ssize_t show_##file_name						\
+(struct dbs_data *dbs_data, char *buf)					\
+{									\
+	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
+	return sprintf(buf, "%u\n", tuners->file_name);			\
+}
+
+#define gov_attr_ro(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0444, show_##_name, NULL)
+
+#define gov_attr_rw(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0644, show_##_name, store_##_name)
+
 /* create helper routines */
 #define define_get_cpu_dbs_routines(_dbs_info)				\
 static struct cpu_dbs_info *get_cpu_cdbs(int cpu)			\
@@ -197,14 +222,13 @@ struct cs_dbs_tuners {
 };
 
 /* Common Governor data across policies */
-struct dbs_data;
 struct common_dbs_data {
 	/* Common across governors */
 	#define GOV_ONDEMAND		0
 	#define GOV_CONSERVATIVE	1
 	int governor;
-	struct attribute_group *attr_group_gov_sys; /* one governor - system */
-	struct attribute_group *attr_group_gov_pol; /* one governor - policy */
+	const char *kobj_name;
+	struct kobj_type kobj_type;
 
 	/*
 	 * Common data for platforms that don't set
@@ -234,6 +258,9 @@ struct dbs_data {
 	struct common_dbs_data *cdata;
 	int usage_count;
 	void *tuners;
+	struct kobject kobj;
+	/* Protect concurrent updates to governor tunables from sysfs */
+	struct mutex mutex;
 };
 
 /* Governor specific ops, will be passed to dbs_data->gov_ops */
@@ -256,6 +283,7 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 }
 
 extern struct mutex cpufreq_governor_lock;
+extern const struct sysfs_ops governor_sysfs_ops;
 
 void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
 void gov_cancel_work(struct cpu_common_dbs_info *shared);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index b31f64745232..82ed490f7de0 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -436,54 +436,33 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(od, sampling_rate);
-show_store_one(od, io_is_busy);
-show_store_one(od, up_threshold);
-show_store_one(od, sampling_down_factor);
-show_store_one(od, ignore_nice_load);
-show_store_one(od, powersave_bias);
-show_one(od, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(io_is_busy);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(powersave_bias);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&powersave_bias_gov_sys.attr,
-	&io_is_busy_gov_sys.attr,
+gov_show_one(od, sampling_rate);
+gov_show_one(od, io_is_busy);
+gov_show_one(od, up_threshold);
+gov_show_one(od, sampling_down_factor);
+gov_show_one(od, ignore_nice_load);
+gov_show_one(od, powersave_bias);
+gov_show_one(od, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(io_is_busy);
+gov_attr_rw(up_threshold);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(powersave_bias);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *od_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&up_threshold.attr,
+	&sampling_down_factor.attr,
+	&ignore_nice_load.attr,
+	&powersave_bias.attr,
+	&io_is_busy.attr,
 	NULL
 };
 
-static struct attribute_group od_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "ondemand",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&powersave_bias_gov_pol.attr,
-	&io_is_busy_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group od_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
-	.name = "ondemand",
-};
-
 /************************** sysfs end ************************/
 
 static int od_init(struct dbs_data *dbs_data, bool notify)
@@ -542,8 +521,8 @@ static struct od_ops od_ops = {
 
 static struct common_dbs_data od_dbs_cdata = {
 	.governor = GOV_ONDEMAND,
-	.attr_group_gov_sys = &od_attr_group_gov_sys,
-	.attr_group_gov_pol = &od_attr_group_gov_pol,
+	.kobj_name = "ondemand",
+	.kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = od_attributes },
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = od_dbs_timer,
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V2 3/7] cpufreq: governor: Drop unused macros for creating governor tunable attributes
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 4/7] Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Viresh Kumar
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

"The previous commit introduced a new set of macros for creating sysfs
attributes that represent governor tunables and the old macros used for
this purpose are not needed any more, so drop them."

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.h | 68 --------------------------------------
 1 file changed, 68 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 45a6ac1ecfb0..ed328a39c4ac 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -40,74 +40,6 @@
 /* Ondemand Sampling types */
 enum {OD_NORMAL_SAMPLE, OD_SUB_SAMPLE};
 
-/*
- * Macro for creating governors sysfs routines
- *
- * - gov_sys: One governor instance per whole system
- * - gov_pol: One governor instance per policy
- */
-
-/* Create attributes */
-#define gov_sys_attr_ro(_name)						\
-static struct global_attr _name##_gov_sys =				\
-__ATTR(_name, 0444, show_##_name##_gov_sys, NULL)
-
-#define gov_sys_attr_rw(_name)						\
-static struct global_attr _name##_gov_sys =				\
-__ATTR(_name, 0644, show_##_name##_gov_sys, store_##_name##_gov_sys)
-
-#define gov_pol_attr_ro(_name)						\
-static struct freq_attr _name##_gov_pol =				\
-__ATTR(_name, 0444, show_##_name##_gov_pol, NULL)
-
-#define gov_pol_attr_rw(_name)						\
-static struct freq_attr _name##_gov_pol =				\
-__ATTR(_name, 0644, show_##_name##_gov_pol, store_##_name##_gov_pol)
-
-#define gov_sys_pol_attr_rw(_name)					\
-	gov_sys_attr_rw(_name);						\
-	gov_pol_attr_rw(_name)
-
-#define gov_sys_pol_attr_ro(_name)					\
-	gov_sys_attr_ro(_name);						\
-	gov_pol_attr_ro(_name)
-
-/* Create show/store routines */
-#define show_one(_gov, file_name)					\
-static ssize_t show_##file_name##_gov_sys				\
-(struct kobject *kobj, struct attribute *attr, char *buf)		\
-{									\
-	struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \
-	return sprintf(buf, "%u\n", tuners->file_name);			\
-}									\
-									\
-static ssize_t show_##file_name##_gov_pol				\
-(struct cpufreq_policy *policy, char *buf)				\
-{									\
-	struct dbs_data *dbs_data = policy->governor_data;		\
-	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
-	return sprintf(buf, "%u\n", tuners->file_name);			\
-}
-
-#define store_one(_gov, file_name)					\
-static ssize_t store_##file_name##_gov_sys				\
-(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \
-{									\
-	struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data;		\
-	return store_##file_name(dbs_data, buf, count);			\
-}									\
-									\
-static ssize_t store_##file_name##_gov_pol				\
-(struct cpufreq_policy *policy, const char *buf, size_t count)		\
-{									\
-	struct dbs_data *dbs_data = policy->governor_data;		\
-	return store_##file_name(dbs_data, buf, count);			\
-}
-
-#define show_store_one(_gov, file_name)					\
-show_one(_gov, file_name);						\
-store_one(_gov, file_name)
-
 /* Governor's specific attributes */
 struct dbs_data;
 struct governor_attr {
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V2 4/7] Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
                   ` (2 preceding siblings ...)
  2016-02-03 14:02 ` [PATCH V2 3/7] cpufreq: governor: Drop unused macros for creating governor tunable attributes Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines Viresh Kumar
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

Earlier, when the struct freq-attr was used to represent governor
attributes, the standard cpufreq show/store sysfs attribute callbacks
were applied to the governor tunable attributes and they always acquire
the policy->rwsem lock before carrying out the operation.  That could
have resulted in an ABBA deadlock if governor tunable attributes are
removed under policy->rwsem while one of them is being accessed
concurrently (if sysfs attributes removal wins the race, it will wait
for the access to complete with policy->rwsem held while the attribute
callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.

The previous commit, "cpufreq: governor: New sysfs show/store callbacks
for governor tunables", fixed the original ABBA deadlock by adding new
governor specific show/store callbacks.

We don't have to drop rwsem around invocations of governor event
CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.

Fixes: 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT")
Reported-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 5 -----
 include/linux/cpufreq.h   | 4 ----
 2 files changed, 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e979ec78b695..5f7e24567e0e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2155,10 +2155,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 			return ret;
 		}
 
-		up_write(&policy->rwsem);
 		ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-		down_write(&policy->rwsem);
-
 		if (ret) {
 			pr_err("%s: Failed to Exit Governor: %s (%d)\n",
 			       __func__, old_gov->name, ret);
@@ -2174,9 +2171,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 		if (!ret)
 			goto out;
 
-		up_write(&policy->rwsem);
 		__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-		down_write(&policy->rwsem);
 	}
 
 	/* new governor failed, so re-start old one */
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 88a4215125bc..79b87cebaa9c 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -100,10 +100,6 @@ struct cpufreq_policy {
 	 * - Any routine that will write to the policy structure and/or may take away
 	 *   the policy altogether (eg. CPU hotplug), will hold this lock in write
 	 *   mode before doing so.
-	 *
-	 * Additional rules:
-	 * - Lock should not be held across
-	 *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 	 */
 	struct rw_semaphore	rwsem;
 
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
                   ` (3 preceding siblings ...)
  2016-02-03 14:02 ` [PATCH V2 4/7] Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-03 20:21   ` Saravana Kannan
  2016-02-03 14:02 ` [PATCH V2 6/7] cpufreq: Call __cpufreq_governor() with policy->rwsem held Viresh Kumar
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

The offline routine was separated into two halves earlier by
'commit 1aee40ac9c86 ("cpufreq: Invoke __cpufreq_remove_dev_finish()
after releasing cpu_hotplug.lock");.

And the reasons cited were, race issues between accessing policy's sysfs
files and policy kobject's cleanup.

That race isn't valid anymore, as we don't remove the policy & its
kobject completely on hotplugs, but do that from ->remove() callback of
subsys framework.

These two routines can be merged back now.

This is a preparatory step for the next patch, that will enforce
policy->rwsem lock around __cpufreq_governor() routines STOP/EXIT
sequence.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 36 ++++++++++--------------------------
 1 file changed, 10 insertions(+), 26 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 5f7e24567e0e..dc43294e7b31 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1309,9 +1309,10 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
 	return ret;
 }
 
-static void cpufreq_offline_prepare(unsigned int cpu)
+static void cpufreq_offline(unsigned int cpu)
 {
 	struct cpufreq_policy *policy;
+	int ret;
 
 	pr_debug("%s: unregistering CPU %u\n", __func__, cpu);
 
@@ -1322,7 +1323,7 @@ static void cpufreq_offline_prepare(unsigned int cpu)
 	}
 
 	if (has_target()) {
-		int ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
+		ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 		if (ret)
 			pr_err("%s: Failed to stop governor\n", __func__);
 	}
@@ -1345,34 +1346,23 @@ static void cpufreq_offline_prepare(unsigned int cpu)
 	/* Start governor again for active policy */
 	if (!policy_is_inactive(policy)) {
 		if (has_target()) {
-			int ret = __cpufreq_governor(policy, CPUFREQ_GOV_START);
+			ret = __cpufreq_governor(policy, CPUFREQ_GOV_START);
 			if (!ret)
 				ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
 
 			if (ret)
 				pr_err("%s: Failed to start governor\n", __func__);
 		}
-	} else if (cpufreq_driver->stop_cpu) {
-		cpufreq_driver->stop_cpu(policy);
-	}
-}
 
-static void cpufreq_offline_finish(unsigned int cpu)
-{
-	struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
-
-	if (!policy) {
-		pr_debug("%s: No cpu_data found\n", __func__);
 		return;
 	}
 
-	/* Only proceed for inactive policies */
-	if (!policy_is_inactive(policy))
-		return;
+	if (cpufreq_driver->stop_cpu)
+		cpufreq_driver->stop_cpu(policy);
 
 	/* If cpu is last user of policy, free policy */
 	if (has_target()) {
-		int ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
+		ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
 		if (ret)
 			pr_err("%s: Failed to exit governor\n", __func__);
 	}
@@ -1401,10 +1391,8 @@ static void cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
 	if (!policy)
 		return;
 
-	if (cpu_online(cpu)) {
-		cpufreq_offline_prepare(cpu);
-		cpufreq_offline_finish(cpu);
-	}
+	if (cpu_online(cpu))
+		cpufreq_offline(cpu);
 
 	cpumask_clear_cpu(cpu, policy->real_cpus);
 	remove_cpu_dev_symlink(policy, cpu);
@@ -2255,11 +2243,7 @@ static int cpufreq_cpu_callback(struct notifier_block *nfb,
 		break;
 
 	case CPU_DOWN_PREPARE:
-		cpufreq_offline_prepare(cpu);
-		break;
-
-	case CPU_POST_DEAD:
-		cpufreq_offline_finish(cpu);
+		cpufreq_offline(cpu);
 		break;
 
 	case CPU_DOWN_FAILED:
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V2 6/7] cpufreq: Call __cpufreq_governor() with policy->rwsem held
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
                   ` (4 preceding siblings ...)
  2016-02-03 14:02 ` [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-03 14:02 ` [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock Viresh Kumar
  2016-02-03 15:54 ` [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Juri Lelli
  7 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

This isn't followed properly by all parts of the core code, some follow
it, whereas others don't.

Enforcing it will also enable us to remove cpufreq_governor_lock, that
is used today because we can't guarantee that __cpufreq_governor() isn't
executed in parallel.

We should also ensure that the lock is held across state changes to the
governors.

For example, while adding a CPU to the policy on cpu-online path, we
need to stop the governor, change policy->cpus, start the governor and
then refresh its limits. The complete sequence must be guaranteed to
execute without any concurrent races. And that can be achieved using
policy->rwsem around these use cases.

Also note that cpufreq_driver->stop_cpu() and ->exit() can get called
while policy->rwsem is held. That shouldn't have any side effects
though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 45 ++++++++++++++++++++++++++++++---------------
 1 file changed, 30 insertions(+), 15 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index dc43294e7b31..4fc3889ca7c9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -996,30 +996,29 @@ static int cpufreq_add_policy_cpu(struct cpufreq_policy *policy, unsigned int cp
 	if (cpumask_test_cpu(cpu, policy->cpus))
 		return 0;
 
+	down_write(&policy->rwsem);
 	if (has_target()) {
 		ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 		if (ret) {
 			pr_err("%s: Failed to stop governor\n", __func__);
-			return ret;
+			goto unlock;
 		}
 	}
 
-	down_write(&policy->rwsem);
 	cpumask_set_cpu(cpu, policy->cpus);
-	up_write(&policy->rwsem);
 
 	if (has_target()) {
 		ret = __cpufreq_governor(policy, CPUFREQ_GOV_START);
 		if (!ret)
 			ret = __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
 
-		if (ret) {
+		if (ret)
 			pr_err("%s: Failed to start governor\n", __func__);
-			return ret;
-		}
 	}
 
-	return 0;
+unlock:
+	up_write(&policy->rwsem);
+	return ret;
 }
 
 static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
@@ -1322,13 +1321,13 @@ static void cpufreq_offline(unsigned int cpu)
 		return;
 	}
 
+	down_write(&policy->rwsem);
 	if (has_target()) {
 		ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
 		if (ret)
 			pr_err("%s: Failed to stop governor\n", __func__);
 	}
 
-	down_write(&policy->rwsem);
 	cpumask_clear_cpu(cpu, policy->cpus);
 
 	if (policy_is_inactive(policy)) {
@@ -1341,7 +1340,6 @@ static void cpufreq_offline(unsigned int cpu)
 		/* Nominate new CPU */
 		policy->cpu = cpumask_any(policy->cpus);
 	}
-	up_write(&policy->rwsem);
 
 	/* Start governor again for active policy */
 	if (!policy_is_inactive(policy)) {
@@ -1376,6 +1374,7 @@ static void cpufreq_offline(unsigned int cpu)
 		cpufreq_driver->exit(policy);
 		policy->freq_table = NULL;
 	}
+	up_write(&policy->rwsem);
 }
 
 /**
@@ -1572,6 +1571,7 @@ EXPORT_SYMBOL(cpufreq_generic_suspend);
 void cpufreq_suspend(void)
 {
 	struct cpufreq_policy *policy;
+	int ret;
 
 	if (!cpufreq_driver)
 		return;
@@ -1582,7 +1582,11 @@ void cpufreq_suspend(void)
 	pr_debug("%s: Suspending Governors\n", __func__);
 
 	for_each_active_policy(policy) {
-		if (__cpufreq_governor(policy, CPUFREQ_GOV_STOP))
+		down_write(&policy->rwsem);
+		ret = __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
+		up_write(&policy->rwsem);
+
+		if (ret)
 			pr_err("%s: Failed to stop governor for policy: %p\n",
 				__func__, policy);
 		else if (cpufreq_driver->suspend
@@ -1604,6 +1608,7 @@ void cpufreq_suspend(void)
 void cpufreq_resume(void)
 {
 	struct cpufreq_policy *policy;
+	int ret;
 
 	if (!cpufreq_driver)
 		return;
@@ -1616,13 +1621,20 @@ void cpufreq_resume(void)
 	pr_debug("%s: Resuming Governors\n", __func__);
 
 	for_each_active_policy(policy) {
-		if (cpufreq_driver->resume && cpufreq_driver->resume(policy))
+		if (cpufreq_driver->resume && cpufreq_driver->resume(policy)) {
 			pr_err("%s: Failed to resume driver: %p\n", __func__,
 				policy);
-		else if (__cpufreq_governor(policy, CPUFREQ_GOV_START)
-		    || __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS))
-			pr_err("%s: Failed to start governor for policy: %p\n",
-				__func__, policy);
+		} else {
+			down_write(&policy->rwsem);
+			ret = __cpufreq_governor(policy, CPUFREQ_GOV_START);
+			if (!ret)
+				__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
+			up_write(&policy->rwsem);
+
+			if (ret)
+				pr_err("%s: Failed to start governor for policy: %p\n",
+				       __func__, policy);
+		}
 	}
 
 	/*
@@ -2276,8 +2288,11 @@ static int cpufreq_boost_set_sw(int state)
 				       __func__);
 				break;
 			}
+
+			down_write(&policy->rwsem);
 			policy->user_policy.max = policy->max;
 			__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
+			up_write(&policy->rwsem);
 		}
 	}
 
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
                   ` (5 preceding siblings ...)
  2016-02-03 14:02 ` [PATCH V2 6/7] cpufreq: Call __cpufreq_governor() with policy->rwsem held Viresh Kumar
@ 2016-02-03 14:02 ` Viresh Kumar
  2016-02-04  6:43   ` Viresh Kumar
  2016-02-03 15:54 ` [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Juri Lelli
  7 siblings, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 14:02 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

We used to drop policy->rwsem just before calling __cpufreq_governor()
in some cases earlier and so it was possible that __cpufreq_governor()
runs concurrently via separate threads.

In order to guarantee valid state transitions for governors,
'governor_enabled' was required to be protected using some locking and
we created cpufreq_governor_lock for that.

But now, __cpufreq_governor() is always called from within policy->rwsem
held and so 'governor_enabled' is protected against races even without
cpufreq_governor_lock.

Get rid of the extra lock now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 4fc3889ca7c9..7bc8a5ed97e5 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list);
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
-DEFINE_MUTEX(cpufreq_governor_lock);
 
 /* Flag to suspend/resume CPUFreq governors */
 static bool cpufreq_suspended;
@@ -1963,11 +1962,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 
 	pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
 
-	mutex_lock(&cpufreq_governor_lock);
 	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
 	    || (!policy->governor_enabled
 	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
-		mutex_unlock(&cpufreq_governor_lock);
 		return -EBUSY;
 	}
 
@@ -1976,8 +1973,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 	else if (event == CPUFREQ_GOV_START)
 		policy->governor_enabled = true;
 
-	mutex_unlock(&cpufreq_governor_lock);
-
 	ret = policy->governor->governor(policy, event);
 
 	if (!ret) {
@@ -1987,12 +1982,10 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 			policy->governor->initialized--;
 	} else {
 		/* Restore original values */
-		mutex_lock(&cpufreq_governor_lock);
 		if (event == CPUFREQ_GOV_STOP)
 			policy->governor_enabled = true;
 		else if (event == CPUFREQ_GOV_START)
 			policy->governor_enabled = false;
-		mutex_unlock(&cpufreq_governor_lock);
 	}
 
 	if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
                   ` (6 preceding siblings ...)
  2016-02-03 14:02 ` [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock Viresh Kumar
@ 2016-02-03 15:54 ` Juri Lelli
  2016-02-03 16:10   ` Viresh Kumar
  7 siblings, 1 reply; 36+ messages in thread
From: Juri Lelli @ 2016-02-03 15:54 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

Hi Viresh,

On 03/02/16 19:32, Viresh Kumar wrote:
> Hi Rafael,
> 
> Here is the V2 with updated patches as suggested by you guys.
> 
> These are pushed here:
> git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/governor-kobject
> 
> The first four patches are for 4.5, if possible and others you can keep
> for 4.6.
> 
> V1->V2:
> - Improved changelogs, thanks Rafael.
> - Added new dbs_data->mutex to avoid concurrent updates to tunables.
> - Moved kobj_type to common_dbs_data.
> - Updated macros to static inline routines
> - s/show/governor_show
> - s/store/governor_store
> - Improved comments
> 
> @Juri: More testing requested :)
> 

Ouch, I've just got this executing -f basic on Juno. :(
It happens with the hotplug_1_by_1 test.


[ 1086.531252] IRQ1 no longer affine to CPU1
[ 1086.531495] CPU1: shutdown
[ 1086.538199] psci: CPU1 killed.
[ 1086.583396]
[ 1086.584881] ======================================================
[ 1086.590999] [ INFO: possible circular locking dependency detected ]
[ 1086.597205] 4.5.0-rc2+ #37 Not tainted
[ 1086.600914] -------------------------------------------------------
[ 1086.607118] runme.sh/1052 is trying to acquire lock:
[ 1086.612031]  (sb_writers#7){.+.+.+}, at: [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
[ 1086.620090]
[ 1086.620090] but task is already holding lock:
[ 1086.625865]  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
[ 1086.634081]
[ 1086.634081] which lock already depends on the new lock.
[ 1086.634081]
[ 1086.642180]
[ 1086.642180] the existing dependency chain (in reverse order) is:
[ 1086.649589]
-> #1 (&policy->rwsem){+++++.}:
[ 1086.653929]        [<ffffffc00011d9a4>] check_prev_add+0x670/0x754
[ 1086.660060]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
[ 1086.666876]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
[ 1086.673001]        [<ffffffc000120b58>] lock_release+0x244/0x570
[ 1086.678955]        [<ffffffc0007351d0>] __mutex_unlock_slowpath+0xa0/0x18c
[ 1086.685771]        [<ffffffc0007352dc>] mutex_unlock+0x20/0x2c
[ 1086.691553]        [<ffffffc0002ccd24>] kernfs_fop_write+0xb0/0x194
[ 1086.697768]        [<ffffffc00024478c>] __vfs_write+0x48/0x104
[ 1086.703550]        [<ffffffc0002457a4>] vfs_write+0x98/0x198
[ 1086.709161]        [<ffffffc0002465e4>] SyS_write+0x54/0xb0
[ 1086.714684]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
[ 1086.720555]
-> #0 (sb_writers#7){.+.+.+}:
[ 1086.724730]        [<ffffffc00011c574>] print_circular_bug+0x80/0x2e4
[ 1086.731116]        [<ffffffc00011d470>] check_prev_add+0x13c/0x754
[ 1086.737243]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
[ 1086.744059]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
[ 1086.750184]        [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
[ 1086.756052]        [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
[ 1086.762180]        [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
[ 1086.768306]        [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
[ 1086.774263]        [<ffffffc0002555f8>] do_last+0x660/0xcb8
[ 1086.779788]        [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
[ 1086.785570]        [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
[ 1086.791353]        [<ffffffc000244058>] do_sys_open+0x150/0x214
[ 1086.797222]        [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
[ 1086.802831]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
[ 1086.808700]
[ 1086.808700] other info that might help us debug this:
[ 1086.808700]
[ 1086.816627]  Possible unsafe locking scenario:
[ 1086.816627]
[ 1086.822488]        CPU0                    CPU1
[ 1086.826971]        ----                    ----
[ 1086.831453]   lock(&policy->rwsem);
[ 1086.834918]                                lock(sb_writers#7);
[ 1086.840713]                                lock(&policy->rwsem);
[ 1086.846671]   lock(sb_writers#7);
[ 1086.849972]
[ 1086.849972]  *** DEADLOCK ***
[ 1086.849972]
[ 1086.855836] 1 lock held by runme.sh/1052:
[ 1086.859802]  #0:  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
[ 1086.868453]
[ 1086.868453] stack backtrace:
[ 1086.872769] CPU: 5 PID: 1052 Comm: runme.sh Not tainted 4.5.0-rc2+ #37
[ 1086.879229] Hardware name: ARM Juno development board (r2) (DT)
[ 1086.885089] Call trace:
[ 1086.887511] [<ffffffc00008a788>] dump_backtrace+0x0/0x1f4
[ 1086.892858] [<ffffffc00008a99c>] show_stack+0x20/0x28
[ 1086.897861] [<ffffffc00041a380>] dump_stack+0x84/0xc0
[ 1086.902863] [<ffffffc00011c6c8>] print_circular_bug+0x1d4/0x2e4
[ 1086.908725] [<ffffffc00011d470>] check_prev_add+0x13c/0x754
[ 1086.914244] [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
[ 1086.920448] [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
[ 1086.925965] [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
[ 1086.931224] [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
[ 1086.936742] [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
[ 1086.942260] [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
[ 1086.947605] [<ffffffc0002555f8>] do_last+0x660/0xcb8
[ 1086.952520] [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
[ 1086.957693] [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
[ 1086.962865] [<ffffffc000244058>] do_sys_open+0x150/0x214
[ 1086.968123] [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
[ 1086.973124] [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
[ 1087.019315] Detected PIPT I-cache on CPU1
[ 1087.019373] CPU1: Booted secondary processor [410fd080]

Best,

- Juri

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 15:54 ` [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Juri Lelli
@ 2016-02-03 16:10   ` Viresh Kumar
  2016-02-03 17:20     ` Juri Lelli
  2016-02-04  6:24     ` Viresh Kumar
  0 siblings, 2 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 16:10 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03-02-16, 15:54, Juri Lelli wrote:
> Ouch, I've just got this executing -f basic on Juno. :(
> It happens with the hotplug_1_by_1 test.
> 
> 
> [ 1086.531252] IRQ1 no longer affine to CPU1
> [ 1086.531495] CPU1: shutdown
> [ 1086.538199] psci: CPU1 killed.
> [ 1086.583396]
> [ 1086.584881] ======================================================
> [ 1086.590999] [ INFO: possible circular locking dependency detected ]
> [ 1086.597205] 4.5.0-rc2+ #37 Not tainted
> [ 1086.600914] -------------------------------------------------------
> [ 1086.607118] runme.sh/1052 is trying to acquire lock:
> [ 1086.612031]  (sb_writers#7){.+.+.+}, at: [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> [ 1086.620090]
> [ 1086.620090] but task is already holding lock:
> [ 1086.625865]  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
> [ 1086.634081]
> [ 1086.634081] which lock already depends on the new lock.
> [ 1086.634081]
> [ 1086.642180]
> [ 1086.642180] the existing dependency chain (in reverse order) is:
> [ 1086.649589]
> -> #1 (&policy->rwsem){+++++.}:
> [ 1086.653929]        [<ffffffc00011d9a4>] check_prev_add+0x670/0x754
> [ 1086.660060]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> [ 1086.666876]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> [ 1086.673001]        [<ffffffc000120b58>] lock_release+0x244/0x570
> [ 1086.678955]        [<ffffffc0007351d0>] __mutex_unlock_slowpath+0xa0/0x18c
> [ 1086.685771]        [<ffffffc0007352dc>] mutex_unlock+0x20/0x2c
> [ 1086.691553]        [<ffffffc0002ccd24>] kernfs_fop_write+0xb0/0x194
> [ 1086.697768]        [<ffffffc00024478c>] __vfs_write+0x48/0x104
> [ 1086.703550]        [<ffffffc0002457a4>] vfs_write+0x98/0x198
> [ 1086.709161]        [<ffffffc0002465e4>] SyS_write+0x54/0xb0
> [ 1086.714684]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> [ 1086.720555]
> -> #0 (sb_writers#7){.+.+.+}:
> [ 1086.724730]        [<ffffffc00011c574>] print_circular_bug+0x80/0x2e4
> [ 1086.731116]        [<ffffffc00011d470>] check_prev_add+0x13c/0x754
> [ 1086.737243]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> [ 1086.744059]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> [ 1086.750184]        [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
> [ 1086.756052]        [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
> [ 1086.762180]        [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> [ 1086.768306]        [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
> [ 1086.774263]        [<ffffffc0002555f8>] do_last+0x660/0xcb8
> [ 1086.779788]        [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
> [ 1086.785570]        [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
> [ 1086.791353]        [<ffffffc000244058>] do_sys_open+0x150/0x214
> [ 1086.797222]        [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
> [ 1086.802831]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> [ 1086.808700]
> [ 1086.808700] other info that might help us debug this:
> [ 1086.808700]
> [ 1086.816627]  Possible unsafe locking scenario:
> [ 1086.816627]
> [ 1086.822488]        CPU0                    CPU1
> [ 1086.826971]        ----                    ----
> [ 1086.831453]   lock(&policy->rwsem);
> [ 1086.834918]                                lock(sb_writers#7);
> [ 1086.840713]                                lock(&policy->rwsem);
> [ 1086.846671]   lock(sb_writers#7);
> [ 1086.849972]
> [ 1086.849972]  *** DEADLOCK ***
> [ 1086.849972]
> [ 1086.855836] 1 lock held by runme.sh/1052:
> [ 1086.859802]  #0:  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
> [ 1086.868453]
> [ 1086.868453] stack backtrace:
> [ 1086.872769] CPU: 5 PID: 1052 Comm: runme.sh Not tainted 4.5.0-rc2+ #37
> [ 1086.879229] Hardware name: ARM Juno development board (r2) (DT)
> [ 1086.885089] Call trace:
> [ 1086.887511] [<ffffffc00008a788>] dump_backtrace+0x0/0x1f4
> [ 1086.892858] [<ffffffc00008a99c>] show_stack+0x20/0x28
> [ 1086.897861] [<ffffffc00041a380>] dump_stack+0x84/0xc0
> [ 1086.902863] [<ffffffc00011c6c8>] print_circular_bug+0x1d4/0x2e4
> [ 1086.908725] [<ffffffc00011d470>] check_prev_add+0x13c/0x754
> [ 1086.914244] [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> [ 1086.920448] [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> [ 1086.925965] [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
> [ 1086.931224] [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
> [ 1086.936742] [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> [ 1086.942260] [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
> [ 1086.947605] [<ffffffc0002555f8>] do_last+0x660/0xcb8
> [ 1086.952520] [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
> [ 1086.957693] [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
> [ 1086.962865] [<ffffffc000244058>] do_sys_open+0x150/0x214
> [ 1086.968123] [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
> [ 1086.973124] [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> [ 1087.019315] Detected PIPT I-cache on CPU1
> [ 1087.019373] CPU1: Booted secondary processor [410fd080]

Urg..

I failed to understand it for now though. Please test only the first 4
patches and leave the bottom three. AFAICT, this is caused by the 6th
patch.

The first 4 are important for 4.5 and must be tested soonish.

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables
  2016-02-03 14:02 ` [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables Viresh Kumar
@ 2016-02-03 16:17   ` Viresh Kumar
  0 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-03 16:17 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03-02-16, 19:32, Viresh Kumar wrote:

Build bot reported a minor fix here for compiling governors as
modules:

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index e7f79d2477fa..f76a83a99ca4 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -73,6 +73,7 @@ const struct sysfs_ops governor_sysfs_ops = {
        .show   = governor_show,
        .store  = governor_store,
 };
+EXPORT_SYMBOL_GPL(governor_sysfs_ops);
 
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 {


Full patch pasted below.

-------------------------8<-------------------------

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Tue, 2 Feb 2016 12:35:01 +0530
Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
 governor tunables

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use the same governor with different tunables at the
same time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 73 ++++++++++++----------------------
 drivers/cpufreq/cpufreq_governor.c     | 70 +++++++++++++++++++++++++++-----
 drivers/cpufreq/cpufreq_governor.h     | 34 ++++++++++++++--
 drivers/cpufreq/cpufreq_ondemand.c     | 73 ++++++++++++----------------------
 4 files changed, 143 insertions(+), 107 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 57750367bd26..c749fb4fe5d2 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -275,54 +275,33 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(cs, sampling_rate);
-show_store_one(cs, sampling_down_factor);
-show_store_one(cs, up_threshold);
-show_store_one(cs, down_threshold);
-show_store_one(cs, ignore_nice_load);
-show_store_one(cs, freq_step);
-show_one(cs, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(down_threshold);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(freq_step);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&down_threshold_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&freq_step_gov_sys.attr,
+gov_show_one(cs, sampling_rate);
+gov_show_one(cs, sampling_down_factor);
+gov_show_one(cs, up_threshold);
+gov_show_one(cs, down_threshold);
+gov_show_one(cs, ignore_nice_load);
+gov_show_one(cs, freq_step);
+gov_show_one(cs, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(up_threshold);
+gov_attr_rw(down_threshold);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(freq_step);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *cs_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&sampling_down_factor.attr,
+	&up_threshold.attr,
+	&down_threshold.attr,
+	&ignore_nice_load.attr,
+	&freq_step.attr,
 	NULL
 };
 
-static struct attribute_group cs_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "conservative",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&down_threshold_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&freq_step_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group cs_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
-	.name = "conservative",
-};
-
 /************************** sysfs end ************************/
 
 static int cs_init(struct dbs_data *dbs_data, bool notify)
@@ -365,8 +344,8 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
 
 static struct common_dbs_data cs_dbs_cdata = {
 	.governor = GOV_CONSERVATIVE,
-	.attr_group_gov_sys = &cs_attr_group_gov_sys,
-	.attr_group_gov_pol = &cs_attr_group_gov_pol,
+	.kobj_name = "conservative",
+	.kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = cs_attributes },
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = cs_dbs_timer,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9a7edc91ad57..f76a83a99ca4 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -22,14 +22,59 @@
 
 #include "cpufreq_governor.h"
 
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
+static inline struct dbs_data *to_dbs_data(struct kobject *kobj)
 {
-	if (have_governor_per_policy())
-		return dbs_data->cdata->attr_group_gov_pol;
-	else
-		return dbs_data->cdata->attr_group_gov_sys;
+	return container_of(kobj, struct dbs_data, kobj);
 }
 
+static inline struct governor_attr *to_gov_attr(struct attribute *attr)
+{
+	return container_of(attr, struct governor_attr, attr);
+}
+
+static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
+			     char *buf)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_gov_attr(attr);
+	int ret = -EIO;
+
+	if (gattr->show)
+		ret = gattr->show(dbs_data, buf);
+
+	return ret;
+}
+
+static ssize_t governor_store(struct kobject *kobj, struct attribute *attr,
+			      const char *buf, size_t count)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_gov_attr(attr);
+	int ret = -EIO;
+
+	mutex_lock(&dbs_data->mutex);
+
+	if (gattr->store)
+		ret = gattr->store(dbs_data, buf, count);
+
+	mutex_unlock(&dbs_data->mutex);
+
+	return ret;
+}
+
+/*
+ * Sysfs Ops for accessing governor attributes.
+ *
+ * All show/store invocations for governor specific sysfs attributes, will first
+ * call the below show/store callbacks and the attribute specific callback will
+ * be called from within it.
+ */
+const struct sysfs_ops governor_sysfs_ops = {
+	.show	= governor_show,
+	.store	= governor_store,
+};
+EXPORT_SYMBOL_GPL(governor_sysfs_ops);
+
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 {
 	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
@@ -383,6 +428,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	dbs_data->cdata = cdata;
 	dbs_data->usage_count = 1;
+	mutex_init(&dbs_data->mutex);
 
 	ret = cdata->init(dbs_data, !policy->governor->initialized);
 	if (ret)
@@ -395,10 +441,14 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	policy->governor_data = dbs_data;
 
-	ret = sysfs_create_group(get_governor_parent_kobj(policy),
-				 get_sysfs_attr(dbs_data));
-	if (ret)
+	ret = kobject_init_and_add(&dbs_data->kobj, &cdata->kobj_type,
+				   get_governor_parent_kobj(policy),
+				   cdata->kobj_name);
+	if (ret) {
+		pr_err("cpufreq: Governor initialization failed (dbs_data kobject initialization error %d)\n",
+		       ret);
 		goto reset_gdbs_data;
+	}
 
 	return 0;
 
@@ -426,8 +476,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
 		return -EBUSY;
 
 	if (!--dbs_data->usage_count) {
-		sysfs_remove_group(get_governor_parent_kobj(policy),
-				   get_sysfs_attr(dbs_data));
+		kobject_put(&dbs_data->kobj);
 
 		policy->governor_data = NULL;
 
@@ -435,6 +484,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
 			cdata->gdbs_data = NULL;
 
 		cdata->exit(dbs_data, policy->governor->initialized == 1);
+		mutex_destroy(&dbs_data->mutex);
 		kfree(dbs_data);
 	} else {
 		policy->governor_data = NULL;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index ad44a8546a3a..45a6ac1ecfb0 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol				\
 show_one(_gov, file_name);						\
 store_one(_gov, file_name)
 
+/* Governor's specific attributes */
+struct dbs_data;
+struct governor_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
+	ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
+			 size_t count);
+};
+
+#define gov_show_one(_gov, file_name)					\
+static ssize_t show_##file_name						\
+(struct dbs_data *dbs_data, char *buf)					\
+{									\
+	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
+	return sprintf(buf, "%u\n", tuners->file_name);			\
+}
+
+#define gov_attr_ro(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0444, show_##_name, NULL)
+
+#define gov_attr_rw(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0644, show_##_name, store_##_name)
+
 /* create helper routines */
 #define define_get_cpu_dbs_routines(_dbs_info)				\
 static struct cpu_dbs_info *get_cpu_cdbs(int cpu)			\
@@ -197,14 +222,13 @@ struct cs_dbs_tuners {
 };
 
 /* Common Governor data across policies */
-struct dbs_data;
 struct common_dbs_data {
 	/* Common across governors */
 	#define GOV_ONDEMAND		0
 	#define GOV_CONSERVATIVE	1
 	int governor;
-	struct attribute_group *attr_group_gov_sys; /* one governor - system */
-	struct attribute_group *attr_group_gov_pol; /* one governor - policy */
+	const char *kobj_name;
+	struct kobj_type kobj_type;
 
 	/*
 	 * Common data for platforms that don't set
@@ -234,6 +258,9 @@ struct dbs_data {
 	struct common_dbs_data *cdata;
 	int usage_count;
 	void *tuners;
+	struct kobject kobj;
+	/* Protect concurrent updates to governor tunables from sysfs */
+	struct mutex mutex;
 };
 
 /* Governor specific ops, will be passed to dbs_data->gov_ops */
@@ -256,6 +283,7 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 }
 
 extern struct mutex cpufreq_governor_lock;
+extern const struct sysfs_ops governor_sysfs_ops;
 
 void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
 void gov_cancel_work(struct cpu_common_dbs_info *shared);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index b31f64745232..82ed490f7de0 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -436,54 +436,33 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(od, sampling_rate);
-show_store_one(od, io_is_busy);
-show_store_one(od, up_threshold);
-show_store_one(od, sampling_down_factor);
-show_store_one(od, ignore_nice_load);
-show_store_one(od, powersave_bias);
-show_one(od, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(io_is_busy);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(powersave_bias);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&powersave_bias_gov_sys.attr,
-	&io_is_busy_gov_sys.attr,
+gov_show_one(od, sampling_rate);
+gov_show_one(od, io_is_busy);
+gov_show_one(od, up_threshold);
+gov_show_one(od, sampling_down_factor);
+gov_show_one(od, ignore_nice_load);
+gov_show_one(od, powersave_bias);
+gov_show_one(od, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(io_is_busy);
+gov_attr_rw(up_threshold);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(powersave_bias);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *od_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&up_threshold.attr,
+	&sampling_down_factor.attr,
+	&ignore_nice_load.attr,
+	&powersave_bias.attr,
+	&io_is_busy.attr,
 	NULL
 };
 
-static struct attribute_group od_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "ondemand",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&powersave_bias_gov_pol.attr,
-	&io_is_busy_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group od_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
-	.name = "ondemand",
-};
-
 /************************** sysfs end ************************/
 
 static int od_init(struct dbs_data *dbs_data, bool notify)
@@ -542,8 +521,8 @@ static struct od_ops od_ops = {
 
 static struct common_dbs_data od_dbs_cdata = {
 	.governor = GOV_ONDEMAND,
-	.attr_group_gov_sys = &od_attr_group_gov_sys,
-	.attr_group_gov_pol = &od_attr_group_gov_pol,
+	.kobj_name = "ondemand",
+	.kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = od_attributes },
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = od_dbs_timer,

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 16:10   ` Viresh Kumar
@ 2016-02-03 17:20     ` Juri Lelli
  2016-02-03 17:20       ` Rafael J. Wysocki
  2016-02-04  6:24     ` Viresh Kumar
  1 sibling, 1 reply; 36+ messages in thread
From: Juri Lelli @ 2016-02-03 17:20 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03/02/16 21:40, Viresh Kumar wrote:
> On 03-02-16, 15:54, Juri Lelli wrote:
> > Ouch, I've just got this executing -f basic on Juno. :(
> > It happens with the hotplug_1_by_1 test.
> > 

[...]

> 
> Urg..
> 
> I failed to understand it for now though. Please test only the first 4
> patches and leave the bottom three. AFAICT, this is caused by the 6th
> patch.
> 
> The first 4 are important for 4.5 and must be tested soonish.
> 

First 4 look ok from a testing viewpoint.

Best,

- Juri

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 17:20     ` Juri Lelli
@ 2016-02-03 17:20       ` Rafael J. Wysocki
  2016-02-03 23:31         ` Shilpa Bhat
  0 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03 17:20 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Viresh Kumar, Rafael Wysocki, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Wed, Feb 3, 2016 at 6:20 PM, Juri Lelli <juri.lelli@arm.com> wrote:
> On 03/02/16 21:40, Viresh Kumar wrote:
>> On 03-02-16, 15:54, Juri Lelli wrote:
>> > Ouch, I've just got this executing -f basic on Juno. :(
>> > It happens with the hotplug_1_by_1 test.
>> >
>
> [...]
>
>>
>> Urg..
>>
>> I failed to understand it for now though. Please test only the first 4
>> patches and leave the bottom three. AFAICT, this is caused by the 6th
>> patch.
>>
>> The first 4 are important for 4.5 and must be tested soonish.
>>
>
> First 4 look ok from a testing viewpoint.

Good, thanks for the confirmation!

I'm going to apply them and they will go to Linus next week.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines
  2016-02-03 14:02 ` [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines Viresh Kumar
@ 2016-02-03 20:21   ` Saravana Kannan
  2016-02-04  1:49     ` Viresh Kumar
  0 siblings, 1 reply; 36+ messages in thread
From: Saravana Kannan @ 2016-02-03 20:21 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, juri.lelli, linaro-kernel, linux-pm, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 02/03/2016 06:02 AM, Viresh Kumar wrote:
> The offline routine was separated into two halves earlier by
> 'commit 1aee40ac9c86 ("cpufreq: Invoke __cpufreq_remove_dev_finish()
> after releasing cpu_hotplug.lock");.
>
> And the reasons cited were, race issues between accessing policy's sysfs
> files and policy kobject's cleanup.
>
> That race isn't valid anymore, as we don't remove the policy & its
> kobject completely on hotplugs, but do that from ->remove() callback of
> subsys framework.
>
> These two routines can be merged back now.
>
> This is a preparatory step for the next patch, that will enforce
> policy->rwsem lock around __cpufreq_governor() routines STOP/EXIT
> sequence.

Is this stale text? Seems like this is now done in the *previous* patch?

-Saravana


-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 17:20       ` Rafael J. Wysocki
@ 2016-02-03 23:31         ` Shilpa Bhat
  2016-02-03 23:50           ` Rafael J. Wysocki
  0 siblings, 1 reply; 36+ messages in thread
From: Shilpa Bhat @ 2016-02-03 23:31 UTC (permalink / raw)
  To: Rafael J. Wysocki, Juri Lelli
  Cc: Viresh Kumar, Rafael Wysocki, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

Hi,

On 02/03/2016 10:50 PM, Rafael J. Wysocki wrote:
> On Wed, Feb 3, 2016 at 6:20 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>> On 03/02/16 21:40, Viresh Kumar wrote:
>>> On 03-02-16, 15:54, Juri Lelli wrote:
>>>> Ouch, I've just got this executing -f basic on Juno. :(
>>>> It happens with the hotplug_1_by_1 test.
>>>>
>>
>> [...]
>>
>>>
>>> Urg..
>>>
>>> I failed to understand it for now though. Please test only the first 4
>>> patches and leave the bottom three. AFAICT, this is caused by the 6th
>>> patch.
>>>
>>> The first 4 are important for 4.5 and must be tested soonish.
>>>
>>
>> First 4 look ok from a testing viewpoint.
>
> Good, thanks for the confirmation!
>
> I'm going to apply them and they will go to Linus next week.
>
> Thanks,
> Rafael

Sorry for the delayed report. But I see the below backtrace on Power8 box. It
has 4 chips with 128 cpus.

I see the below trace with the first four patches on running tests
from Viresh's testcase.
'./runme.sh -f basic'
 hit this trace at 'shuffle_governors_for_all_cpus' test.

[  906.762045] ======================================================
[  906.762114] [ INFO: possible circular locking dependency detected ]
[  906.762172] 4.5.0-rc2-sgb+ #96 Not tainted
[  906.762207] -------------------------------------------------------
[  906.762263] runme.sh/2840 is trying to acquire lock:
[  906.762309]  (s_active#91){++++.+}, at: [<c000000000407db8>]
kernfs_remove+0x48/0x70
[  906.762419]
but task is already holding lock:
[  906.762476]  (od_dbs_cdata.mutex){+.+.+.}, at: [<c000000000ad7594>]
cpufreq_governor_dbs+0x64/0x7e0
[  906.762592]
which lock already depends on the new lock.

[  906.762659]
the existing dependency chain (in reverse order) is:
[  906.762727]
-> #2 (od_dbs_cdata.mutex){+.+.+.}:
[  906.762807]        [<c000000000d485b0>] mutex_lock_nested+0x90/0x590
[  906.762877]        [<c000000000ad57f8>] update_sampling_rate+0x88/0x1c0
[  906.762946]        [<c000000000ad5990>] store_sampling_rate+0x60/0xa0
[  906.763013]        [<c000000000ad6af0>] governor_store+0x80/0xc0
[  906.763070]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
[  906.763128]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
[  906.763196]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
[  906.763254]        [<c0000000003490a0>] vfs_write+0xc0/0x200
[  906.763311]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
[  906.763369]        [<c00000000000926c>] system_call+0x38/0xd0
[  906.763427]
-> #1 (&dbs_data->mutex){+.+...}:
[  906.763495]        [<c000000000d485b0>] mutex_lock_nested+0x90/0x590
[  906.763563]        [<c000000000ad6ac0>] governor_store+0x50/0xc0
[  906.763620]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
[  906.763677]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
[  906.763745]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
[  906.763801]        [<c0000000003490a0>] vfs_write+0xc0/0x200
[  906.763859]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
[  906.763916]        [<c00000000000926c>] system_call+0x38/0xd0
[  906.763973]
-> #0 (s_active#91){++++.+}:
[  906.764052]        [<c00000000015f318>] lock_acquire+0xd8/0x1a0
[  906.764111]        [<c0000000004065f4>] __kernfs_remove+0x344/0x410
[  906.764179]        [<c000000000407db8>] kernfs_remove+0x48/0x70
[  906.764236]        [<c00000000040b868>] sysfs_remove_dir+0x78/0xd0
[  906.764304]        [<c0000000005eccec>] kobject_del+0x2c/0x80
[  906.764362]        [<c0000000005ec9e8>] kobject_release+0xa8/0x250
[  906.764430]        [<c000000000ad7c28>] cpufreq_governor_dbs+0x6f8/0x7e0
[  906.764497]        [<c000000000ad4bdc>] od_cpufreq_governor_dbs+0x3c/0x60
[  906.764567]        [<c000000000acf830>] __cpufreq_governor+0x1d0/0x390
[  906.764634]        [<c000000000ad0750>] cpufreq_set_policy+0x3b0/0x450
[  906.764703]        [<c000000000ad12cc>] store_scaling_governor+0x8c/0xf0
[  906.764771]        [<c000000000aced34>] store+0xb4/0x110
[  906.764828]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
[  906.764885]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
[  906.764952]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
[  906.765048]        [<c0000000003490a0>] vfs_write+0xc0/0x200
[  906.765160]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
[  906.765272]        [<c00000000000926c>] system_call+0x38/0xd0
[  906.765384]
other info that might help us debug this:

[  906.765522] Chain exists of:
  s_active#91 --> &dbs_data->mutex --> od_dbs_cdata.mutex

[  906.765768]  Possible unsafe locking scenario:

[  906.765880]        CPU0                    CPU1
[  906.765969]        ----                    ----
[  906.766058]   lock(od_dbs_cdata.mutex);
[  906.766170]                                lock(&dbs_data->mutex);
[  906.766304]                                lock(od_dbs_cdata.mutex);
[  906.766461]   lock(s_active#91);
[  906.766572]
 *** DEADLOCK ***

[  906.766686] 6 locks held by runme.sh/2840:
[  906.766756]  #0:  (sb_writers#6){.+.+.+}, at: [<c00000000034cf10>]
__sb_start_write+0x120/0x150
[  906.767002]  #1:  (&of->mutex){+.+.+.}, at: [<c00000000040939c>]
kernfs_fop_write+0x7c/0x1f0
[  906.767225]  #2:  (s_active#82){.+.+.+}, at: [<c0000000004093a8>]
kernfs_fop_write+0x88/0x1f0
[  906.767471]  #3:  (cpu_hotplug.lock){++++++}, at: [<c0000000000e06d8>]
get_online_cpus+0x48/0xc0
[  906.767676]  #4:  (&policy->rwsem){+++++.}, at: [<c000000000aced04>]
store+0x84/0x110
[  906.767878]  #5:  (od_dbs_cdata.mutex){+.+.+.}, at: [<c000000000ad7594>]
cpufreq_governor_dbs+0x64/0x7e0
[  906.768124]
stack backtrace:
[  906.768215] CPU: 0 PID: 2840 Comm: runme.sh Not tainted 4.5.0-rc2-sgb+ #96
[  906.768329] Call Trace:
[  906.768375] [c000007fe3126ec0] [c000000000d56530] dump_stack+0x90/0xbc
(unreliable)
[  906.768536] [c000007fe3126ef0] [c00000000015884c]
print_circular_bug+0x28c/0x3e0
[  906.768696] [c000007fe3126f90] [c00000000015ed88]
__lock_acquire+0x2278/0x22d0
[  906.768853] [c000007fe3127120] [c00000000015f318] lock_acquire+0xd8/0x1a0
[  906.768987] [c000007fe31271e0] [c0000000004065f4] __kernfs_remove+0x344/0x410
[  906.769121] [c000007fe31272e0] [c000000000407db8] kernfs_remove+0x48/0x70
[  906.769256] [c000007fe3127310] [c00000000040b868] sysfs_remove_dir+0x78/0xd0
[  906.769394] [c000007fe3127350] [c0000000005eccec] kobject_del+0x2c/0x80
[  906.769528] [c000007fe3127380] [c0000000005ec9e8] kobject_release+0xa8/0x250
[  906.769607] [c000007fe3127410] [c000000000ad7c28]
cpufreq_governor_dbs+0x6f8/0x7e0
[  906.769687] [c000007fe31274c0] [c000000000ad4bdc]
od_cpufreq_governor_dbs+0x3c/0x60
[  906.769766] [c000007fe3127500] [c000000000acf830]
__cpufreq_governor+0x1d0/0x390
[  906.769845] [c000007fe3127580] [c000000000ad0750]
cpufreq_set_policy+0x3b0/0x450
[  906.769924] [c000007fe3127610] [c000000000ad12cc]
store_scaling_governor+0x8c/0xf0
[  906.770003] [c000007fe3127c10] [c000000000aced34] store+0xb4/0x110
[  906.770071] [c000007fe3127c60] [c00000000040a8a4] sysfs_kf_write+0x94/0xc0
[  906.770139] [c000007fe3127ca0] [c0000000004094a8]
kernfs_fop_write+0x188/0x1f0
[  906.770221] [c000007fe3127cf0] [c000000000347b8c] __vfs_write+0x6c/0x180
[  906.770290] [c000007fe3127d90] [c0000000003490a0] vfs_write+0xc0/0x200
[  906.770358] [c000007fe3127de0] [c00000000034a3cc] SyS_write+0x6c/0x110
[  906.770426] [c000007fe3127e30] [c00000000000926c] system_call+0x38/0xd0

Thanks and Regards,
Shilpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 23:31         ` Shilpa Bhat
@ 2016-02-03 23:50           ` Rafael J. Wysocki
  2016-02-04  5:51             ` Viresh Kumar
  2016-02-04 11:09             ` Viresh Kumar
  0 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03 23:50 UTC (permalink / raw)
  To: Shilpa Bhat
  Cc: Rafael J. Wysocki, Juri Lelli, Viresh Kumar, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Saravana Kannan, Peter Zijlstra,
	Michael Turquette, Steve Muckle, Vincent Guittot,
	Morten Rasmussen, dietmar.eggemann, Linux Kernel Mailing List

On Thu, Feb 4, 2016 at 12:31 AM, Shilpa Bhat <shilpabhatppc@gmail.com> wrote:
> Hi,
>
> On 02/03/2016 10:50 PM, Rafael J. Wysocki wrote:
>> On Wed, Feb 3, 2016 at 6:20 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>> On 03/02/16 21:40, Viresh Kumar wrote:
>>>> On 03-02-16, 15:54, Juri Lelli wrote:
>>>>> Ouch, I've just got this executing -f basic on Juno. :(
>>>>> It happens with the hotplug_1_by_1 test.
>>>>>
>>>
>>> [...]
>>>
>>>>
>>>> Urg..
>>>>
>>>> I failed to understand it for now though. Please test only the first 4
>>>> patches and leave the bottom three. AFAICT, this is caused by the 6th
>>>> patch.
>>>>
>>>> The first 4 are important for 4.5 and must be tested soonish.
>>>>
>>>
>>> First 4 look ok from a testing viewpoint.
>>
>> Good, thanks for the confirmation!
>>
>> I'm going to apply them and they will go to Linus next week.
>>
>> Thanks,
>> Rafael
>
> Sorry for the delayed report. But I see the below backtrace on Power8 box. It
> has 4 chips with 128 cpus.

Thanks for the report.

> I see the below trace with the first four patches on running tests
> from Viresh's testcase.
> './runme.sh -f basic'
>  hit this trace at 'shuffle_governors_for_all_cpus' test.
>
> [  906.762045] ======================================================
> [  906.762114] [ INFO: possible circular locking dependency detected ]
> [  906.762172] 4.5.0-rc2-sgb+ #96 Not tainted
> [  906.762207] -------------------------------------------------------
> [  906.762263] runme.sh/2840 is trying to acquire lock:
> [  906.762309]  (s_active#91){++++.+}, at: [<c000000000407db8>]
> kernfs_remove+0x48/0x70
> [  906.762419]
> but task is already holding lock:
> [  906.762476]  (od_dbs_cdata.mutex){+.+.+.}, at: [<c000000000ad7594>]
> cpufreq_governor_dbs+0x64/0x7e0
> [  906.762592]
> which lock already depends on the new lock.
>
> [  906.762659]
> the existing dependency chain (in reverse order) is:
> [  906.762727]
> -> #2 (od_dbs_cdata.mutex){+.+.+.}:
> [  906.762807]        [<c000000000d485b0>] mutex_lock_nested+0x90/0x590
> [  906.762877]        [<c000000000ad57f8>] update_sampling_rate+0x88/0x1c0
> [  906.762946]        [<c000000000ad5990>] store_sampling_rate+0x60/0xa0
> [  906.763013]        [<c000000000ad6af0>] governor_store+0x80/0xc0
> [  906.763070]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
> [  906.763128]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
> [  906.763196]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
> [  906.763254]        [<c0000000003490a0>] vfs_write+0xc0/0x200
> [  906.763311]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
> [  906.763369]        [<c00000000000926c>] system_call+0x38/0xd0
> [  906.763427]
> -> #1 (&dbs_data->mutex){+.+...}:
> [  906.763495]        [<c000000000d485b0>] mutex_lock_nested+0x90/0x590
> [  906.763563]        [<c000000000ad6ac0>] governor_store+0x50/0xc0
> [  906.763620]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
> [  906.763677]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
> [  906.763745]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
> [  906.763801]        [<c0000000003490a0>] vfs_write+0xc0/0x200
> [  906.763859]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
> [  906.763916]        [<c00000000000926c>] system_call+0x38/0xd0
> [  906.763973]
> -> #0 (s_active#91){++++.+}:
> [  906.764052]        [<c00000000015f318>] lock_acquire+0xd8/0x1a0
> [  906.764111]        [<c0000000004065f4>] __kernfs_remove+0x344/0x410
> [  906.764179]        [<c000000000407db8>] kernfs_remove+0x48/0x70
> [  906.764236]        [<c00000000040b868>] sysfs_remove_dir+0x78/0xd0
> [  906.764304]        [<c0000000005eccec>] kobject_del+0x2c/0x80
> [  906.764362]        [<c0000000005ec9e8>] kobject_release+0xa8/0x250
> [  906.764430]        [<c000000000ad7c28>] cpufreq_governor_dbs+0x6f8/0x7e0
> [  906.764497]        [<c000000000ad4bdc>] od_cpufreq_governor_dbs+0x3c/0x60
> [  906.764567]        [<c000000000acf830>] __cpufreq_governor+0x1d0/0x390
> [  906.764634]        [<c000000000ad0750>] cpufreq_set_policy+0x3b0/0x450
> [  906.764703]        [<c000000000ad12cc>] store_scaling_governor+0x8c/0xf0
> [  906.764771]        [<c000000000aced34>] store+0xb4/0x110
> [  906.764828]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
> [  906.764885]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
> [  906.764952]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
> [  906.765048]        [<c0000000003490a0>] vfs_write+0xc0/0x200
> [  906.765160]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
> [  906.765272]        [<c00000000000926c>] system_call+0x38/0xd0
> [  906.765384]
> other info that might help us debug this:
>
> [  906.765522] Chain exists of:
>   s_active#91 --> &dbs_data->mutex --> od_dbs_cdata.mutex
>
> [  906.765768]  Possible unsafe locking scenario:
>
> [  906.765880]        CPU0                    CPU1
> [  906.765969]        ----                    ----
> [  906.766058]   lock(od_dbs_cdata.mutex);
> [  906.766170]                                lock(&dbs_data->mutex);
> [  906.766304]                                lock(od_dbs_cdata.mutex);
> [  906.766461]   lock(s_active#91);
> [  906.766572]
>  *** DEADLOCK ***

This is exactly right.  We've avoided one deadlock only to trip into
another one.

This happens because update_sampling_rate() acquires
od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
cpufreq_governor_dbs().

Worse yet, a deadlock can still happen without (the new)
dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
update_sampling_rate() runs in parallel with
cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
the race.

It looks like we need to drop the governor mutex before putting the
kobject in cpufreq_governor_exit().

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines
  2016-02-03 20:21   ` Saravana Kannan
@ 2016-02-04  1:49     ` Viresh Kumar
  0 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-04  1:49 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael Wysocki, juri.lelli, linaro-kernel, linux-pm, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03-02-16, 12:21, Saravana Kannan wrote:
> On 02/03/2016 06:02 AM, Viresh Kumar wrote:
> >The offline routine was separated into two halves earlier by
> >'commit 1aee40ac9c86 ("cpufreq: Invoke __cpufreq_remove_dev_finish()
> >after releasing cpu_hotplug.lock");.
> >
> >And the reasons cited were, race issues between accessing policy's sysfs
> >files and policy kobject's cleanup.
> >
> >That race isn't valid anymore, as we don't remove the policy & its
> >kobject completely on hotplugs, but do that from ->remove() callback of
> >subsys framework.
> >
> >These two routines can be merged back now.
> >
> >This is a preparatory step for the next patch, that will enforce
> >policy->rwsem lock around __cpufreq_governor() routines STOP/EXIT
> >sequence.
> 
> Is this stale text? Seems like this is now done in the *previous* patch?

No, the previous patch has fixed a single location only. The next
patch tried to fix others as well.

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 23:50           ` Rafael J. Wysocki
@ 2016-02-04  5:51             ` Viresh Kumar
  2016-02-04 11:09             ` Viresh Kumar
  1 sibling, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-04  5:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Shilpa Bhat, Juri Lelli, Rafael Wysocki, Lists linaro-kernel,
	linux-pm, Saravana Kannan, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 04-02-16, 00:50, Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2016 at 12:31 AM, Shilpa Bhat <shilpabhatppc@gmail.com> wrote:
> > Sorry for the delayed report. But I see the below backtrace on Power8 box. It
> > has 4 chips with 128 cpus.

Honestly, I wasn't expecting you to test this stuff, but I really
appreciate you doing that.

Thanks a lot ..

> > [  906.765768]  Possible unsafe locking scenario:
> >
> > [  906.765880]        CPU0                    CPU1
> > [  906.765969]        ----                    ----

This race scenario is perhaps incomplete and difficult to understand
without below lines:

                          Governor's EXIT         Update sampling rate from sysfs

                                                  lock(s_active#91);

> > [  906.766058]   lock(od_dbs_cdata.mutex);
> > [  906.766170]                                lock(&dbs_data->mutex);
> > [  906.766304]                                lock(od_dbs_cdata.mutex);
> > [  906.766461]   lock(s_active#91);
> > [  906.766572]
> >  *** DEADLOCK ***
> 
> This is exactly right.  We've avoided one deadlock only to trip into
> another one.

As we discussed on IRC, we haven't introduced this deadlock with the
current series.  But this is what Juri has reported some days back,
while he tested linus/master on TC2.

> This happens because update_sampling_rate() acquires
> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
> cpufreq_governor_dbs().
> 
> Worse yet, a deadlock can still happen without (the new)
> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
> update_sampling_rate() runs in parallel with
> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
> the race.
> 
> It looks like we need to drop the governor mutex before putting the
> kobject in cpufreq_governor_exit().

That wouldn't be trivial to implement as we discussed.

Okay, here is a proposal for the current series and the series's you
have post Rafael:

- Firstly, I would like to clarify that I don't have any issues with
  rebasing on top of your series, it should be easy enough.

- One thing is for sure that nothing from these 3 series's is getting
  merged in 4.5, as we aren't fixing the real issue Shilpa/Juril have
  reported.

- I think the first 4 patches here are just fine and don't need any
  updates. They actually do the right thing and makes code so much
  cleaner.

- So, can we apply the first 4 patches (which  you have already
  applied to bleeding-edge) now and do all work on top of that ?

Again, I can rebase if you merge your patches first, no issues at all
:)

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 16:10   ` Viresh Kumar
  2016-02-03 17:20     ` Juri Lelli
@ 2016-02-04  6:24     ` Viresh Kumar
  2016-02-04 12:17       ` Viresh Kumar
  1 sibling, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-04  6:24 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03-02-16, 21:40, Viresh Kumar wrote:
> On 03-02-16, 15:54, Juri Lelli wrote:
> > Ouch, I've just got this executing -f basic on Juno. :(
> > It happens with the hotplug_1_by_1 test.
> > 
> > 
> > [ 1086.531252] IRQ1 no longer affine to CPU1
> > [ 1086.531495] CPU1: shutdown
> > [ 1086.538199] psci: CPU1 killed.
> > [ 1086.583396]
> > [ 1086.584881] ======================================================
> > [ 1086.590999] [ INFO: possible circular locking dependency detected ]
> > [ 1086.597205] 4.5.0-rc2+ #37 Not tainted
> > [ 1086.600914] -------------------------------------------------------
> > [ 1086.607118] runme.sh/1052 is trying to acquire lock:
> > [ 1086.612031]  (sb_writers#7){.+.+.+}, at: [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> > [ 1086.620090]
> > [ 1086.620090] but task is already holding lock:
> > [ 1086.625865]  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
> > [ 1086.634081]
> > [ 1086.634081] which lock already depends on the new lock.
> > [ 1086.634081]
> > [ 1086.642180]
> > [ 1086.642180] the existing dependency chain (in reverse order) is:
> > [ 1086.649589]
> > -> #1 (&policy->rwsem){+++++.}:
> > [ 1086.653929]        [<ffffffc00011d9a4>] check_prev_add+0x670/0x754
> > [ 1086.660060]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> > [ 1086.666876]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> > [ 1086.673001]        [<ffffffc000120b58>] lock_release+0x244/0x570
> > [ 1086.678955]        [<ffffffc0007351d0>] __mutex_unlock_slowpath+0xa0/0x18c
> > [ 1086.685771]        [<ffffffc0007352dc>] mutex_unlock+0x20/0x2c
> > [ 1086.691553]        [<ffffffc0002ccd24>] kernfs_fop_write+0xb0/0x194
> > [ 1086.697768]        [<ffffffc00024478c>] __vfs_write+0x48/0x104
> > [ 1086.703550]        [<ffffffc0002457a4>] vfs_write+0x98/0x198
> > [ 1086.709161]        [<ffffffc0002465e4>] SyS_write+0x54/0xb0
> > [ 1086.714684]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> > [ 1086.720555]
> > -> #0 (sb_writers#7){.+.+.+}:
> > [ 1086.724730]        [<ffffffc00011c574>] print_circular_bug+0x80/0x2e4
> > [ 1086.731116]        [<ffffffc00011d470>] check_prev_add+0x13c/0x754
> > [ 1086.737243]        [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> > [ 1086.744059]        [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> > [ 1086.750184]        [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
> > [ 1086.756052]        [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
> > [ 1086.762180]        [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> > [ 1086.768306]        [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
> > [ 1086.774263]        [<ffffffc0002555f8>] do_last+0x660/0xcb8
> > [ 1086.779788]        [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
> > [ 1086.785570]        [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
> > [ 1086.791353]        [<ffffffc000244058>] do_sys_open+0x150/0x214
> > [ 1086.797222]        [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
> > [ 1086.802831]        [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> > [ 1086.808700]
> > [ 1086.808700] other info that might help us debug this:
> > [ 1086.808700]
> > [ 1086.816627]  Possible unsafe locking scenario:
> > [ 1086.816627]
> > [ 1086.822488]        CPU0                    CPU1
> > [ 1086.826971]        ----                    ----
> > [ 1086.831453]   lock(&policy->rwsem);
> > [ 1086.834918]                                lock(sb_writers#7);
> > [ 1086.840713]                                lock(&policy->rwsem);
> > [ 1086.846671]   lock(sb_writers#7);
> > [ 1086.849972]
> > [ 1086.849972]  *** DEADLOCK ***
> > [ 1086.849972]
> > [ 1086.855836] 1 lock held by runme.sh/1052:
> > [ 1086.859802]  #0:  (&policy->rwsem){+++++.}, at: [<ffffffc0005c8ee4>] cpufreq_offline+0x7c/0x278
> > [ 1086.868453]
> > [ 1086.868453] stack backtrace:
> > [ 1086.872769] CPU: 5 PID: 1052 Comm: runme.sh Not tainted 4.5.0-rc2+ #37
> > [ 1086.879229] Hardware name: ARM Juno development board (r2) (DT)
> > [ 1086.885089] Call trace:
> > [ 1086.887511] [<ffffffc00008a788>] dump_backtrace+0x0/0x1f4
> > [ 1086.892858] [<ffffffc00008a99c>] show_stack+0x20/0x28
> > [ 1086.897861] [<ffffffc00041a380>] dump_stack+0x84/0xc0
> > [ 1086.902863] [<ffffffc00011c6c8>] print_circular_bug+0x1d4/0x2e4
> > [ 1086.908725] [<ffffffc00011d470>] check_prev_add+0x13c/0x754
> > [ 1086.914244] [<ffffffc00011e1ac>] validate_chain.isra.36+0x724/0xa0c
> > [ 1086.920448] [<ffffffc00011f904>] __lock_acquire+0x4e4/0xba0
> > [ 1086.925965] [<ffffffc0001207f4>] lock_acquire+0xe4/0x204
> > [ 1086.931224] [<ffffffc000118da0>] percpu_down_read+0x50/0xe4
> > [ 1086.936742] [<ffffffc000249500>] __sb_start_write+0xcc/0xe0
> > [ 1086.942260] [<ffffffc00026ae90>] mnt_want_write+0x28/0x54
> > [ 1086.947605] [<ffffffc0002555f8>] do_last+0x660/0xcb8
> > [ 1086.952520] [<ffffffc000255cdc>] path_openat+0x8c/0x2b0
> > [ 1086.957693] [<ffffffc000256fbc>] do_filp_open+0x78/0xf0
> > [ 1086.962865] [<ffffffc000244058>] do_sys_open+0x150/0x214
> > [ 1086.968123] [<ffffffc0002441a0>] SyS_openat+0x3c/0x48
> > [ 1086.973124] [<ffffffc000085d30>] el0_svc_naked+0x24/0x28
> > [ 1087.019315] Detected PIPT I-cache on CPU1
> > [ 1087.019373] CPU1: Booted secondary processor [410fd080]
> 
> Urg..

Urg square :(

> I failed to understand it for now though. Please test only the first 4
> patches and leave the bottom three. AFAICT, this is caused by the 6th
> patch.

>From the code I still failed to understand this since sometime back
and I something just caught my eyes and the 6th patch needs this
fixup:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7bc8a5ed97e5..ac3348ecde7b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1351,7 +1351,7 @@ static void cpufreq_offline(unsigned int cpu)
                                pr_err("%s: Failed to start governor\n", __func__);
                }
 
-               return;
+               goto unlock;
        }
 
        if (cpufreq_driver->stop_cpu)
@@ -1373,6 +1373,8 @@ static void cpufreq_offline(unsigned int cpu)
                cpufreq_driver->exit(policy);
                policy->freq_table = NULL;
        }
+
+unlock:
        up_write(&policy->rwsem);
 }

I tried the basic tests using './runme' and they aren't reporting the
same lockdep now. And yes, your lockdep occurred on my exynos board as
well :)

I have re-pushed my patches again to the same branch. All 7 look fine
to me now :)

-- 
viresh

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock
  2016-02-03 14:02 ` [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock Viresh Kumar
@ 2016-02-04  6:43   ` Viresh Kumar
  0 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-04  6:43 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03-02-16, 19:32, Viresh Kumar wrote:
> We used to drop policy->rwsem just before calling __cpufreq_governor()
> in some cases earlier and so it was possible that __cpufreq_governor()
> runs concurrently via separate threads.
> 
> In order to guarantee valid state transitions for governors,
> 'governor_enabled' was required to be protected using some locking and
> we created cpufreq_governor_lock for that.
> 
> But now, __cpufreq_governor() is always called from within policy->rwsem
> held and so 'governor_enabled' is protected against races even without
> cpufreq_governor_lock.
> 
> Get rid of the extra lock now.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 4fc3889ca7c9..7bc8a5ed97e5 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list);
>  static struct cpufreq_driver *cpufreq_driver;
>  static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
>  static DEFINE_RWLOCK(cpufreq_driver_lock);
> -DEFINE_MUTEX(cpufreq_governor_lock);
>  
>  /* Flag to suspend/resume CPUFreq governors */
>  static bool cpufreq_suspended;
> @@ -1963,11 +1962,9 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  
>  	pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
>  
> -	mutex_lock(&cpufreq_governor_lock);
>  	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
>  	    || (!policy->governor_enabled
>  	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
> -		mutex_unlock(&cpufreq_governor_lock);
>  		return -EBUSY;
>  	}
>  
> @@ -1976,8 +1973,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  	else if (event == CPUFREQ_GOV_START)
>  		policy->governor_enabled = true;
>  
> -	mutex_unlock(&cpufreq_governor_lock);
> -
>  	ret = policy->governor->governor(policy, event);
>  
>  	if (!ret) {
> @@ -1987,12 +1982,10 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  			policy->governor->initialized--;
>  	} else {
>  		/* Restore original values */
> -		mutex_lock(&cpufreq_governor_lock);
>  		if (event == CPUFREQ_GOV_STOP)
>  			policy->governor_enabled = true;
>  		else if (event == CPUFREQ_GOV_START)
>  			policy->governor_enabled = false;
> -		mutex_unlock(&cpufreq_governor_lock);
>  	}
>  
>  	if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||

+ minor cleanup:

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index ed328a39c4ac..7bed63e14e7d 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -214,7 +214,6 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
        return delay;
 }
 
-extern struct mutex cpufreq_governor_lock;
 extern const struct sysfs_ops governor_sysfs_ops;
 
 void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);

-- 
viresh

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-03 23:50           ` Rafael J. Wysocki
  2016-02-04  5:51             ` Viresh Kumar
@ 2016-02-04 11:09             ` Viresh Kumar
  2016-02-04 17:43               ` Saravana Kannan
  1 sibling, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-04 11:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Shilpa Bhat, Juri Lelli, Rafael Wysocki, Lists linaro-kernel,
	linux-pm, Saravana Kannan, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 04-02-16, 00:50, Rafael J. Wysocki wrote:
> This is exactly right.  We've avoided one deadlock only to trip into
> another one.
> 
> This happens because update_sampling_rate() acquires
> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
> cpufreq_governor_dbs().
> 
> Worse yet, a deadlock can still happen without (the new)
> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
> update_sampling_rate() runs in parallel with
> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
> the race.
> 
> It looks like we need to drop the governor mutex before putting the
> kobject in cpufreq_governor_exit().

I have tried to explore all possible ways of fixing this, and every
other way looked to be racy in some way.

Does anyone else have a better idea (untested):

-------------------------8<-------------------------

Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a separate
 work

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.h |  2 ++
 drivers/cpufreq/cpufreq_ondemand.c | 39 +++++++++++++++++++++++++++++---------
 2 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 7bed63e14e7d..97e604356b20 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -141,6 +141,8 @@ struct od_dbs_tuners {
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
 	unsigned int min_sampling_rate;
+	struct work_struct work;
+	struct dbs_data *dbs_data;
 };
 
 struct cs_dbs_tuners {
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 82ed490f7de0..93ad7a226aee 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata;
  * reducing the sampling rate, we need to make the new value effective
  * immediately.
  */
-static void update_sampling_rate(struct dbs_data *dbs_data,
-		unsigned int new_rate)
+static void update_sampling_rate(struct work_struct *work)
 {
-	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+	struct od_dbs_tuners *od_tuners = container_of(work, struct
+						       od_dbs_tuners, work);
+	unsigned int new_rate = od_tuners->sampling_rate;
+	struct dbs_data *dbs_data = od_tuners->dbs_data;
 	struct cpumask cpumask;
 	int cpu;
 
-	od_tuners->sampling_rate = new_rate = max(new_rate,
-			od_tuners->min_sampling_rate);
-
 	/*
 	 * Lock governor so that governor start/stop can't execute in parallel.
+	 *
+	 * We can't do a regular mutex_lock() here, as that may deadlock against
+	 * another thread performing CPUFREQ_GOV_POLICY_EXIT event on the
+	 * governor, which might have already taken od_dbs_cdata.mutex and is
+	 * waiting for this work to finish.
 	 */
-	mutex_lock(&od_dbs_cdata.mutex);
+	if (!mutex_trylock(&od_dbs_cdata.mutex)) {
+		queue_work(system_wq, &od_tuners->work);
+		return;
+	}
 
 	cpumask_copy(&cpumask, cpu_online_mask);
 
@@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
 		size_t count)
 {
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 	if (ret != 1)
 		return -EINVAL;
 
-	update_sampling_rate(dbs_data, input);
+	od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate);
+
+	/*
+	 * update_sampling_rate() requires to hold od_dbs_cdata.mutex, but we
+	 * can't take that from this thread, otherwise it results in ABBA
+	 * lockdep between s_active and od_dbs_cdata.mutex locks.
+	 */
+	queue_work(system_wq, &od_tuners->work);
+
 	return count;
 }
 
@@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
 	tuners->ignore_nice_load = 0;
 	tuners->powersave_bias = default_powersave_bias;
 	tuners->io_is_busy = should_io_be_busy();
+	INIT_WORK(&tuners->work, update_sampling_rate);
+	tuners->dbs_data = dbs_data;
 
 	dbs_data->tuners = tuners;
 	return 0;
@@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
 
 static void od_exit(struct dbs_data *dbs_data, bool notify)
 {
-	kfree(dbs_data->tuners);
+	struct od_dbs_tuners *tuners = dbs_data->tuners;
+
+	cancel_work_sync(&tuners->work);
+	kfree(tuners);
 }
 
 define_get_cpu_dbs_routines(od_cpu_dbs_info);

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04  6:24     ` Viresh Kumar
@ 2016-02-04 12:17       ` Viresh Kumar
  2016-02-04 20:50         ` Shilpasri G Bhat
  0 siblings, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-04 12:17 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 04-02-16, 11:54, Viresh Kumar wrote:
> From the code I still failed to understand this since sometime back
> and I something just caught my eyes and the 6th patch needs this
> fixup:
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 7bc8a5ed97e5..ac3348ecde7b 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1351,7 +1351,7 @@ static void cpufreq_offline(unsigned int cpu)
>                                 pr_err("%s: Failed to start governor\n", __func__);
>                 }
>  
> -               return;
> +               goto unlock;
>         }
>  
>         if (cpufreq_driver->stop_cpu)
> @@ -1373,6 +1373,8 @@ static void cpufreq_offline(unsigned int cpu)
>                 cpufreq_driver->exit(policy);
>                 policy->freq_table = NULL;
>         }
> +
> +unlock:
>         up_write(&policy->rwsem);
>  }
> 
> I tried the basic tests using './runme' and they aren't reporting the
> same lockdep now. And yes, your lockdep occurred on my exynos board as
> well :)
> 
> I have re-pushed my patches again to the same branch. All 7 look fine
> to me now :)

FWIW, Juri has reported on IRC that the above diff fixed the lockdep
he reported yesterday and all the 7 patches are working fine on his
test machine, Juno.

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 11:09             ` Viresh Kumar
@ 2016-02-04 17:43               ` Saravana Kannan
  2016-02-04 17:44                 ` Saravana Kannan
  0 siblings, 1 reply; 36+ messages in thread
From: Saravana Kannan @ 2016-02-04 17:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Shilpa Bhat, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02/04/2016 03:09 AM, Viresh Kumar wrote:
> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
>> This is exactly right.  We've avoided one deadlock only to trip into
>> another one.
>>
>> This happens because update_sampling_rate() acquires
>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
>> cpufreq_governor_dbs().
>>
>> Worse yet, a deadlock can still happen without (the new)
>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
>> update_sampling_rate() runs in parallel with
>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
>> the race.
>>
>> It looks like we need to drop the governor mutex before putting the
>> kobject in cpufreq_governor_exit().
>
> I have tried to explore all possible ways of fixing this, and every
> other way looked to be racy in some way.
>
> Does anyone else have a better idea (untested):
>
> -------------------------8<-------------------------
>
> Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a separate
>   work
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>   drivers/cpufreq/cpufreq_governor.h |  2 ++
>   drivers/cpufreq/cpufreq_ondemand.c | 39 +++++++++++++++++++++++++++++---------
>   2 files changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
> index 7bed63e14e7d..97e604356b20 100644
> --- a/drivers/cpufreq/cpufreq_governor.h
> +++ b/drivers/cpufreq/cpufreq_governor.h
> @@ -141,6 +141,8 @@ struct od_dbs_tuners {
>   	unsigned int powersave_bias;
>   	unsigned int io_is_busy;
>   	unsigned int min_sampling_rate;
> +	struct work_struct work;
> +	struct dbs_data *dbs_data;
>   };
>
>   struct cs_dbs_tuners {
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index 82ed490f7de0..93ad7a226aee 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata;
>    * reducing the sampling rate, we need to make the new value effective
>    * immediately.
>    */
> -static void update_sampling_rate(struct dbs_data *dbs_data,
> -		unsigned int new_rate)
> +static void update_sampling_rate(struct work_struct *work)
>   {
> -	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
> +	struct od_dbs_tuners *od_tuners = container_of(work, struct
> +						       od_dbs_tuners, work);
> +	unsigned int new_rate = od_tuners->sampling_rate;
> +	struct dbs_data *dbs_data = od_tuners->dbs_data;
>   	struct cpumask cpumask;
>   	int cpu;
>
> -	od_tuners->sampling_rate = new_rate = max(new_rate,
> -			od_tuners->min_sampling_rate);
> -
>   	/*
>   	 * Lock governor so that governor start/stop can't execute in parallel.
> +	 *
> +	 * We can't do a regular mutex_lock() here, as that may deadlock against
> +	 * another thread performing CPUFREQ_GOV_POLICY_EXIT event on the
> +	 * governor, which might have already taken od_dbs_cdata.mutex and is
> +	 * waiting for this work to finish.
>   	 */
> -	mutex_lock(&od_dbs_cdata.mutex);
> +	if (!mutex_trylock(&od_dbs_cdata.mutex)) {
> +		queue_work(system_wq, &od_tuners->work);
> +		return;
> +	}
>
>   	cpumask_copy(&cpumask, cpu_online_mask);
>
> @@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
>   static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
>   		size_t count)
>   {
> +	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>   	unsigned int input;
>   	int ret;
>   	ret = sscanf(buf, "%u", &input);
>   	if (ret != 1)
>   		return -EINVAL;
>
> -	update_sampling_rate(dbs_data, input);
> +	od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate);
> +
> +	/*
> +	 * update_sampling_rate() requires to hold od_dbs_cdata.mutex, but we
> +	 * can't take that from this thread, otherwise it results in ABBA
> +	 * lockdep between s_active and od_dbs_cdata.mutex locks.
> +	 */
> +	queue_work(system_wq, &od_tuners->work);
> +
>   	return count;
>   }
>
> @@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
>   	tuners->ignore_nice_load = 0;
>   	tuners->powersave_bias = default_powersave_bias;
>   	tuners->io_is_busy = should_io_be_busy();
> +	INIT_WORK(&tuners->work, update_sampling_rate);
> +	tuners->dbs_data = dbs_data;
>
>   	dbs_data->tuners = tuners;
>   	return 0;
> @@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
>
>   static void od_exit(struct dbs_data *dbs_data, bool notify)
>   {
> -	kfree(dbs_data->tuners);
> +	struct od_dbs_tuners *tuners = dbs_data->tuners;
> +
> +	cancel_work_sync(&tuners->work);
> +	kfree(tuners);
>   }
>
>   define_get_cpu_dbs_routines(od_cpu_dbs_info);
>

No no no no! Let's not open up this can of worms of queuing up the work 
to handle a write to a sysfs file. It *MIGHT* work for this specific 
tunable (I haven't bothered to analyze), but this makes it impossible to 
return a useful/proper error value.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 17:43               ` Saravana Kannan
@ 2016-02-04 17:44                 ` Saravana Kannan
  2016-02-04 18:18                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 36+ messages in thread
From: Saravana Kannan @ 2016-02-04 17:44 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Shilpa Bhat, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02/04/2016 09:43 AM, Saravana Kannan wrote:
> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
>>> This is exactly right.  We've avoided one deadlock only to trip into
>>> another one.
>>>
>>> This happens because update_sampling_rate() acquires
>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
>>> cpufreq_governor_dbs().
>>>
>>> Worse yet, a deadlock can still happen without (the new)
>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
>>> update_sampling_rate() runs in parallel with
>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
>>> the race.
>>>
>>> It looks like we need to drop the governor mutex before putting the
>>> kobject in cpufreq_governor_exit().
>>
>> I have tried to explore all possible ways of fixing this, and every
>> other way looked to be racy in some way.
>>
>> Does anyone else have a better idea (untested):
>>
>> -------------------------8<-------------------------
>>
>> Subject: [PATCH] cpufreq: ondemand: Shoot update_sampling_rate with a
>> separate
>>   work
>>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>>   drivers/cpufreq/cpufreq_governor.h |  2 ++
>>   drivers/cpufreq/cpufreq_ondemand.c | 39
>> +++++++++++++++++++++++++++++---------
>>   2 files changed, 32 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.h
>> b/drivers/cpufreq/cpufreq_governor.h
>> index 7bed63e14e7d..97e604356b20 100644
>> --- a/drivers/cpufreq/cpufreq_governor.h
>> +++ b/drivers/cpufreq/cpufreq_governor.h
>> @@ -141,6 +141,8 @@ struct od_dbs_tuners {
>>       unsigned int powersave_bias;
>>       unsigned int io_is_busy;
>>       unsigned int min_sampling_rate;
>> +    struct work_struct work;
>> +    struct dbs_data *dbs_data;
>>   };
>>
>>   struct cs_dbs_tuners {
>> diff --git a/drivers/cpufreq/cpufreq_ondemand.c
>> b/drivers/cpufreq/cpufreq_ondemand.c
>> index 82ed490f7de0..93ad7a226aee 100644
>> --- a/drivers/cpufreq/cpufreq_ondemand.c
>> +++ b/drivers/cpufreq/cpufreq_ondemand.c
>> @@ -242,20 +242,27 @@ static struct common_dbs_data od_dbs_cdata;
>>    * reducing the sampling rate, we need to make the new value effective
>>    * immediately.
>>    */
>> -static void update_sampling_rate(struct dbs_data *dbs_data,
>> -        unsigned int new_rate)
>> +static void update_sampling_rate(struct work_struct *work)
>>   {
>> -    struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>> +    struct od_dbs_tuners *od_tuners = container_of(work, struct
>> +                               od_dbs_tuners, work);
>> +    unsigned int new_rate = od_tuners->sampling_rate;
>> +    struct dbs_data *dbs_data = od_tuners->dbs_data;
>>       struct cpumask cpumask;
>>       int cpu;
>>
>> -    od_tuners->sampling_rate = new_rate = max(new_rate,
>> -            od_tuners->min_sampling_rate);
>> -
>>       /*
>>        * Lock governor so that governor start/stop can't execute in
>> parallel.
>> +     *
>> +     * We can't do a regular mutex_lock() here, as that may deadlock
>> against
>> +     * another thread performing CPUFREQ_GOV_POLICY_EXIT event on the
>> +     * governor, which might have already taken od_dbs_cdata.mutex
>> and is
>> +     * waiting for this work to finish.
>>        */
>> -    mutex_lock(&od_dbs_cdata.mutex);
>> +    if (!mutex_trylock(&od_dbs_cdata.mutex)) {
>> +        queue_work(system_wq, &od_tuners->work);
>> +        return;
>> +    }
>>
>>       cpumask_copy(&cpumask, cpu_online_mask);
>>
>> @@ -311,13 +318,22 @@ static void update_sampling_rate(struct dbs_data
>> *dbs_data,
>>   static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const
>> char *buf,
>>           size_t count)
>>   {
>> +    struct od_dbs_tuners *od_tuners = dbs_data->tuners;
>>       unsigned int input;
>>       int ret;
>>       ret = sscanf(buf, "%u", &input);
>>       if (ret != 1)
>>           return -EINVAL;
>>
>> -    update_sampling_rate(dbs_data, input);
>> +    od_tuners->sampling_rate = max(input, od_tuners->min_sampling_rate);
>> +
>> +    /*
>> +     * update_sampling_rate() requires to hold od_dbs_cdata.mutex,
>> but we
>> +     * can't take that from this thread, otherwise it results in ABBA
>> +     * lockdep between s_active and od_dbs_cdata.mutex locks.
>> +     */
>> +    queue_work(system_wq, &od_tuners->work);
>> +
>>       return count;
>>   }
>>
>> @@ -501,6 +517,8 @@ static int od_init(struct dbs_data *dbs_data, bool
>> notify)
>>       tuners->ignore_nice_load = 0;
>>       tuners->powersave_bias = default_powersave_bias;
>>       tuners->io_is_busy = should_io_be_busy();
>> +    INIT_WORK(&tuners->work, update_sampling_rate);
>> +    tuners->dbs_data = dbs_data;
>>
>>       dbs_data->tuners = tuners;
>>       return 0;
>> @@ -508,7 +526,10 @@ static int od_init(struct dbs_data *dbs_data,
>> bool notify)
>>
>>   static void od_exit(struct dbs_data *dbs_data, bool notify)
>>   {
>> -    kfree(dbs_data->tuners);
>> +    struct od_dbs_tuners *tuners = dbs_data->tuners;
>> +
>> +    cancel_work_sync(&tuners->work);
>> +    kfree(tuners);
>>   }
>>
>>   define_get_cpu_dbs_routines(od_cpu_dbs_info);
>>
>
> No no no no! Let's not open up this can of worms of queuing up the work
> to handle a write to a sysfs file. It *MIGHT* work for this specific
> tunable (I haven't bothered to analyze), but this makes it impossible to
> return a useful/proper error value.

Sent too soon. Not only that, but it can also cause the writes to the 
sysfs files to get processed in a different order and I don't know what 
other issues/races THAT will open up.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 17:44                 ` Saravana Kannan
@ 2016-02-04 18:18                   ` Rafael J. Wysocki
  2016-02-05  2:44                     ` Viresh Kumar
  2016-02-05  3:54                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-04 18:18 UTC (permalink / raw)
  To: Saravana Kannan, Viresh Kumar
  Cc: Rafael J. Wysocki, Shilpa Bhat, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
> On 02/04/2016 09:43 AM, Saravana Kannan wrote:
>>
>> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
>>>
>>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
>>>>
>>>> This is exactly right.  We've avoided one deadlock only to trip into
>>>> another one.
>>>>
>>>> This happens because update_sampling_rate() acquires
>>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
>>>> cpufreq_governor_dbs().
>>>>
>>>> Worse yet, a deadlock can still happen without (the new)
>>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
>>>> update_sampling_rate() runs in parallel with
>>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
>>>> the race.
>>>>
>>>> It looks like we need to drop the governor mutex before putting the
>>>> kobject in cpufreq_governor_exit().
>>>

[cut]

>>
>> No no no no! Let's not open up this can of worms of queuing up the work
>> to handle a write to a sysfs file. It *MIGHT* work for this specific
>> tunable (I haven't bothered to analyze), but this makes it impossible to
>> return a useful/proper error value.
>
>
> Sent too soon. Not only that, but it can also cause the writes to the sysfs
> files to get processed in a different order and I don't know what other
> issues/races THAT will open up.

Well, I don't like this too.

I actually do have an idea about how to fix these deadlocks, but it is
on top of my cleanup series.

I'll write more about it later today.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 12:17       ` Viresh Kumar
@ 2016-02-04 20:50         ` Shilpasri G Bhat
  2016-02-05  2:49           ` Viresh Kumar
  0 siblings, 1 reply; 36+ messages in thread
From: Shilpasri G Bhat @ 2016-02-04 20:50 UTC (permalink / raw)
  To: Viresh Kumar, Juri Lelli
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

Hi,

> 
> FWIW, Juri has reported on IRC that the above diff fixed the lockdep
> he reported yesterday and all the 7 patches are working fine on his
> test machine, Juno.
> 

I could see the previous lockdep warnings on pm/next and on top of patch[4].
On Patch[5-7] I see the below lockdep trace on running './runme' on a Power8 box.

[  710.332841] ======================================================
[  710.332911] [ INFO: possible circular locking dependency detected ]
[  710.332969] 4.5.0-rc2-sgb+ #104 Not tainted
[  710.333004] -------------------------------------------------------
[  710.333060] runme.sh/2476 is trying to acquire lock:
[  710.333107]  (s_active#91){++++.+}, at: [<c000000000407db8>]
kernfs_remove+0x48/0x70
[  710.333215]
but task is already holding lock:
[  710.333272]  (od_dbs_cdata.mutex){+.+.+.}, at: [<c000000000ad7434>]
cpufreq_governor_dbs+0x64/0x7e0
[  710.333388]
which lock already depends on the new lock.

[  710.333456]
the existing dependency chain (in reverse order) is:
[  710.333523]
-> #2 (od_dbs_cdata.mutex){+.+.+.}:
[  710.333604]        [<c000000000d48450>] mutex_lock_nested+0x90/0x590
[  710.333673]        [<c000000000ad5698>] update_sampling_rate+0x88/0x1c0
[  710.333741]        [<c000000000ad5830>] store_sampling_rate+0x60/0xa0
[  710.333809]        [<c000000000ad6990>] governor_store+0x80/0xc0
[  710.333865]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
[  710.333923]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
[  710.333991]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
[  710.334049]        [<c0000000003490a0>] vfs_write+0xc0/0x200
[  710.334107]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
[  710.334163]        [<c00000000000926c>] system_call+0x38/0xd0
[  710.334222]
-> #1 (&dbs_data->mutex){+.+...}:
[  710.334290]        [<c000000000d48450>] mutex_lock_nested+0x90/0x590
[  710.334358]        [<c000000000ad6960>] governor_store+0x50/0xc0
[  710.334415]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
[  710.334471]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
[  710.334539]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
[  710.334596]        [<c0000000003490a0>] vfs_write+0xc0/0x200
[  710.334653]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
[  710.334710]        [<c00000000000926c>] system_call+0x38/0xd0
[  710.334767]
-> #0 (s_active#91){++++.+}:
[  710.334847]        [<c00000000015f318>] lock_acquire+0xd8/0x1a0
[  710.334905]        [<c0000000004065f4>] __kernfs_remove+0x344/0x410
[  710.334973]        [<c000000000407db8>] kernfs_remove+0x48/0x70
[  710.335030]        [<c00000000040b868>] sysfs_remove_dir+0x78/0xd0
[  710.335098]        [<c0000000005eccec>] kobject_del+0x2c/0x80
[  710.335156]        [<c0000000005ec9e8>] kobject_release+0xa8/0x250
[  710.335224]        [<c000000000ad7ac8>] cpufreq_governor_dbs+0x6f8/0x7e0
[  710.335292]        [<c000000000ad4a7c>] od_cpufreq_governor_dbs+0x3c/0x60
[  710.335361]        [<c000000000acf7c4>] __cpufreq_governor+0x164/0x300
[  710.335429]        [<c000000000ad0600>] cpufreq_set_policy+0x3b0/0x450
[  710.335497]        [<c000000000ad117c>] store_scaling_governor+0x8c/0xf0
[  710.335565]        [<c000000000aced34>] store+0xb4/0x110
[  710.335622]        [<c00000000040a8a4>] sysfs_kf_write+0x94/0xc0
[  710.335679]        [<c0000000004094a8>] kernfs_fop_write+0x188/0x1f0
[  710.335747]        [<c000000000347b8c>] __vfs_write+0x6c/0x180
[  710.335803]        [<c0000000003490a0>] vfs_write+0xc0/0x200
[  710.335861]        [<c00000000034a3cc>] SyS_write+0x6c/0x110
[  710.335918]        [<c00000000000926c>] system_call+0x38/0xd0
[  710.335993]
other info that might help us debug this:

[  710.336130] Chain exists of:
  s_active#91 --> &dbs_data->mutex --> od_dbs_cdata.mutex

[  710.336376]  Possible unsafe locking scenario:

[  710.336488]        CPU0                    CPU1
[  710.336577]        ----                    ----
[  710.336666]   lock(od_dbs_cdata.mutex);
[  710.336778]                                lock(&dbs_data->mutex);
[  710.336911]                                lock(od_dbs_cdata.mutex);
[  710.337064]   lock(s_active#91);
[  710.337176]
 *** DEADLOCK ***

[  710.337289] 6 locks held by runme.sh/2476:
[  710.337355]  #0:  (sb_writers#6){.+.+.+}, at: [<c00000000034cf10>]
__sb_start_write+0x120/0x150
[  710.337600]  #1:  (&of->mutex){+.+.+.}, at: [<c00000000040939c>]
kernfs_fop_write+0x7c/0x1f0
[  710.337824]  #2:  (s_active#82){.+.+.+}, at: [<c0000000004093a8>]
kernfs_fop_write+0x88/0x1f0
[  710.338070]  #3:  (cpu_hotplug.lock){++++++}, at: [<c0000000000e06d8>]
get_online_cpus+0x48/0xc0
[  710.338276]  #4:  (&policy->rwsem){+++++.}, at: [<c000000000aced04>]
store+0x84/0x110
[  710.338476]  #5:  (od_dbs_cdata.mutex){+.+.+.}, at: [<c000000000ad7434>]
cpufreq_governor_dbs+0x64/0x7e0
[  710.338722]
stack backtrace:
[  710.338813] CPU: 0 PID: 2476 Comm: runme.sh Not tainted 4.5.0-rc2-sgb+ #104
[  710.338929] Call Trace:
[  710.338978] [c000005fd40eaec0] [c000000000d563d0] dump_stack+0x90/0xbc
(unreliable)
[  710.339138] [c000005fd40eaef0] [c00000000015884c] print_circular_bug+0x28c/0x3e0
[  710.339295] [c000005fd40eaf90] [c00000000015ed88] __lock_acquire+0x2278/0x22d0
[  710.339455] [c000005fd40eb120] [c00000000015f318] lock_acquire+0xd8/0x1a0
[  710.339589] [c000005fd40eb1e0] [c0000000004065f4] __kernfs_remove+0x344/0x410
[  710.339724] [c000005fd40eb2e0] [c000000000407db8] kernfs_remove+0x48/0x70
[  710.339859] [c000005fd40eb310] [c00000000040b868] sysfs_remove_dir+0x78/0xd0
[  710.339993] [c000005fd40eb350] [c0000000005eccec] kobject_del+0x2c/0x80
[  710.340128] [c000005fd40eb380] [c0000000005ec9e8] kobject_release+0xa8/0x250
[  710.340265] [c000005fd40eb410] [c000000000ad7ac8]
cpufreq_governor_dbs+0x6f8/0x7e0
[  710.340423] [c000005fd40eb4c0] [c000000000ad4a7c]
od_cpufreq_governor_dbs+0x3c/0x60
[  710.340561] [c000005fd40eb500] [c000000000acf7c4] __cpufreq_governor+0x164/0x300
[  710.340639] [c000005fd40eb580] [c000000000ad0600] cpufreq_set_policy+0x3b0/0x450
[  710.340719] [c000005fd40eb610] [c000000000ad117c]
store_scaling_governor+0x8c/0xf0
[  710.340797] [c000005fd40ebc10] [c000000000aced34] store+0xb4/0x110
[  710.340866] [c000005fd40ebc60] [c00000000040a8a4] sysfs_kf_write+0x94/0xc0
[  710.340934] [c000005fd40ebca0] [c0000000004094a8] kernfs_fop_write+0x188/0x1f0
[  710.341013] [c000005fd40ebcf0] [c000000000347b8c] __vfs_write+0x6c/0x180
[  710.341081] [c000005fd40ebd90] [c0000000003490a0] vfs_write+0xc0/0x200
[  710.341151] [c000005fd40ebde0] [c00000000034a3cc] SyS_write+0x6c/0x110
[  710.341219] [c000005fd40ebe30] [c00000000000926c] system_call+0x38/0xd0

Thanks and Regards,
Shilpa

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable
  2016-02-03 14:02 ` [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable Viresh Kumar
@ 2016-02-05  2:31   ` Rafael J. Wysocki
  2016-02-05  2:47     ` Viresh Kumar
  0 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-05  2:31 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: juri.lelli, linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On Wednesday, February 03, 2016 07:32:17 PM Viresh Kumar wrote:
> The min_sampling_rate governor tunable is a field in struct dbs_data, so
> it has to be handled in a special way separate from the rest of governor
> tunables.  In particular, that requires a special macro to be present
> for creating its show/store sysfs attribute callbacks.
> 
> However, there is no real need for the data structures and code in
> question to be arranged this way and if min_sampling_rate is moved to
> data structures holding the other governor tunables, the sysfs attribute
> creation macros that work with those tunables will also work with
> min_sampling_rate and the special macro for it won't be necessary any
> more.  That will make it easier to modify the governor code going
> forward, so do it.
> 
> [ Rafael: Written changelog ]
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

I'm having some second thoughts about the utility of this patch to be honest.

I actually would like to move some tunables in the opposite direction.  That is,
from struct od_dbs_tuners and struct cs_dbs_tuners to struct dbs_data.  The
tuners field in that will then become something like gov_tunables (in analogy
with gov_ops in struct common_dbs_data) and it will point to governor-specific
tunables.

The reason why I'd like to do that is to make it easier to get rid of the
super-ugly governor == GOV_CONSERVATIVE etc tests in the common code.

Also I think that governor-specific tunables should be defined in the .c file
for that governor rather than in the common header.

We will need two set of macros for their sysfs attributes then, but that's
not a big deal IMO.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 18:18                   ` Rafael J. Wysocki
@ 2016-02-05  2:44                     ` Viresh Kumar
  2016-02-05  3:54                     ` Rafael J. Wysocki
  1 sibling, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-05  2:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Saravana Kannan, Shilpa Bhat, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 04-02-16, 19:18, Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
> > On 02/04/2016 09:43 AM, Saravana Kannan wrote:

> >> No no no no! Let's not open up this can of worms of queuing up the work
> >> to handle a write to a sysfs file. It *MIGHT* work for this specific
> >> tunable (I haven't bothered to analyze), but this makes it impossible to
> >> return a useful/proper error value.
> >
> >
> > Sent too soon. Not only that, but it can also cause the writes to the sysfs
> > files to get processed in a different order and I don't know what other
> > issues/races THAT will open up.
> 
> Well, I don't like this too.

I expected similar responses only, so no surprises for me :)

Though there are few things I would like to tell here:
- There wouldn't be any race for updating the file, as that is done
  directly from store_sampling_rate(). It updates the *real* file we
  wanted to.

- What's offloaded to the work-handler is something very special about
  ondemand governor and sampling rate. The same is not done for
  conservative governor as well, don't know why though.

- After updating the sampling rate, we assess if we need to reschedule
  the timers/workqueue to a different time for better efficiency. I
  don't think there can be a race there and it can be safely done in a
  work..

> I actually do have an idea about how to fix these deadlocks, but it is
> on top of my cleanup series.

I would like to hear that, if you can, to save your time. I have tried
so many different ways of fixing this yesterday and found issue
everywhere :(

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable
  2016-02-05  2:31   ` Rafael J. Wysocki
@ 2016-02-05  2:47     ` Viresh Kumar
  0 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-05  2:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: juri.lelli, linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 05-02-16, 03:31, Rafael J. Wysocki wrote:
> I'm having some second thoughts about the utility of this patch to be honest.
> 
> I actually would like to move some tunables in the opposite direction.  That is,
> from struct od_dbs_tuners and struct cs_dbs_tuners to struct dbs_data.  The
> tuners field in that will then become something like gov_tunables (in analogy
> with gov_ops in struct common_dbs_data) and it will point to governor-specific
> tunables.
> 
> The reason why I'd like to do that is to make it easier to get rid of the
> super-ugly governor == GOV_CONSERVATIVE etc tests in the common code.
> 
> Also I think that governor-specific tunables should be defined in the .c file
> for that governor rather than in the common header.
> 
> We will need two set of macros for their sysfs attributes then, but that's
> not a big deal IMO.

I agree with that, no issues from my side.

But, this patch was actually required to kill the ugly macros. So, if
we are planning to take this series as is, then maybe we can keep it
for now and fix everything together with your patches :)

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 20:50         ` Shilpasri G Bhat
@ 2016-02-05  2:49           ` Viresh Kumar
  0 siblings, 0 replies; 36+ messages in thread
From: Viresh Kumar @ 2016-02-05  2:49 UTC (permalink / raw)
  To: Shilpasri G Bhat
  Cc: Juri Lelli, Rafael Wysocki, linaro-kernel, linux-pm, skannan,
	peterz, mturquette, steve.muckle, vincent.guittot,
	morten.rasmussen, dietmar.eggemann, linux-kernel

On 05-02-16, 02:20, Shilpasri G Bhat wrote:
> I could see the previous lockdep warnings on pm/next and on top of patch[4].
> On Patch[5-7] I see the below lockdep trace on running './runme' on a Power8 box.

> [  710.336130] Chain exists of:
>   s_active#91 --> &dbs_data->mutex --> od_dbs_cdata.mutex
> 
> [  710.336376]  Possible unsafe locking scenario:
> 
> [  710.336488]        CPU0                    CPU1
> [  710.336577]        ----                    ----
> [  710.336666]   lock(od_dbs_cdata.mutex);
> [  710.336778]                                lock(&dbs_data->mutex);
> [  710.336911]                                lock(od_dbs_cdata.mutex);
> [  710.337064]   lock(s_active#91);
> [  710.337176]

This is the same lockdep, just that we have added another mutex
(dbs_data->mutex) to it.

Have you tried if all the lockdeps go away with the 8th patch that I
provided yesterday ?

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-04 18:18                   ` Rafael J. Wysocki
  2016-02-05  2:44                     ` Viresh Kumar
@ 2016-02-05  3:54                     ` Rafael J. Wysocki
  2016-02-05  9:49                       ` Viresh Kumar
  2016-02-06  2:22                       ` Saravana Kannan
  1 sibling, 2 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-05  3:54 UTC (permalink / raw)
  To: Saravana Kannan, Viresh Kumar
  Cc: Rafael J. Wysocki, Shilpa Bhat, Juri Lelli, Lists linaro-kernel,
	linux-pm, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote:
> On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
> > On 02/04/2016 09:43 AM, Saravana Kannan wrote:
> >>
> >> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
> >>>
> >>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
> >>>>
> >>>> This is exactly right.  We've avoided one deadlock only to trip into
> >>>> another one.
> >>>>
> >>>> This happens because update_sampling_rate() acquires
> >>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
> >>>> cpufreq_governor_dbs().
> >>>>
> >>>> Worse yet, a deadlock can still happen without (the new)
> >>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
> >>>> update_sampling_rate() runs in parallel with
> >>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
> >>>> the race.
> >>>>
> >>>> It looks like we need to drop the governor mutex before putting the
> >>>> kobject in cpufreq_governor_exit().
> >>>
> 
> [cut]
> 
> >>
> >> No no no no! Let's not open up this can of worms of queuing up the work
> >> to handle a write to a sysfs file. It *MIGHT* work for this specific
> >> tunable (I haven't bothered to analyze), but this makes it impossible to
> >> return a useful/proper error value.
> >
> >
> > Sent too soon. Not only that, but it can also cause the writes to the sysfs
> > files to get processed in a different order and I don't know what other
> > issues/races THAT will open up.
> 
> Well, I don't like this too.
> 
> I actually do have an idea about how to fix these deadlocks, but it is
> on top of my cleanup series.
> 
> I'll write more about it later today.

Having actually posted that series again after cleaning it up I can say
what I'm thinking about hopefully without confusing anyone too much.  So
please bear in mind that I'm going to refer to this series below:

http://marc.info/?l=linux-pm&m=145463901630950&w=4

Also this is more of a brain dump rather than actual design description,
so there may be holes etc in it.  Please let me know if you can see any.

The problem at hand is that policy->rwsem needs to be held around *all*
operations in cpufreq_set_policy().  In particular, it cannot be dropped
around invocations of __cpufreq_governor() with the event arg equal to
_EXIT as that leads to interesting races.

Unfortunately, we know that holding policy->rwsem in those places leads
to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().

Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
attributes access (as holding it is not necessary for them in principle).  That
was a nice try, but it turned out to be insufficient because of another deadlock
scenario uncovered by it.  Namely, since the ondemand governor's update_sampling_rate()
acquires the governor mutex (called dbs_data_mutex after my patches mentioned
above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
in almost exactly the same way.

To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
update_sampling_rate(), or drop it for the removal of the governor sysfs
attributes in cpufreq_governor_exit().  I don't think the former is an option
at least at this point, so it looks like we pretty much have to do the latter.

With that in mind, I'd start with the changes made by Viresh (maybe without the
first patch which really isn't essential here).  That is, introduce a separate
kobject type for the governor attributes kobject and register that in
cpufreq_governor_init().  The show/store callbacks for that kobject type won't
acquire policy->rwsem so the first deadlock will be avoided.

But in addition to that, I'd drop dbs_data_mutex before the removal of governor
sysfs attributes.  That actually happens in two places, in cpufreq_governor_exit()
and in the error path of cpufreq_governor_init().

To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
called by it.  That should be readily doable and they can do all of the
necessary checks themselves.  cpufreq_governor_dbs() would become a pure mux then,
but that's not such a big deal.

With that, cpufreq_governor_exit() may just drop the lock before it does the
final kobject_put().  The danger here is that the sysfs show/store callbacks of
the governor attributes kobject may see invalid dbs_data for a while, after the
lock has been dropped and before the kobject is deleted.  That may be addressed
by checking, for example, the presence of the dbs_data's "tuners" pointer in those
callbacks.  If it is NULL, they can simply return -EAGAIN or similar.

Now, that means, though, that they need to acquire the same lock as
cpufreq_governor_exit(), or they may see things go away while they are running.
The simplest approach here would be to take dbs_data_mutex in them too, although
that's a bit of a sledgehammer.  It might be better to have a per-policy lock
in struct policy_dbs_info for that, for example, but then the governor attribute
sysfs callbacks would need to get that object instead of dbs_data.

On the flip side, it might be possible to migrate update_sampling_rate() to
that lock too.  And maybe we can get rid of dbs_data_mutex even, who knows?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-05  3:54                     ` Rafael J. Wysocki
@ 2016-02-05  9:49                       ` Viresh Kumar
  2016-02-08  2:20                         ` Rafael J. Wysocki
  2016-02-06  2:22                       ` Saravana Kannan
  1 sibling, 1 reply; 36+ messages in thread
From: Viresh Kumar @ 2016-02-05  9:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Saravana Kannan, Rafael J. Wysocki, Shilpa Bhat, Juri Lelli,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 05-02-16, 04:54, Rafael J. Wysocki wrote:
> Having actually posted that series again after cleaning it up I can say
> what I'm thinking about hopefully without confusing anyone too much.  So
> please bear in mind that I'm going to refer to this series below:
> 
> http://marc.info/?l=linux-pm&m=145463901630950&w=4
> 
> Also this is more of a brain dump rather than actual design description,
> so there may be holes etc in it.  Please let me know if you can see any.
> 
> The problem at hand is that policy->rwsem needs to be held around *all*
> operations in cpufreq_set_policy().  In particular, it cannot be dropped
> around invocations of __cpufreq_governor() with the event arg equal to
> _EXIT as that leads to interesting races.
> 
> Unfortunately, we know that holding policy->rwsem in those places leads
> to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().
> 
> Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
> attributes access (as holding it is not necessary for them in principle).  That
> was a nice try, but it turned out to be insufficient because of another deadlock
> scenario uncovered by it.

Not really.

The other deadlock wasn't uncovered by it, its just that Shilpa tested
directly after my patches and reported the issue. Later yesterday, she
was hitting the exactly same issue on pm/linux-next as well (i.e.
without my patches). And ofcourse Juri has also reported the same
issue on linux-next few days back.

> Namely, since the ondemand governor's update_sampling_rate()
> acquires the governor mutex (called dbs_data_mutex after my patches mentioned
> above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
> in almost exactly the same way.

Right.

> To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
> update_sampling_rate(),

And my so called 'ugly' 8th patch tried to do just that :)

But as I also mentioned in reply to the update-util patchset of yours,
its possible somewhat.

> or drop it for the removal of the governor sysfs
> attributes in cpufreq_governor_exit().  I don't think the former is an option
> at least at this point, so it looks like we pretty much have to do the latter.
> 
> With that in mind, I'd start with the changes made by Viresh (maybe without the
> first patch which really isn't essential here).

That was just to cleanup the macro mess a bit, nothing more. Over
that, I think the first 7 patches can be picked as it is without any
changes. Ofcourse they are required to be rebased over your 13
patches, if those are going in first :)

> That is, introduce a separate
> kobject type for the governor attributes kobject and register that in
> cpufreq_governor_init().  The show/store callbacks for that kobject type won't
> acquire policy->rwsem so the first deadlock will be avoided.
> 
> But in addition to that, I'd drop dbs_data_mutex before the removal of governor
> sysfs attributes.  That actually happens in two places, in cpufreq_governor_exit()
> and in the error path of cpufreq_governor_init().
> 
> To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
> called by it.  That should be readily doable and they can do all of the
> necessary checks themselves.  cpufreq_governor_dbs() would become a pure mux then,
> but that's not such a big deal.
> 
> With that, cpufreq_governor_exit() may just drop the lock before it does the
> final kobject_put().  The danger here is that the sysfs show/store callbacks of
> the governor attributes kobject may see invalid dbs_data for a while, after the
> lock has been dropped and before the kobject is deleted.  That may be addressed
> by checking, for example, the presence of the dbs_data's "tuners" pointer in those
> callbacks.  If it is NULL, they can simply return -EAGAIN or similar.

So you mean something like this (consider only !governor_per_policy
case with ondemand governor for now):

exit()
{
       lock-dbs_data_mutex;
       ...
       dbs_data->tuners = NULL; //so that sysfs files can return early
       dbs_governor->gdbs_data = NULL; //For !governor_per_policy case
       unlock-dbs_data_mutex;

       /*
        * Problem: Who is stopping us to set ondemand as governor for
        * another policy, which can try create a kobject which will
        * try to create sysfs directory at the same path ?
        *
        * Though another field in dbs_governor can be used to fix this
        * I think, which needs to block the other INIT operation.
        */
        
       kobject_put(dbs_data->kobj); //This should wait for all sysfs operations to end.

       kfree(dbs_data);
}

And the sysfs operations show/store need to take dbs_data_mutex() for
their entire operations.

??

-- 
viresh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-05  3:54                     ` Rafael J. Wysocki
  2016-02-05  9:49                       ` Viresh Kumar
@ 2016-02-06  2:22                       ` Saravana Kannan
  2016-02-08  2:28                         ` Rafael J. Wysocki
  1 sibling, 1 reply; 36+ messages in thread
From: Saravana Kannan @ 2016-02-06  2:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Rafael J. Wysocki, Shilpa Bhat, Juri Lelli,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote:
> On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote:
>> On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
>>> On 02/04/2016 09:43 AM, Saravana Kannan wrote:
>>>>
>>>> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
>>>>>
>>>>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
>>>>>>
>>>>>> This is exactly right.  We've avoided one deadlock only to trip into
>>>>>> another one.
>>>>>>
>>>>>> This happens because update_sampling_rate() acquires
>>>>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
>>>>>> cpufreq_governor_dbs().
>>>>>>
>>>>>> Worse yet, a deadlock can still happen without (the new)
>>>>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
>>>>>> update_sampling_rate() runs in parallel with
>>>>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
>>>>>> the race.
>>>>>>
>>>>>> It looks like we need to drop the governor mutex before putting the
>>>>>> kobject in cpufreq_governor_exit().
>>>>>
>>
>> [cut]
>>
>>>>
>>>> No no no no! Let's not open up this can of worms of queuing up the work
>>>> to handle a write to a sysfs file. It *MIGHT* work for this specific
>>>> tunable (I haven't bothered to analyze), but this makes it impossible to
>>>> return a useful/proper error value.
>>>
>>>
>>> Sent too soon. Not only that, but it can also cause the writes to the sysfs
>>> files to get processed in a different order and I don't know what other
>>> issues/races THAT will open up.
>>
>> Well, I don't like this too.
>>
>> I actually do have an idea about how to fix these deadlocks, but it is
>> on top of my cleanup series.
>>
>> I'll write more about it later today.
>
> Having actually posted that series again after cleaning it up I can say
> what I'm thinking about hopefully without confusing anyone too much.  So
> please bear in mind that I'm going to refer to this series below:
>
> http://marc.info/?l=linux-pm&m=145463901630950&w=4
>
> Also this is more of a brain dump rather than actual design description,
> so there may be holes etc in it.  Please let me know if you can see any.
>
> The problem at hand is that policy->rwsem needs to be held around *all*
> operations in cpufreq_set_policy().  In particular, it cannot be dropped
> around invocations of __cpufreq_governor() with the event arg equal to
> _EXIT as that leads to interesting races.
>
> Unfortunately, we know that holding policy->rwsem in those places leads
> to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().
>
> Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
> attributes access (as holding it is not necessary for them in principle).  That
> was a nice try, but it turned out to be insufficient because of another deadlock
> scenario uncovered by it.  Namely, since the ondemand governor's update_sampling_rate()
> acquires the governor mutex (called dbs_data_mutex after my patches mentioned
> above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
> in almost exactly the same way.
>
> To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
> update_sampling_rate(), or drop it for the removal of the governor sysfs
> attributes in cpufreq_governor_exit().  I don't think the former is an option
> at least at this point, so it looks like we pretty much have to do the latter.
>
> With that in mind, I'd start with the changes made by Viresh (maybe without the
> first patch which really isn't essential here).  That is, introduce a separate
> kobject type for the governor attributes kobject and register that in
> cpufreq_governor_init().  The show/store callbacks for that kobject type won't
> acquire policy->rwsem so the first deadlock will be avoided.
>
> But in addition to that, I'd drop dbs_data_mutex before the removal of governor
> sysfs attributes.  That actually happens in two places, in cpufreq_governor_exit()
> and in the error path of cpufreq_governor_init().
>
> To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
> called by it.  That should be readily doable and they can do all of the
> necessary checks themselves.  cpufreq_governor_dbs() would become a pure mux then,
> but that's not such a big deal.
>
> With that, cpufreq_governor_exit() may just drop the lock before it does the
> final kobject_put().  The danger here is that the sysfs show/store callbacks of
> the governor attributes kobject may see invalid dbs_data for a while, after the
> lock has been dropped and before the kobject is deleted.  That may be addressed
> by checking, for example, the presence of the dbs_data's "tuners" pointer in those
> callbacks.  If it is NULL, they can simply return -EAGAIN or similar.
>
> Now, that means, though, that they need to acquire the same lock as
> cpufreq_governor_exit(), or they may see things go away while they are running.
> The simplest approach here would be to take dbs_data_mutex in them too, although
> that's a bit of a sledgehammer.  It might be better to have a per-policy lock
> in struct policy_dbs_info for that, for example, but then the governor attribute
> sysfs callbacks would need to get that object instead of dbs_data.
>
> On the flip side, it might be possible to migrate update_sampling_rate() to
> that lock too.  And maybe we can get rid of dbs_data_mutex even, who knows?

I'm glad you've analyzed it this far. So, the rest of my comments will 
be easier to understand.

I'm going to go back to my point of NOT doing the sysfs add/remove 
inside the governor at all (that includes cpufreq_governor.c) and doing 
it in cpufreq.c. That suggestion was confusing to explain/understand 
before when we were using policy rwsem inside the show/store ops for the 
governor attributes. Now that has been removed, my suggestion would be 
even easier/cleaner to implement/understand and you don't have to worry 
about ANY races in the governor.

I'll just talk about the have_governor_per_policy() case. It can be 
easily extended to the global case.

In cpufreq_governor.c:
cpufreq_governor_init(...)
{
  ...
  /* NOT kobject_init_and_add */
  kobject_init();
  /* New field */
  policy->gov_kobj = &dbs_data->kobj);
  ...
}

In cpufreq.c:
__cpufreq_governor(...)
{

    if (event == POLICY_EXIT) {
       kobject_put(policy->gov_kobj);
    }
    ret = policy->governor->governor(policy, event);
    if (event == POLICY_INIT) {
       kobj_add(policy->gov_kobj, policy->kobj, policy->governor->name);
    }
}

This guarantees that there can be no races of the governor specific data 
structures going away while being accessed from sysfs because the first 
thing we do once we decide to "kill" a governor is to remove the sysfs 
files and the accesses to governor data (and flush out all on going 
accesses) and THEN ask the governor to exit.

Thoughts?

Thanks,
Saravana
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-05  9:49                       ` Viresh Kumar
@ 2016-02-08  2:20                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-08  2:20 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Saravana Kannan, Rafael J. Wysocki, Shilpa Bhat, Juri Lelli,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On Friday, February 05, 2016 03:19:25 PM Viresh Kumar wrote:
> On 05-02-16, 04:54, Rafael J. Wysocki wrote:
> > Having actually posted that series again after cleaning it up I can say
> > what I'm thinking about hopefully without confusing anyone too much.  So
> > please bear in mind that I'm going to refer to this series below:
> > 
> > http://marc.info/?l=linux-pm&m=145463901630950&w=4
> > 
> > Also this is more of a brain dump rather than actual design description,
> > so there may be holes etc in it.  Please let me know if you can see any.
> > 
> > The problem at hand is that policy->rwsem needs to be held around *all*
> > operations in cpufreq_set_policy().  In particular, it cannot be dropped
> > around invocations of __cpufreq_governor() with the event arg equal to
> > _EXIT as that leads to interesting races.
> > 
> > Unfortunately, we know that holding policy->rwsem in those places leads
> > to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().
> > 
> > Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
> > attributes access (as holding it is not necessary for them in principle).  That
> > was a nice try, but it turned out to be insufficient because of another deadlock
> > scenario uncovered by it.
> 
> Not really.
> 
> The other deadlock wasn't uncovered by it, its just that Shilpa tested
> directly after my patches and reported the issue. Later yesterday, she
> was hitting the exactly same issue on pm/linux-next as well (i.e.
> without my patches). And ofcourse Juri has also reported the same
> issue on linux-next few days back.

OK, fair enough.

> > Namely, since the ondemand governor's update_sampling_rate()
> > acquires the governor mutex (called dbs_data_mutex after my patches mentioned
> > above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
> > in almost exactly the same way.
> 
> Right.
> 
> > To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
> > update_sampling_rate(),
> 
> And my so called 'ugly' 8th patch tried to do just that :)
> 
> But as I also mentioned in reply to the update-util patchset of yours,
> its possible somewhat.

Yes, it should be possible and not even too difficult.

> > or drop it for the removal of the governor sysfs
> > attributes in cpufreq_governor_exit().  I don't think the former is an option
> > at least at this point, so it looks like we pretty much have to do the latter.
> > 
> > With that in mind, I'd start with the changes made by Viresh (maybe without the
> > first patch which really isn't essential here).
> 
> That was just to cleanup the macro mess a bit, nothing more. Over
> that, I think the first 7 patches can be picked as it is without any
> changes. Ofcourse they are required to be rebased over your 13
> patches, if those are going in first :)

Yes, please rebase.

Also please skip the first one that was moving min_sampling_rate around,
at least for now.

As I said, we may be moving other attributes in the opposite direction,
so two sets of macros may be necessary anyway.

> > That is, introduce a separate
> > kobject type for the governor attributes kobject and register that in
> > cpufreq_governor_init().  The show/store callbacks for that kobject type won't
> > acquire policy->rwsem so the first deadlock will be avoided.
> > 
> > But in addition to that, I'd drop dbs_data_mutex before the removal of governor
> > sysfs attributes.  That actually happens in two places, in cpufreq_governor_exit()
> > and in the error path of cpufreq_governor_init().
> > 
> > To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
> > called by it.  That should be readily doable and they can do all of the
> > necessary checks themselves.  cpufreq_governor_dbs() would become a pure mux then,
> > but that's not such a big deal.
> > 
> > With that, cpufreq_governor_exit() may just drop the lock before it does the
> > final kobject_put().  The danger here is that the sysfs show/store callbacks of
> > the governor attributes kobject may see invalid dbs_data for a while, after the
> > lock has been dropped and before the kobject is deleted.  That may be addressed
> > by checking, for example, the presence of the dbs_data's "tuners" pointer in those
> > callbacks.  If it is NULL, they can simply return -EAGAIN or similar.
> 
> So you mean something like this (consider only !governor_per_policy
> case with ondemand governor for now):
> 
> exit()
> {
>        lock-dbs_data_mutex;
>        ...
>        dbs_data->tuners = NULL; //so that sysfs files can return early
>        dbs_governor->gdbs_data = NULL; //For !governor_per_policy case
>        unlock-dbs_data_mutex;
> 
>        /*
>         * Problem: Who is stopping us to set ondemand as governor for
>         * another policy, which can try create a kobject which will
>         * try to create sysfs directory at the same path ?
>         *
>         * Though another field in dbs_governor can be used to fix this
>         * I think, which needs to block the other INIT operation.
>         */
>         
>        kobject_put(dbs_data->kobj); //This should wait for all sysfs operations to end.
> 
>        kfree(dbs_data);
> }
> 
> And the sysfs operations show/store need to take dbs_data_mutex() for
> their entire operations.
> 
> ??

Yes, roughly.

But it shouldn't be necessary after all, because dropping the mutex from
update_sampling_rate() looks easier than I thought previously.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-06  2:22                       ` Saravana Kannan
@ 2016-02-08  2:28                         ` Rafael J. Wysocki
  2016-02-09 21:02                           ` Saravana Kannan
  0 siblings, 1 reply; 36+ messages in thread
From: Rafael J. Wysocki @ 2016-02-08  2:28 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Viresh Kumar, Rafael J. Wysocki, Shilpa Bhat, Juri Lelli,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On Friday, February 05, 2016 06:22:35 PM Saravana Kannan wrote:
> On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote:
> > On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote:
> >> On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
> >>> On 02/04/2016 09:43 AM, Saravana Kannan wrote:
> >>>>
> >>>> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
> >>>>>
> >>>>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
> >>>>>>
> >>>>>> This is exactly right.  We've avoided one deadlock only to trip into
> >>>>>> another one.
> >>>>>>
> >>>>>> This happens because update_sampling_rate() acquires
> >>>>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
> >>>>>> cpufreq_governor_dbs().
> >>>>>>
> >>>>>> Worse yet, a deadlock can still happen without (the new)
> >>>>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
> >>>>>> update_sampling_rate() runs in parallel with
> >>>>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
> >>>>>> the race.
> >>>>>>
> >>>>>> It looks like we need to drop the governor mutex before putting the
> >>>>>> kobject in cpufreq_governor_exit().
> >>>>>
> >>
> >> [cut]
> >>
> >>>>
> >>>> No no no no! Let's not open up this can of worms of queuing up the work
> >>>> to handle a write to a sysfs file. It *MIGHT* work for this specific
> >>>> tunable (I haven't bothered to analyze), but this makes it impossible to
> >>>> return a useful/proper error value.
> >>>
> >>>
> >>> Sent too soon. Not only that, but it can also cause the writes to the sysfs
> >>> files to get processed in a different order and I don't know what other
> >>> issues/races THAT will open up.
> >>
> >> Well, I don't like this too.
> >>
> >> I actually do have an idea about how to fix these deadlocks, but it is
> >> on top of my cleanup series.
> >>
> >> I'll write more about it later today.
> >
> > Having actually posted that series again after cleaning it up I can say
> > what I'm thinking about hopefully without confusing anyone too much.  So
> > please bear in mind that I'm going to refer to this series below:
> >
> > http://marc.info/?l=linux-pm&m=145463901630950&w=4
> >
> > Also this is more of a brain dump rather than actual design description,
> > so there may be holes etc in it.  Please let me know if you can see any.
> >
> > The problem at hand is that policy->rwsem needs to be held around *all*
> > operations in cpufreq_set_policy().  In particular, it cannot be dropped
> > around invocations of __cpufreq_governor() with the event arg equal to
> > _EXIT as that leads to interesting races.
> >
> > Unfortunately, we know that holding policy->rwsem in those places leads
> > to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().
> >
> > Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
> > attributes access (as holding it is not necessary for them in principle).  That
> > was a nice try, but it turned out to be insufficient because of another deadlock
> > scenario uncovered by it.  Namely, since the ondemand governor's update_sampling_rate()
> > acquires the governor mutex (called dbs_data_mutex after my patches mentioned
> > above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
> > in almost exactly the same way.
> >
> > To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
> > update_sampling_rate(), or drop it for the removal of the governor sysfs
> > attributes in cpufreq_governor_exit().  I don't think the former is an option
> > at least at this point, so it looks like we pretty much have to do the latter.
> >
> > With that in mind, I'd start with the changes made by Viresh (maybe without the
> > first patch which really isn't essential here).  That is, introduce a separate
> > kobject type for the governor attributes kobject and register that in
> > cpufreq_governor_init().  The show/store callbacks for that kobject type won't
> > acquire policy->rwsem so the first deadlock will be avoided.
> >
> > But in addition to that, I'd drop dbs_data_mutex before the removal of governor
> > sysfs attributes.  That actually happens in two places, in cpufreq_governor_exit()
> > and in the error path of cpufreq_governor_init().
> >
> > To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
> > called by it.  That should be readily doable and they can do all of the
> > necessary checks themselves.  cpufreq_governor_dbs() would become a pure mux then,
> > but that's not such a big deal.
> >
> > With that, cpufreq_governor_exit() may just drop the lock before it does the
> > final kobject_put().  The danger here is that the sysfs show/store callbacks of
> > the governor attributes kobject may see invalid dbs_data for a while, after the
> > lock has been dropped and before the kobject is deleted.  That may be addressed
> > by checking, for example, the presence of the dbs_data's "tuners" pointer in those
> > callbacks.  If it is NULL, they can simply return -EAGAIN or similar.
> >
> > Now, that means, though, that they need to acquire the same lock as
> > cpufreq_governor_exit(), or they may see things go away while they are running.
> > The simplest approach here would be to take dbs_data_mutex in them too, although
> > that's a bit of a sledgehammer.  It might be better to have a per-policy lock
> > in struct policy_dbs_info for that, for example, but then the governor attribute
> > sysfs callbacks would need to get that object instead of dbs_data.
> >
> > On the flip side, it might be possible to migrate update_sampling_rate() to
> > that lock too.  And maybe we can get rid of dbs_data_mutex even, who knows?
> 
> I'm glad you've analyzed it this far. So, the rest of my comments will 
> be easier to understand.
> 
> I'm going to go back to my point of NOT doing the sysfs add/remove 
> inside the governor at all (that includes cpufreq_governor.c) and doing 
> it in cpufreq.c. That suggestion was confusing to explain/understand 
> before when we were using policy rwsem inside the show/store ops for the 
> governor attributes. Now that has been removed, my suggestion would be 
> even easier/cleaner to implement/understand and you don't have to worry 
> about ANY races in the governor.
> 
> I'll just talk about the have_governor_per_policy() case. It can be 
> easily extended to the global case.
> 
> In cpufreq_governor.c:
> cpufreq_governor_init(...)
> {
>   ...
>   /* NOT kobject_init_and_add */
>   kobject_init();
>   /* New field */
>   policy->gov_kobj = &dbs_data->kobj);
>   ...
> }
> 
> In cpufreq.c:
> __cpufreq_governor(...)
> {
> 
>     if (event == POLICY_EXIT) {
>        kobject_put(policy->gov_kobj);
>     }
>     ret = policy->governor->governor(policy, event);
>     if (event == POLICY_INIT) {
>        kobj_add(policy->gov_kobj, policy->kobj, policy->governor->name);
>     }
> }
> 
> This guarantees that there can be no races of the governor specific data 
> structures going away while being accessed from sysfs because the first 
> thing we do once we decide to "kill" a governor is to remove the sysfs 
> files and the accesses to governor data (and flush out all on going 
> accesses) and THEN ask the governor to exit.
> 
> Thoughts?

The core would then have to rely on the governor code to populate the gov_kobj
field correctly which doesn't look really straightforward to me.  It is better
if each code layer arranges the data structures it is going to use by itself.

Besides, ondemand and conservative are the only governors that use the governor
kobject at all, so I'm not sure if that really belongs to the core.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups
  2016-02-08  2:28                         ` Rafael J. Wysocki
@ 2016-02-09 21:02                           ` Saravana Kannan
  0 siblings, 0 replies; 36+ messages in thread
From: Saravana Kannan @ 2016-02-09 21:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Rafael J. Wysocki, Shilpa Bhat, Juri Lelli,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02/07/2016 06:28 PM, Rafael J. Wysocki wrote:
> On Friday, February 05, 2016 06:22:35 PM Saravana Kannan wrote:
>> On 02/04/2016 07:54 PM, Rafael J. Wysocki wrote:
>>> On Thursday, February 04, 2016 07:18:32 PM Rafael J. Wysocki wrote:
>>>> On Thu, Feb 4, 2016 at 6:44 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
>>>>> On 02/04/2016 09:43 AM, Saravana Kannan wrote:
>>>>>>
>>>>>> On 02/04/2016 03:09 AM, Viresh Kumar wrote:
>>>>>>>
>>>>>>> On 04-02-16, 00:50, Rafael J. Wysocki wrote:
>>>>>>>>
>>>>>>>> This is exactly right.  We've avoided one deadlock only to trip into
>>>>>>>> another one.
>>>>>>>>
>>>>>>>> This happens because update_sampling_rate() acquires
>>>>>>>> od_dbs_cdata.mutex which is held around cpufreq_governor_exit() by
>>>>>>>> cpufreq_governor_dbs().
>>>>>>>>
>>>>>>>> Worse yet, a deadlock can still happen without (the new)
>>>>>>>> dbs_data->mutex, just between s_active and od_dbs_cdata.mutex if
>>>>>>>> update_sampling_rate() runs in parallel with
>>>>>>>> cpufreq_governor_dbs()->cpufreq_governor_exit() and the latter wins
>>>>>>>> the race.
>>>>>>>>
>>>>>>>> It looks like we need to drop the governor mutex before putting the
>>>>>>>> kobject in cpufreq_governor_exit().
>>>>>>>
>>>>
>>>> [cut]
>>>>
>>>>>>
>>>>>> No no no no! Let's not open up this can of worms of queuing up the work
>>>>>> to handle a write to a sysfs file. It *MIGHT* work for this specific
>>>>>> tunable (I haven't bothered to analyze), but this makes it impossible to
>>>>>> return a useful/proper error value.
>>>>>
>>>>>
>>>>> Sent too soon. Not only that, but it can also cause the writes to the sysfs
>>>>> files to get processed in a different order and I don't know what other
>>>>> issues/races THAT will open up.
>>>>
>>>> Well, I don't like this too.
>>>>
>>>> I actually do have an idea about how to fix these deadlocks, but it is
>>>> on top of my cleanup series.
>>>>
>>>> I'll write more about it later today.
>>>
>>> Having actually posted that series again after cleaning it up I can say
>>> what I'm thinking about hopefully without confusing anyone too much.  So
>>> please bear in mind that I'm going to refer to this series below:
>>>
>>> http://marc.info/?l=linux-pm&m=145463901630950&w=4
>>>
>>> Also this is more of a brain dump rather than actual design description,
>>> so there may be holes etc in it.  Please let me know if you can see any.
>>>
>>> The problem at hand is that policy->rwsem needs to be held around *all*
>>> operations in cpufreq_set_policy().  In particular, it cannot be dropped
>>> around invocations of __cpufreq_governor() with the event arg equal to
>>> _EXIT as that leads to interesting races.
>>>
>>> Unfortunately, we know that holding policy->rwsem in those places leads
>>> to a deadlock with governor sysfs attributes removal in cpufreq_governor_exit().
>>>
>>> Viresh attempted to fix this by avoiding to acquire policy->rwsem for governor
>>> attributes access (as holding it is not necessary for them in principle).  That
>>> was a nice try, but it turned out to be insufficient because of another deadlock
>>> scenario uncovered by it.  Namely, since the ondemand governor's update_sampling_rate()
>>> acquires the governor mutex (called dbs_data_mutex after my patches mentioned
>>> above), it may deadlock with exactly the same piece of code in cpufreq_governor_exit()
>>> in almost exactly the same way.
>>>
>>> To avoid that other deadlock, we'd either need to drop dbs_data_mutex from
>>> update_sampling_rate(), or drop it for the removal of the governor sysfs
>>> attributes in cpufreq_governor_exit().  I don't think the former is an option
>>> at least at this point, so it looks like we pretty much have to do the latter.
>>>
>>> With that in mind, I'd start with the changes made by Viresh (maybe without the
>>> first patch which really isn't essential here).  That is, introduce a separate
>>> kobject type for the governor attributes kobject and register that in
>>> cpufreq_governor_init().  The show/store callbacks for that kobject type won't
>>> acquire policy->rwsem so the first deadlock will be avoided.
>>>
>>> But in addition to that, I'd drop dbs_data_mutex before the removal of governor
>>> sysfs attributes.  That actually happens in two places, in cpufreq_governor_exit()
>>> and in the error path of cpufreq_governor_init().
>>>
>>> To that end, I'd move the locking from cpufreq_governor_dbs() to the functions
>>> called by it.  That should be readily doable and they can do all of the
>>> necessary checks themselves.  cpufreq_governor_dbs() would become a pure mux then,
>>> but that's not such a big deal.
>>>
>>> With that, cpufreq_governor_exit() may just drop the lock before it does the
>>> final kobject_put().  The danger here is that the sysfs show/store callbacks of
>>> the governor attributes kobject may see invalid dbs_data for a while, after the
>>> lock has been dropped and before the kobject is deleted.  That may be addressed
>>> by checking, for example, the presence of the dbs_data's "tuners" pointer in those
>>> callbacks.  If it is NULL, they can simply return -EAGAIN or similar.
>>>
>>> Now, that means, though, that they need to acquire the same lock as
>>> cpufreq_governor_exit(), or they may see things go away while they are running.
>>> The simplest approach here would be to take dbs_data_mutex in them too, although
>>> that's a bit of a sledgehammer.  It might be better to have a per-policy lock
>>> in struct policy_dbs_info for that, for example, but then the governor attribute
>>> sysfs callbacks would need to get that object instead of dbs_data.
>>>
>>> On the flip side, it might be possible to migrate update_sampling_rate() to
>>> that lock too.  And maybe we can get rid of dbs_data_mutex even, who knows?
>>
>> I'm glad you've analyzed it this far. So, the rest of my comments will
>> be easier to understand.
>>
>> I'm going to go back to my point of NOT doing the sysfs add/remove
>> inside the governor at all (that includes cpufreq_governor.c) and doing
>> it in cpufreq.c. That suggestion was confusing to explain/understand
>> before when we were using policy rwsem inside the show/store ops for the
>> governor attributes. Now that has been removed, my suggestion would be
>> even easier/cleaner to implement/understand and you don't have to worry
>> about ANY races in the governor.
>>
>> I'll just talk about the have_governor_per_policy() case. It can be
>> easily extended to the global case.
>>
>> In cpufreq_governor.c:
>> cpufreq_governor_init(...)
>> {
>>    ...
>>    /* NOT kobject_init_and_add */
>>    kobject_init();
>>    /* New field */
>>    policy->gov_kobj = &dbs_data->kobj);
>>    ...
>> }
>>
>> In cpufreq.c:
>> __cpufreq_governor(...)
>> {
>>
>>      if (event == POLICY_EXIT) {
>>         kobject_put(policy->gov_kobj);
>>      }
>>      ret = policy->governor->governor(policy, event);
>>      if (event == POLICY_INIT) {
>>         kobj_add(policy->gov_kobj, policy->kobj, policy->governor->name);
>>      }
>> }
>>
>> This guarantees that there can be no races of the governor specific data
>> structures going away while being accessed from sysfs because the first
>> thing we do once we decide to "kill" a governor is to remove the sysfs
>> files and the accesses to governor data (and flush out all on going
>> accesses) and THEN ask the governor to exit.
>>
>> Thoughts?
>
> The core would then have to rely on the governor code to populate the gov_kobj
> field correctly which doesn't look really straightforward to me.  It is better
> if each code layer arranges the data structures it is going to use by itself.

The core depends a lot on the drivers and governors filling up some 
fields correctly. This isn't any worse than that. It just seems way more 
logical to me to remove the interface to changing governor attributes 
(the sysfs files) before we start "exiting" a governor. But it looks 
like there's a v3 series of patches from Viresh that people seem to 
agree is fixing the race in a different method -- I haven't had time to 
look at it. So, I'm not going to keep pushing my point about removing 
the sysfs files at the core level. I'll jump back to it if we later find 
another race with this v3 patch series :)

> Besides, ondemand and conservative are the only governors that use the governor
> kobject at all, so I'm not sure if that really belongs to the core.

Technically userspace should be using kobject and sysfs attributes for 
set speed, but for whatever reason (legacy/historical I assume) we let 
the core add/remove sysfs files for an op that's supported only by 
userspace governor.

-Saravana


-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-02-09 21:02 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 14:02 [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 1/7] cpufreq: governor: Treat min_sampling_rate as a governor-specific tunable Viresh Kumar
2016-02-05  2:31   ` Rafael J. Wysocki
2016-02-05  2:47     ` Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 2/7] cpufreq: governor: New sysfs show/store callbacks for governor tunables Viresh Kumar
2016-02-03 16:17   ` Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 3/7] cpufreq: governor: Drop unused macros for creating governor tunable attributes Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 4/7] Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT" Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 5/7] cpufreq: Merge cpufreq_offline_prepare/finish routines Viresh Kumar
2016-02-03 20:21   ` Saravana Kannan
2016-02-04  1:49     ` Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 6/7] cpufreq: Call __cpufreq_governor() with policy->rwsem held Viresh Kumar
2016-02-03 14:02 ` [PATCH V2 7/7] cpufreq: Remove cpufreq_governor_lock Viresh Kumar
2016-02-04  6:43   ` Viresh Kumar
2016-02-03 15:54 ` [PATCH V2 0/7] cpufreq: governors: Fix ABBA lockups Juri Lelli
2016-02-03 16:10   ` Viresh Kumar
2016-02-03 17:20     ` Juri Lelli
2016-02-03 17:20       ` Rafael J. Wysocki
2016-02-03 23:31         ` Shilpa Bhat
2016-02-03 23:50           ` Rafael J. Wysocki
2016-02-04  5:51             ` Viresh Kumar
2016-02-04 11:09             ` Viresh Kumar
2016-02-04 17:43               ` Saravana Kannan
2016-02-04 17:44                 ` Saravana Kannan
2016-02-04 18:18                   ` Rafael J. Wysocki
2016-02-05  2:44                     ` Viresh Kumar
2016-02-05  3:54                     ` Rafael J. Wysocki
2016-02-05  9:49                       ` Viresh Kumar
2016-02-08  2:20                         ` Rafael J. Wysocki
2016-02-06  2:22                       ` Saravana Kannan
2016-02-08  2:28                         ` Rafael J. Wysocki
2016-02-09 21:02                           ` Saravana Kannan
2016-02-04  6:24     ` Viresh Kumar
2016-02-04 12:17       ` Viresh Kumar
2016-02-04 20:50         ` Shilpasri G Bhat
2016-02-05  2:49           ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.