All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups
@ 2016-02-02 10:57 Viresh Kumar
  2016-02-02 10:57 ` [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min() Viresh Kumar
                   ` (6 more replies)
  0 siblings, 7 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-02 10:57 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

Hi Rafael,

Sorry for doing this, I know you were also looking to fix this in a
possibly different way. But I thought, it would be better if we fix
that. We can scrap this version and take yours if that looks better.

The root cause of all the issues we were facing, was that we were taking
policy->rwsem while accessing governor sysfs attributes. And that
happened because we were sharing the show/store calls present in
cpufreq.c.

I thought, perhaps the best way to fix it is to give separate sysfs-ops
to governors. And that's what I did.

@Juri: I need your help in testing these. My platform doesn't give me
those lockups (even without these patches) and Juno/Tc2 would fit
better.

Can you please run some tests on these?

They are pushed here for easy access (and auto test by build-bot):
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/governor-kobject

--
viresh

Viresh Kumar (5):
  cpufreq: governor: Kill declare_show_sampling_rate_min()
  cpufreq: governor: Create separate sysfs-ops
  cpufreq: governor: Remove unused sysfs attribute macros
  cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
  cpufreq: Get rid of ->governor_enabled and its lock

 drivers/cpufreq/cpufreq.c              |  29 ----------
 drivers/cpufreq/cpufreq_conservative.c |  77 ++++++++++---------------
 drivers/cpufreq/cpufreq_governor.c     |  86 ++++++++++++++++++++--------
 drivers/cpufreq/cpufreq_governor.h     | 101 +++++++--------------------------
 drivers/cpufreq/cpufreq_ondemand.c     |  77 ++++++++++---------------
 include/linux/cpufreq.h                |   5 --
 6 files changed, 143 insertions(+), 232 deletions(-)

-- 
2.7.0.79.gdc08a19

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min()
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
@ 2016-02-02 10:57 ` Viresh Kumar
  2016-02-02 20:23   ` Rafael J. Wysocki
  2016-02-02 10:57 ` [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops Viresh Kumar
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-02 10:57 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

This extra macro is required because the variable min_sampling_rate is
made part of 'struct dbs_data' instead of governor specific tunables.

For further optimization, its better that we kill
declare_show_sampling_rate_min() by moving min_sampling_rate to governor
specific tunables.

Lets do it.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 14 ++++++-------
 drivers/cpufreq/cpufreq_governor.c     | 36 ++++++++++++++++++++--------------
 drivers/cpufreq/cpufreq_governor.h     | 18 ++---------------
 drivers/cpufreq/cpufreq_ondemand.c     | 14 ++++++-------
 4 files changed, 37 insertions(+), 45 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 606ad74abe6e..57750367bd26 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -185,7 +185,7 @@ static ssize_t store_sampling_rate(struct dbs_data *dbs_data, const char *buf,
 	if (ret != 1)
 		return -EINVAL;
 
-	cs_tuners->sampling_rate = max(input, dbs_data->min_sampling_rate);
+	cs_tuners->sampling_rate = max(input, cs_tuners->min_sampling_rate);
 	return count;
 }
 
@@ -281,7 +281,7 @@ show_store_one(cs, up_threshold);
 show_store_one(cs, down_threshold);
 show_store_one(cs, ignore_nice_load);
 show_store_one(cs, freq_step);
-declare_show_sampling_rate_min(cs);
+show_one(cs, min_sampling_rate);
 
 gov_sys_pol_attr_rw(sampling_rate);
 gov_sys_pol_attr_rw(sampling_down_factor);
@@ -289,10 +289,10 @@ gov_sys_pol_attr_rw(up_threshold);
 gov_sys_pol_attr_rw(down_threshold);
 gov_sys_pol_attr_rw(ignore_nice_load);
 gov_sys_pol_attr_rw(freq_step);
-gov_sys_pol_attr_ro(sampling_rate_min);
+gov_sys_pol_attr_ro(min_sampling_rate);
 
 static struct attribute *dbs_attributes_gov_sys[] = {
-	&sampling_rate_min_gov_sys.attr,
+	&min_sampling_rate_gov_sys.attr,
 	&sampling_rate_gov_sys.attr,
 	&sampling_down_factor_gov_sys.attr,
 	&up_threshold_gov_sys.attr,
@@ -308,7 +308,7 @@ static struct attribute_group cs_attr_group_gov_sys = {
 };
 
 static struct attribute *dbs_attributes_gov_pol[] = {
-	&sampling_rate_min_gov_pol.attr,
+	&min_sampling_rate_gov_pol.attr,
 	&sampling_rate_gov_pol.attr,
 	&sampling_down_factor_gov_pol.attr,
 	&up_threshold_gov_pol.attr,
@@ -340,10 +340,10 @@ static int cs_init(struct dbs_data *dbs_data, bool notify)
 	tuners->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR;
 	tuners->ignore_nice_load = 0;
 	tuners->freq_step = DEF_FREQUENCY_STEP;
+	tuners->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
+		jiffies_to_usecs(10);
 
 	dbs_data->tuners = tuners;
-	dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
-		jiffies_to_usecs(10);
 
 	if (notify)
 		cpufreq_register_notifier(&cs_cpufreq_notifier_block,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index e0d111024d48..9a7edc91ad57 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -286,16 +286,32 @@ static void dbs_timer_handler(unsigned long data)
 		queue_work(system_wq, &shared->work);
 }
 
-static void set_sampling_rate(struct dbs_data *dbs_data,
-		unsigned int sampling_rate)
+static void set_sampling_rate(struct cpufreq_policy *policy,
+			      struct dbs_data *dbs_data)
 {
+	unsigned int *sampling_rate;
+	unsigned int *min_sampling_rate;
+	unsigned int latency;
+
+	/* policy latency is in ns. Convert it to us first */
+	latency = policy->cpuinfo.transition_latency / 1000;
+	if (latency == 0)
+		latency = 1;
+
 	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
-		cs_tuners->sampling_rate = sampling_rate;
+		sampling_rate = &cs_tuners->sampling_rate;
+		min_sampling_rate = &cs_tuners->min_sampling_rate;
 	} else {
 		struct od_dbs_tuners *od_tuners = dbs_data->tuners;
-		od_tuners->sampling_rate = sampling_rate;
+		sampling_rate = &od_tuners->sampling_rate;
+		min_sampling_rate = &od_tuners->min_sampling_rate;
 	}
+
+	/* Bring kernel and HW constraints together */
+	*min_sampling_rate = max(*min_sampling_rate,
+				 MIN_LATENCY_MULTIPLIER * latency);
+	*sampling_rate = max(*min_sampling_rate, latency * LATENCY_MULTIPLIER);
 }
 
 static int alloc_common_dbs_info(struct cpufreq_policy *policy,
@@ -338,7 +354,6 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 				 struct dbs_data *dbs_data,
 				 struct common_dbs_data *cdata)
 {
-	unsigned int latency;
 	int ret;
 
 	/* State should be equivalent to EXIT */
@@ -373,16 +388,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 	if (ret)
 		goto free_common_dbs_info;
 
-	/* policy latency is in ns. Convert it to us first */
-	latency = policy->cpuinfo.transition_latency / 1000;
-	if (latency == 0)
-		latency = 1;
-
-	/* Bring kernel and HW constraints together */
-	dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
-					  MIN_LATENCY_MULTIPLIER * latency);
-	set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
-					latency * LATENCY_MULTIPLIER));
+	set_sampling_rate(policy, dbs_data);
 
 	if (!have_governor_per_policy())
 		cdata->gdbs_data = dbs_data;
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 91e767a058a7..ad44a8546a3a 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -183,6 +183,7 @@ struct od_dbs_tuners {
 	unsigned int up_threshold;
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
+	unsigned int min_sampling_rate;
 };
 
 struct cs_dbs_tuners {
@@ -192,6 +193,7 @@ struct cs_dbs_tuners {
 	unsigned int up_threshold;
 	unsigned int down_threshold;
 	unsigned int freq_step;
+	unsigned int min_sampling_rate;
 };
 
 /* Common Governor data across policies */
@@ -230,7 +232,6 @@ struct common_dbs_data {
 /* Governor Per policy data */
 struct dbs_data {
 	struct common_dbs_data *cdata;
-	unsigned int min_sampling_rate;
 	int usage_count;
 	void *tuners;
 };
@@ -254,21 +255,6 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 	return delay;
 }
 
-#define declare_show_sampling_rate_min(_gov)				\
-static ssize_t show_sampling_rate_min_gov_sys				\
-(struct kobject *kobj, struct attribute *attr, char *buf)		\
-{									\
-	struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data;		\
-	return sprintf(buf, "%u\n", dbs_data->min_sampling_rate);	\
-}									\
-									\
-static ssize_t show_sampling_rate_min_gov_pol				\
-(struct cpufreq_policy *policy, char *buf)				\
-{									\
-	struct dbs_data *dbs_data = policy->governor_data;		\
-	return sprintf(buf, "%u\n", dbs_data->min_sampling_rate);	\
-}
-
 extern struct mutex cpufreq_governor_lock;
 
 void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index eae51070c034..b31f64745232 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -250,7 +250,7 @@ static void update_sampling_rate(struct dbs_data *dbs_data,
 	int cpu;
 
 	od_tuners->sampling_rate = new_rate = max(new_rate,
-			dbs_data->min_sampling_rate);
+			od_tuners->min_sampling_rate);
 
 	/*
 	 * Lock governor so that governor start/stop can't execute in parallel.
@@ -442,7 +442,7 @@ show_store_one(od, up_threshold);
 show_store_one(od, sampling_down_factor);
 show_store_one(od, ignore_nice_load);
 show_store_one(od, powersave_bias);
-declare_show_sampling_rate_min(od);
+show_one(od, min_sampling_rate);
 
 gov_sys_pol_attr_rw(sampling_rate);
 gov_sys_pol_attr_rw(io_is_busy);
@@ -450,10 +450,10 @@ gov_sys_pol_attr_rw(up_threshold);
 gov_sys_pol_attr_rw(sampling_down_factor);
 gov_sys_pol_attr_rw(ignore_nice_load);
 gov_sys_pol_attr_rw(powersave_bias);
-gov_sys_pol_attr_ro(sampling_rate_min);
+gov_sys_pol_attr_ro(min_sampling_rate);
 
 static struct attribute *dbs_attributes_gov_sys[] = {
-	&sampling_rate_min_gov_sys.attr,
+	&min_sampling_rate_gov_sys.attr,
 	&sampling_rate_gov_sys.attr,
 	&up_threshold_gov_sys.attr,
 	&sampling_down_factor_gov_sys.attr,
@@ -469,7 +469,7 @@ static struct attribute_group od_attr_group_gov_sys = {
 };
 
 static struct attribute *dbs_attributes_gov_pol[] = {
-	&sampling_rate_min_gov_pol.attr,
+	&min_sampling_rate_gov_pol.attr,
 	&sampling_rate_gov_pol.attr,
 	&up_threshold_gov_pol.attr,
 	&sampling_down_factor_gov_pol.attr,
@@ -509,12 +509,12 @@ static int od_init(struct dbs_data *dbs_data, bool notify)
 		 * not depending on HZ, but fixed (very low). The deferred
 		 * timer might skip some samples if idle/sleeping as needed.
 		*/
-		dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
+		tuners->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
 	} else {
 		tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
 
 		/* For correct statistics, we need 10 ticks for each measure */
-		dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
+		tuners->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
 			jiffies_to_usecs(10);
 	}
 
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
  2016-02-02 10:57 ` [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min() Viresh Kumar
@ 2016-02-02 10:57 ` Viresh Kumar
  2016-02-02 15:47   ` Juri Lelli
  2016-02-02 21:23   ` Rafael J. Wysocki
  2016-02-02 10:57 ` [PATCH 3/5] cpufreq: governor: Remove unused sysfs attribute macros Viresh Kumar
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-02 10:57 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

Until now, governors (ondemand/conservative) were using the
'global-attr' or 'freq-attr', depending on the sysfs location where we
want to create governor's directory.

The problem is that, in case of 'freq-attr', we are forced to use
show()/store() present in cpufreq.c, which always take policy->rwsem.

And because of that we were facing some ABBA lockups during governor
callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
event.

That caused further problems and it never worked perfectly.

This patch attempts to fix that by creating separate sysfs-ops for
cpufreq governors.

Because things got much simplified now, we don't need separate
show/store callbacks for governor-for-system and governor-per-policy
cases.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 71 +++++++++++++---------------------
 drivers/cpufreq/cpufreq_governor.c     | 50 +++++++++++++++++++-----
 drivers/cpufreq/cpufreq_governor.h     | 31 +++++++++++++--
 drivers/cpufreq/cpufreq_ondemand.c     | 71 +++++++++++++---------------------
 4 files changed, 122 insertions(+), 101 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 57750367bd26..980145da796a 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(cs, sampling_rate);
-show_store_one(cs, sampling_down_factor);
-show_store_one(cs, up_threshold);
-show_store_one(cs, down_threshold);
-show_store_one(cs, ignore_nice_load);
-show_store_one(cs, freq_step);
-show_one(cs, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(down_threshold);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(freq_step);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&down_threshold_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&freq_step_gov_sys.attr,
+gov_show_one(cs, sampling_rate);
+gov_show_one(cs, sampling_down_factor);
+gov_show_one(cs, up_threshold);
+gov_show_one(cs, down_threshold);
+gov_show_one(cs, ignore_nice_load);
+gov_show_one(cs, freq_step);
+gov_show_one(cs, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(up_threshold);
+gov_attr_rw(down_threshold);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(freq_step);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *dbs_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&sampling_down_factor.attr,
+	&up_threshold.attr,
+	&down_threshold.attr,
+	&ignore_nice_load.attr,
+	&freq_step.attr,
 	NULL
 };
 
-static struct attribute_group cs_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "conservative",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&down_threshold_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&freq_step_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group cs_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
+static struct attribute_group cs_attr_group = {
+	.attrs = dbs_attributes,
 	.name = "conservative",
 };
 
@@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
 
 static struct common_dbs_data cs_dbs_cdata = {
 	.governor = GOV_CONSERVATIVE,
-	.attr_group_gov_sys = &cs_attr_group_gov_sys,
-	.attr_group_gov_pol = &cs_attr_group_gov_pol,
+	.attr_group = &cs_attr_group,
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = cs_dbs_timer,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9a7edc91ad57..e785a118cbdc 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -22,14 +22,37 @@
 
 #include "cpufreq_governor.h"
 
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
+#define to_dbs_data(k) container_of(k, struct dbs_data, kobj)
+#define to_attr(a) container_of(a, struct governor_attr, attr)
+
+static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
 {
-	if (have_governor_per_policy())
-		return dbs_data->cdata->attr_group_gov_pol;
-	else
-		return dbs_data->cdata->attr_group_gov_sys;
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_attr(attr);
+
+	if (gattr->show)
+		return gattr->show(dbs_data, buf);
+
+	return -EIO;
+}
+
+static ssize_t store(struct kobject *kobj, struct attribute *attr,
+		     const char *buf, size_t count)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_attr(attr);
+
+	if (gattr->store)
+		return gattr->store(dbs_data, buf, count);
+
+	return -EIO;
 }
 
+static const struct sysfs_ops sysfs_ops = {
+	.show	= show,
+	.store	= store,
+};
+
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 {
 	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
@@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 				 struct dbs_data *dbs_data,
 				 struct common_dbs_data *cdata)
 {
+	struct attribute_group *attr_group;
 	int ret;
 
 	/* State should be equivalent to EXIT */
@@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	policy->governor_data = dbs_data;
 
-	ret = sysfs_create_group(get_governor_parent_kobj(policy),
-				 get_sysfs_attr(dbs_data));
-	if (ret)
+	attr_group = dbs_data->cdata->attr_group;
+	dbs_data->kobj_type.sysfs_ops = &sysfs_ops;
+	dbs_data->kobj_type.default_attrs = attr_group->attrs;
+
+	ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
+				   get_governor_parent_kobj(policy),
+				   attr_group->name);
+	if (ret) {
+		pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
 		goto reset_gdbs_data;
+	}
 
 	return 0;
 
@@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
 		return -EBUSY;
 
 	if (!--dbs_data->usage_count) {
-		sysfs_remove_group(get_governor_parent_kobj(policy),
-				   get_sysfs_attr(dbs_data));
+		kobject_put(&dbs_data->kobj);
 
 		policy->governor_data = NULL;
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index ad44a8546a3a..59b28133dd68 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol				\
 show_one(_gov, file_name);						\
 store_one(_gov, file_name)
 
+/* Governor's specific attributes */
+struct dbs_data;
+struct governor_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
+	ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
+			 size_t count);
+};
+
+#define gov_show_one(_gov, file_name)					\
+static ssize_t show_##file_name						\
+(struct dbs_data *dbs_data, char *buf)					\
+{									\
+	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
+	return sprintf(buf, "%u\n", tuners->file_name);			\
+}
+
+#define gov_attr_ro(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0444, show_##_name, NULL)
+
+#define gov_attr_rw(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0644, show_##_name, store_##_name)
+
 /* create helper routines */
 #define define_get_cpu_dbs_routines(_dbs_info)				\
 static struct cpu_dbs_info *get_cpu_cdbs(int cpu)			\
@@ -197,14 +222,12 @@ struct cs_dbs_tuners {
 };
 
 /* Common Governor data across policies */
-struct dbs_data;
 struct common_dbs_data {
 	/* Common across governors */
 	#define GOV_ONDEMAND		0
 	#define GOV_CONSERVATIVE	1
 	int governor;
-	struct attribute_group *attr_group_gov_sys; /* one governor - system */
-	struct attribute_group *attr_group_gov_pol; /* one governor - policy */
+	struct attribute_group *attr_group; /* one governor - system */
 
 	/*
 	 * Common data for platforms that don't set
@@ -234,6 +257,8 @@ struct dbs_data {
 	struct common_dbs_data *cdata;
 	int usage_count;
 	void *tuners;
+	struct kobject kobj;
+	struct kobj_type kobj_type;
 };
 
 /* Governor specific ops, will be passed to dbs_data->gov_ops */
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index b31f64745232..b7983dd02e24 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -436,51 +436,35 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(od, sampling_rate);
-show_store_one(od, io_is_busy);
-show_store_one(od, up_threshold);
-show_store_one(od, sampling_down_factor);
-show_store_one(od, ignore_nice_load);
-show_store_one(od, powersave_bias);
-show_one(od, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(io_is_busy);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(powersave_bias);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&powersave_bias_gov_sys.attr,
-	&io_is_busy_gov_sys.attr,
+gov_show_one(od, sampling_rate);
+gov_show_one(od, io_is_busy);
+gov_show_one(od, up_threshold);
+gov_show_one(od, sampling_down_factor);
+gov_show_one(od, ignore_nice_load);
+gov_show_one(od, powersave_bias);
+gov_show_one(od, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(io_is_busy);
+gov_attr_rw(up_threshold);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(powersave_bias);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *dbs_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&up_threshold.attr,
+	&sampling_down_factor.attr,
+	&ignore_nice_load.attr,
+	&powersave_bias.attr,
+	&io_is_busy.attr,
 	NULL
 };
 
-static struct attribute_group od_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "ondemand",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&powersave_bias_gov_pol.attr,
-	&io_is_busy_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group od_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
+static struct attribute_group od_attr_group = {
+	.attrs = dbs_attributes,
 	.name = "ondemand",
 };
 
@@ -542,8 +526,7 @@ static struct od_ops od_ops = {
 
 static struct common_dbs_data od_dbs_cdata = {
 	.governor = GOV_ONDEMAND,
-	.attr_group_gov_sys = &od_attr_group_gov_sys,
-	.attr_group_gov_pol = &od_attr_group_gov_pol,
+	.attr_group = &od_attr_group,
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = od_dbs_timer,
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 3/5] cpufreq: governor: Remove unused sysfs attribute macros
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
  2016-02-02 10:57 ` [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min() Viresh Kumar
  2016-02-02 10:57 ` [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops Viresh Kumar
@ 2016-02-02 10:57 ` Viresh Kumar
  2016-02-02 21:34   ` Rafael J. Wysocki
  2016-02-02 10:57 ` [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT Viresh Kumar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-02 10:57 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

The older macros aren't used anymore, remove them.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.h | 68 --------------------------------------
 1 file changed, 68 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 59b28133dd68..5a7de52815c4 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -40,74 +40,6 @@
 /* Ondemand Sampling types */
 enum {OD_NORMAL_SAMPLE, OD_SUB_SAMPLE};
 
-/*
- * Macro for creating governors sysfs routines
- *
- * - gov_sys: One governor instance per whole system
- * - gov_pol: One governor instance per policy
- */
-
-/* Create attributes */
-#define gov_sys_attr_ro(_name)						\
-static struct global_attr _name##_gov_sys =				\
-__ATTR(_name, 0444, show_##_name##_gov_sys, NULL)
-
-#define gov_sys_attr_rw(_name)						\
-static struct global_attr _name##_gov_sys =				\
-__ATTR(_name, 0644, show_##_name##_gov_sys, store_##_name##_gov_sys)
-
-#define gov_pol_attr_ro(_name)						\
-static struct freq_attr _name##_gov_pol =				\
-__ATTR(_name, 0444, show_##_name##_gov_pol, NULL)
-
-#define gov_pol_attr_rw(_name)						\
-static struct freq_attr _name##_gov_pol =				\
-__ATTR(_name, 0644, show_##_name##_gov_pol, store_##_name##_gov_pol)
-
-#define gov_sys_pol_attr_rw(_name)					\
-	gov_sys_attr_rw(_name);						\
-	gov_pol_attr_rw(_name)
-
-#define gov_sys_pol_attr_ro(_name)					\
-	gov_sys_attr_ro(_name);						\
-	gov_pol_attr_ro(_name)
-
-/* Create show/store routines */
-#define show_one(_gov, file_name)					\
-static ssize_t show_##file_name##_gov_sys				\
-(struct kobject *kobj, struct attribute *attr, char *buf)		\
-{									\
-	struct _gov##_dbs_tuners *tuners = _gov##_dbs_cdata.gdbs_data->tuners; \
-	return sprintf(buf, "%u\n", tuners->file_name);			\
-}									\
-									\
-static ssize_t show_##file_name##_gov_pol				\
-(struct cpufreq_policy *policy, char *buf)				\
-{									\
-	struct dbs_data *dbs_data = policy->governor_data;		\
-	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
-	return sprintf(buf, "%u\n", tuners->file_name);			\
-}
-
-#define store_one(_gov, file_name)					\
-static ssize_t store_##file_name##_gov_sys				\
-(struct kobject *kobj, struct attribute *attr, const char *buf, size_t count) \
-{									\
-	struct dbs_data *dbs_data = _gov##_dbs_cdata.gdbs_data;		\
-	return store_##file_name(dbs_data, buf, count);			\
-}									\
-									\
-static ssize_t store_##file_name##_gov_pol				\
-(struct cpufreq_policy *policy, const char *buf, size_t count)		\
-{									\
-	struct dbs_data *dbs_data = policy->governor_data;		\
-	return store_##file_name(dbs_data, buf, count);			\
-}
-
-#define show_store_one(_gov, file_name)					\
-show_one(_gov, file_name);						\
-store_one(_gov, file_name)
-
 /* Governor's specific attributes */
 struct dbs_data;
 struct governor_attr {
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
                   ` (2 preceding siblings ...)
  2016-02-02 10:57 ` [PATCH 3/5] cpufreq: governor: Remove unused sysfs attribute macros Viresh Kumar
@ 2016-02-02 10:57 ` Viresh Kumar
  2016-02-02 21:53   ` Rafael J. Wysocki
  2016-02-02 10:57 ` [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock Viresh Kumar
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-02 10:57 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

Commit 955ef4833574 ("cpufreq: Drop rwsem lock around
CPUFREQ_GOV_POLICY_EXIT") dropped these because of some ABBA lockup
issues.

The previous commit has fixed them all and we don't need to drop these
locks anymore.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 5 -----
 include/linux/cpufreq.h   | 4 ----
 2 files changed, 9 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e979ec78b695..5f7e24567e0e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2155,10 +2155,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 			return ret;
 		}
 
-		up_write(&policy->rwsem);
 		ret = __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-		down_write(&policy->rwsem);
-
 		if (ret) {
 			pr_err("%s: Failed to Exit Governor: %s (%d)\n",
 			       __func__, old_gov->name, ret);
@@ -2174,9 +2171,7 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 		if (!ret)
 			goto out;
 
-		up_write(&policy->rwsem);
 		__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
-		down_write(&policy->rwsem);
 	}
 
 	/* new governor failed, so re-start old one */
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 88a4215125bc..79b87cebaa9c 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -100,10 +100,6 @@ struct cpufreq_policy {
 	 * - Any routine that will write to the policy structure and/or may take away
 	 *   the policy altogether (eg. CPU hotplug), will hold this lock in write
 	 *   mode before doing so.
-	 *
-	 * Additional rules:
-	 * - Lock should not be held across
-	 *     __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 	 */
 	struct rw_semaphore	rwsem;
 
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
                   ` (3 preceding siblings ...)
  2016-02-02 10:57 ` [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT Viresh Kumar
@ 2016-02-02 10:57 ` Viresh Kumar
  2016-02-02 16:49   ` Juri Lelli
  2016-02-02 21:57   ` Rafael J. Wysocki
  2016-02-02 11:25 ` [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Juri Lelli
  2016-02-02 20:04 ` Rafael J. Wysocki
  6 siblings, 2 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-02 10:57 UTC (permalink / raw)
  To: Rafael Wysocki, juri.lelli
  Cc: linaro-kernel, linux-pm, skannan, peterz, mturquette,
	steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Viresh Kumar

Invalid state-transitions is verified by governor core now and there is
no need to replicate that in cpufreq core. Also we don't drop
policy->rwsem anymore, which makes rest of the races go away.

Simplify code a bit now.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 24 ------------------------
 include/linux/cpufreq.h   |  1 -
 2 files changed, 25 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 5f7e24567e0e..052ad1b9372c 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list);
 static struct cpufreq_driver *cpufreq_driver;
 static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
 static DEFINE_RWLOCK(cpufreq_driver_lock);
-DEFINE_MUTEX(cpufreq_governor_lock);
 
 /* Flag to suspend/resume CPUFreq governors */
 static bool cpufreq_suspended;
@@ -1963,21 +1962,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 
 	pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
 
-	mutex_lock(&cpufreq_governor_lock);
-	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
-	    || (!policy->governor_enabled
-	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
-		mutex_unlock(&cpufreq_governor_lock);
-		return -EBUSY;
-	}
-
-	if (event == CPUFREQ_GOV_STOP)
-		policy->governor_enabled = false;
-	else if (event == CPUFREQ_GOV_START)
-		policy->governor_enabled = true;
-
-	mutex_unlock(&cpufreq_governor_lock);
-
 	ret = policy->governor->governor(policy, event);
 
 	if (!ret) {
@@ -1985,14 +1969,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 			policy->governor->initialized++;
 		else if (event == CPUFREQ_GOV_POLICY_EXIT)
 			policy->governor->initialized--;
-	} else {
-		/* Restore original values */
-		mutex_lock(&cpufreq_governor_lock);
-		if (event == CPUFREQ_GOV_STOP)
-			policy->governor_enabled = true;
-		else if (event == CPUFREQ_GOV_START)
-			policy->governor_enabled = false;
-		mutex_unlock(&cpufreq_governor_lock);
 	}
 
 	if (((event == CPUFREQ_GOV_POLICY_INIT) && ret) ||
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 79b87cebaa9c..e90cf5d31e85 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -80,7 +80,6 @@ struct cpufreq_policy {
 	unsigned int		last_policy; /* policy before unplug */
 	struct cpufreq_governor	*governor; /* see below */
 	void			*governor_data;
-	bool			governor_enabled; /* governor start/stop flag */
 	char			last_governor[CPUFREQ_NAME_LEN]; /* last governor used */
 
 	struct work_struct	update; /* if update_policy() needs to be
-- 
2.7.0.79.gdc08a19

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
                   ` (4 preceding siblings ...)
  2016-02-02 10:57 ` [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock Viresh Kumar
@ 2016-02-02 11:25 ` Juri Lelli
  2016-02-02 20:04 ` Rafael J. Wysocki
  6 siblings, 0 replies; 45+ messages in thread
From: Juri Lelli @ 2016-02-02 11:25 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:
> Hi Rafael,
> 
> Sorry for doing this, I know you were also looking to fix this in a
> possibly different way. But I thought, it would be better if we fix
> that. We can scrap this version and take yours if that looks better.
> 
> The root cause of all the issues we were facing, was that we were taking
> policy->rwsem while accessing governor sysfs attributes. And that
> happened because we were sharing the show/store calls present in
> cpufreq.c.
> 
> I thought, perhaps the best way to fix it is to give separate sysfs-ops
> to governors. And that's what I did.
> 
> @Juri: I need your help in testing these. My platform doesn't give me
> those lockups (even without these patches) and Juno/Tc2 would fit
> better.
> 
> Can you please run some tests on these?
> 

Sure! Will do in the next few days.

Best,

- Juri

> They are pushed here for easy access (and auto test by build-bot):
> git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm.git cpufreq/governor-kobject
> 
> --
> viresh
> 
> Viresh Kumar (5):
>   cpufreq: governor: Kill declare_show_sampling_rate_min()
>   cpufreq: governor: Create separate sysfs-ops
>   cpufreq: governor: Remove unused sysfs attribute macros
>   cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
>   cpufreq: Get rid of ->governor_enabled and its lock
> 
>  drivers/cpufreq/cpufreq.c              |  29 ----------
>  drivers/cpufreq/cpufreq_conservative.c |  77 ++++++++++---------------
>  drivers/cpufreq/cpufreq_governor.c     |  86 ++++++++++++++++++++--------
>  drivers/cpufreq/cpufreq_governor.h     | 101 +++++++--------------------------
>  drivers/cpufreq/cpufreq_ondemand.c     |  77 ++++++++++---------------
>  include/linux/cpufreq.h                |   5 --
>  6 files changed, 143 insertions(+), 232 deletions(-)
> 
> -- 
> 2.7.0.79.gdc08a19
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 10:57 ` [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops Viresh Kumar
@ 2016-02-02 15:47   ` Juri Lelli
  2016-02-02 16:35     ` Rafael J. Wysocki
  2016-02-02 21:23   ` Rafael J. Wysocki
  1 sibling, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2016-02-02 15:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel, Juri Lelli

Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:
> Until now, governors (ondemand/conservative) were using the
> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> want to create governor's directory.
> 
> The problem is that, in case of 'freq-attr', we are forced to use
> show()/store() present in cpufreq.c, which always take policy->rwsem.
> 
> And because of that we were facing some ABBA lockups during governor
> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
> event.
> 
> That caused further problems and it never worked perfectly.
> 
> This patch attempts to fix that by creating separate sysfs-ops for
> cpufreq governors.
> 
> Because things got much simplified now, we don't need separate
> show/store callbacks for governor-for-system and governor-per-policy
> cases.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

This patch cleans things up a lot, that's good.

One thing I'm still concerned about, though: don't we need some locking
in place for some of the store operations on governors attributes? Are
store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
It seems that we can call them from different cpus concurrently.

Best,

- Juri

> ---
>  drivers/cpufreq/cpufreq_conservative.c | 71 +++++++++++++---------------------
>  drivers/cpufreq/cpufreq_governor.c     | 50 +++++++++++++++++++-----
>  drivers/cpufreq/cpufreq_governor.h     | 31 +++++++++++++--
>  drivers/cpufreq/cpufreq_ondemand.c     | 71 +++++++++++++---------------------
>  4 files changed, 122 insertions(+), 101 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index 57750367bd26..980145da796a 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -275,51 +275,35 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf,
>  	return count;
>  }
>  
> -show_store_one(cs, sampling_rate);
> -show_store_one(cs, sampling_down_factor);
> -show_store_one(cs, up_threshold);
> -show_store_one(cs, down_threshold);
> -show_store_one(cs, ignore_nice_load);
> -show_store_one(cs, freq_step);
> -show_one(cs, min_sampling_rate);
> -
> -gov_sys_pol_attr_rw(sampling_rate);
> -gov_sys_pol_attr_rw(sampling_down_factor);
> -gov_sys_pol_attr_rw(up_threshold);
> -gov_sys_pol_attr_rw(down_threshold);
> -gov_sys_pol_attr_rw(ignore_nice_load);
> -gov_sys_pol_attr_rw(freq_step);
> -gov_sys_pol_attr_ro(min_sampling_rate);
> -
> -static struct attribute *dbs_attributes_gov_sys[] = {
> -	&min_sampling_rate_gov_sys.attr,
> -	&sampling_rate_gov_sys.attr,
> -	&sampling_down_factor_gov_sys.attr,
> -	&up_threshold_gov_sys.attr,
> -	&down_threshold_gov_sys.attr,
> -	&ignore_nice_load_gov_sys.attr,
> -	&freq_step_gov_sys.attr,
> +gov_show_one(cs, sampling_rate);
> +gov_show_one(cs, sampling_down_factor);
> +gov_show_one(cs, up_threshold);
> +gov_show_one(cs, down_threshold);
> +gov_show_one(cs, ignore_nice_load);
> +gov_show_one(cs, freq_step);
> +gov_show_one(cs, min_sampling_rate);
> +
> +gov_attr_rw(sampling_rate);
> +gov_attr_rw(sampling_down_factor);
> +gov_attr_rw(up_threshold);
> +gov_attr_rw(down_threshold);
> +gov_attr_rw(ignore_nice_load);
> +gov_attr_rw(freq_step);
> +gov_attr_ro(min_sampling_rate);
> +
> +static struct attribute *dbs_attributes[] = {
> +	&min_sampling_rate.attr,
> +	&sampling_rate.attr,
> +	&sampling_down_factor.attr,
> +	&up_threshold.attr,
> +	&down_threshold.attr,
> +	&ignore_nice_load.attr,
> +	&freq_step.attr,
>  	NULL
>  };
>  
> -static struct attribute_group cs_attr_group_gov_sys = {
> -	.attrs = dbs_attributes_gov_sys,
> -	.name = "conservative",
> -};
> -
> -static struct attribute *dbs_attributes_gov_pol[] = {
> -	&min_sampling_rate_gov_pol.attr,
> -	&sampling_rate_gov_pol.attr,
> -	&sampling_down_factor_gov_pol.attr,
> -	&up_threshold_gov_pol.attr,
> -	&down_threshold_gov_pol.attr,
> -	&ignore_nice_load_gov_pol.attr,
> -	&freq_step_gov_pol.attr,
> -	NULL
> -};
> -
> -static struct attribute_group cs_attr_group_gov_pol = {
> -	.attrs = dbs_attributes_gov_pol,
> +static struct attribute_group cs_attr_group = {
> +	.attrs = dbs_attributes,
>  	.name = "conservative",
>  };
>  
> @@ -365,8 +349,7 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
>  
>  static struct common_dbs_data cs_dbs_cdata = {
>  	.governor = GOV_CONSERVATIVE,
> -	.attr_group_gov_sys = &cs_attr_group_gov_sys,
> -	.attr_group_gov_pol = &cs_attr_group_gov_pol,
> +	.attr_group = &cs_attr_group,
>  	.get_cpu_cdbs = get_cpu_cdbs,
>  	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
>  	.gov_dbs_timer = cs_dbs_timer,
> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
> index 9a7edc91ad57..e785a118cbdc 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -22,14 +22,37 @@
>  
>  #include "cpufreq_governor.h"
>  
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj)
> +#define to_attr(a) container_of(a, struct governor_attr, attr)
> +
> +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)
>  {
> -	if (have_governor_per_policy())
> -		return dbs_data->cdata->attr_group_gov_pol;
> -	else
> -		return dbs_data->cdata->attr_group_gov_sys;
> +	struct dbs_data *dbs_data = to_dbs_data(kobj);
> +	struct governor_attr *gattr = to_attr(attr);
> +
> +	if (gattr->show)
> +		return gattr->show(dbs_data, buf);
> +
> +	return -EIO;
> +}
> +
> +static ssize_t store(struct kobject *kobj, struct attribute *attr,
> +		     const char *buf, size_t count)
> +{
> +	struct dbs_data *dbs_data = to_dbs_data(kobj);
> +	struct governor_attr *gattr = to_attr(attr);
> +
> +	if (gattr->store)
> +		return gattr->store(dbs_data, buf, count);
> +
> +	return -EIO;
>  }
>  
> +static const struct sysfs_ops sysfs_ops = {
> +	.show	= show,
> +	.store	= store,
> +};
> +
>  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>  {
>  	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
>  				 struct dbs_data *dbs_data,
>  				 struct common_dbs_data *cdata)
>  {
> +	struct attribute_group *attr_group;
>  	int ret;
>  
>  	/* State should be equivalent to EXIT */
> @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
>  
>  	policy->governor_data = dbs_data;
>  
> -	ret = sysfs_create_group(get_governor_parent_kobj(policy),
> -				 get_sysfs_attr(dbs_data));
> -	if (ret)
> +	attr_group = dbs_data->cdata->attr_group;
> +	dbs_data->kobj_type.sysfs_ops = &sysfs_ops;
> +	dbs_data->kobj_type.default_attrs = attr_group->attrs;
> +
> +	ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
> +				   get_governor_parent_kobj(policy),
> +				   attr_group->name);
> +	if (ret) {
> +		pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
>  		goto reset_gdbs_data;
> +	}
>  
>  	return 0;
>  
> @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
>  		return -EBUSY;
>  
>  	if (!--dbs_data->usage_count) {
> -		sysfs_remove_group(get_governor_parent_kobj(policy),
> -				   get_sysfs_attr(dbs_data));
> +		kobject_put(&dbs_data->kobj);
>  
>  		policy->governor_data = NULL;
>  
> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
> index ad44a8546a3a..59b28133dd68 100644
> --- a/drivers/cpufreq/cpufreq_governor.h
> +++ b/drivers/cpufreq/cpufreq_governor.h
> @@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol				\
>  show_one(_gov, file_name);						\
>  store_one(_gov, file_name)
>  
> +/* Governor's specific attributes */
> +struct dbs_data;
> +struct governor_attr {
> +	struct attribute attr;
> +	ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
> +	ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
> +			 size_t count);
> +};
> +
> +#define gov_show_one(_gov, file_name)					\
> +static ssize_t show_##file_name						\
> +(struct dbs_data *dbs_data, char *buf)					\
> +{									\
> +	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
> +	return sprintf(buf, "%u\n", tuners->file_name);			\
> +}
> +
> +#define gov_attr_ro(_name)						\
> +static struct governor_attr _name =					\
> +__ATTR(_name, 0444, show_##_name, NULL)
> +
> +#define gov_attr_rw(_name)						\
> +static struct governor_attr _name =					\
> +__ATTR(_name, 0644, show_##_name, store_##_name)
> +
>  /* create helper routines */
>  #define define_get_cpu_dbs_routines(_dbs_info)				\
>  static struct cpu_dbs_info *get_cpu_cdbs(int cpu)			\
> @@ -197,14 +222,12 @@ struct cs_dbs_tuners {
>  };
>  
>  /* Common Governor data across policies */
> -struct dbs_data;
>  struct common_dbs_data {
>  	/* Common across governors */
>  	#define GOV_ONDEMAND		0
>  	#define GOV_CONSERVATIVE	1
>  	int governor;
> -	struct attribute_group *attr_group_gov_sys; /* one governor - system */
> -	struct attribute_group *attr_group_gov_pol; /* one governor - policy */
> +	struct attribute_group *attr_group; /* one governor - system */
>  
>  	/*
>  	 * Common data for platforms that don't set
> @@ -234,6 +257,8 @@ struct dbs_data {
>  	struct common_dbs_data *cdata;
>  	int usage_count;
>  	void *tuners;
> +	struct kobject kobj;
> +	struct kobj_type kobj_type;
>  };
>  
>  /* Governor specific ops, will be passed to dbs_data->gov_ops */
> diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
> index b31f64745232..b7983dd02e24 100644
> --- a/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/drivers/cpufreq/cpufreq_ondemand.c
> @@ -436,51 +436,35 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf,
>  	return count;
>  }
>  
> -show_store_one(od, sampling_rate);
> -show_store_one(od, io_is_busy);
> -show_store_one(od, up_threshold);
> -show_store_one(od, sampling_down_factor);
> -show_store_one(od, ignore_nice_load);
> -show_store_one(od, powersave_bias);
> -show_one(od, min_sampling_rate);
> -
> -gov_sys_pol_attr_rw(sampling_rate);
> -gov_sys_pol_attr_rw(io_is_busy);
> -gov_sys_pol_attr_rw(up_threshold);
> -gov_sys_pol_attr_rw(sampling_down_factor);
> -gov_sys_pol_attr_rw(ignore_nice_load);
> -gov_sys_pol_attr_rw(powersave_bias);
> -gov_sys_pol_attr_ro(min_sampling_rate);
> -
> -static struct attribute *dbs_attributes_gov_sys[] = {
> -	&min_sampling_rate_gov_sys.attr,
> -	&sampling_rate_gov_sys.attr,
> -	&up_threshold_gov_sys.attr,
> -	&sampling_down_factor_gov_sys.attr,
> -	&ignore_nice_load_gov_sys.attr,
> -	&powersave_bias_gov_sys.attr,
> -	&io_is_busy_gov_sys.attr,
> +gov_show_one(od, sampling_rate);
> +gov_show_one(od, io_is_busy);
> +gov_show_one(od, up_threshold);
> +gov_show_one(od, sampling_down_factor);
> +gov_show_one(od, ignore_nice_load);
> +gov_show_one(od, powersave_bias);
> +gov_show_one(od, min_sampling_rate);
> +
> +gov_attr_rw(sampling_rate);
> +gov_attr_rw(io_is_busy);
> +gov_attr_rw(up_threshold);
> +gov_attr_rw(sampling_down_factor);
> +gov_attr_rw(ignore_nice_load);
> +gov_attr_rw(powersave_bias);
> +gov_attr_ro(min_sampling_rate);
> +
> +static struct attribute *dbs_attributes[] = {
> +	&min_sampling_rate.attr,
> +	&sampling_rate.attr,
> +	&up_threshold.attr,
> +	&sampling_down_factor.attr,
> +	&ignore_nice_load.attr,
> +	&powersave_bias.attr,
> +	&io_is_busy.attr,
>  	NULL
>  };
>  
> -static struct attribute_group od_attr_group_gov_sys = {
> -	.attrs = dbs_attributes_gov_sys,
> -	.name = "ondemand",
> -};
> -
> -static struct attribute *dbs_attributes_gov_pol[] = {
> -	&min_sampling_rate_gov_pol.attr,
> -	&sampling_rate_gov_pol.attr,
> -	&up_threshold_gov_pol.attr,
> -	&sampling_down_factor_gov_pol.attr,
> -	&ignore_nice_load_gov_pol.attr,
> -	&powersave_bias_gov_pol.attr,
> -	&io_is_busy_gov_pol.attr,
> -	NULL
> -};
> -
> -static struct attribute_group od_attr_group_gov_pol = {
> -	.attrs = dbs_attributes_gov_pol,
> +static struct attribute_group od_attr_group = {
> +	.attrs = dbs_attributes,
>  	.name = "ondemand",
>  };
>  
> @@ -542,8 +526,7 @@ static struct od_ops od_ops = {
>  
>  static struct common_dbs_data od_dbs_cdata = {
>  	.governor = GOV_ONDEMAND,
> -	.attr_group_gov_sys = &od_attr_group_gov_sys,
> -	.attr_group_gov_pol = &od_attr_group_gov_pol,
> +	.attr_group = &od_attr_group,
>  	.get_cpu_cdbs = get_cpu_cdbs,
>  	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
>  	.gov_dbs_timer = od_dbs_timer,
> -- 
> 2.7.0.79.gdc08a19
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 15:47   ` Juri Lelli
@ 2016-02-02 16:35     ` Rafael J. Wysocki
  2016-02-02 17:01       ` Juri Lelli
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 16:35 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Viresh Kumar, Rafael Wysocki, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli <juri.lelli@arm.com> wrote:
> Hi Viresh,
>
> On 02/02/16 16:27, Viresh Kumar wrote:
>> Until now, governors (ondemand/conservative) were using the
>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> want to create governor's directory.
>>
>> The problem is that, in case of 'freq-attr', we are forced to use
>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>
>> And because of that we were facing some ABBA lockups during governor
>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>> event.
>>
>> That caused further problems and it never worked perfectly.
>>
>> This patch attempts to fix that by creating separate sysfs-ops for
>> cpufreq governors.
>>
>> Because things got much simplified now, we don't need separate
>> show/store callbacks for governor-for-system and governor-per-policy
>> cases.
>>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>
> This patch cleans things up a lot, that's good.
>
> One thing I'm still concerned about, though: don't we need some locking
> in place for some of the store operations on governors attributes? Are
> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?

That would require some investigation I suppose.

> It seems that we can call them from different cpus concurrently.

Yes, we can.

One quick-and-dirty way of dealing with that might be to introduce a
"sysfs lock" into struct dbs_data and hold that around the invocation
of gattr->store() in the sysfs_ops's ->store callback.

>
> Best,
>
> - Juri

BTW, you could have dropped the stuff below this line from your reply
message.  That at least would have prevented tools like Patchwork from
storing useless garbage.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2016-02-02 10:57 ` [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock Viresh Kumar
@ 2016-02-02 16:49   ` Juri Lelli
  2016-02-03  6:05     ` Viresh Kumar
  2016-02-02 21:57   ` Rafael J. Wysocki
  1 sibling, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2016-02-02 16:49 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

Hi Viresh,

On 02/02/16 16:27, Viresh Kumar wrote:
> Invalid state-transitions is verified by governor core now and there is
> no need to replicate that in cpufreq core. Also we don't drop
> policy->rwsem anymore, which makes rest of the races go away.

There are still paths where we call __cpufreq_governor() without holding
policy->rwsem, but those should be fixed with my cleanups (that I intend
to refresh and post soon). So, I'm not sure we can safely remove this
yet.

> 
> Simplify code a bit now.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq.c | 24 ------------------------
>  include/linux/cpufreq.h   |  1 -
>  2 files changed, 25 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 5f7e24567e0e..052ad1b9372c 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -102,7 +102,6 @@ static LIST_HEAD(cpufreq_governor_list);
>  static struct cpufreq_driver *cpufreq_driver;
>  static DEFINE_PER_CPU(struct cpufreq_policy *, cpufreq_cpu_data);
>  static DEFINE_RWLOCK(cpufreq_driver_lock);
> -DEFINE_MUTEX(cpufreq_governor_lock);
>  
>  /* Flag to suspend/resume CPUFreq governors */
>  static bool cpufreq_suspended;
> @@ -1963,21 +1962,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  
>  	pr_debug("%s: for CPU %u, event %u\n", __func__, policy->cpu, event);
>  
> -	mutex_lock(&cpufreq_governor_lock);
> -	if ((policy->governor_enabled && event == CPUFREQ_GOV_START)
> -	    || (!policy->governor_enabled
> -	    && (event == CPUFREQ_GOV_LIMITS || event == CPUFREQ_GOV_STOP))) {
> -		mutex_unlock(&cpufreq_governor_lock);
> -		return -EBUSY;
> -	}
> -
> -	if (event == CPUFREQ_GOV_STOP)
> -		policy->governor_enabled = false;
> -	else if (event == CPUFREQ_GOV_START)
> -		policy->governor_enabled = true;
> -
> -	mutex_unlock(&cpufreq_governor_lock);
> -
>  	ret = policy->governor->governor(policy, event);

So, __cpufreq_governor() becomes effectively a wrapper around
->governor() calls and governors are left responsible for implementing
the state machine with appropriate checks.

I'm wondering if this approach is completely sane, but what we end up
with your changes should work (and we kill a lock! :)).

Maybe we add a comment somewhere stating exactly how things are meant to
work?

Thanks,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 16:35     ` Rafael J. Wysocki
@ 2016-02-02 17:01       ` Juri Lelli
  2016-02-02 19:40         ` Rafael J. Wysocki
  2016-02-03  6:33         ` Viresh Kumar
  0 siblings, 2 replies; 45+ messages in thread
From: Juri Lelli @ 2016-02-02 17:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Rafael Wysocki, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

Hi Rafael,

On 02/02/16 17:35, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli <juri.lelli@arm.com> wrote:
> > Hi Viresh,
> >
> > On 02/02/16 16:27, Viresh Kumar wrote:
> >> Until now, governors (ondemand/conservative) were using the
> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> >> want to create governor's directory.
> >>
> >> The problem is that, in case of 'freq-attr', we are forced to use
> >> show()/store() present in cpufreq.c, which always take policy->rwsem.
> >>
> >> And because of that we were facing some ABBA lockups during governor
> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
> >> event.
> >>
> >> That caused further problems and it never worked perfectly.
> >>
> >> This patch attempts to fix that by creating separate sysfs-ops for
> >> cpufreq governors.
> >>
> >> Because things got much simplified now, we don't need separate
> >> show/store callbacks for governor-for-system and governor-per-policy
> >> cases.
> >>
> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> >
> > This patch cleans things up a lot, that's good.
> >
> > One thing I'm still concerned about, though: don't we need some locking
> > in place for some of the store operations on governors attributes? Are
> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
> 
> That would require some investigation I suppose.
> 
> > It seems that we can call them from different cpus concurrently.
> 
> Yes, we can.
> 
> One quick-and-dirty way of dealing with that might be to introduce a
> "sysfs lock" into struct dbs_data and hold that around the invocation
> of gattr->store() in the sysfs_ops's ->store callback.
> 

There is value in trying to solve this issue by using some of the
existing locks, IMHO.

Can't we actually try to use the policy->rwsem (or one of the core
locks) + wait_for_completion approach as we do in cpufreq core?

> BTW, you could have dropped the stuff below this line from your reply
> message.  That at least would have prevented tools like Patchwork from
> storing useless garbage.
> 

Right. Sorry for the garbage; I'll check twice that I trim my replies in
the future.

Best,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 17:01       ` Juri Lelli
@ 2016-02-02 19:40         ` Rafael J. Wysocki
  2016-02-02 22:21           ` Saravana Kannan
  2016-02-03  6:33         ` Viresh Kumar
  1 sibling, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 19:40 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael J. Wysocki, Viresh Kumar, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Saravana Kannan, Peter Zijlstra,
	Michael Turquette, Steve Muckle, Vincent Guittot,
	Morten Rasmussen, dietmar.eggemann, Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
> Hi Rafael,
>
> On 02/02/16 17:35, Rafael J. Wysocki wrote:
>> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>> > Hi Viresh,
>> >
>> > On 02/02/16 16:27, Viresh Kumar wrote:
>> >> Until now, governors (ondemand/conservative) were using the
>> >> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>> >> want to create governor's directory.
>> >>
>> >> The problem is that, in case of 'freq-attr', we are forced to use
>> >> show()/store() present in cpufreq.c, which always take policy->rwsem.
>> >>
>> >> And because of that we were facing some ABBA lockups during governor
>> >> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>> >> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>> >> event.
>> >>
>> >> That caused further problems and it never worked perfectly.
>> >>
>> >> This patch attempts to fix that by creating separate sysfs-ops for
>> >> cpufreq governors.
>> >>
>> >> Because things got much simplified now, we don't need separate
>> >> show/store callbacks for governor-for-system and governor-per-policy
>> >> cases.
>> >>
>> >> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> >
>> > This patch cleans things up a lot, that's good.
>> >
>> > One thing I'm still concerned about, though: don't we need some locking
>> > in place for some of the store operations on governors attributes? Are
>> > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
>>
>> That would require some investigation I suppose.
>>
>> > It seems that we can call them from different cpus concurrently.
>>
>> Yes, we can.
>>
>> One quick-and-dirty way of dealing with that might be to introduce a
>> "sysfs lock" into struct dbs_data and hold that around the invocation
>> of gattr->store() in the sysfs_ops's ->store callback.
>>
>
> There is value in trying to solve this issue by using some of the
> existing locks, IMHO.

Some value - maybe.  I'm not sure how much of it, though.

Finer-grained locking is generally easier to follow, because the locks
tend to be used for specific purposes only.

> Can't we actually try to use the policy->rwsem (or one of the core
> locks) + wait_for_completion approach as we do in cpufreq core?

No.  Too many things depend on that lock already and some of them work
by accident rather than by design.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups
  2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
                   ` (5 preceding siblings ...)
  2016-02-02 11:25 ` [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Juri Lelli
@ 2016-02-02 20:04 ` Rafael J. Wysocki
  2016-02-03  2:22   ` Viresh Kumar
  6 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 20:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Hi Rafael,
>
> Sorry for doing this, I know you were also looking to fix this in a
> possibly different way. But I thought, it would be better if we fix
> that. We can scrap this version and take yours if that looks better.

That's not nice to be honest.  At the very least you could have asked
me about the status of my work before sending this.

Fortunately, though, when I started to look deeper into fixing this
problem I thought I didn't like the overall design of things in the
governor land, so I started to change that and my modifications turn
out to be sort of complementary with respect to this patchset.  Of
course, they do conflict, but I can redo my patches on top of these if
that's necessary.  That said I'm going to post them in their current
form anyway, at least to show the direction I want to take going
forward.

> The root cause of all the issues we were facing, was that we were taking
> policy->rwsem while accessing governor sysfs attributes. And that
> happened because we were sharing the show/store calls present in
> cpufreq.c.
>
> I thought, perhaps the best way to fix it is to give separate sysfs-ops
> to governors. And that's what I did.

Yes, that's what I was talking about in the other thread.

My overall impression here is that the code changes make sense.  Some
details need to be improved (like the concurrent ->store for governor
tunables pointed out by Juri).  The patch changelogs suck, though.

If your hope was that this might go into 4.5, there is a small chance
of that happening, but only if it can be made ready this week.
Otherwise, I'd prefer it to be redone on top of my changes.

Let me comment on the individual patches.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min()
  2016-02-02 10:57 ` [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min() Viresh Kumar
@ 2016-02-02 20:23   ` Rafael J. Wysocki
  2016-02-03  2:29     ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 20:23 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> This extra macro is required because the variable min_sampling_rate is
> made part of 'struct dbs_data' instead of governor specific tunables.
>
> For further optimization, its better that we kill
> declare_show_sampling_rate_min() by moving min_sampling_rate to governor
> specific tunables.
>
> Lets do it.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

To me, this is not about the macro, but about moving min_sampling_rate
to governor tunables, so my subject would be something like "cpufreq:
governor: Treat min_sampling_rate as a governor-specific tunable".

My changelog, then, would be something like the following:

"The min_sampling_rate governor tunable is a field in struct dbs_data,
so it has to be handled in a special way separate from the rest of
governor tunables.  In particular, that requires a special macro to be
present for creating its show/store sysfs attribute callbacks.

However, there is no real need for the data structures and code in
question to be arranged this way and if min_sampling_rate is moved to
data structures holding the other governor tunables, the sysfs
attribute creation macros that work with those tunables will also work
with min_sampling_rate and the special macro for it won't be necessary
any more.  That will make it easier to modify the governor code going
forward, so do it."

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 10:57 ` [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops Viresh Kumar
  2016-02-02 15:47   ` Juri Lelli
@ 2016-02-02 21:23   ` Rafael J. Wysocki
  2016-02-03  6:58     ` Viresh Kumar
  1 sibling, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 21:23 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

First, the subject might be better.  What about something like
"cpufreq: governor: New sysfs show/store callbacks for governor
tunables", for example?

> Until now, governors (ondemand/conservative) were using the
> 'global-attr' or 'freq-attr', depending on the sysfs location where we
> want to create governor's directory.

"The ondemand and conservative governors use the global-attr or
freq-attr structures to represent sysfs attributes corresponding to
their tunables (which of them is actually used depends on whether or
not different policy objects can use different governors at the same
time and, consequently, on where those attributes are located in
sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable
attributes and they always acquire the policy->rwsem lock before
carrying out the operation.  That may lead to an ABBA deadlock if
governor tunable attributes are removed under policy->rwsem while one
of them is being accessed concurrently (if sysfs attributes removal
wins the race, it will wait for the access to complete with
policy->rwsem held while the attribute callback will block on
policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of
the ->governor callback with the event arg equal to
CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up
race conditions that had not been possible with policy->rwsem held all
the time.  Therefore policy->rwsem cannot be dropped in
cpufreq_set_policy() at any point, but the deadlock situation
described above must be avoided too.

To that end, use the observation that in principle governor tunables
may be represented by the same data type regardless of whether the
governor is system-wide or per-policy and introduce a new structure,
struct governor_attr, for representing them and new corresponding
macros for creating show/store sysfs callbacks for them.  Also make
their parent kobject use a new kobject type whose default show/store
callbacks are not related to the standard core cpufreq ones in any way
(and they don't acquire policy->rwsem in particular)."

IOW, (1) describe the problem you're addressing so that people
unfamiliar with the code in question can understand it, (2) describe
what is done to address the problem (what's the idea and what changes
are made to implement it).

[cut]

> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -22,14 +22,37 @@
>
>  #include "cpufreq_governor.h"
>
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +#define to_dbs_data(k) container_of(k, struct dbs_data, kobj)
> +#define to_attr(a) container_of(a, struct governor_attr, attr)

Please change the above to static inline routines.

> +
> +static ssize_t show(struct kobject *kobj, struct attribute *attr, char *buf)

A better name please.  Something that will correspond to the purpose.

>  {
> -       if (have_governor_per_policy())
> -               return dbs_data->cdata->attr_group_gov_pol;
> -       else
> -               return dbs_data->cdata->attr_group_gov_sys;
> +       struct dbs_data *dbs_data = to_dbs_data(kobj);
> +       struct governor_attr *gattr = to_attr(attr);
> +
> +       if (gattr->show)
> +               return gattr->show(dbs_data, buf);
> +
> +       return -EIO;
> +}
> +
> +static ssize_t store(struct kobject *kobj, struct attribute *attr,
> +                    const char *buf, size_t count)

Ditto.

> +{
> +       struct dbs_data *dbs_data = to_dbs_data(kobj);
> +       struct governor_attr *gattr = to_attr(attr);
> +
> +       if (gattr->store)
> +               return gattr->store(dbs_data, buf, count);

Say two instances of this run in parallel with each other, either for
the same attribute or for different attributes under the same
dbs_data.  What's the guarantee that they won't make conflicting
changes?

> +
> +       return -EIO;
>  }
>
> +static const struct sysfs_ops sysfs_ops = {
> +       .show   = show,
> +       .store  = store,
> +};

That is completely enigmatic, so please at least add a comment describing it.

> +
>  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
>  {
>         struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> @@ -354,6 +377,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
>                                  struct dbs_data *dbs_data,
>                                  struct common_dbs_data *cdata)
>  {
> +       struct attribute_group *attr_group;
>         int ret;
>
>         /* State should be equivalent to EXIT */
> @@ -395,10 +419,17 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
>
>         policy->governor_data = dbs_data;
>
> -       ret = sysfs_create_group(get_governor_parent_kobj(policy),
> -                                get_sysfs_attr(dbs_data));
> -       if (ret)
> +       attr_group = dbs_data->cdata->attr_group;
> +       dbs_data->kobj_type.sysfs_ops = &sysfs_ops;
> +       dbs_data->kobj_type.default_attrs = attr_group->attrs;

Why can't the kobject type be defined in struct common_dbs_data?
Surely, it will be the same for all dbs_data objects corresponding to
the same governor, won't it?

> +
> +       ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
> +                                  get_governor_parent_kobj(policy),
> +                                  attr_group->name);
> +       if (ret) {
> +               pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);

pr_debug() would be better here.

>                 goto reset_gdbs_data;
> +       }
>
>         return 0;
>
> @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
>                 return -EBUSY;
>
>         if (!--dbs_data->usage_count) {
> -               sysfs_remove_group(get_governor_parent_kobj(policy),
> -                                  get_sysfs_attr(dbs_data));
> +               kobject_put(&dbs_data->kobj);

Don't we need a ->release callback for this kobject?

>
>                 policy->governor_data = NULL;
>
> diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
> index ad44a8546a3a..59b28133dd68 100644
> --- a/drivers/cpufreq/cpufreq_governor.h
> +++ b/drivers/cpufreq/cpufreq_governor.h
> @@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol                         \
>  show_one(_gov, file_name);                                             \
>  store_one(_gov, file_name)
>
> +/* Governor's specific attributes */
> +struct dbs_data;
> +struct governor_attr {
> +       struct attribute attr;
> +       ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
> +       ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
> +                        size_t count);
> +};
> +
> +#define gov_show_one(_gov, file_name)                                  \
> +static ssize_t show_##file_name                                                \
> +(struct dbs_data *dbs_data, char *buf)                                 \
> +{                                                                      \
> +       struct _gov##_dbs_tuners *tuners = dbs_data->tuners;            \
> +       return sprintf(buf, "%u\n", tuners->file_name);                 \
> +}
> +
> +#define gov_attr_ro(_name)                                             \
> +static struct governor_attr _name =                                    \
> +__ATTR(_name, 0444, show_##_name, NULL)
> +
> +#define gov_attr_rw(_name)                                             \
> +static struct governor_attr _name =                                    \
> +__ATTR(_name, 0644, show_##_name, store_##_name)
> +
>  /* create helper routines */
>  #define define_get_cpu_dbs_routines(_dbs_info)                         \
>  static struct cpu_dbs_info *get_cpu_cdbs(int cpu)                      \
> @@ -197,14 +222,12 @@ struct cs_dbs_tuners {
>  };
>
>  /* Common Governor data across policies */
> -struct dbs_data;
>  struct common_dbs_data {
>         /* Common across governors */
>         #define GOV_ONDEMAND            0
>         #define GOV_CONSERVATIVE        1
>         int governor;
> -       struct attribute_group *attr_group_gov_sys; /* one governor - system */
> -       struct attribute_group *attr_group_gov_pol; /* one governor - policy */
> +       struct attribute_group *attr_group; /* one governor - system */
>
>         /*
>          * Common data for platforms that don't set
> @@ -234,6 +257,8 @@ struct dbs_data {
>         struct common_dbs_data *cdata;
>         int usage_count;
>         void *tuners;
> +       struct kobject kobj;
> +       struct kobj_type kobj_type;

This is questionable.  The kobject type doesn't have to be dynamic IMO.

>  };

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 3/5] cpufreq: governor: Remove unused sysfs attribute macros
  2016-02-02 10:57 ` [PATCH 3/5] cpufreq: governor: Remove unused sysfs attribute macros Viresh Kumar
@ 2016-02-02 21:34   ` Rafael J. Wysocki
  0 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 21:34 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> The older macros aren't used anymore, remove them.

It doesn't follow from either the subject or the changelog which
macros *in* *particular* are in question.

What about changing the subject to something like "cpufreq: governor:
Drop unused macros for creating governor tunable attributes" and the
changelog to something like:

"The previous commit introduced a new set of macros for creating sysfs
attributes that represent governor tunables and the old macros used
for this purpose are not needed any more, so drop them."

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
  2016-02-02 10:57 ` [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT Viresh Kumar
@ 2016-02-02 21:53   ` Rafael J. Wysocki
  2016-02-03  5:51     ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 21:53 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Commit 955ef4833574 ("cpufreq: Drop rwsem lock around
> CPUFREQ_GOV_POLICY_EXIT") dropped these because of some ABBA lockup
> issues.
>
> The previous commit has fixed them all and we don't need to drop these
> locks anymore.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

First of all, this is effectively reverting commit 955ef4833574, so
the subject should be "Revert commit 955ef4833574 (cpufreq: Drop rwsem
lock around CPUFREQ_GOV_POLICY_EXIT)".

There should be a Fixes: tag pointing to commit 955ef4833574 and a
Reported-by: for Juri.

If there is a link to a bug report related to this, it should be
pointed to by a Link: tag.

The changelog should say why the original commit was there and why the
way it attempted to solve the problem was incorrect.  It also should
say that the original problem was addressed by a previous commit, so
this one can be reverted without consequences.

But I'm not going to write that changelog.  I actually am not going to
write any changelogs for you any more, because I'm seriously tired of
doing that.  Moreover, if I see a patch from you with a changelog
that's not acceptable to me, it will immediately go to the "not
applicable" trash bin no matter what the changes below look like.  You
*have* *to* write useful changelogs.  This isn't optional or best
effort.  This is mandatory and important.

Now, I'm not really sure if the ordering of this patchset is right.
Maybe we should just revert upfront with the "we'll address the
original problem in the following commits" statement in the changelog
and fix it in a different way?  It looks like patches [1-3/5] fix a
problem that isn't there even, but would appear after the [4/5] if
they were not applied previously, which doesn't sound really
straightforward to me.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2016-02-02 10:57 ` [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock Viresh Kumar
  2016-02-02 16:49   ` Juri Lelli
@ 2016-02-02 21:57   ` Rafael J. Wysocki
  1 sibling, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 21:57 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Invalid state-transitions is verified by governor core now

What about the governors that don't use cpufreq_governor_dbs()?

> and there is no need to replicate that in cpufreq core. Also we don't drop
> policy->rwsem anymore, which makes rest of the races go away.
>
> Simplify code a bit now.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

I won't make this change just yet.  At least not in 4.5 (provided that
the other patches go into it).

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 19:40         ` Rafael J. Wysocki
@ 2016-02-02 22:21           ` Saravana Kannan
  2016-02-02 23:42             ` Rafael J. Wysocki
  2016-02-03  6:51             ` Viresh Kumar
  0 siblings, 2 replies; 45+ messages in thread
From: Saravana Kannan @ 2016-02-02 22:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Juri Lelli, Viresh Kumar, Rafael Wysocki, Lists linaro-kernel,
	linux-pm, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>> Hi Rafael,
>>
>> On 02/02/16 17:35, Rafael J. Wysocki wrote:
>>> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>> Hi Viresh,
>>>>
>>>> On 02/02/16 16:27, Viresh Kumar wrote:
>>>>> Until now, governors (ondemand/conservative) were using the
>>>>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>>>>> want to create governor's directory.
>>>>>
>>>>> The problem is that, in case of 'freq-attr', we are forced to use
>>>>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>>>>
>>>>> And because of that we were facing some ABBA lockups during governor
>>>>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>>>>> rwsem right before calling governor callback for CPUFREQ_GOV_POLICY_EXIT
>>>>> event.
>>>>>
>>>>> That caused further problems and it never worked perfectly.
>>>>>
>>>>> This patch attempts to fix that by creating separate sysfs-ops for
>>>>> cpufreq governors.
>>>>>
>>>>> Because things got much simplified now, we don't need separate
>>>>> show/store callbacks for governor-for-system and governor-per-policy
>>>>> cases.
>>>>>
>>>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>>>
>>>> This patch cleans things up a lot, that's good.
>>>>
>>>> One thing I'm still concerned about, though: don't we need some locking
>>>> in place for some of the store operations on governors attributes? Are
>>>> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
>>>
>>> That would require some investigation I suppose.
>>>
>>>> It seems that we can call them from different cpus concurrently.
>>>
>>> Yes, we can.
>>>
>>> One quick-and-dirty way of dealing with that might be to introduce a
>>> "sysfs lock" into struct dbs_data and hold that around the invocation
>>> of gattr->store() in the sysfs_ops's ->store callback.
>>>
>>
>> There is value in trying to solve this issue by using some of the
>> existing locks, IMHO.
>
> Some value - maybe.  I'm not sure how much of it, though.
>
> Finer-grained locking is generally easier to follow, because the locks
> tend to be used for specific purposes only.
>
>> Can't we actually try to use the policy->rwsem (or one of the core
>> locks) + wait_for_completion approach as we do in cpufreq core?
>
> No.  Too many things depend on that lock already and some of them work
> by accident rather than by design.

Also, wait_for_completion() and complete() is just another way to 
implement a lock. So, it won't necessarily solve any deadlock issues.

I also don't like this patch because it forces governors to either 
implement their own macros and management of their attributes or force 
them to use the governor structs that come with cpufreq_governor.h. 
cpufreq_governor.h IMHO is very ondemand and conservative governor 
specific and is very irrelevant for sched-dvfs or any other governors 
(hint hint).

The only time this ABBA locking is an issue is when governor are 
changing and trying to add/remove attributes. That can easily be checked 
in store_governor and dealt with without holding the policy rwsem if the 
governors can provide their per sys and per policy attribute arrays as 
part of registering themselves.

I'm sorry that I just keep talking about the idea and not sending out 
the patches.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 22:21           ` Saravana Kannan
@ 2016-02-02 23:42             ` Rafael J. Wysocki
  2016-02-03  1:07               ` Rafael J. Wysocki
  2016-02-03  6:51             ` Viresh Kumar
  1 sibling, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-02 23:42 UTC (permalink / raw)
  To: Saravana Kannan, Viresh Kumar
  Cc: Rafael J. Wysocki, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>
>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>
>>> Hi Rafael,
>>>
>>> On 02/02/16 17:35, Rafael J. Wysocki wrote:
>>>>
>>>> On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>>>
>>>>> Hi Viresh,
>>>>>
>>>>> On 02/02/16 16:27, Viresh Kumar wrote:
>>>>>>
>>>>>> Until now, governors (ondemand/conservative) were using the
>>>>>> 'global-attr' or 'freq-attr', depending on the sysfs location where we
>>>>>> want to create governor's directory.
>>>>>>
>>>>>> The problem is that, in case of 'freq-attr', we are forced to use
>>>>>> show()/store() present in cpufreq.c, which always take policy->rwsem.
>>>>>>
>>>>>> And because of that we were facing some ABBA lockups during governor
>>>>>> callback event CPUFREQ_GOV_POLICY_EXIT. And so we were dropping the
>>>>>> rwsem right before calling governor callback for
>>>>>> CPUFREQ_GOV_POLICY_EXIT
>>>>>> event.
>>>>>>
>>>>>> That caused further problems and it never worked perfectly.
>>>>>>
>>>>>> This patch attempts to fix that by creating separate sysfs-ops for
>>>>>> cpufreq governors.
>>>>>>
>>>>>> Because things got much simplified now, we don't need separate
>>>>>> show/store callbacks for governor-for-system and governor-per-policy
>>>>>> cases.
>>>>>>
>>>>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>>>>>
>>>>>
>>>>> This patch cleans things up a lot, that's good.
>>>>>
>>>>> One thing I'm still concerned about, though: don't we need some locking
>>>>> in place for some of the store operations on governors attributes? Are
>>>>> store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
>>>>
>>>>
>>>> That would require some investigation I suppose.
>>>>
>>>>> It seems that we can call them from different cpus concurrently.
>>>>
>>>>
>>>> Yes, we can.
>>>>
>>>> One quick-and-dirty way of dealing with that might be to introduce a
>>>> "sysfs lock" into struct dbs_data and hold that around the invocation
>>>> of gattr->store() in the sysfs_ops's ->store callback.
>>>>
>>>
>>> There is value in trying to solve this issue by using some of the
>>> existing locks, IMHO.
>>
>>
>> Some value - maybe.  I'm not sure how much of it, though.
>>
>> Finer-grained locking is generally easier to follow, because the locks
>> tend to be used for specific purposes only.
>>
>>> Can't we actually try to use the policy->rwsem (or one of the core
>>> locks) + wait_for_completion approach as we do in cpufreq core?
>>
>>
>> No.  Too many things depend on that lock already and some of them work
>> by accident rather than by design.
>
>
> Also, wait_for_completion() and complete() is just another way to implement
> a lock. So, it won't necessarily solve any deadlock issues.
>
> I also don't like this patch because it forces governors to either implement
> their own macros and management of their attributes or force them to use the
> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
> is very ondemand and conservative governor specific and is very irrelevant
> for sched-dvfs or any other governors (hint hint).
>
> The only time this ABBA locking is an issue is when governor are changing
> and trying to add/remove attributes. That can easily be checked in
> store_governor and dealt with without holding the policy rwsem if the
> governors can provide their per sys and per policy attribute arrays as part
> of registering themselves.
>
> I'm sorry that I just keep talking about the idea and not sending out the
> patches.

I think you have a point, though.

The deadlock really is specific to the governors using the code in
cpufreq_governor.c.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 23:42             ` Rafael J. Wysocki
@ 2016-02-03  1:07               ` Rafael J. Wysocki
  2016-02-03  1:32                 ` Saravana Kannan
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03  1:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Saravana Kannan, Viresh Kumar, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
>> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>>
>>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>>

[cut]

>>
>> I also don't like this patch because it forces governors to either implement
>> their own macros and management of their attributes or force them to use the
>> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
>> is very ondemand and conservative governor specific and is very irrelevant
>> for sched-dvfs or any other governors (hint hint).
>>
>> The only time this ABBA locking is an issue is when governor are changing
>> and trying to add/remove attributes. That can easily be checked in
>> store_governor and dealt with without holding the policy rwsem if the
>> governors can provide their per sys and per policy attribute arrays as part
>> of registering themselves.
>>
>> I'm sorry that I just keep talking about the idea and not sending out the
>> patches.
>
> I think you have a point, though.
>
> The deadlock really is specific to the governors using the code in
> cpufreq_governor.c.

That said no other governors in the tree use any sysfs attributes for
tunables AFAICS and the out-of-the tree ones are out of interest here.

Also the deadlock happens if one of the tunable attributes is accessed
while we're trying to remove it which very well may happen on read
access too.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  1:07               ` Rafael J. Wysocki
@ 2016-02-03  1:32                 ` Saravana Kannan
  2016-02-03  1:52                   ` Rafael J. Wysocki
  2016-02-03  6:54                   ` Viresh Kumar
  0 siblings, 2 replies; 45+ messages in thread
From: Saravana Kannan @ 2016-02-03  1:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Juri Lelli, Rafael Wysocki, Lists linaro-kernel,
	linux-pm, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
> On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skannan@codeaurora.org> wrote:
>>> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>>>
>>>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>>>
>
> [cut]
>
>>>
>>> I also don't like this patch because it forces governors to either implement
>>> their own macros and management of their attributes or force them to use the
>>> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
>>> is very ondemand and conservative governor specific and is very irrelevant
>>> for sched-dvfs or any other governors (hint hint).
>>>
>>> The only time this ABBA locking is an issue is when governor are changing
>>> and trying to add/remove attributes. That can easily be checked in
>>> store_governor and dealt with without holding the policy rwsem if the
>>> governors can provide their per sys and per policy attribute arrays as part
>>> of registering themselves.
>>>
>>> I'm sorry that I just keep talking about the idea and not sending out the
>>> patches.
>>
>> I think you have a point, though.
>>
>> The deadlock really is specific to the governors using the code in
>> cpufreq_governor.c.
>
> That said no other governors in the tree use any sysfs attributes for
> tunables AFAICS and the out-of-the tree ones are out of interest here.

But if we are expecting sched dvfs to come in, why make it worse for it. 
It would be completely pointless to try and shoehorn sched dvfs to use 
cpufreq_governor.c

> Also the deadlock happens if one of the tunable attributes is accessed
> while we're trying to remove it which very well may happen on read
> access too.

Isn't this THE deadlock we are talking about? The removal of the 
attributes only happen when governors are changes and we send a 
POLICY_EXIT and or all the cores are hotplugged out. And my suggestion 
would work just as well there.

Why are you prefixing your sentence with "Also"? Is there some other 
case I'm not considering?

-Saravana
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  1:32                 ` Saravana Kannan
@ 2016-02-03  1:52                   ` Rafael J. Wysocki
  2016-02-03  4:03                     ` Saravana Kannan
  2016-02-03  6:54                   ` Viresh Kumar
  1 sibling, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03  1:52 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Viresh Kumar, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan <skannan@codeaurora.org> wrote:
> On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
>>
>> On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki <rafael@kernel.org>
>> wrote:
>>>
>>> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skannan@codeaurora.org>
>>> wrote:
>>>>
>>>> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>>>>
>>>>>
>>>>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>>>>
>>>>>>
>>
>> [cut]
>>
>>>>
>>>> I also don't like this patch because it forces governors to either
>>>> implement
>>>> their own macros and management of their attributes or force them to use
>>>> the
>>>> governor structs that come with cpufreq_governor.h. cpufreq_governor.h
>>>> IMHO
>>>> is very ondemand and conservative governor specific and is very
>>>> irrelevant
>>>> for sched-dvfs or any other governors (hint hint).
>>>>
>>>> The only time this ABBA locking is an issue is when governor are
>>>> changing
>>>> and trying to add/remove attributes. That can easily be checked in
>>>> store_governor and dealt with without holding the policy rwsem if the
>>>> governors can provide their per sys and per policy attribute arrays as
>>>> part
>>>> of registering themselves.
>>>>
>>>> I'm sorry that I just keep talking about the idea and not sending out
>>>> the
>>>> patches.
>>>
>>>
>>> I think you have a point, though.
>>>
>>> The deadlock really is specific to the governors using the code in
>>> cpufreq_governor.c.
>>
>>
>> That said no other governors in the tree use any sysfs attributes for
>> tunables AFAICS and the out-of-the tree ones are out of interest here.
>
>
> But if we are expecting sched dvfs to come in, why make it worse for it. It
> would be completely pointless to try and shoehorn sched dvfs to use
> cpufreq_governor.c

Well, do you honestly think that using the existing stuff in it would
be a good idea?

If not, then why it matters at all?

>> Also the deadlock happens if one of the tunable attributes is accessed
>> while we're trying to remove it which very well may happen on read
>> access too.
>
> Isn't this THE deadlock we are talking about? The removal of the attributes
> only happen when governors are changes and we send a POLICY_EXIT and or all
> the cores are hotplugged out.

It generally happens when the "old" governor is going away, whatever the reason.

> And my suggestion would work just as well there.
>
> Why are you prefixing your sentence with "Also"? Is there some other case
> I'm not considering?

Say someone is reading sampling_rate for a policy with 1 CPU in it and
someone else is taking the CPU offline.  The governor EXIT code path
(that will trigger as a result) will try to remove the sampling_rate
attribute and (if it does that under policy->rwsem) it'll wait for the
read access to finish.  Where exactly would you put the deadlock
prevention in this case?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups
  2016-02-02 20:04 ` Rafael J. Wysocki
@ 2016-02-03  2:22   ` Viresh Kumar
  2016-02-03 11:37     ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  2:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02-02-16, 21:04, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > Hi Rafael,
> >
> > Sorry for doing this, I know you were also looking to fix this in a
> > possibly different way. But I thought, it would be better if we fix
> > that. We can scrap this version and take yours if that looks better.
> 
> That's not nice to be honest.  At the very least you could have asked
> me about the status of my work before sending this.

I started looking into this yesterday morning, while you were away and
finished it before you came back. So, it would have taken more time
and so I just sent them. I don't have any issues (as mentioned
earlier) is discarding them for the work you might have already done.

> Fortunately, though, when I started to look deeper into fixing this
> problem I thought I didn't like the overall design of things in the
> governor land, so I started to change that and my modifications turn
> out to be sort of complementary with respect to this patchset.

That's good.

> Of
> course, they do conflict, but I can redo my patches on top of these if
> that's necessary.

Yeah, even I don't have any issues in rebasing over your work, if I
have to.

> That said I'm going to post them in their current
> form anyway, at least to show the direction I want to take going
> forward.

Sure.

> > I thought, perhaps the best way to fix it is to give separate sysfs-ops
> > to governors. And that's what I did.
> 
> Yes, that's what I was talking about in the other thread.

I must have missed that then :(

> My overall impression here is that the code changes make sense.  Some
> details need to be improved (like the concurrent ->store for governor
> tunables pointed out by Juri). 

> The patch changelogs suck, though.

Like always :(

> If your hope was that this might go into 4.5, there is a small chance
> of that happening, but only if it can be made ready this week.

Will try my best.

> Otherwise, I'd prefer it to be redone on top of my changes.

No worries.

> Let me comment on the individual patches.

Thanks for taking this up Rafael, really appreciate it :)

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min()
  2016-02-02 20:23   ` Rafael J. Wysocki
@ 2016-02-03  2:29     ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  2:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02-02-16, 21:23, Rafael J. Wysocki wrote:
> To me, this is not about the macro, but about moving min_sampling_rate
> to governor tunables, so my subject would be something like "cpufreq:
> governor: Treat min_sampling_rate as a governor-specific tunable".
> 
> My changelog, then, would be something like the following:
> 
> "The min_sampling_rate governor tunable is a field in struct dbs_data,
> so it has to be handled in a special way separate from the rest of
> governor tunables.  In particular, that requires a special macro to be
> present for creating its show/store sysfs attribute callbacks.
> 
> However, there is no real need for the data structures and code in
> question to be arranged this way and if min_sampling_rate is moved to
> data structures holding the other governor tunables, the sysfs
> attribute creation macros that work with those tunables will also work
> with min_sampling_rate and the special macro for it won't be necessary
> any more.  That will make it easier to modify the governor code going
> forward, so do it."

I just copy pasted it with:

[ Rafael: Written changelog ]

just before my sign-off. Thanks.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  1:52                   ` Rafael J. Wysocki
@ 2016-02-03  4:03                     ` Saravana Kannan
  2016-02-03  6:57                       ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Saravana Kannan @ 2016-02-03  4:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Juri Lelli, Rafael Wysocki, Lists linaro-kernel,
	linux-pm, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02/02/2016 05:52 PM, Rafael J. Wysocki wrote:
> On Wed, Feb 3, 2016 at 2:32 AM, Saravana Kannan <skannan@codeaurora.org> wrote:
>> On 02/02/2016 05:07 PM, Rafael J. Wysocki wrote:
>>>
>>> On Wed, Feb 3, 2016 at 12:42 AM, Rafael J. Wysocki <rafael@kernel.org>
>>> wrote:
>>>>
>>>> On Tue, Feb 2, 2016 at 11:21 PM, Saravana Kannan <skannan@codeaurora.org>
>>>> wrote:
>>>>>
>>>>> On 02/02/2016 11:40 AM, Rafael J. Wysocki wrote:
>>>>>>
>>>>>>
>>>>>> On Tue, Feb 2, 2016 at 6:01 PM, Juri Lelli <juri.lelli@arm.com> wrote:
>>>>>>>
>>>>>>>
>>>
>>> [cut]
>>>
>>>>>
>>>>> I also don't like this patch because it forces governors to either
>>>>> implement
>>>>> their own macros and management of their attributes or force them to use
>>>>> the
>>>>> governor structs that come with cpufreq_governor.h. cpufreq_governor.h
>>>>> IMHO
>>>>> is very ondemand and conservative governor specific and is very
>>>>> irrelevant
>>>>> for sched-dvfs or any other governors (hint hint).
>>>>>
>>>>> The only time this ABBA locking is an issue is when governor are
>>>>> changing
>>>>> and trying to add/remove attributes. That can easily be checked in
>>>>> store_governor and dealt with without holding the policy rwsem if the
>>>>> governors can provide their per sys and per policy attribute arrays as
>>>>> part
>>>>> of registering themselves.
>>>>>
>>>>> I'm sorry that I just keep talking about the idea and not sending out
>>>>> the
>>>>> patches.
>>>>
>>>>
>>>> I think you have a point, though.
>>>>
>>>> The deadlock really is specific to the governors using the code in
>>>> cpufreq_governor.c.
>>>
>>>
>>> That said no other governors in the tree use any sysfs attributes for
>>> tunables AFAICS and the out-of-the tree ones are out of interest here.
>>
>>
>> But if we are expecting sched dvfs to come in, why make it worse for it. It
>> would be completely pointless to try and shoehorn sched dvfs to use
>> cpufreq_governor.c
>
> Well, do you honestly think that using the existing stuff in it would
> be a good idea?
>
> If not, then why it matters at all?
>
>>> Also the deadlock happens if one of the tunable attributes is accessed
>>> while we're trying to remove it which very well may happen on read
>>> access too.
>>
>> Isn't this THE deadlock we are talking about? The removal of the attributes
>> only happen when governors are changes and we send a POLICY_EXIT and or all
>> the cores are hotplugged out.
>
> It generally happens when the "old" governor is going away, whatever the reason.
>
>> And my suggestion would work just as well there.
>>
>> Why are you prefixing your sentence with "Also"? Is there some other case
>> I'm not considering?
>
> Say someone is reading sampling_rate for a policy with 1 CPU in it and
> someone else is taking the CPU offline.  The governor EXIT code path
> (that will trigger as a result) will try to remove the sampling_rate
> attribute and (if it does that under policy->rwsem) it'll wait for the
> read access to finish.  Where exactly would you put the deadlock
> prevention in this case?

This is the hotplug case I mentioned. The sysfs file removals will 
happen only for the last CPU in that policy (we thankfully optimized 
that part last year). We also know that multiple CPUs can't be 
hotplugged at the same time. So, in the start of 
cpufreq_offline_prepare, we just need to check if this is the last CPU 
in the policy and if that's the case, do the gov sysfs remove and then 
grab the policy lock and do all our crap. If a read is going on, that's 
going to finish before the sysfs attr remove can go ahead and it can 
grab the policy lock if it needs to and that still won't cause a 
deadlock because we haven't yet grabbed the policy lock in 
cpufreq_offline_prepare(). If the read comes after the sysfs remove, 
then the read is obviously going to fail (we can depend on the sysfs 
framework on doing its job there).

Will that still leave any race conditions in?

-Saravana
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
  2016-02-02 21:53   ` Rafael J. Wysocki
@ 2016-02-03  5:51     ` Viresh Kumar
  2016-02-03 12:24       ` Rafael J. Wysocki
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  5:51 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02-02-16, 22:53, Rafael J. Wysocki wrote:
> First of all, this is effectively reverting commit 955ef4833574, so
> the subject should be "Revert commit 955ef4833574 (cpufreq: Drop rwsem
> lock around CPUFREQ_GOV_POLICY_EXIT)".
> 
> There should be a Fixes: tag pointing to commit 955ef4833574 and a
> Reported-by: for Juri.
> 
> If there is a link to a bug report related to this, it should be
> pointed to by a Link: tag.
> 
> The changelog should say why the original commit was there and why the
> way it attempted to solve the problem was incorrect.  It also should
> say that the original problem was addressed by a previous commit, so
> this one can be reverted without consequences.

How about this:

    Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
    
    Earlier, when the struct freq-attr was used to represent governor
    attributes, the standard cpufreq show/store sysfs attribute callbacks
    were applied to the governor tunable attributes and they always acquire
    the policy->rwsem lock before carrying out the operation.  That could
    have resulted in an ABBA deadlock if governor tunable attributes are
    removed under policy->rwsem while one of them is being accessed
    concurrently (if sysfs attributes removal wins the race, it will wait
    for the access to complete with policy->rwsem held while the attribute
    callback will block on policy->rwsem indefinitely).
    
    We attempted to address this issue by dropping policy->rwsem around
    governor tunable attributes removal (that is, around invocations of the
    ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
    in cpufreq_set_policy(), but that opened up race conditions that had not
    been possible with policy->rwsem held all the time.
    
    The previous commit, "cpufreq: governor: New sysfs show/store callbacks
    for governor tunables", fixed the original ABBA deadlock by adding new
    governor specific show/store callbacks.
    
    We don't have to drop rwsem around invocations of governor event
    CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.
    
    Fixes: 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT")
    Reported-by: Juri Lelli <juri.lelli@arm.com>
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> But I'm not going to write that changelog.  I actually am not going to
> write any changelogs for you any more, because I'm seriously tired of
> doing that.  Moreover, if I see a patch from you with a changelog
> that's not acceptable to me, it will immediately go to the "not
> applicable" trash bin no matter what the changes below look like.  You
> *have* *to* write useful changelogs.  This isn't optional or best
> effort.  This is mandatory and important.

Will try to improve, sorry about that (again).

> Now, I'm not really sure if the ordering of this patchset is right.
> Maybe we should just revert upfront with the "we'll address the
> original problem in the following commits" statement in the changelog
> and fix it in a different way?

Wouldn't that break things like 'git bisect'? People running kernels
after the reverted commits may start hitting lockdeps.

> It looks like patches [1-3/5] fix a
> problem that isn't there even, but would appear after the [4/5] if
> they were not applied previously, which doesn't sound really
> straightforward to me.

I am going to fight hard for it, if you feel 4/5 should be the first
patch here, I will do that.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2016-02-02 16:49   ` Juri Lelli
@ 2016-02-03  6:05     ` Viresh Kumar
  2016-02-03 11:05       ` Juri Lelli
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  6:05 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 02-02-16, 16:49, Juri Lelli wrote:
> There are still paths where we call __cpufreq_governor() without holding
> policy->rwsem, but those should be fixed with my cleanups (that I intend
> to refresh and post soon). So, I'm not sure we can safely remove this
> yet.

No, we can't.. Though I haven't seen any races from happening even
after removing it, but it doesn't mean we can't.

The deal is that, the entire sequence of events needs to be guaranteed
to happen in a particular order without any other code performing
similar operations concurrently.

And so we need to preserve the other sites with proper rwsem locking
first.

> So, __cpufreq_governor() becomes effectively a wrapper around
> ->governor() calls and governors are left responsible for implementing
> the state machine with appropriate checks.

Not really. The core can now guarantee that the entire sequence
happens atomically. For example, on governor switch, we need to
guarantee that STOP/EXIT happen without any intervention for the old
governor. Or, INIT/START/LIMITS happen without any intervention for
the new governor, etc..

> Maybe we add a comment somewhere stating exactly how things are meant to
> work?

Hmm.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 17:01       ` Juri Lelli
  2016-02-02 19:40         ` Rafael J. Wysocki
@ 2016-02-03  6:33         ` Viresh Kumar
  1 sibling, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  6:33 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael J. Wysocki, Rafael Wysocki, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02-02-16, 17:01, Juri Lelli wrote:
> Hi Rafael,
> 
> On 02/02/16 17:35, Rafael J. Wysocki wrote:
> > On Tue, Feb 2, 2016 at 4:47 PM, Juri Lelli <juri.lelli@arm.com> wrote:

> > > This patch cleans things up a lot, that's good.
> > >
> > > One thing I'm still concerned about, though: don't we need some locking
> > > in place for some of the store operations on governors attributes? Are
> > > store_{ignore_nice_load, sampling_down_fact, etc} safe without locking?
> > 
> > That would require some investigation I suppose.

Yeah, that protection is required. Sorry about that.

> > > It seems that we can call them from different cpus concurrently.
> > 
> > Yes, we can.
> > 
> > One quick-and-dirty way of dealing with that might be to introduce a
> > "sysfs lock" into struct dbs_data and hold that around the invocation
> > of gattr->store() in the sysfs_ops's ->store callback.

s/dirty/sane ? :)

> Can't we actually try to use the policy->rwsem (or one of the core
> locks) + wait_for_completion approach as we do in cpufreq core?

policy->rwsem will defeat the purpose of this change as it will
reintroduce the ABBA issue.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 22:21           ` Saravana Kannan
  2016-02-02 23:42             ` Rafael J. Wysocki
@ 2016-02-03  6:51             ` Viresh Kumar
  1 sibling, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  6:51 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02-02-16, 14:21, Saravana Kannan wrote:
> I also don't like this patch because it forces governors to either implement
> their own macros and management of their attributes or force them to use the
> governor structs that come with cpufreq_governor.h. cpufreq_governor.h IMHO
> is very ondemand and conservative governor specific and is very irrelevant
> for sched-dvfs or any other governors (hint hint).

But who is stopping us from breaking that file and moving some of it
into include/linux/cpufreq.h ?

We can do that today as well, but it would be fine to do that, when we
add more governors to the core. Though, it would only take a simple
patch if people want me to do it now.

> The only time this ABBA locking is an issue is when governor are changing
> and trying to add/remove attributes. That can easily be checked in
> store_governor

store_scaling_governor ??

> and dealt with without holding the policy rwsem if the

Are you saying that we could have taken the rwsem from the generic
cpufreq.c:store() and dropped it from store_scaling_governor() ? That
would have been something similar to what I tried earlier, which I
never posted (I gave the link to that few days back).

> governors can provide their per sys and per policy attribute arrays as part
> of registering themselves.

These per-sys and per-policy attributes really suck. There is nothing
really different in the implementation, just that the show/store
callbacks have different prototype. One accept 'kboj' as the parameter,
other accept 'policy'. I would call that a HACK as well (I only
implemented it though).

That should just die. A single list of attributes is what we should
have had initially as well.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  1:32                 ` Saravana Kannan
  2016-02-03  1:52                   ` Rafael J. Wysocki
@ 2016-02-03  6:54                   ` Viresh Kumar
  2016-02-03 10:51                     ` Juri Lelli
  2016-02-03 20:14                     ` Saravana Kannan
  1 sibling, 2 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  6:54 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02-02-16, 17:32, Saravana Kannan wrote:
> But if we are expecting sched dvfs to come in, why make it worse for it. It
> would be completely pointless to try and shoehorn sched dvfs to use
> cpufreq_governor.c

We can move the common part to cpufreq core and not make sched-dvfs
reuse cpufreq_governor.c

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  4:03                     ` Saravana Kannan
@ 2016-02-03  6:57                       ` Viresh Kumar
  2016-02-03 20:07                         ` Saravana Kannan
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  6:57 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02-02-16, 20:03, Saravana Kannan wrote:
> This is the hotplug case I mentioned. The sysfs file removals will happen
> only for the last CPU in that policy (we thankfully optimized that part last
> year). We also know that multiple CPUs can't be hotplugged at the same time.
> So, in the start of cpufreq_offline_prepare, we just need to check if this
> is the last CPU in the policy and if that's the case, do the gov sysfs
> remove and then grab the policy lock and do all our crap. If a read is going
> on, that's going to finish before the sysfs attr remove can go ahead and it
> can grab the policy lock if it needs to and that still won't cause a
> deadlock because we haven't yet grabbed the policy lock in
> cpufreq_offline_prepare(). If the read comes after the sysfs remove, then
> the read is obviously going to fail (we can depend on the sysfs framework on
> doing its job there).

IMHO, these are all dirty hacks we should stay away from. Adding such
hunks in code is considered a band-aid kind of solution and hurts
readability badly. The new solution (new governor show/store)
implement this in a very clean and proper way I feel..

Others are free to disagree though :)

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-02 21:23   ` Rafael J. Wysocki
@ 2016-02-03  6:58     ` Viresh Kumar
  2016-02-03 12:42       ` Rafael J. Wysocki
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03  6:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 02-02-16, 22:23, Rafael J. Wysocki wrote:
> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> "The ondemand and conservative governors use the global-attr or
> freq-attr structures to represent sysfs attributes corresponding to
> their tunables

> (which of them is actually used depends on whether or
> not different policy objects can use different governors at the same
> time

Not exactly. Different policies can always use different governors.
What made the difference was that different policies using same
governor, with different tunables or separate governor directories.

I have reworded this para like:

    The ondemand and conservative governors use the global-attr or freq-attr
    structures to represent sysfs attributes corresponding to their tunables
    (which of them is actually used depends on whether or not different
    policy objects can use same governor with different tunables at the same
    time and, consequently, on where those attributes are located in sysfs).

Please let me know if isn't clear.

> > --- a/drivers/cpufreq/cpufreq_governor.c
> > +       ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
> > +                                  get_governor_parent_kobj(policy),
> > +                                  attr_group->name);
> > +       if (ret) {
> > +               pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
> 
> pr_debug() would be better here.

Its a real error, why pr_debug for that ?

> >                 goto reset_gdbs_data;
> > +       }
> >
> >         return 0;
> >
> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
> >                 return -EBUSY;
> >
> >         if (!--dbs_data->usage_count) {
> > -               sysfs_remove_group(get_governor_parent_kobj(policy),
> > -                                  get_sysfs_attr(dbs_data));
> > +               kobject_put(&dbs_data->kobj);
> 
> Don't we need a ->release callback for this kobject?

There is nothing that we need to free from the ->release() callback.
We are using the kobject here just to get separate show/store
callbacks.

Here is the new version based on the review comments received until
now:

-------------------------8<-------------------------

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Tue, 2 Feb 2016 12:35:01 +0530
Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
 governor tunables

The ondemand and conservative governors use the global-attr or freq-attr
structures to represent sysfs attributes corresponding to their tunables
(which of them is actually used depends on whether or not different
policy objects can use same governor with different tunables at the same
time and, consequently, on where those attributes are located in sysfs).

Unfortunately, in the freq-attr case, the standard cpufreq show/store
sysfs attribute callbacks are applied to the governor tunable attributes
and they always acquire the policy->rwsem lock before carrying out the
operation.  That may lead to an ABBA deadlock if governor tunable
attributes are removed under policy->rwsem while one of them is being
accessed concurrently (if sysfs attributes removal wins the race, it
will wait for the access to complete with policy->rwsem held while the
attribute callback will block on policy->rwsem indefinitely).

We attempted to address this issue by dropping policy->rwsem around
governor tunable attributes removal (that is, around invocations of the
->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
in cpufreq_set_policy(), but that opened up race conditions that had not
been possible with policy->rwsem held all the time.  Therefore
policy->rwsem cannot be dropped in cpufreq_set_policy() at any point,
but the deadlock situation described above must be avoided too.

To that end, use the observation that in principle governor tunables may
be represented by the same data type regardless of whether the governor
is system-wide or per-policy and introduce a new structure, struct
governor_attr, for representing them and new corresponding macros for
creating show/store sysfs callbacks for them.  Also make their parent
kobject use a new kobject type whose default show/store callbacks are
not related to the standard core cpufreq ones in any way (and they don't
acquire policy->rwsem in particular).

[ Rafael: Written changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_conservative.c | 73 ++++++++++++----------------------
 drivers/cpufreq/cpufreq_governor.c     | 71 ++++++++++++++++++++++++++++-----
 drivers/cpufreq/cpufreq_governor.h     | 34 ++++++++++++++--
 drivers/cpufreq/cpufreq_ondemand.c     | 73 ++++++++++++----------------------
 4 files changed, 144 insertions(+), 107 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 57750367bd26..c749fb4fe5d2 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -275,54 +275,33 @@ static ssize_t store_freq_step(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(cs, sampling_rate);
-show_store_one(cs, sampling_down_factor);
-show_store_one(cs, up_threshold);
-show_store_one(cs, down_threshold);
-show_store_one(cs, ignore_nice_load);
-show_store_one(cs, freq_step);
-show_one(cs, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(down_threshold);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(freq_step);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&down_threshold_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&freq_step_gov_sys.attr,
+gov_show_one(cs, sampling_rate);
+gov_show_one(cs, sampling_down_factor);
+gov_show_one(cs, up_threshold);
+gov_show_one(cs, down_threshold);
+gov_show_one(cs, ignore_nice_load);
+gov_show_one(cs, freq_step);
+gov_show_one(cs, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(up_threshold);
+gov_attr_rw(down_threshold);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(freq_step);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *cs_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&sampling_down_factor.attr,
+	&up_threshold.attr,
+	&down_threshold.attr,
+	&ignore_nice_load.attr,
+	&freq_step.attr,
 	NULL
 };
 
-static struct attribute_group cs_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "conservative",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&down_threshold_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&freq_step_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group cs_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
-	.name = "conservative",
-};
-
 /************************** sysfs end ************************/
 
 static int cs_init(struct dbs_data *dbs_data, bool notify)
@@ -365,8 +344,8 @@ define_get_cpu_dbs_routines(cs_cpu_dbs_info);
 
 static struct common_dbs_data cs_dbs_cdata = {
 	.governor = GOV_CONSERVATIVE,
-	.attr_group_gov_sys = &cs_attr_group_gov_sys,
-	.attr_group_gov_pol = &cs_attr_group_gov_pol,
+	.kobj_name = "conservative",
+	.kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = cs_attributes },
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = cs_dbs_timer,
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 9a7edc91ad57..a9f335c4e461 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -22,14 +22,62 @@
 
 #include "cpufreq_governor.h"
 
-static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
+static inline struct dbs_data *to_dbs_data(struct kobject *kobj)
 {
-	if (have_governor_per_policy())
-		return dbs_data->cdata->attr_group_gov_pol;
-	else
-		return dbs_data->cdata->attr_group_gov_sys;
+	return container_of(kobj, struct dbs_data, kobj);
+}
+
+static inline struct governor_attr *to_gov_attr(struct attribute *attr)
+{
+	return container_of(attr, struct governor_attr, attr);
+}
+
+static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
+			     char *buf)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_gov_attr(attr);
+	int ret = -EIO;
+
+	down_read(&dbs_data->rwsem);
+
+	if (gattr->show)
+		ret = gattr->show(dbs_data, buf);
+
+	up_read(&dbs_data->rwsem);
+
+	return ret;
 }
 
+static ssize_t governor_store(struct kobject *kobj, struct attribute *attr,
+			      const char *buf, size_t count)
+{
+	struct dbs_data *dbs_data = to_dbs_data(kobj);
+	struct governor_attr *gattr = to_gov_attr(attr);
+	int ret = -EIO;
+
+	down_write(&dbs_data->rwsem);
+
+	if (gattr->store)
+		ret = gattr->store(dbs_data, buf, count);
+
+	up_write(&dbs_data->rwsem);
+
+	return ret;
+}
+
+/*
+ * Sysfs Ops for accessing governor attributes.
+ *
+ * All show/store invocations for governor specific sysfs attributes, will first
+ * call the below show/store callbacks and the attribute specific callback will
+ * be called from within it.
+ */
+const struct sysfs_ops governor_sysfs_ops = {
+	.show	= governor_show,
+	.store	= governor_store,
+};
+
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 {
 	struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
@@ -383,6 +431,7 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	dbs_data->cdata = cdata;
 	dbs_data->usage_count = 1;
+	init_rwsem(&dbs_data->rwsem);
 
 	ret = cdata->init(dbs_data, !policy->governor->initialized);
 	if (ret)
@@ -395,10 +444,13 @@ static int cpufreq_governor_init(struct cpufreq_policy *policy,
 
 	policy->governor_data = dbs_data;
 
-	ret = sysfs_create_group(get_governor_parent_kobj(policy),
-				 get_sysfs_attr(dbs_data));
-	if (ret)
+	ret = kobject_init_and_add(&dbs_data->kobj, &cdata->kobj_type,
+				   get_governor_parent_kobj(policy),
+				   cdata->kobj_name);
+	if (ret) {
+		pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
 		goto reset_gdbs_data;
+	}
 
 	return 0;
 
@@ -426,8 +478,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
 		return -EBUSY;
 
 	if (!--dbs_data->usage_count) {
-		sysfs_remove_group(get_governor_parent_kobj(policy),
-				   get_sysfs_attr(dbs_data));
+		kobject_put(&dbs_data->kobj);
 
 		policy->governor_data = NULL;
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index ad44a8546a3a..67500a1015cf 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -108,6 +108,31 @@ static ssize_t store_##file_name##_gov_pol				\
 show_one(_gov, file_name);						\
 store_one(_gov, file_name)
 
+/* Governor's specific attributes */
+struct dbs_data;
+struct governor_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct dbs_data *dbs_data, char *buf);
+	ssize_t (*store)(struct dbs_data *dbs_data, const char *buf,
+			 size_t count);
+};
+
+#define gov_show_one(_gov, file_name)					\
+static ssize_t show_##file_name						\
+(struct dbs_data *dbs_data, char *buf)					\
+{									\
+	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
+	return sprintf(buf, "%u\n", tuners->file_name);			\
+}
+
+#define gov_attr_ro(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0444, show_##_name, NULL)
+
+#define gov_attr_rw(_name)						\
+static struct governor_attr _name =					\
+__ATTR(_name, 0644, show_##_name, store_##_name)
+
 /* create helper routines */
 #define define_get_cpu_dbs_routines(_dbs_info)				\
 static struct cpu_dbs_info *get_cpu_cdbs(int cpu)			\
@@ -197,14 +222,13 @@ struct cs_dbs_tuners {
 };
 
 /* Common Governor data across policies */
-struct dbs_data;
 struct common_dbs_data {
 	/* Common across governors */
 	#define GOV_ONDEMAND		0
 	#define GOV_CONSERVATIVE	1
 	int governor;
-	struct attribute_group *attr_group_gov_sys; /* one governor - system */
-	struct attribute_group *attr_group_gov_pol; /* one governor - policy */
+	const char *kobj_name;
+	struct kobj_type kobj_type;
 
 	/*
 	 * Common data for platforms that don't set
@@ -234,6 +258,9 @@ struct dbs_data {
 	struct common_dbs_data *cdata;
 	int usage_count;
 	void *tuners;
+	struct kobject kobj;
+	/* Protect concurrent updates to governor tunables from sysfs */
+	struct rw_semaphore rwsem;
 };
 
 /* Governor specific ops, will be passed to dbs_data->gov_ops */
@@ -256,6 +283,7 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 }
 
 extern struct mutex cpufreq_governor_lock;
+extern const struct sysfs_ops governor_sysfs_ops;
 
 void gov_add_timers(struct cpufreq_policy *policy, unsigned int delay);
 void gov_cancel_work(struct cpu_common_dbs_info *shared);
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index b31f64745232..82ed490f7de0 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -436,54 +436,33 @@ static ssize_t store_powersave_bias(struct dbs_data *dbs_data, const char *buf,
 	return count;
 }
 
-show_store_one(od, sampling_rate);
-show_store_one(od, io_is_busy);
-show_store_one(od, up_threshold);
-show_store_one(od, sampling_down_factor);
-show_store_one(od, ignore_nice_load);
-show_store_one(od, powersave_bias);
-show_one(od, min_sampling_rate);
-
-gov_sys_pol_attr_rw(sampling_rate);
-gov_sys_pol_attr_rw(io_is_busy);
-gov_sys_pol_attr_rw(up_threshold);
-gov_sys_pol_attr_rw(sampling_down_factor);
-gov_sys_pol_attr_rw(ignore_nice_load);
-gov_sys_pol_attr_rw(powersave_bias);
-gov_sys_pol_attr_ro(min_sampling_rate);
-
-static struct attribute *dbs_attributes_gov_sys[] = {
-	&min_sampling_rate_gov_sys.attr,
-	&sampling_rate_gov_sys.attr,
-	&up_threshold_gov_sys.attr,
-	&sampling_down_factor_gov_sys.attr,
-	&ignore_nice_load_gov_sys.attr,
-	&powersave_bias_gov_sys.attr,
-	&io_is_busy_gov_sys.attr,
+gov_show_one(od, sampling_rate);
+gov_show_one(od, io_is_busy);
+gov_show_one(od, up_threshold);
+gov_show_one(od, sampling_down_factor);
+gov_show_one(od, ignore_nice_load);
+gov_show_one(od, powersave_bias);
+gov_show_one(od, min_sampling_rate);
+
+gov_attr_rw(sampling_rate);
+gov_attr_rw(io_is_busy);
+gov_attr_rw(up_threshold);
+gov_attr_rw(sampling_down_factor);
+gov_attr_rw(ignore_nice_load);
+gov_attr_rw(powersave_bias);
+gov_attr_ro(min_sampling_rate);
+
+static struct attribute *od_attributes[] = {
+	&min_sampling_rate.attr,
+	&sampling_rate.attr,
+	&up_threshold.attr,
+	&sampling_down_factor.attr,
+	&ignore_nice_load.attr,
+	&powersave_bias.attr,
+	&io_is_busy.attr,
 	NULL
 };
 
-static struct attribute_group od_attr_group_gov_sys = {
-	.attrs = dbs_attributes_gov_sys,
-	.name = "ondemand",
-};
-
-static struct attribute *dbs_attributes_gov_pol[] = {
-	&min_sampling_rate_gov_pol.attr,
-	&sampling_rate_gov_pol.attr,
-	&up_threshold_gov_pol.attr,
-	&sampling_down_factor_gov_pol.attr,
-	&ignore_nice_load_gov_pol.attr,
-	&powersave_bias_gov_pol.attr,
-	&io_is_busy_gov_pol.attr,
-	NULL
-};
-
-static struct attribute_group od_attr_group_gov_pol = {
-	.attrs = dbs_attributes_gov_pol,
-	.name = "ondemand",
-};
-
 /************************** sysfs end ************************/
 
 static int od_init(struct dbs_data *dbs_data, bool notify)
@@ -542,8 +521,8 @@ static struct od_ops od_ops = {
 
 static struct common_dbs_data od_dbs_cdata = {
 	.governor = GOV_ONDEMAND,
-	.attr_group_gov_sys = &od_attr_group_gov_sys,
-	.attr_group_gov_pol = &od_attr_group_gov_pol,
+	.kobj_name = "ondemand",
+	.kobj_type = { .sysfs_ops = &governor_sysfs_ops, .default_attrs = od_attributes },
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = od_dbs_timer,

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  6:54                   ` Viresh Kumar
@ 2016-02-03 10:51                     ` Juri Lelli
  2016-02-03 10:55                       ` Viresh Kumar
  2016-02-03 20:14                     ` Saravana Kannan
  1 sibling, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2016-02-03 10:51 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Saravana Kannan, Rafael J. Wysocki, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 03/02/16 12:24, Viresh Kumar wrote:
> On 02-02-16, 17:32, Saravana Kannan wrote:
> > But if we are expecting sched dvfs to come in, why make it worse for it. It
> > would be completely pointless to try and shoehorn sched dvfs to use
> > cpufreq_governor.c
> 
> We can move the common part to cpufreq core and not make sched-dvfs
> reuse cpufreq_governor.c
> 

I also think that sched-dvfs should not use cpufreq_governor.c.  It is
useful boilerplate code for ondemand and conservative, as they share lot
of data structures and how they work, but it doesn't necessarily suit
everybody's needs, IMHO.

OTOH, fixing the current issue in the best way we can come up with has
still value of course :).

Best,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03 10:51                     ` Juri Lelli
@ 2016-02-03 10:55                       ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03 10:55 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Saravana Kannan, Rafael J. Wysocki, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 03-02-16, 10:51, Juri Lelli wrote:
> I also think that sched-dvfs should not use cpufreq_governor.c.  It is
> useful boilerplate code for ondemand and conservative, as they share lot
> of data structures and how they work, but it doesn't necessarily suit
> everybody's needs, IMHO.
> 
> OTOH, fixing the current issue in the best way we can come up with has
> still value of course :).

Right. cpufreq_governor.c is more about the technique where we do load
evaluation using deferred timers and workqueues, which isn't required
for sched-dvfs.

We can just move the common parts, like, governor_show/governor_store
routines, and the new macros being added here.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2016-02-03  6:05     ` Viresh Kumar
@ 2016-02-03 11:05       ` Juri Lelli
  2016-02-03 11:08         ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Juri Lelli @ 2016-02-03 11:05 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03/02/16 11:35, Viresh Kumar wrote:
> On 02-02-16, 16:49, Juri Lelli wrote:
> > There are still paths where we call __cpufreq_governor() without holding
> > policy->rwsem, but those should be fixed with my cleanups (that I intend
> > to refresh and post soon). So, I'm not sure we can safely remove this
> > yet.
> 
> No, we can't.. Though I haven't seen any races from happening even
> after removing it, but it doesn't mean we can't.
> 
> The deal is that, the entire sequence of events needs to be guaranteed
> to happen in a particular order without any other code performing
> similar operations concurrently.
> 
> And so we need to preserve the other sites with proper rwsem locking
> first.
> 

Right. I guess it is what I was trying to do with my cleanups, adding
assertions and fixing paths that didn't verify those.

It should be easy to rebase that set (or a part of it) on top of your
and/or Rafael changes. I realize that there are multiple sets of changes
under discussion; so, please tell me how do you, and Rafael, want to
proceed about this.

> > So, __cpufreq_governor() becomes effectively a wrapper around
> > ->governor() calls and governors are left responsible for implementing
> > the state machine with appropriate checks.
> 
> Not really. The core can now guarantee that the entire sequence
> happens atomically. For example, on governor switch, we need to
> guarantee that STOP/EXIT happen without any intervention for the old
> governor. Or, INIT/START/LIMITS happen without any intervention for
> the new governor, etc..
> 

OK, checking for invalid state transitions (for ondemand and
conservative) is still in done cpufreq_governor.c.

> > Maybe we add a comment somewhere stating exactly how things are meant to
> > work?

But, I guess any other governor that will bypass cpufreq_governor.c, it
will also have to implement such checks. I was just proposing to state
this somewhere, so that we don't forget.

Best,

- Juri

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock
  2016-02-03 11:05       ` Juri Lelli
@ 2016-02-03 11:08         ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03 11:08 UTC (permalink / raw)
  To: Juri Lelli
  Cc: Rafael Wysocki, linaro-kernel, linux-pm, skannan, peterz,
	mturquette, steve.muckle, vincent.guittot, morten.rasmussen,
	dietmar.eggemann, linux-kernel

On 03-02-16, 11:05, Juri Lelli wrote:
> It should be easy to rebase that set (or a part of it) on top of your
> and/or Rafael changes. I realize that there are multiple sets of changes
> under discussion; so, please tell me how do you, and Rafael, want to
> proceed about this.

Yeah, please wait for a bit for Rafael to apply both the series (if
they pass the litmus test) to PM tree :)

> But, I guess any other governor that will bypass cpufreq_governor.c, it
> will also have to implement such checks. I was just proposing to state
> this somewhere, so that we don't forget.

We can surely add a comment for that.

But I would like to review the state after these patches are applied,
as we may be able to guarantee that from cpufreq-core instead.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups
  2016-02-03  2:22   ` Viresh Kumar
@ 2016-02-03 11:37     ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03 11:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 03-02-16, 07:52, Viresh Kumar wrote:
> > My overall impression here is that the code changes make sense.  Some
> > details need to be improved (like the concurrent ->store for governor
> > tunables pointed out by Juri). 

The next version is ready with some improvements, will send it once
you are done with this thread.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
  2016-02-03  5:51     ` Viresh Kumar
@ 2016-02-03 12:24       ` Rafael J. Wysocki
  2016-02-03 13:09         ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03 12:24 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Juri Lelli,
	Lists linaro-kernel, linux-pm, Saravana Kannan, Peter Zijlstra,
	Michael Turquette, Steve Muckle, Vincent Guittot,
	Morten Rasmussen, dietmar.eggemann, Linux Kernel Mailing List

On Wed, Feb 3, 2016 at 6:51 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 02-02-16, 22:53, Rafael J. Wysocki wrote:
>> First of all, this is effectively reverting commit 955ef4833574, so
>> the subject should be "Revert commit 955ef4833574 (cpufreq: Drop rwsem
>> lock around CPUFREQ_GOV_POLICY_EXIT)".
>>
>> There should be a Fixes: tag pointing to commit 955ef4833574 and a
>> Reported-by: for Juri.
>>
>> If there is a link to a bug report related to this, it should be
>> pointed to by a Link: tag.
>>
>> The changelog should say why the original commit was there and why the
>> way it attempted to solve the problem was incorrect.  It also should
>> say that the original problem was addressed by a previous commit, so
>> this one can be reverted without consequences.
>
> How about this:
>
>     Revert "cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT"
>
>     Earlier, when the struct freq-attr was used to represent governor
>     attributes, the standard cpufreq show/store sysfs attribute callbacks
>     were applied to the governor tunable attributes and they always acquire
>     the policy->rwsem lock before carrying out the operation.  That could
>     have resulted in an ABBA deadlock if governor tunable attributes are
>     removed under policy->rwsem while one of them is being accessed
>     concurrently (if sysfs attributes removal wins the race, it will wait
>     for the access to complete with policy->rwsem held while the attribute
>     callback will block on policy->rwsem indefinitely).
>
>     We attempted to address this issue by dropping policy->rwsem around
>     governor tunable attributes removal (that is, around invocations of the
>     ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT)
>     in cpufreq_set_policy(), but that opened up race conditions that had not
>     been possible with policy->rwsem held all the time.
>
>     The previous commit, "cpufreq: governor: New sysfs show/store callbacks
>     for governor tunables", fixed the original ABBA deadlock by adding new
>     governor specific show/store callbacks.
>
>     We don't have to drop rwsem around invocations of governor event
>     CPUFREQ_GOV_POLICY_EXIT anymore, and original fix can be reverted now.
>
>     Fixes: 955ef4833574 ("cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT")
>     Reported-by: Juri Lelli <juri.lelli@arm.com>
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Much better.

>> But I'm not going to write that changelog.  I actually am not going to
>> write any changelogs for you any more, because I'm seriously tired of
>> doing that.  Moreover, if I see a patch from you with a changelog
>> that's not acceptable to me, it will immediately go to the "not
>> applicable" trash bin no matter what the changes below look like.  You
>> *have* *to* write useful changelogs.  This isn't optional or best
>> effort.  This is mandatory and important.
>
> Will try to improve, sorry about that (again).
>
>> Now, I'm not really sure if the ordering of this patchset is right.
>> Maybe we should just revert upfront with the "we'll address the
>> original problem in the following commits" statement in the changelog
>> and fix it in a different way?
>
> Wouldn't that break things like 'git bisect'? People running kernels
> after the reverted commits may start hitting lockdeps.

Well, we have at least one bug in there before applying the whole
series anyway, regardless of the ordering.

>> It looks like patches [1-3/5] fix a
>> problem that isn't there even, but would appear after the [4/5] if
>> they were not applied previously, which doesn't sound really
>> straightforward to me.
>
> I am going to fight hard for it, if you feel 4/5 should be the first
> patch here, I will do that.

I guess this was supposed to be "I am not ...".

With the above changelog the current order of patches in the series is fine.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  6:58     ` Viresh Kumar
@ 2016-02-03 12:42       ` Rafael J. Wysocki
  2016-02-03 13:21         ` Viresh Kumar
  0 siblings, 1 reply; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03 12:42 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Juri Lelli,
	Lists linaro-kernel, linux-pm, Saravana Kannan, Peter Zijlstra,
	Michael Turquette, Steve Muckle, Vincent Guittot,
	Morten Rasmussen, dietmar.eggemann, Linux Kernel Mailing List

On Wed, Feb 3, 2016 at 7:58 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 02-02-16, 22:23, Rafael J. Wysocki wrote:
>> On Tue, Feb 2, 2016 at 11:57 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
>> "The ondemand and conservative governors use the global-attr or
>> freq-attr structures to represent sysfs attributes corresponding to
>> their tunables
>
>> (which of them is actually used depends on whether or
>> not different policy objects can use different governors at the same
>> time
>
> Not exactly. Different policies can always use different governors.
> What made the difference was that different policies using same
> governor, with different tunables or separate governor directories.
>
> I have reworded this para like:
>
>     The ondemand and conservative governors use the global-attr or freq-attr
>     structures to represent sysfs attributes corresponding to their tunables
>     (which of them is actually used depends on whether or not different
>     policy objects can use same governor with different tunables at the same
>     time and, consequently, on where those attributes are located in sysfs).
>
> Please let me know if isn't clear.

That's OK.  IMO you should say "use the same governor", but that's
easily fixable. :-)

>> > --- a/drivers/cpufreq/cpufreq_governor.c
>> > +       ret = kobject_init_and_add(&dbs_data->kobj, &dbs_data->kobj_type,
>> > +                                  get_governor_parent_kobj(policy),
>> > +                                  attr_group->name);
>> > +       if (ret) {
>> > +               pr_err("%s: failed to init dbs_data kobj: %d\n", __func__, ret);
>>
>> pr_debug() would be better here.
>
> Its a real error, why pr_debug for that ?

What's the value of printing that on user systems?  It contains debug
information only and it is not useful to anyone unfamiliar with the
code in question anyway.

>> >                 goto reset_gdbs_data;
>> > +       }
>> >
>> >         return 0;
>> >
>> > @@ -426,8 +457,7 @@ static int cpufreq_governor_exit(struct cpufreq_policy *policy,
>> >                 return -EBUSY;
>> >
>> >         if (!--dbs_data->usage_count) {
>> > -               sysfs_remove_group(get_governor_parent_kobj(policy),
>> > -                                  get_sysfs_attr(dbs_data));
>> > +               kobject_put(&dbs_data->kobj);
>>
>> Don't we need a ->release callback for this kobject?
>
> There is nothing that we need to free from the ->release() callback.
> We are using the kobject here just to get separate show/store
> callbacks.

Well, I guess the answer should be that there can't be more active
references to the kobject, so it is safe to free it synchronously
later.

> Here is the new version based on the review comments received until
> now:
>
> -------------------------8<-------------------------
>
> From: Viresh Kumar <viresh.kumar@linaro.org>
> Date: Tue, 2 Feb 2016 12:35:01 +0530
> Subject: [PATCH] cpufreq: governor: New sysfs show/store callbacks for
>  governor tunables
>

[cut]

> @@ -22,14 +22,62 @@
>
>  #include "cpufreq_governor.h"
>
> -static struct attribute_group *get_sysfs_attr(struct dbs_data *dbs_data)
> +static inline struct dbs_data *to_dbs_data(struct kobject *kobj)
>  {
> -       if (have_governor_per_policy())
> -               return dbs_data->cdata->attr_group_gov_pol;
> -       else
> -               return dbs_data->cdata->attr_group_gov_sys;
> +       return container_of(kobj, struct dbs_data, kobj);
> +}
> +
> +static inline struct governor_attr *to_gov_attr(struct attribute *attr)
> +{
> +       return container_of(attr, struct governor_attr, attr);
> +}
> +
> +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
> +                            char *buf)
> +{
> +       struct dbs_data *dbs_data = to_dbs_data(kobj);
> +       struct governor_attr *gattr = to_gov_attr(attr);
> +       int ret = -EIO;
> +
> +       down_read(&dbs_data->rwsem);
> +
> +       if (gattr->show)
> +               ret = gattr->show(dbs_data, buf);
> +
> +       up_read(&dbs_data->rwsem);

Do we need the lock here too?

show() is only going to read the value, isn't it?  And everything u32
or smaller is read atomically anyway.

Apart from this it looks good to me.

When you're ready, please resend the whole series without patch [5/5]
which is premature IMO.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT
  2016-02-03 12:24       ` Rafael J. Wysocki
@ 2016-02-03 13:09         ` Viresh Kumar
  0 siblings, 0 replies; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03 13:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 03-02-16, 13:24, Rafael J. Wysocki wrote:
> I guess this was supposed to be "I am not ...".

Urg..

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03 12:42       ` Rafael J. Wysocki
@ 2016-02-03 13:21         ` Viresh Kumar
  2016-02-03 13:38           ` Rafael J. Wysocki
  0 siblings, 1 reply; 45+ messages in thread
From: Viresh Kumar @ 2016-02-03 13:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael Wysocki, Juri Lelli, Lists linaro-kernel, linux-pm,
	Saravana Kannan, Peter Zijlstra, Michael Turquette, Steve Muckle,
	Vincent Guittot, Morten Rasmussen, dietmar.eggemann,
	Linux Kernel Mailing List

On 03-02-16, 13:42, Rafael J. Wysocki wrote:
> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
> > +                            char *buf)
> > +{
> > +       struct dbs_data *dbs_data = to_dbs_data(kobj);
> > +       struct governor_attr *gattr = to_gov_attr(attr);
> > +       int ret = -EIO;
> > +
> > +       down_read(&dbs_data->rwsem);
> > +
> > +       if (gattr->show)
> > +               ret = gattr->show(dbs_data, buf);
> > +
> > +       up_read(&dbs_data->rwsem);
> 
> Do we need the lock here too?
> 
> show() is only going to read the value, isn't it?  And everything u32
> or smaller is read atomically anyway.

Okay, will drop it for now.

> Apart from this it looks good to me.
> 
> When you're ready, please resend the whole series without patch [5/5]
> which is premature IMO.

I have changed that patch a bit, and am dropping just the lock now and
not governor_enable thing. That should be sane enough I believe.

Anyway I will be posting 7 patches now, pick only first 4 if you
aren't confident about the rest.

-- 
viresh

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03 13:21         ` Viresh Kumar
@ 2016-02-03 13:38           ` Rafael J. Wysocki
  0 siblings, 0 replies; 45+ messages in thread
From: Rafael J. Wysocki @ 2016-02-03 13:38 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Rafael Wysocki, Juri Lelli,
	Lists linaro-kernel, linux-pm, Saravana Kannan, Peter Zijlstra,
	Michael Turquette, Steve Muckle, Vincent Guittot,
	Morten Rasmussen, dietmar.eggemann, Linux Kernel Mailing List

On Wed, Feb 3, 2016 at 2:21 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 03-02-16, 13:42, Rafael J. Wysocki wrote:
>> > +static ssize_t governor_show(struct kobject *kobj, struct attribute *attr,
>> > +                            char *buf)
>> > +{
>> > +       struct dbs_data *dbs_data = to_dbs_data(kobj);
>> > +       struct governor_attr *gattr = to_gov_attr(attr);
>> > +       int ret = -EIO;
>> > +
>> > +       down_read(&dbs_data->rwsem);
>> > +
>> > +       if (gattr->show)
>> > +               ret = gattr->show(dbs_data, buf);
>> > +
>> > +       up_read(&dbs_data->rwsem);
>>
>> Do we need the lock here too?
>>
>> show() is only going to read the value, isn't it?  And everything u32
>> or smaller is read atomically anyway.
>
> Okay, will drop it for now.
>
>> Apart from this it looks good to me.
>>
>> When you're ready, please resend the whole series without patch [5/5]
>> which is premature IMO.
>
> I have changed that patch a bit, and am dropping just the lock now and
> not governor_enable thing. That should be sane enough I believe.

In any case this is not suitable for 4.5 IMO.

> Anyway I will be posting 7 patches now, pick only first 4 if you
> aren't confident about the rest.

OK

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  6:57                       ` Viresh Kumar
@ 2016-02-03 20:07                         ` Saravana Kannan
  0 siblings, 0 replies; 45+ messages in thread
From: Saravana Kannan @ 2016-02-03 20:07 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02/02/2016 10:57 PM, Viresh Kumar wrote:
> On 02-02-16, 20:03, Saravana Kannan wrote:
>> This is the hotplug case I mentioned. The sysfs file removals will happen
>> only for the last CPU in that policy (we thankfully optimized that part last
>> year). We also know that multiple CPUs can't be hotplugged at the same time.
>> So, in the start of cpufreq_offline_prepare, we just need to check if this
>> is the last CPU in the policy and if that's the case, do the gov sysfs
>> remove and then grab the policy lock and do all our crap. If a read is going
>> on, that's going to finish before the sysfs attr remove can go ahead and it
>> can grab the policy lock if it needs to and that still won't cause a
>> deadlock because we haven't yet grabbed the policy lock in
>> cpufreq_offline_prepare(). If the read comes after the sysfs remove, then
>> the read is obviously going to fail (we can depend on the sysfs framework on
>> doing its job there).
>
> IMHO, these are all dirty hacks we should stay away from. Adding such
> hunks in code is considered a band-aid kind of solution and hurts
> readability badly. The new solution (new governor show/store)
> implement this in a very clean and proper way I feel..
>
> Others are free to disagree though :)
>

I think it looks clean since we haven't sorted out the race conditions 
that Juri pointed out. So, it's early to call this series clean :)

Also, I don't see it as a dirty hack at all. What's so hacky about it? 
We are just identifying conditions when we'll have to remove the sysfs 
files and removing them before grabbing the policy lock.

The unlock/lock that we have now is what is a dirty hack.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops
  2016-02-03  6:54                   ` Viresh Kumar
  2016-02-03 10:51                     ` Juri Lelli
@ 2016-02-03 20:14                     ` Saravana Kannan
  1 sibling, 0 replies; 45+ messages in thread
From: Saravana Kannan @ 2016-02-03 20:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Juri Lelli, Rafael Wysocki,
	Lists linaro-kernel, linux-pm, Peter Zijlstra, Michael Turquette,
	Steve Muckle, Vincent Guittot, Morten Rasmussen,
	dietmar.eggemann, Linux Kernel Mailing List

On 02/02/2016 10:54 PM, Viresh Kumar wrote:
> On 02-02-16, 17:32, Saravana Kannan wrote:
>> But if we are expecting sched dvfs to come in, why make it worse for it. It
>> would be completely pointless to try and shoehorn sched dvfs to use
>> cpufreq_governor.c
>
> We can move the common part to cpufreq core and not make sched-dvfs
> reuse cpufreq_governor.c
>

Let's do this please.

-Saravana

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-02-03 20:14 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-02 10:57 [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Viresh Kumar
2016-02-02 10:57 ` [PATCH 1/5] cpufreq: governor: Kill declare_show_sampling_rate_min() Viresh Kumar
2016-02-02 20:23   ` Rafael J. Wysocki
2016-02-03  2:29     ` Viresh Kumar
2016-02-02 10:57 ` [PATCH 2/5] cpufreq: governor: Create separate sysfs-ops Viresh Kumar
2016-02-02 15:47   ` Juri Lelli
2016-02-02 16:35     ` Rafael J. Wysocki
2016-02-02 17:01       ` Juri Lelli
2016-02-02 19:40         ` Rafael J. Wysocki
2016-02-02 22:21           ` Saravana Kannan
2016-02-02 23:42             ` Rafael J. Wysocki
2016-02-03  1:07               ` Rafael J. Wysocki
2016-02-03  1:32                 ` Saravana Kannan
2016-02-03  1:52                   ` Rafael J. Wysocki
2016-02-03  4:03                     ` Saravana Kannan
2016-02-03  6:57                       ` Viresh Kumar
2016-02-03 20:07                         ` Saravana Kannan
2016-02-03  6:54                   ` Viresh Kumar
2016-02-03 10:51                     ` Juri Lelli
2016-02-03 10:55                       ` Viresh Kumar
2016-02-03 20:14                     ` Saravana Kannan
2016-02-03  6:51             ` Viresh Kumar
2016-02-03  6:33         ` Viresh Kumar
2016-02-02 21:23   ` Rafael J. Wysocki
2016-02-03  6:58     ` Viresh Kumar
2016-02-03 12:42       ` Rafael J. Wysocki
2016-02-03 13:21         ` Viresh Kumar
2016-02-03 13:38           ` Rafael J. Wysocki
2016-02-02 10:57 ` [PATCH 3/5] cpufreq: governor: Remove unused sysfs attribute macros Viresh Kumar
2016-02-02 21:34   ` Rafael J. Wysocki
2016-02-02 10:57 ` [PATCH 4/5] cpufreq: Don't drop rwsem before calling CPUFREQ_GOV_POLICY_EXIT Viresh Kumar
2016-02-02 21:53   ` Rafael J. Wysocki
2016-02-03  5:51     ` Viresh Kumar
2016-02-03 12:24       ` Rafael J. Wysocki
2016-02-03 13:09         ` Viresh Kumar
2016-02-02 10:57 ` [PATCH 5/5] cpufreq: Get rid of ->governor_enabled and its lock Viresh Kumar
2016-02-02 16:49   ` Juri Lelli
2016-02-03  6:05     ` Viresh Kumar
2016-02-03 11:05       ` Juri Lelli
2016-02-03 11:08         ` Viresh Kumar
2016-02-02 21:57   ` Rafael J. Wysocki
2016-02-02 11:25 ` [PATCH 0/5] cpufreq: governors: Solve the ABBA lockups Juri Lelli
2016-02-02 20:04 ` Rafael J. Wysocki
2016-02-03  2:22   ` Viresh Kumar
2016-02-03 11:37     ` Viresh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.