linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] CPUFreq: Implement per policy instances of governors
@ 2013-02-04 11:38 Viresh Kumar
  2013-02-04 11:38 ` [PATCH 1/4] cpufreq: Don't check cpu_online(policy->cpu) Viresh Kumar
                   ` (5 more replies)
  0 siblings, 6 replies; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 11:38 UTC (permalink / raw)
  To: rjw
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar

Currently, there can't be multiple instances of single governor_type. If we have
a multi-package system, where we have multiple instances of struct policy (per
package), we can't have multiple instances of same governor. i.e. We can't have
multiple instances of ondemand governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one instance of a
governor_type in the system.

This is a bottleneck for multicluster system, where we want different packages
to use same governor type, but with different tunables.

This patchset is inclined towards fixing this issue.

Viresh Kumar (4):
  cpufreq: Don't check cpu_online(policy->cpu)
  cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR
  cpufreq: Add per policy governor-init/exit infrastructure
  cpufreq: governor: Implement per policy instances of governors

 drivers/cpufreq/cpufreq.c              |  41 ++++---
 drivers/cpufreq/cpufreq_conservative.c | 142 +++++++++++++----------
 drivers/cpufreq/cpufreq_governor.c     | 140 +++++++++++++---------
 drivers/cpufreq/cpufreq_governor.h     |  42 ++++---
 drivers/cpufreq/cpufreq_ondemand.c     | 205 +++++++++++++++++++--------------
 drivers/cpufreq/cpufreq_stats.c        |  18 +--
 drivers/cpufreq/cpufreq_userspace.c    |   2 -
 drivers/cpufreq/freq_table.c           |   6 -
 include/linux/cpufreq.h                |  26 +----
 9 files changed, 346 insertions(+), 276 deletions(-)

-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/4] cpufreq: Don't check cpu_online(policy->cpu)
  2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
@ 2013-02-04 11:38 ` Viresh Kumar
  2013-02-04 11:38 ` [PATCH 2/4] cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR Viresh Kumar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 11:38 UTC (permalink / raw)
  To: rjw
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar

policy->cpu or cpus in policy->cpus can't be offline anymore. And so we don't
need to check if they are online or not.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c           | 17 +++--------------
 drivers/cpufreq/cpufreq_governor.c  |  2 +-
 drivers/cpufreq/cpufreq_userspace.c |  2 --
 drivers/cpufreq/freq_table.c        |  6 ------
 4 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 9656420..e619f4f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -76,10 +76,6 @@ static int lock_policy_rwsem_##mode					\
 	int policy_cpu = per_cpu(cpufreq_policy_cpu, cpu);		\
 	BUG_ON(policy_cpu == -1);					\
 	down_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));		\
-	if (unlikely(!cpu_online(cpu))) {				\
-		up_##mode(&per_cpu(cpu_policy_rwsem, policy_cpu));	\
-		return -1;						\
-	}								\
 									\
 	return 0;							\
 }
@@ -719,8 +715,6 @@ static int cpufreq_add_dev_symlink(unsigned int cpu,
 
 		if (j == cpu)
 			continue;
-		if (!cpu_online(j))
-			continue;
 
 		pr_debug("CPU %u already managed, adding link\n", j);
 		managed_policy = cpufreq_cpu_get(cpu);
@@ -777,8 +771,6 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
 
 	spin_lock_irqsave(&cpufreq_driver_lock, flags);
 	for_each_cpu(j, policy->cpus) {
-		if (!cpu_online(j))
-			continue;
 		per_cpu(cpufreq_cpu_data, j) = policy;
 		per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
 	}
@@ -1005,11 +997,8 @@ static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
 	policy->last_cpu = policy->cpu;
 	policy->cpu = cpu;
 
-	for_each_cpu(j, policy->cpus) {
-		if (!cpu_online(j))
-			continue;
+	for_each_cpu(j, policy->cpus)
 		per_cpu(cpufreq_policy_cpu, j) = cpu;
-	}
 
 #ifdef CONFIG_CPU_FREQ_TABLE
 	cpufreq_frequency_table_update_policy_cpu(policy);
@@ -1468,7 +1457,7 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
 	if (target_freq == policy->cur)
 		return 0;
 
-	if (cpu_online(policy->cpu) && cpufreq_driver->target)
+	if (cpufreq_driver->target)
 		retval = cpufreq_driver->target(policy, target_freq, relation);
 
 	return retval;
@@ -1506,7 +1495,7 @@ int __cpufreq_driver_getavg(struct cpufreq_policy *policy, unsigned int cpu)
 	if (cpufreq_disabled())
 		return ret;
 
-	if (!(cpu_online(cpu) && cpufreq_driver->getavg))
+	if (!cpufreq_driver->getavg)
 		return 0;
 
 	policy = cpufreq_cpu_get(policy->cpu);
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 79795c4..e4a306c 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -225,7 +225,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
 
 	switch (event) {
 	case CPUFREQ_GOV_START:
-		if ((!cpu_online(cpu)) || (!policy->cur))
+		if (!policy->cur)
 			return -EINVAL;
 
 		mutex_lock(&dbs_data->mutex);
diff --git a/drivers/cpufreq/cpufreq_userspace.c b/drivers/cpufreq/cpufreq_userspace.c
index c8c3d29..bbeb9c0 100644
--- a/drivers/cpufreq/cpufreq_userspace.c
+++ b/drivers/cpufreq/cpufreq_userspace.c
@@ -118,8 +118,6 @@ static int cpufreq_governor_userspace(struct cpufreq_policy *policy,
 
 	switch (event) {
 	case CPUFREQ_GOV_START:
-		if (!cpu_online(cpu))
-			return -EINVAL;
 		BUG_ON(!policy->cur);
 		mutex_lock(&userspace_mutex);
 
diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index aa5bd39..d7a7966 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -63,9 +63,6 @@ int cpufreq_frequency_table_verify(struct cpufreq_policy *policy,
 	pr_debug("request for verification of policy (%u - %u kHz) for cpu %u\n",
 					policy->min, policy->max, policy->cpu);
 
-	if (!cpu_online(policy->cpu))
-		return -EINVAL;
-
 	cpufreq_verify_within_limits(policy, policy->cpuinfo.min_freq,
 				     policy->cpuinfo.max_freq);
 
@@ -121,9 +118,6 @@ int cpufreq_frequency_table_target(struct cpufreq_policy *policy,
 		break;
 	}
 
-	if (!cpu_online(policy->cpu))
-		return -EINVAL;
-
 	for (i = 0; (table[i].frequency != CPUFREQ_TABLE_END); i++) {
 		unsigned int freq = table[i].frequency;
 		if (freq == CPUFREQ_ENTRY_INVALID)
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/4] cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR
  2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
  2013-02-04 11:38 ` [PATCH 1/4] cpufreq: Don't check cpu_online(policy->cpu) Viresh Kumar
@ 2013-02-04 11:38 ` Viresh Kumar
  2013-02-04 11:38 ` [PATCH 3/4] cpufreq: Add per policy governor-init/exit infrastructure Viresh Kumar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 11:38 UTC (permalink / raw)
  To: rjw
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar

Macro "CPUFREQ_STATDEVICE_ATTR" is defined local to cpufreq_stats.c file and is
almost a copy of the generic version present in cpufreq.h file. Lets use the
generic version instead.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_stats.c | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 572124c..a2dee4c 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -24,12 +24,6 @@
 
 static spinlock_t cpufreq_stats_lock;
 
-#define CPUFREQ_STATDEVICE_ATTR(_name, _mode, _show) \
-static struct freq_attr _attr_##_name = {\
-	.attr = {.name = __stringify(_name), .mode = _mode, }, \
-	.show = _show,\
-};
-
 struct cpufreq_stats {
 	unsigned int cpu;
 	unsigned int total_trans;
@@ -136,17 +130,17 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
 		return PAGE_SIZE;
 	return len;
 }
-CPUFREQ_STATDEVICE_ATTR(trans_table, 0444, show_trans_table);
+cpufreq_freq_attr_ro(trans_table);
 #endif
 
-CPUFREQ_STATDEVICE_ATTR(total_trans, 0444, show_total_trans);
-CPUFREQ_STATDEVICE_ATTR(time_in_state, 0444, show_time_in_state);
+cpufreq_freq_attr_ro(total_trans);
+cpufreq_freq_attr_ro(time_in_state);
 
 static struct attribute *default_attrs[] = {
-	&_attr_total_trans.attr,
-	&_attr_time_in_state.attr,
+	&total_trans.attr,
+	&time_in_state.attr,
 #ifdef CONFIG_CPU_FREQ_STAT_DETAILS
-	&_attr_trans_table.attr,
+	&trans_table.attr,
 #endif
 	NULL
 };
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/4] cpufreq: Add per policy governor-init/exit infrastructure
  2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
  2013-02-04 11:38 ` [PATCH 1/4] cpufreq: Don't check cpu_online(policy->cpu) Viresh Kumar
  2013-02-04 11:38 ` [PATCH 2/4] cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR Viresh Kumar
@ 2013-02-04 11:38 ` Viresh Kumar
  2013-02-04 11:38 ` [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors Viresh Kumar
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 11:38 UTC (permalink / raw)
  To: rjw
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar

Currently, there can't be multiple instances of single governor_type. If we have
a multi-package system, where we have multiple instances of struct policy (per
package), we can't have multiple instances of same governor. i.e. We can't have
multiple instances of ondemand governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one instance of a
governor_type in the system.

This is a bottleneck for multicluster system, where we want different packages
to use same governor type, but with different tunables.

This patch is inclined towards providing this infrastructure. Because we are
required to allocate governor's resources dynamically now, we must do it at
policy creation and end. And so got CPUFREQ_GOV_POLICY_INIT/EXIT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c | 20 +++++++++++++++++---
 include/linux/cpufreq.h   |  9 ++++++---
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index e619f4f..1ae78d4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1076,6 +1076,7 @@ static int __cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif
 
 	/* If cpu is last user of policy, free policy */
 	if (cpus == 1) {
+		__cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
 		lock_policy_rwsem_write(cpu);
 		kobj = &data->kobj;
 		cmp = &data->kobj_unregister;
@@ -1655,7 +1656,7 @@ EXPORT_SYMBOL(cpufreq_get_policy);
 static int __cpufreq_set_policy(struct cpufreq_policy *data,
 				struct cpufreq_policy *policy)
 {
-	int ret = 0;
+	int ret = 0, failed = 1;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n", policy->cpu,
 		policy->min, policy->max);
@@ -1709,18 +1710,31 @@ static int __cpufreq_set_policy(struct cpufreq_policy *data,
 			pr_debug("governor switch\n");
 
 			/* end old governor */
-			if (data->governor)
+			if (data->governor) {
 				__cpufreq_governor(data, CPUFREQ_GOV_STOP);
+				__cpufreq_governor(data,
+						CPUFREQ_GOV_POLICY_EXIT);
+			}
 
 			/* start new governor */
 			data->governor = policy->governor;
-			if (__cpufreq_governor(data, CPUFREQ_GOV_START)) {
+			if (!__cpufreq_governor(data, CPUFREQ_GOV_POLICY_INIT)) {
+				if (!__cpufreq_governor(data, CPUFREQ_GOV_START))
+					failed = 0;
+				else
+					__cpufreq_governor(data,
+							CPUFREQ_GOV_POLICY_EXIT);
+			}
+
+			if (failed) {
 				/* new governor failed, so re-start old one */
 				pr_debug("starting governor %s failed\n",
 							data->governor->name);
 				if (old_gov) {
 					data->governor = old_gov;
 					__cpufreq_governor(data,
+							CPUFREQ_GOV_POLICY_INIT);
+					__cpufreq_governor(data,
 							   CPUFREQ_GOV_START);
 				}
 				ret = -EINVAL;
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a22944c..3b822ce 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -106,6 +106,7 @@ struct cpufreq_policy {
 					 * governors are used */
 	unsigned int		policy; /* see above */
 	struct cpufreq_governor	*governor; /* see below */
+	void			*governor_data;
 
 	struct work_struct	update; /* if update_policy() needs to be
 					 * called, but you're in IRQ context */
@@ -178,9 +179,11 @@ static inline unsigned long cpufreq_scale(unsigned long old, u_int div, u_int mu
  *                          CPUFREQ GOVERNORS                        *
  *********************************************************************/
 
-#define CPUFREQ_GOV_START  1
-#define CPUFREQ_GOV_STOP   2
-#define CPUFREQ_GOV_LIMITS 3
+#define CPUFREQ_GOV_START	1
+#define CPUFREQ_GOV_STOP	2
+#define CPUFREQ_GOV_LIMITS	3
+#define CPUFREQ_GOV_POLICY_INIT	4
+#define CPUFREQ_GOV_POLICY_EXIT	4
 
 struct cpufreq_governor {
 	char	name[CPUFREQ_NAME_LEN];
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors
  2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
                   ` (2 preceding siblings ...)
  2013-02-04 11:38 ` [PATCH 3/4] cpufreq: Add per policy governor-init/exit infrastructure Viresh Kumar
@ 2013-02-04 11:38 ` Viresh Kumar
  2013-02-10 21:14   ` Francesco Lavra
  2013-02-04 12:17 ` [PATCH 0/4] CPUFreq: " Rafael J. Wysocki
  2013-02-05 16:21 ` Viresh Kumar
  5 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 11:38 UTC (permalink / raw)
  To: rjw
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar

Currently, there can't be multiple instances of single governor_type. If we have
a multi-package system, where we have multiple instances of struct policy (per
package), we can't have multiple instances of same governor. i.e. We can't have
multiple instances of ondemand governor for multiple packages.

Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
governor-name/. Which again reflects that there can be only one instance of a
governor_type in the system.

This is a bottleneck for multicluster system, where we want different packages
to use same governor type, but with different tunables.

This patch uses the infrastructure provided by earlier patch and implements
init/exit routines for ondemand and conservative governors.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq.c              |   4 -
 drivers/cpufreq/cpufreq_conservative.c | 142 +++++++++++++----------
 drivers/cpufreq/cpufreq_governor.c     | 138 +++++++++++++---------
 drivers/cpufreq/cpufreq_governor.h     |  42 ++++---
 drivers/cpufreq/cpufreq_ondemand.c     | 205 +++++++++++++++++++--------------
 include/linux/cpufreq.h                |  19 +--
 6 files changed, 314 insertions(+), 236 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 1ae78d4..7aacfbf 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1551,9 +1551,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
 						policy->cpu, event);
 	ret = policy->governor->governor(policy, event);
 
-	if (!policy->governor->initialized && (event == CPUFREQ_GOV_START))
-		policy->governor->initialized = 1;
-
 	/* we keep one module reference alive for
 			each CPU governed by this CPU */
 	if ((event != CPUFREQ_GOV_START) || ret)
@@ -1577,7 +1574,6 @@ int cpufreq_register_governor(struct cpufreq_governor *governor)
 
 	mutex_lock(&cpufreq_governor_mutex);
 
-	governor->initialized = 0;
 	err = -EBUSY;
 	if (__find_governor(governor->name) == NULL) {
 		err = 0;
diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index e8bb915..6b13f9f 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -20,6 +20,7 @@
 #include <linux/mutex.h>
 #include <linux/notifier.h>
 #include <linux/percpu-defs.h>
+#include <linux/slab.h>
 #include <linux/sysfs.h>
 #include <linux/types.h>
 
@@ -31,17 +32,8 @@
 #define DEF_SAMPLING_DOWN_FACTOR		(1)
 #define MAX_SAMPLING_DOWN_FACTOR		(10)
 
-static struct dbs_data cs_dbs_data;
 static DEFINE_PER_CPU(struct cs_cpu_dbs_info_s, cs_cpu_dbs_info);
 
-static struct cs_dbs_tuners cs_tuners = {
-	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
-	.down_threshold = DEF_FREQUENCY_DOWN_THRESHOLD,
-	.sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
-	.ignore_nice = 0,
-	.freq_step = 5,
-};
-
 /*
  * Every sampling_rate, we check, if current idle time is less than 20%
  * (default), then we try to increase frequency Every sampling_rate *
@@ -55,24 +47,26 @@ static void cs_check_cpu(int cpu, unsigned int load)
 {
 	struct cs_cpu_dbs_info_s *dbs_info = &per_cpu(cs_cpu_dbs_info, cpu);
 	struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int freq_target;
 
 	/*
 	 * break out if we 'cannot' reduce the speed as the user might
 	 * want freq_step to be zero
 	 */
-	if (cs_tuners.freq_step == 0)
+	if (cs_tuners->freq_step == 0)
 		return;
 
 	/* Check for frequency increase */
-	if (load > cs_tuners.up_threshold) {
+	if (load > cs_tuners->up_threshold) {
 		dbs_info->down_skip = 0;
 
 		/* if we are already at full speed then break out early */
 		if (dbs_info->requested_freq == policy->max)
 			return;
 
-		freq_target = (cs_tuners.freq_step * policy->max) / 100;
+		freq_target = (cs_tuners->freq_step * policy->max) / 100;
 
 		/* max freq cannot be less than 100. But who knows.... */
 		if (unlikely(freq_target == 0))
@@ -92,8 +86,8 @@ static void cs_check_cpu(int cpu, unsigned int load)
 	 * support the current CPU usage without triggering the up policy. To be
 	 * safe, we focus 10 points under the threshold.
 	 */
-	if (load < (cs_tuners.down_threshold - 10)) {
-		freq_target = (cs_tuners.freq_step * policy->max) / 100;
+	if (load < (cs_tuners->down_threshold - 10)) {
+		freq_target = (cs_tuners->freq_step * policy->max) / 100;
 
 		dbs_info->requested_freq -= freq_target;
 		if (dbs_info->requested_freq < policy->min)
@@ -119,11 +113,13 @@ static void cs_dbs_timer(struct work_struct *work)
 	unsigned int cpu = dbs_info->cdbs.cur_policy->cpu;
 	struct cs_cpu_dbs_info_s *core_dbs_info = &per_cpu(cs_cpu_dbs_info,
 			cpu);
-	int delay = delay_for_sampling_rate(cs_tuners.sampling_rate);
+	struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
+	int delay = delay_for_sampling_rate(cs_tuners->sampling_rate);
 
 	mutex_lock(&core_dbs_info->cdbs.timer_mutex);
-	if (need_load_eval(&core_dbs_info->cdbs, cs_tuners.sampling_rate))
-		dbs_check_cpu(&cs_dbs_data, cpu);
+	if (need_load_eval(&core_dbs_info->cdbs, cs_tuners->sampling_rate))
+		dbs_check_cpu(dbs_data, cpu);
 
 	schedule_delayed_work_on(smp_processor_id(), dw, delay);
 	mutex_unlock(&core_dbs_info->cdbs.timer_mutex);
@@ -154,16 +150,11 @@ static int dbs_cpufreq_notifier(struct notifier_block *nb, unsigned long val,
 }
 
 /************************** sysfs interface ************************/
-static ssize_t show_sampling_rate_min(struct kobject *kobj,
-				      struct attribute *attr, char *buf)
-{
-	return sprintf(buf, "%u\n", cs_dbs_data.min_sampling_rate);
-}
-
-static ssize_t store_sampling_down_factor(struct kobject *a,
-					  struct attribute *b,
-					  const char *buf, size_t count)
+static ssize_t store_sampling_down_factor(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
@@ -171,13 +162,15 @@ static ssize_t store_sampling_down_factor(struct kobject *a,
 	if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1)
 		return -EINVAL;
 
-	cs_tuners.sampling_down_factor = input;
+	cs_tuners->sampling_down_factor = input;
 	return count;
 }
 
-static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
-				   const char *buf, size_t count)
+static ssize_t store_sampling_rate(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
@@ -185,43 +178,49 @@ static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
 	if (ret != 1)
 		return -EINVAL;
 
-	cs_tuners.sampling_rate = max(input, cs_dbs_data.min_sampling_rate);
+	cs_tuners->sampling_rate = max(input, dbs_data->min_sampling_rate);
 	return count;
 }
 
-static ssize_t store_up_threshold(struct kobject *a, struct attribute *b,
-				  const char *buf, size_t count)
+static ssize_t store_up_threshold(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
-	if (ret != 1 || input > 100 || input <= cs_tuners.down_threshold)
+	if (ret != 1 || input > 100 || input <= cs_tuners->down_threshold)
 		return -EINVAL;
 
-	cs_tuners.up_threshold = input;
+	cs_tuners->up_threshold = input;
 	return count;
 }
 
-static ssize_t store_down_threshold(struct kobject *a, struct attribute *b,
-				    const char *buf, size_t count)
+static ssize_t store_down_threshold(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
 	/* cannot be lower than 11 otherwise freq will not fall */
 	if (ret != 1 || input < 11 || input > 100 ||
-			input >= cs_tuners.up_threshold)
+			input >= cs_tuners->up_threshold)
 		return -EINVAL;
 
-	cs_tuners.down_threshold = input;
+	cs_tuners->down_threshold = input;
 	return count;
 }
 
-static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
-				      const char *buf, size_t count)
+static ssize_t store_ignore_nice(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int input, j;
 	int ret;
 
@@ -232,10 +231,10 @@ static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
 	if (input > 1)
 		input = 1;
 
-	if (input == cs_tuners.ignore_nice) /* nothing to do */
+	if (input == cs_tuners->ignore_nice) /* nothing to do */
 		return count;
 
-	cs_tuners.ignore_nice = input;
+	cs_tuners->ignore_nice = input;
 
 	/* we need to re-evaluate prev_cpu_idle */
 	for_each_online_cpu(j) {
@@ -243,16 +242,18 @@ static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
 		dbs_info = &per_cpu(cs_cpu_dbs_info, j);
 		dbs_info->cdbs.prev_cpu_idle = get_cpu_idle_time(j,
 						&dbs_info->cdbs.prev_cpu_wall);
-		if (cs_tuners.ignore_nice)
+		if (cs_tuners->ignore_nice)
 			dbs_info->cdbs.prev_cpu_nice =
 				kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 	}
 	return count;
 }
 
-static ssize_t store_freq_step(struct kobject *a, struct attribute *b,
-			       const char *buf, size_t count)
+static ssize_t store_freq_step(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
@@ -267,7 +268,7 @@ static ssize_t store_freq_step(struct kobject *a, struct attribute *b,
 	 * no need to test here if freq_step is zero as the user might actually
 	 * want this, they would be crazy though :)
 	 */
-	cs_tuners.freq_step = input;
+	cs_tuners->freq_step = input;
 	return count;
 }
 
@@ -275,16 +276,17 @@ show_one(cs, sampling_rate, sampling_rate);
 show_one(cs, sampling_down_factor, sampling_down_factor);
 show_one(cs, up_threshold, up_threshold);
 show_one(cs, down_threshold, down_threshold);
-show_one(cs, ignore_nice_load, ignore_nice);
+show_one(cs, ignore_nice, ignore_nice);
 show_one(cs, freq_step, freq_step);
+declare_show_sampling_rate_min();
 
-define_one_global_rw(sampling_rate);
-define_one_global_rw(sampling_down_factor);
-define_one_global_rw(up_threshold);
-define_one_global_rw(down_threshold);
-define_one_global_rw(ignore_nice_load);
-define_one_global_rw(freq_step);
-define_one_global_ro(sampling_rate_min);
+cpufreq_freq_attr_rw(sampling_rate);
+cpufreq_freq_attr_rw(sampling_down_factor);
+cpufreq_freq_attr_rw(up_threshold);
+cpufreq_freq_attr_rw(down_threshold);
+cpufreq_freq_attr_rw(ignore_nice);
+cpufreq_freq_attr_rw(freq_step);
+cpufreq_freq_attr_ro(sampling_rate_min);
 
 static struct attribute *dbs_attributes[] = {
 	&sampling_rate_min.attr,
@@ -292,7 +294,7 @@ static struct attribute *dbs_attributes[] = {
 	&sampling_down_factor.attr,
 	&up_threshold.attr,
 	&down_threshold.attr,
-	&ignore_nice_load.attr,
+	&ignore_nice.attr,
 	&freq_step.attr,
 	NULL
 };
@@ -304,6 +306,29 @@ static struct attribute_group cs_attr_group = {
 
 /************************** sysfs end ************************/
 
+static int cs_init(struct dbs_data *dbs_data)
+{
+	struct cs_dbs_tuners *tuners;
+
+	tuners = kzalloc(sizeof(struct cs_dbs_tuners), GFP_KERNEL);
+	if (!tuners) {
+		pr_err("%s: kzalloc failed\n", __func__);
+		return -ENOMEM;
+	}
+
+	tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
+	tuners->down_threshold = DEF_FREQUENCY_DOWN_THRESHOLD;
+	tuners->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR;
+	tuners->ignore_nice = 0;
+	tuners->freq_step = 5;
+
+	dbs_data->tuners = tuners;
+	dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
+		jiffies_to_usecs(10);
+	mutex_init(&dbs_data->mutex);
+	return 0;
+}
+
 define_get_cpu_dbs_routines(cs_cpu_dbs_info);
 
 static struct notifier_block cs_cpufreq_notifier_block = {
@@ -314,21 +339,21 @@ static struct cs_ops cs_ops = {
 	.notifier_block = &cs_cpufreq_notifier_block,
 };
 
-static struct dbs_data cs_dbs_data = {
+static struct common_dbs_data cs_dbs_cdata = {
 	.governor = GOV_CONSERVATIVE,
 	.attr_group = &cs_attr_group,
-	.tuners = &cs_tuners,
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = cs_dbs_timer,
 	.gov_check_cpu = cs_check_cpu,
 	.gov_ops = &cs_ops,
+	.init = cs_init,
 };
 
 static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy,
 				   unsigned int event)
 {
-	return cpufreq_governor_dbs(&cs_dbs_data, policy, event);
+	return cpufreq_governor_dbs(policy, &cs_dbs_cdata, event);
 }
 
 #ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
@@ -343,7 +368,6 @@ struct cpufreq_governor cpufreq_gov_conservative = {
 
 static int __init cpufreq_gov_dbs_init(void)
 {
-	mutex_init(&cs_dbs_data.mutex);
 	return cpufreq_register_governor(&cpufreq_gov_conservative);
 }
 
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index e4a306c..545ae24 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -22,6 +22,7 @@
 #include <linux/export.h>
 #include <linux/kernel_stat.h>
 #include <linux/mutex.h>
+#include <linux/slab.h>
 #include <linux/tick.h>
 #include <linux/types.h>
 #include <linux/workqueue.h>
@@ -65,7 +66,7 @@ EXPORT_SYMBOL_GPL(get_cpu_idle_time);
 
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 {
-	struct cpu_dbs_common_info *cdbs = dbs_data->get_cpu_cdbs(cpu);
+	struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
 	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
 	struct cpufreq_policy *policy;
@@ -73,7 +74,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 	unsigned int ignore_nice;
 	unsigned int j;
 
-	if (dbs_data->governor == GOV_ONDEMAND)
+	if (dbs_data->cdata->governor == GOV_ONDEMAND)
 		ignore_nice = od_tuners->ignore_nice;
 	else
 		ignore_nice = cs_tuners->ignore_nice;
@@ -87,7 +88,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 		unsigned int idle_time, wall_time, iowait_time;
 		unsigned int load;
 
-		j_cdbs = dbs_data->get_cpu_cdbs(j);
+		j_cdbs = dbs_data->cdata->get_cpu_cdbs(j);
 
 		cur_idle_time = get_cpu_idle_time(j, &cur_wall_time);
 
@@ -117,9 +118,9 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 			idle_time += jiffies_to_usecs(cur_nice_jiffies);
 		}
 
-		if (dbs_data->governor == GOV_ONDEMAND) {
+		if (dbs_data->cdata->governor == GOV_ONDEMAND) {
 			struct od_cpu_dbs_info_s *od_j_dbs_info =
-				dbs_data->get_cpu_dbs_info_s(cpu);
+				dbs_data->cdata->get_cpu_dbs_info_s(cpu);
 
 			cur_iowait_time = get_cpu_iowait_time_us(j,
 					&cur_wall_time);
@@ -145,7 +146,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 		load = 100 * (wall_time - idle_time) / wall_time;
 
-		if (dbs_data->governor == GOV_ONDEMAND) {
+		if (dbs_data->cdata->governor == GOV_ONDEMAND) {
 			int freq_avg = __cpufreq_driver_getavg(policy, j);
 			if (freq_avg <= 0)
 				freq_avg = policy->cur;
@@ -157,7 +158,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 			max_load = load;
 	}
 
-	dbs_data->gov_check_cpu(cpu, max_load);
+	dbs_data->cdata->gov_check_cpu(cpu, max_load);
 }
 EXPORT_SYMBOL_GPL(dbs_check_cpu);
 
@@ -165,14 +166,14 @@ static inline void dbs_timer_init(struct dbs_data *dbs_data, int cpu,
 				  unsigned int sampling_rate)
 {
 	int delay = delay_for_sampling_rate(sampling_rate);
-	struct cpu_dbs_common_info *cdbs = dbs_data->get_cpu_cdbs(cpu);
+	struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
 
 	schedule_delayed_work_on(cpu, &cdbs->work, delay);
 }
 
 static inline void dbs_timer_exit(struct dbs_data *dbs_data, int cpu)
 {
-	struct cpu_dbs_common_info *cdbs = dbs_data->get_cpu_cdbs(cpu);
+	struct cpu_dbs_common_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
 
 	cancel_delayed_work_sync(&cdbs->work);
 }
@@ -196,31 +197,83 @@ bool need_load_eval(struct cpu_dbs_common_info *cdbs,
 }
 EXPORT_SYMBOL_GPL(need_load_eval);
 
-int cpufreq_governor_dbs(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy, unsigned int event)
+static void set_sampling_rate(struct dbs_data *dbs_data,
+		unsigned int sampling_rate)
+{
+	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
+		struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
+		cs_tuners->sampling_rate = sampling_rate;
+	} else {
+		struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+		od_tuners->sampling_rate = sampling_rate;
+	}
+}
+
+int cpufreq_governor_dbs(struct cpufreq_policy *policy,
+		struct common_dbs_data *cdata, unsigned int event)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	struct od_cpu_dbs_info_s *od_dbs_info = NULL;
 	struct cs_cpu_dbs_info_s *cs_dbs_info = NULL;
 	struct cs_ops *cs_ops = NULL;
 	struct od_ops *od_ops = NULL;
-	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
-	struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
+	struct od_dbs_tuners *od_tuners = NULL;
+	struct cs_dbs_tuners *cs_tuners = NULL;
 	struct cpu_dbs_common_info *cpu_cdbs;
-	unsigned int *sampling_rate, latency, ignore_nice, j, cpu = policy->cpu;
+	unsigned int sampling_rate, latency, ignore_nice, j, cpu = policy->cpu;
 	int rc;
 
-	cpu_cdbs = dbs_data->get_cpu_cdbs(cpu);
+	WARN_ON(!dbs_data && (event != CPUFREQ_GOV_POLICY_INIT));
 
-	if (dbs_data->governor == GOV_CONSERVATIVE) {
-		cs_dbs_info = dbs_data->get_cpu_dbs_info_s(cpu);
-		sampling_rate = &cs_tuners->sampling_rate;
+	switch (event) {
+	case CPUFREQ_GOV_POLICY_INIT:
+		WARN_ON(dbs_data);
+		dbs_data = kzalloc(sizeof(*dbs_data), GFP_KERNEL);
+		if (!dbs_data) {
+			pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__);
+			return -ENOMEM;
+		}
+
+		dbs_data->cdata = cdata;
+		rc = cdata->init(dbs_data);
+		if (rc) {
+			pr_err("%s: POLICY_INIT: init() failed\n", __func__);
+			kfree(dbs_data);
+			return rc;
+		}
+		policy->governor_data = dbs_data;
+
+		/* policy latency is in nS. Convert it to uS first */
+		latency = policy->cpuinfo.transition_latency / 1000;
+		if (latency == 0)
+			latency = 1;
+
+		/* Bring kernel and HW constraints together */
+		dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
+				MIN_LATENCY_MULTIPLIER * latency);
+		set_sampling_rate(dbs_data, max(dbs_data->min_sampling_rate,
+					latency * LATENCY_MULTIPLIER));
+		return 0;
+	case CPUFREQ_GOV_POLICY_EXIT:
+		kfree(dbs_data);
+		policy->governor_data = NULL;
+		return 0;
+	}
+
+	cpu_cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
+
+	if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
+		cs_tuners = dbs_data->tuners;
+		cs_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
+		sampling_rate = cs_tuners->sampling_rate;
 		ignore_nice = cs_tuners->ignore_nice;
-		cs_ops = dbs_data->gov_ops;
+		cs_ops = dbs_data->cdata->gov_ops;
 	} else {
-		od_dbs_info = dbs_data->get_cpu_dbs_info_s(cpu);
-		sampling_rate = &od_tuners->sampling_rate;
+		od_tuners = dbs_data->tuners;
+		od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu);
+		sampling_rate = od_tuners->sampling_rate;
 		ignore_nice = od_tuners->ignore_nice;
-		od_ops = dbs_data->gov_ops;
+		od_ops = dbs_data->cdata->gov_ops;
 	}
 
 	switch (event) {
@@ -232,7 +285,7 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
 
 		for_each_cpu(j, policy->cpus) {
 			struct cpu_dbs_common_info *j_cdbs =
-				dbs_data->get_cpu_cdbs(j);
+				dbs_data->cdata->get_cpu_cdbs(j);
 
 			j_cdbs->cpu = j;
 			j_cdbs->cur_policy = policy;
@@ -244,11 +297,11 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
 
 			mutex_init(&j_cdbs->timer_mutex);
 			INIT_DEFERRABLE_WORK(&j_cdbs->work,
-					     dbs_data->gov_dbs_timer);
+					     dbs_data->cdata->gov_dbs_timer);
 		}
 
-		rc = sysfs_create_group(cpufreq_global_kobject,
-				dbs_data->attr_group);
+		rc = sysfs_create_group(&policy->kobj,
+				dbs_data->cdata->attr_group);
 		if (rc) {
 			mutex_unlock(&dbs_data->mutex);
 			return rc;
@@ -258,51 +311,29 @@ int cpufreq_governor_dbs(struct dbs_data *dbs_data,
 		 * conservative does not implement micro like ondemand
 		 * governor, thus we are bound to jiffes/HZ
 		 */
-		if (dbs_data->governor == GOV_CONSERVATIVE) {
+		if (dbs_data->cdata->governor == GOV_CONSERVATIVE) {
 			cs_dbs_info->down_skip = 0;
 			cs_dbs_info->enable = 1;
 			cs_dbs_info->requested_freq = policy->cur;
 			cpufreq_register_notifier(cs_ops->notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
-
-			if (!policy->governor->initialized)
-				dbs_data->min_sampling_rate =
-					MIN_SAMPLING_RATE_RATIO *
-					jiffies_to_usecs(10);
 		} else {
 			od_dbs_info->rate_mult = 1;
 			od_dbs_info->sample_type = OD_NORMAL_SAMPLE;
 			od_ops->powersave_bias_init_cpu(cpu);
-
-			if (!policy->governor->initialized)
-				od_tuners->io_is_busy = od_ops->io_busy();
 		}
 
-		if (policy->governor->initialized)
-			goto unlock;
-
-		/* policy latency is in nS. Convert it to uS first */
-		latency = policy->cpuinfo.transition_latency / 1000;
-		if (latency == 0)
-			latency = 1;
-
-		/* Bring kernel and HW constraints together */
-		dbs_data->min_sampling_rate = max(dbs_data->min_sampling_rate,
-				MIN_LATENCY_MULTIPLIER * latency);
-		*sampling_rate = max(dbs_data->min_sampling_rate, latency *
-				LATENCY_MULTIPLIER);
-unlock:
 		mutex_unlock(&dbs_data->mutex);
 
 		/* Initiate timer time stamp */
 		cpu_cdbs->time_stamp = ktime_get();
 
 		for_each_cpu(j, policy->cpus)
-			dbs_timer_init(dbs_data, j, *sampling_rate);
+			dbs_timer_init(dbs_data, j, sampling_rate);
 		break;
 
 	case CPUFREQ_GOV_STOP:
-		if (dbs_data->governor == GOV_CONSERVATIVE)
+		if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
 			cs_dbs_info->enable = 0;
 
 		for_each_cpu(j, policy->cpus)
@@ -311,9 +342,8 @@ unlock:
 		mutex_lock(&dbs_data->mutex);
 		mutex_destroy(&cpu_cdbs->timer_mutex);
 
-		sysfs_remove_group(cpufreq_global_kobject,
-				dbs_data->attr_group);
-		if (dbs_data->governor == GOV_CONSERVATIVE)
+		sysfs_remove_group(&policy->kobj, dbs_data->cdata->attr_group);
+		if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
 			cpufreq_unregister_notifier(cs_ops->notifier_block,
 					CPUFREQ_TRANSITION_NOTIFIER);
 		mutex_unlock(&dbs_data->mutex);
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 16314b6..8ea106a 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -43,9 +43,11 @@ enum {OD_NORMAL_SAMPLE, OD_SUB_SAMPLE};
 /* Macro creating sysfs show routines */
 #define show_one(_gov, file_name, object)				\
 static ssize_t show_##file_name						\
-(struct kobject *kobj, struct attribute *attr, char *buf)		\
+(struct cpufreq_policy *policy, char *buf)				\
 {									\
-	return sprintf(buf, "%u\n", _gov##_tuners.object);		\
+	struct dbs_data *dbs_data = policy->governor_data;		\
+	struct _gov##_dbs_tuners *tuners = dbs_data->tuners;		\
+	return sprintf(buf, "%u\n", tuners->file_name);			\
 }
 
 #define define_get_cpu_dbs_routines(_dbs_info)				\
@@ -103,7 +105,7 @@ struct cs_cpu_dbs_info_s {
 	unsigned int enable:1;
 };
 
-/* Governers sysfs tunables */
+/* Per policy Governers sysfs tunables */
 struct od_dbs_tuners {
 	unsigned int ignore_nice;
 	unsigned int sampling_rate;
@@ -123,31 +125,37 @@ struct cs_dbs_tuners {
 	unsigned int freq_step;
 };
 
-/* Per Governer data */
-struct dbs_data {
+/* Common Governer data across policies */
+struct dbs_data;
+struct common_dbs_data {
 	/* Common across governors */
 	#define GOV_ONDEMAND		0
 	#define GOV_CONSERVATIVE	1
 	int governor;
-	unsigned int min_sampling_rate;
 	struct attribute_group *attr_group;
-	void *tuners;
-
-	/* dbs_mutex protects dbs_enable in governor start/stop */
-	struct mutex mutex;
 
 	struct cpu_dbs_common_info *(*get_cpu_cdbs)(int cpu);
 	void *(*get_cpu_dbs_info_s)(int cpu);
 	void (*gov_dbs_timer)(struct work_struct *work);
 	void (*gov_check_cpu)(int cpu, unsigned int load);
+	int (*init)(struct dbs_data *dbs_data);
 
 	/* Governor specific ops, see below */
 	void *gov_ops;
 };
 
+/* Governer Per policy data */
+struct dbs_data {
+	struct common_dbs_data *cdata;
+	unsigned int min_sampling_rate;
+	void *tuners;
+
+	/* dbs_mutex protects dbs_enable in governor start/stop */
+	struct mutex mutex;
+};
+
 /* Governor specific ops, will be passed to dbs_data->gov_ops */
 struct od_ops {
-	int (*io_busy)(void);
 	void (*powersave_bias_init_cpu)(int cpu);
 	unsigned int (*powersave_bias_target)(struct cpufreq_policy *policy,
 			unsigned int freq_next, unsigned int relation);
@@ -169,10 +177,18 @@ static inline int delay_for_sampling_rate(unsigned int sampling_rate)
 	return delay;
 }
 
+#define declare_show_sampling_rate_min()				\
+static ssize_t show_sampling_rate_min(struct cpufreq_policy *policy,	\
+		char *buf)						\
+{									\
+	struct dbs_data *dbs_data = policy->governor_data;		\
+	return sprintf(buf, "%u\n", dbs_data->min_sampling_rate);	\
+}
+
 u64 get_cpu_idle_time(unsigned int cpu, u64 *wall);
 void dbs_check_cpu(struct dbs_data *dbs_data, int cpu);
 bool need_load_eval(struct cpu_dbs_common_info *cdbs,
 		unsigned int sampling_rate);
-int cpufreq_governor_dbs(struct dbs_data *dbs_data,
-		struct cpufreq_policy *policy, unsigned int event);
+int cpufreq_governor_dbs(struct cpufreq_policy *policy,
+		struct common_dbs_data *cdata, unsigned int event);
 #endif /* _CPUFREQ_GOVERNER_H */
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index f38b8da..7e15bbe 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/mutex.h>
 #include <linux/percpu-defs.h>
+#include <linux/slab.h>
 #include <linux/sysfs.h>
 #include <linux/tick.h>
 #include <linux/types.h>
@@ -37,21 +38,12 @@
 #define MIN_FREQUENCY_UP_THRESHOLD		(11)
 #define MAX_FREQUENCY_UP_THRESHOLD		(100)
 
-static struct dbs_data od_dbs_data;
 static DEFINE_PER_CPU(struct od_cpu_dbs_info_s, od_cpu_dbs_info);
 
 #ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
 static struct cpufreq_governor cpufreq_gov_ondemand;
 #endif
 
-static struct od_dbs_tuners od_tuners = {
-	.up_threshold = DEF_FREQUENCY_UP_THRESHOLD,
-	.sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR,
-	.down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL,
-	.ignore_nice = 0,
-	.powersave_bias = 0,
-};
-
 static void ondemand_powersave_bias_init_cpu(int cpu)
 {
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
@@ -97,6 +89,8 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
 	unsigned int jiffies_total, jiffies_hi, jiffies_lo;
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info,
 						   policy->cpu);
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 
 	if (!dbs_info->freq_table) {
 		dbs_info->freq_lo = 0;
@@ -107,7 +101,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
 	cpufreq_frequency_table_target(policy, dbs_info->freq_table, freq_next,
 			relation, &index);
 	freq_req = dbs_info->freq_table[index].frequency;
-	freq_reduc = freq_req * od_tuners.powersave_bias / 1000;
+	freq_reduc = freq_req * od_tuners->powersave_bias / 1000;
 	freq_avg = freq_req - freq_reduc;
 
 	/* Find freq bounds for freq_avg in freq_table */
@@ -126,7 +120,7 @@ static unsigned int powersave_bias_target(struct cpufreq_policy *policy,
 		dbs_info->freq_lo_jiffies = 0;
 		return freq_lo;
 	}
-	jiffies_total = usecs_to_jiffies(od_tuners.sampling_rate);
+	jiffies_total = usecs_to_jiffies(od_tuners->sampling_rate);
 	jiffies_hi = (freq_avg - freq_lo) * jiffies_total;
 	jiffies_hi += ((freq_hi - freq_lo) / 2);
 	jiffies_hi /= (freq_hi - freq_lo);
@@ -147,12 +141,15 @@ static void ondemand_powersave_bias_init(void)
 
 static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
 {
-	if (od_tuners.powersave_bias)
+	struct dbs_data *dbs_data = p->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
+
+	if (od_tuners->powersave_bias)
 		freq = powersave_bias_target(p, freq, CPUFREQ_RELATION_H);
 	else if (p->cur == p->max)
 		return;
 
-	__cpufreq_driver_target(p, freq, od_tuners.powersave_bias ?
+	__cpufreq_driver_target(p, freq, od_tuners->powersave_bias ?
 			CPUFREQ_RELATION_L : CPUFREQ_RELATION_H);
 }
 
@@ -169,15 +166,17 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
 {
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 	struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 
 	dbs_info->freq_lo = 0;
 
 	/* Check for frequency increase */
-	if (load_freq > od_tuners.up_threshold * policy->cur) {
+	if (load_freq > od_tuners->up_threshold * policy->cur) {
 		/* If switching to max speed, apply sampling_down_factor */
 		if (policy->cur < policy->max)
 			dbs_info->rate_mult =
-				od_tuners.sampling_down_factor;
+				od_tuners->sampling_down_factor;
 		dbs_freq_increase(policy, policy->max);
 		return;
 	}
@@ -192,11 +191,11 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
 	 * support the current CPU usage without triggering the up policy. To be
 	 * safe, we focus 10 points under the threshold.
 	 */
-	if (load_freq < (od_tuners.up_threshold - od_tuners.down_differential) *
-			policy->cur) {
+	if (load_freq < (od_tuners->up_threshold - od_tuners->down_differential)
+			* policy->cur) {
 		unsigned int freq_next;
-		freq_next = load_freq / (od_tuners.up_threshold -
-				od_tuners.down_differential);
+		freq_next = load_freq / (od_tuners->up_threshold -
+				od_tuners->down_differential);
 
 		/* No longer fully busy, reset rate_mult */
 		dbs_info->rate_mult = 1;
@@ -204,7 +203,7 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
 		if (freq_next < policy->min)
 			freq_next = policy->min;
 
-		if (!od_tuners.powersave_bias) {
+		if (!od_tuners->powersave_bias) {
 			__cpufreq_driver_target(policy, freq_next,
 					CPUFREQ_RELATION_L);
 		} else {
@@ -224,12 +223,14 @@ static void od_dbs_timer(struct work_struct *work)
 	unsigned int cpu = dbs_info->cdbs.cur_policy->cpu;
 	struct od_cpu_dbs_info_s *core_dbs_info = &per_cpu(od_cpu_dbs_info,
 			cpu);
+	struct dbs_data *dbs_data = dbs_info->cdbs.cur_policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	int delay, sample_type = core_dbs_info->sample_type;
 	bool eval_load;
 
 	mutex_lock(&core_dbs_info->cdbs.timer_mutex);
 	eval_load = need_load_eval(&core_dbs_info->cdbs,
-			od_tuners.sampling_rate);
+			od_tuners->sampling_rate);
 
 	/* Common NORMAL_SAMPLE setup */
 	core_dbs_info->sample_type = OD_NORMAL_SAMPLE;
@@ -241,13 +242,13 @@ static void od_dbs_timer(struct work_struct *work)
 						CPUFREQ_RELATION_H);
 	} else {
 		if (eval_load)
-			dbs_check_cpu(&od_dbs_data, cpu);
+			dbs_check_cpu(dbs_data, cpu);
 		if (core_dbs_info->freq_lo) {
 			/* Setup timer for SUB_SAMPLE */
 			core_dbs_info->sample_type = OD_SUB_SAMPLE;
 			delay = core_dbs_info->freq_hi_jiffies;
 		} else {
-			delay = delay_for_sampling_rate(od_tuners.sampling_rate
+			delay = delay_for_sampling_rate(od_tuners->sampling_rate
 						* core_dbs_info->rate_mult);
 		}
 	}
@@ -257,13 +258,6 @@ static void od_dbs_timer(struct work_struct *work)
 }
 
 /************************** sysfs interface ************************/
-
-static ssize_t show_sampling_rate_min(struct kobject *kobj,
-				      struct attribute *attr, char *buf)
-{
-	return sprintf(buf, "%u\n", od_dbs_data.min_sampling_rate);
-}
-
 /**
  * update_sampling_rate - update sampling rate effective immediately if needed.
  * @new_rate: new sampling rate
@@ -277,12 +271,14 @@ static ssize_t show_sampling_rate_min(struct kobject *kobj,
  * reducing the sampling rate, we need to make the new value effective
  * immediately.
  */
-static void update_sampling_rate(unsigned int new_rate)
+static void update_sampling_rate(struct dbs_data *dbs_data,
+		unsigned int new_rate)
 {
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	int cpu;
 
-	od_tuners.sampling_rate = new_rate = max(new_rate,
-			od_dbs_data.min_sampling_rate);
+	od_tuners->sampling_rate = new_rate = max(new_rate,
+			dbs_data->min_sampling_rate);
 
 	for_each_online_cpu(cpu) {
 		struct cpufreq_policy *policy;
@@ -323,34 +319,39 @@ static void update_sampling_rate(unsigned int new_rate)
 	}
 }
 
-static ssize_t store_sampling_rate(struct kobject *a, struct attribute *b,
-				   const char *buf, size_t count)
+static ssize_t store_sampling_rate(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 	if (ret != 1)
 		return -EINVAL;
-	update_sampling_rate(input);
+	update_sampling_rate(dbs_data, input);
 	return count;
 }
 
-static ssize_t store_io_is_busy(struct kobject *a, struct attribute *b,
-				   const char *buf, size_t count)
+static ssize_t store_io_is_busy(struct cpufreq_policy *policy, const char *buf,
+		size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 
 	ret = sscanf(buf, "%u", &input);
 	if (ret != 1)
 		return -EINVAL;
-	od_tuners.io_is_busy = !!input;
+	od_tuners->io_is_busy = !!input;
 	return count;
 }
 
-static ssize_t store_up_threshold(struct kobject *a, struct attribute *b,
-				  const char *buf, size_t count)
+static ssize_t store_up_threshold(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
@@ -359,20 +360,22 @@ static ssize_t store_up_threshold(struct kobject *a, struct attribute *b,
 			input < MIN_FREQUENCY_UP_THRESHOLD) {
 		return -EINVAL;
 	}
-	od_tuners.up_threshold = input;
+	od_tuners->up_threshold = input;
 	return count;
 }
 
-static ssize_t store_sampling_down_factor(struct kobject *a,
-			struct attribute *b, const char *buf, size_t count)
+static ssize_t store_sampling_down_factor(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	unsigned int input, j;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
 
 	if (ret != 1 || input > MAX_SAMPLING_DOWN_FACTOR || input < 1)
 		return -EINVAL;
-	od_tuners.sampling_down_factor = input;
+	od_tuners->sampling_down_factor = input;
 
 	/* Reset down sampling multiplier in case it was active */
 	for_each_online_cpu(j) {
@@ -383,9 +386,11 @@ static ssize_t store_sampling_down_factor(struct kobject *a,
 	return count;
 }
 
-static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
-				      const char *buf, size_t count)
+static ssize_t store_ignore_nice(struct cpufreq_policy *policy, const char *buf,
+		size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 
@@ -398,10 +403,10 @@ static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
 	if (input > 1)
 		input = 1;
 
-	if (input == od_tuners.ignore_nice) { /* nothing to do */
+	if (input == od_tuners->ignore_nice) { /* nothing to do */
 		return count;
 	}
-	od_tuners.ignore_nice = input;
+	od_tuners->ignore_nice = input;
 
 	/* we need to re-evaluate prev_cpu_idle */
 	for_each_online_cpu(j) {
@@ -409,7 +414,7 @@ static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
 		dbs_info = &per_cpu(od_cpu_dbs_info, j);
 		dbs_info->cdbs.prev_cpu_idle = get_cpu_idle_time(j,
 						&dbs_info->cdbs.prev_cpu_wall);
-		if (od_tuners.ignore_nice)
+		if (od_tuners->ignore_nice)
 			dbs_info->cdbs.prev_cpu_nice =
 				kcpustat_cpu(j).cpustat[CPUTIME_NICE];
 
@@ -417,9 +422,11 @@ static ssize_t store_ignore_nice_load(struct kobject *a, struct attribute *b,
 	return count;
 }
 
-static ssize_t store_powersave_bias(struct kobject *a, struct attribute *b,
-				    const char *buf, size_t count)
+static ssize_t store_powersave_bias(struct cpufreq_policy *policy,
+		const char *buf, size_t count)
 {
+	struct dbs_data *dbs_data = policy->governor_data;
+	struct od_dbs_tuners *od_tuners = dbs_data->tuners;
 	unsigned int input;
 	int ret;
 	ret = sscanf(buf, "%u", &input);
@@ -430,7 +437,7 @@ static ssize_t store_powersave_bias(struct kobject *a, struct attribute *b,
 	if (input > 1000)
 		input = 1000;
 
-	od_tuners.powersave_bias = input;
+	od_tuners->powersave_bias = input;
 	ondemand_powersave_bias_init();
 	return count;
 }
@@ -439,23 +446,24 @@ show_one(od, sampling_rate, sampling_rate);
 show_one(od, io_is_busy, io_is_busy);
 show_one(od, up_threshold, up_threshold);
 show_one(od, sampling_down_factor, sampling_down_factor);
-show_one(od, ignore_nice_load, ignore_nice);
+show_one(od, ignore_nice, ignore_nice);
 show_one(od, powersave_bias, powersave_bias);
+declare_show_sampling_rate_min();
 
-define_one_global_rw(sampling_rate);
-define_one_global_rw(io_is_busy);
-define_one_global_rw(up_threshold);
-define_one_global_rw(sampling_down_factor);
-define_one_global_rw(ignore_nice_load);
-define_one_global_rw(powersave_bias);
-define_one_global_ro(sampling_rate_min);
+cpufreq_freq_attr_rw(sampling_rate);
+cpufreq_freq_attr_rw(io_is_busy);
+cpufreq_freq_attr_rw(up_threshold);
+cpufreq_freq_attr_rw(sampling_down_factor);
+cpufreq_freq_attr_rw(ignore_nice);
+cpufreq_freq_attr_rw(powersave_bias);
+cpufreq_freq_attr_ro(sampling_rate_min);
 
 static struct attribute *dbs_attributes[] = {
 	&sampling_rate_min.attr,
 	&sampling_rate.attr,
 	&up_threshold.attr,
 	&sampling_down_factor.attr,
-	&ignore_nice_load.attr,
+	&ignore_nice.attr,
 	&powersave_bias.attr,
 	&io_is_busy.attr,
 	NULL
@@ -468,30 +476,73 @@ static struct attribute_group od_attr_group = {
 
 /************************** sysfs end ************************/
 
+static int od_init(struct dbs_data *dbs_data)
+{
+	struct od_dbs_tuners *tuners;
+	u64 idle_time;
+	int cpu;
+
+	tuners = kzalloc(sizeof(struct od_dbs_tuners), GFP_KERNEL);
+	if (!tuners) {
+		pr_err("%s: kzalloc failed\n", __func__);
+		return -ENOMEM;
+	}
+
+	cpu = get_cpu();
+	idle_time = get_cpu_idle_time_us(cpu, NULL);
+	put_cpu();
+	if (idle_time != -1ULL) {
+		/* Idle micro accounting is supported. Use finer thresholds */
+		tuners->up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
+		tuners->down_differential = MICRO_FREQUENCY_DOWN_DIFFERENTIAL;
+		/*
+		 * In nohz/micro accounting case we set the minimum frequency
+		 * not depending on HZ, but fixed (very low). The deferred
+		 * timer might skip some samples if idle/sleeping as needed.
+		*/
+		dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
+	} else {
+		tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
+		tuners->down_differential = DEF_FREQUENCY_DOWN_DIFFERENTIAL;
+
+		/* For correct statistics, we need 10 ticks for each measure */
+		dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
+			jiffies_to_usecs(10);
+	}
+
+	tuners->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR;
+	tuners->ignore_nice = 0;
+	tuners->powersave_bias = 0;
+	tuners->io_is_busy = should_io_be_busy();
+
+	dbs_data->tuners = tuners;
+	mutex_init(&dbs_data->mutex);
+	return 0;
+}
+
 define_get_cpu_dbs_routines(od_cpu_dbs_info);
 
 static struct od_ops od_ops = {
-	.io_busy = should_io_be_busy,
 	.powersave_bias_init_cpu = ondemand_powersave_bias_init_cpu,
 	.powersave_bias_target = powersave_bias_target,
 	.freq_increase = dbs_freq_increase,
 };
 
-static struct dbs_data od_dbs_data = {
+static struct common_dbs_data od_dbs_cdata = {
 	.governor = GOV_ONDEMAND,
 	.attr_group = &od_attr_group,
-	.tuners = &od_tuners,
 	.get_cpu_cdbs = get_cpu_cdbs,
 	.get_cpu_dbs_info_s = get_cpu_dbs_info_s,
 	.gov_dbs_timer = od_dbs_timer,
 	.gov_check_cpu = od_check_cpu,
 	.gov_ops = &od_ops,
+	.init = od_init,
 };
 
 static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy,
 		unsigned int event)
 {
-	return cpufreq_governor_dbs(&od_dbs_data, policy, event);
+	return cpufreq_governor_dbs(policy, &od_dbs_cdata, event);
 }
 
 #ifndef CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND
@@ -506,28 +557,6 @@ struct cpufreq_governor cpufreq_gov_ondemand = {
 
 static int __init cpufreq_gov_dbs_init(void)
 {
-	u64 idle_time;
-	int cpu = get_cpu();
-
-	mutex_init(&od_dbs_data.mutex);
-	idle_time = get_cpu_idle_time_us(cpu, NULL);
-	put_cpu();
-	if (idle_time != -1ULL) {
-		/* Idle micro accounting is supported. Use finer thresholds */
-		od_tuners.up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
-		od_tuners.down_differential = MICRO_FREQUENCY_DOWN_DIFFERENTIAL;
-		/*
-		 * In nohz/micro accounting case we set the minimum frequency
-		 * not depending on HZ, but fixed (very low). The deferred
-		 * timer might skip some samples if idle/sleeping as needed.
-		*/
-		od_dbs_data.min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
-	} else {
-		/* For correct statistics, we need 10 ticks for each measure */
-		od_dbs_data.min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
-			jiffies_to_usecs(10);
-	}
-
 	return cpufreq_register_governor(&cpufreq_gov_ondemand);
 }
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 3b822ce..5dae7e6 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -183,11 +183,10 @@ static inline unsigned long cpufreq_scale(unsigned long old, u_int div, u_int mu
 #define CPUFREQ_GOV_STOP	2
 #define CPUFREQ_GOV_LIMITS	3
 #define CPUFREQ_GOV_POLICY_INIT	4
-#define CPUFREQ_GOV_POLICY_EXIT	4
+#define CPUFREQ_GOV_POLICY_EXIT	5
 
 struct cpufreq_governor {
 	char	name[CPUFREQ_NAME_LEN];
-	int	initialized;
 	int	(*governor)	(struct cpufreq_policy *policy,
 				 unsigned int event);
 	ssize_t	(*show_setspeed)	(struct cpufreq_policy *policy,
@@ -307,22 +306,6 @@ __ATTR(_name, _perm, show_##_name, NULL)
 static struct freq_attr _name =			\
 __ATTR(_name, 0644, show_##_name, store_##_name)
 
-struct global_attr {
-	struct attribute attr;
-	ssize_t (*show)(struct kobject *kobj,
-			struct attribute *attr, char *buf);
-	ssize_t (*store)(struct kobject *a, struct attribute *b,
-			 const char *c, size_t count);
-};
-
-#define define_one_global_ro(_name)		\
-static struct global_attr _name =		\
-__ATTR(_name, 0444, show_##_name, NULL)
-
-#define define_one_global_rw(_name)		\
-static struct global_attr _name =		\
-__ATTR(_name, 0644, show_##_name, store_##_name)
-
 struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu);
 void cpufreq_cpu_put(struct cpufreq_policy *data);
 const char *cpufreq_get_current_driver(void);
-- 
1.7.12.rc2.18.g61b472e


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
                   ` (3 preceding siblings ...)
  2013-02-04 11:38 ` [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors Viresh Kumar
@ 2013-02-04 12:17 ` Rafael J. Wysocki
  2013-02-04 12:24   ` Viresh Kumar
  2013-02-05 16:21 ` Viresh Kumar
  5 siblings, 1 reply; 44+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 12:17 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau

On Monday, February 04, 2013 05:08:50 PM Viresh Kumar wrote:
> Currently, there can't be multiple instances of single governor_type. If we have
> a multi-package system, where we have multiple instances of struct policy (per
> package), we can't have multiple instances of same governor. i.e. We can't have
> multiple instances of ondemand governor for multiple packages.
> 
> Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
> governor-name/. Which again reflects that there can be only one instance of a
> governor_type in the system.
> 
> This is a bottleneck for multicluster system, where we want different packages
> to use same governor type, but with different tunables.
> 
> This patchset is inclined towards fixing this issue.
> 
> Viresh Kumar (4):
>   cpufreq: Don't check cpu_online(policy->cpu)
>   cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR
>   cpufreq: Add per policy governor-init/exit infrastructure
>   cpufreq: governor: Implement per policy instances of governors

Well, [1-2/4] are things I can take for v3.9 no problem.  The other two I'd
wait for the next cycle to be honest.  We already have 30+ cpufreq patches
scheduled for v3.9 and some of them quite subtle for that matter.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 12:17 ` [PATCH 0/4] CPUFreq: " Rafael J. Wysocki
@ 2013-02-04 12:24   ` Viresh Kumar
  2013-02-04 12:32     ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 12:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau

On 4 February 2013 17:47, Rafael J. Wysocki <rjw@sisk.pl> wrote:
> Well, [1-2/4] are things I can take for v3.9 no problem.  The other two I'd
> wait for the next cycle to be honest.  We already have 30+ cpufreq patches
> scheduled for v3.9 and some of them quite subtle for that matter.

To be honest, i wanted to get these in 3.9 :) .
And that's why hurried on them and got this full series working in a single
day of work :)

Anyway, i understand your point and i also believe last two patches require some
testing/review before these getting in.

One important point i would like to highlight is:
governors directory would be present in cpu/cpu*/cpufreq/ now instead
of cpu/cpufreq/.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 12:24   ` Viresh Kumar
@ 2013-02-04 12:32     ` Borislav Petkov
  2013-02-04 12:54       ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-04 12:32 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Mon, Feb 04, 2013 at 05:54:18PM +0530, Viresh Kumar wrote:
> One important point i would like to highlight is: governors directory
> would be present in cpu/cpu*/cpufreq/ now instead of cpu/cpufreq/.

Uh, hold on, isn't this breaking a bunch of userspace with this move?
Also, on all those other systems which don't need per-policy governors,
we probably don't need this. So maybe this should be made optional, to
be enabled by a config option IMO...

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 12:32     ` Borislav Petkov
@ 2013-02-04 12:54       ` Viresh Kumar
  2013-02-04 13:04         ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 12:54 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau

On 4 February 2013 18:02, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 05:54:18PM +0530, Viresh Kumar wrote:
>> One important point i would like to highlight is: governors directory
>> would be present in cpu/cpu*/cpufreq/ now instead of cpu/cpufreq/.
>
> Uh, hold on, isn't this breaking a bunch of userspace with this move?
> Also, on all those other systems which don't need per-policy governors,
> we probably don't need this. So maybe this should be made optional, to
> be enabled by a config option IMO...

That's why i am highlighting it again and again. :)
What i believe is, the place where this directory was present earlier
(cpu/cpufreq/) wasn't the right place. Everything else was in cpu/cpu*/cpufreq,
then why this in cpu/cpufreq/ ?

I don't know how much of a pain it would be to fix userspace for it, but i know
it wouldn't be that small.

I had another idea of doing this only for platforms where we have multiple
struct policy alive at the same time. But didn't wanted to implement it before
discussing this further.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 12:54       ` Viresh Kumar
@ 2013-02-04 13:04         ` Borislav Petkov
  2013-02-04 13:25           ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-04 13:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Mon, Feb 04, 2013 at 06:24:19PM +0530, Viresh Kumar wrote:
> That's why i am highlighting it again and again. :)

Ah, see, someone caught up with it :).

> What i believe is, the place where this directory was present earlier
> (cpu/cpufreq/) wasn't the right place. Everything else was in cpu/cpu*/cpufreq,
> then why this in cpu/cpufreq/ ?

For the simple reason that the "cpu*" stuff is per-cpu - the
"cpu/cpufreq" is per system, i.e. one governor for the whole system.

> I don't know how much of a pain it would be to fix userspace for it,
> but i know it wouldn't be that small.

I wouldn't fix userspace but simply not touch it. You can add your
per-policy stuff in "cpu/cpu*" as new sysfs nodes and no need to
change anything. And, also, as I suggested earlier, you should make it
configurable since this code wouldn't make sense on x86, for example,
where one system-wide governor should suffice.

> I had another idea of doing this only for platforms where we have
> multiple struct policy alive at the same time. But didn't wanted to
> implement it before discussing this further.

Simply put it behind a config option like
CONFIG_CPU_IDLE_MULTIPLE_DRIVERS, call the whole menu
"Multi-power-domain-policy" something and that should be modulary
enough.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 13:04         ` Borislav Petkov
@ 2013-02-04 13:25           ` Viresh Kumar
  2013-02-04 13:36             ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 13:25 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau

On 4 February 2013 18:34, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 06:24:19PM +0530, Viresh Kumar wrote:

>> What i believe is, the place where this directory was present earlier
>> (cpu/cpufreq/) wasn't the right place. Everything else was in cpu/cpu*/cpufreq,
>> then why this in cpu/cpufreq/ ?
>
> For the simple reason that the "cpu*" stuff is per-cpu - the
> "cpu/cpufreq" is per system, i.e. one governor for the whole system.

That's not completely true. There lies cpufreq directory in cpu/cpu*/ too,
where we have per policy stuff in cpu/cpu*/, like policy tunables and stats.
And the same is true for governor too.

>> I don't know how much of a pain it would be to fix userspace for it,
>> but i know it wouldn't be that small.
>
> I wouldn't fix userspace but simply not touch it. You can add your
> per-policy stuff in "cpu/cpu*" as new sysfs nodes and no need to
> change anything.

That was slightly confusing to me :(
The whole governor directory is per policy, i have to keep that in
cpu/cpu*/cpufreq
instead of cpu/cpufreq .

> And, also, as I suggested earlier, you should make it
> configurable since this code wouldn't make sense on x86, for example,
> where one system-wide governor should suffice.

Its not only for multicluster system, but a system where multiple cpus have
separate clock control and hence multiple policy structures.

>> I had another idea of doing this only for platforms where we have
>> multiple struct policy alive at the same time. But didn't wanted to
>> implement it before discussing this further.
>
> Simply put it behind a config option like
> CONFIG_CPU_IDLE_MULTIPLE_DRIVERS, call the whole menu
> "Multi-power-domain-policy" something and that should be modulary
> enough.

Problem with this is it would fail for single image solutions on which everybody
is working on. So, with multiple platforms compiled into a single
image, this wouldn't
work.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 13:25           ` Viresh Kumar
@ 2013-02-04 13:36             ` Borislav Petkov
  2013-02-04 13:58               ` Viresh Kumar
  2013-02-05  9:36               ` Viresh Kumar
  0 siblings, 2 replies; 44+ messages in thread
From: Borislav Petkov @ 2013-02-04 13:36 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:
> That's not completely true. There lies cpufreq directory in cpu/cpu*/
> too, where we have per policy stuff in cpu/cpu*/, like policy tunables
> and stats. And the same is true for governor too.

$ tree /sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu0/cpufreq/
├── affected_cpus
├── bios_limit
├── cpb
├── cpuinfo_cur_freq
├── cpuinfo_max_freq
├── cpuinfo_min_freq
├── cpuinfo_transition_latency
├── related_cpus
├── scaling_available_frequencies
├── scaling_available_governors
├── scaling_cur_freq
├── scaling_driver
├── scaling_governor
├── scaling_max_freq
├── scaling_min_freq
├── scaling_setspeed
└── stats
    ├── time_in_state
    ├── total_trans
    └── trans_table

1 directory, 19 files

$ grep -r . /sys/devices/system/cpu/cpu0/cpufreq/*
/sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
/sys/devices/system/cpu/cpu0/cpufreq/bios_limit:4000000
/sys/devices/system/cpu/cpu0/cpufreq/cpb:1
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1400000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:4000000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1400000
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:4000
/sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:4000000 3400000 2800000 2100000 1400000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:powersave userspace conservative ondemand perform
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1400000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:acpi-cpufreq
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:ondemand
/sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:4000000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1400000
/sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:4000000 3089328
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:3400000 47448
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:2800000 67185
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:2100000 92731
/sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:1400000 11416914
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:   From  :    To
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:         :   4000000   3400000   2800000   2100000   140000
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  4000000:         0     34756     46388     53179    21824
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  3400000:     12938         0      3755      3555     1450
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  2800000:     19940         0         0      4547     2565
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  2100000:     18523         0         0         0     4275
/sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  1400000:    301168         0         0         0
/sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans:799918

Show me the policy tunables here.

> That was slightly confusing to me :( The whole governor directory
> is per policy, i have to keep that in cpu/cpu*/cpufreq instead of
> cpu/cpufreq.

So make a /sys/devices/system/cpu/cpufreq/policies/ and add
functionality to assign cpus to policies or whatever the design of this
thing will be.

> Its not only for multicluster system, but a system where multiple cpus
> have separate clock control and hence multiple policy structures.

What are those systems? Examples?

> Problem with this is it would fail for single image solutions on which
> everybody is working on. So, with multiple platforms compiled into a
> single image, this wouldn't work.

Single-image solutions will enable that config option and get built with
it - no problem at all.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 13:36             ` Borislav Petkov
@ 2013-02-04 13:58               ` Viresh Kumar
  2013-02-04 14:09                 ` Borislav Petkov
  2013-02-05  9:36               ` Viresh Kumar
  1 sibling, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 13:58 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau

On 4 February 2013 19:06, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:
>> That's not completely true. There lies cpufreq directory in cpu/cpu*/
>> too, where we have per policy stuff in cpu/cpu*/, like policy tunables
>> and stats. And the same is true for governor too.
>
> $ tree /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu0/cpufreq/
> ├── affected_cpus
> ├── bios_limit
> ├── cpb
> ├── cpuinfo_cur_freq
> ├── cpuinfo_max_freq
> ├── cpuinfo_min_freq
> ├── cpuinfo_transition_latency
> ├── related_cpus
> ├── scaling_available_frequencies
> ├── scaling_available_governors
> ├── scaling_cur_freq
> ├── scaling_driver
> ├── scaling_governor
> ├── scaling_max_freq
> ├── scaling_min_freq
> ├── scaling_setspeed
> └── stats
>     ├── time_in_state
>     ├── total_trans
>     └── trans_table
>
> 1 directory, 19 files
>
> $ grep -r . /sys/devices/system/cpu/cpu0/cpufreq/*
> /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus:0
> /sys/devices/system/cpu/cpu0/cpufreq/bios_limit:4000000
> /sys/devices/system/cpu/cpu0/cpufreq/cpb:1
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1400000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:4000000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq:1400000
> /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency:4000
> /sys/devices/system/cpu/cpu0/cpufreq/related_cpus:0 1
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies:4000000 3400000 2800000 2100000 1400000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors:powersave userspace conservative ondemand perform
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1400000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:acpi-cpufreq
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:ondemand
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq:4000000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq:1400000
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed:<unsupported>
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:4000000 3089328
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:3400000 47448
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:2800000 67185
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:2100000 92731
> /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state:1400000 11416914
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:   From  :    To
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:         :   4000000   3400000   2800000   2100000   140000
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  4000000:         0     34756     46388     53179    21824
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  3400000:     12938         0      3755      3555     1450
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  2800000:     19940         0         0      4547     2565
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  2100000:     18523         0         0         0     4275
> /sys/devices/system/cpu/cpu0/cpufreq/stats/trans_table:  1400000:    301168         0         0         0
> /sys/devices/system/cpu/cpu0/cpufreq/stats/total_trans:799918
>
> Show me the policy tunables here.

All files which are directly present in cpu/cpu*/cpufreq/ folder. I am not
talking about governor tunables but policy tunables. Things like
scaling_[min]max_freq are policy tunables.

>> That was slightly confusing to me :( The whole governor directory
>> is per policy, i have to keep that in cpu/cpu*/cpufreq instead of
>> cpu/cpufreq.
>
> So make a /sys/devices/system/cpu/cpufreq/policies/ and add
> functionality to assign cpus to policies or whatever the design of this
> thing will be.

Policies don't have a name associated with them and so cpu/cpufreq/policies
doesn't make any sense. Rather one policy is related to multiple cpus and
its tunables are linked in all the cpus that belong to it, like
scaling_[min]max_freq.

>> Its not only for multicluster system, but a system where multiple cpus
>> have separate clock control and hence multiple policy structures.
>
> What are those systems? Examples?

Don't have examples of these, but there can be few. Over that it is
a must for multicluster systems as clusters normally have separate
clock control.

>> Problem with this is it would fail for single image solutions on which
>> everybody is working on. So, with multiple platforms compiled into a
>> single image, this wouldn't work.
>
> Single-image solutions will enable that config option and get built with
> it - no problem at all.

But then we will get governors tunables in cpu/cpu*/cpufreq/ instead of
cpu/cpufreq/ . Will that not break userspace for other systems?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 13:58               ` Viresh Kumar
@ 2013-02-04 14:09                 ` Borislav Petkov
  2013-02-04 14:21                   ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-04 14:09 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Mon, Feb 04, 2013 at 07:28:16PM +0530, Viresh Kumar wrote:
> All files which are directly present in cpu/cpu*/cpufreq/ folder. I am
> not talking about governor tunables but policy tunables. Things like
> scaling_[min]max_freq are policy tunables.

No, on x86 those are the P-states frequencies. They're defined by the
hardware.

> Policies don't have a name associated with them and so
> cpu/cpufreq/policies doesn't make any sense. Rather one policy is
> related to multiple cpus and its tunables are linked in all the cpus
> that belong to it, like scaling_[min]max_freq.

Then do the following:

cpu/cpufreq/policies/
|-> policy0
    |-> min_freq
    |-> max_freq
    |-> affected_cpus
    ...

or whatever needs to be a flexible interface for multi-policy cpufreq
support.

Remember: once you do those, they're more or less cast in stone so take
your time and do the design right, do not hurry those.

> Don't have examples of these, but there can be few. Over that it is a
> must for multicluster systems as clusters normally have separate clock
> control.

Yeah, nice try. We only support real hardware in the kernel, not what
could there be.

> But then we will get governors tunables in cpu/cpu*/cpufreq/ instead
> of cpu/cpufreq/ . Will that not break userspace for other systems?

What's wrong with having both? The cpu/cpufreq/ governor will set the
system-wide governor and the cpu/cpu*/cpufreq/ will add the different
policies.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 14:09                 ` Borislav Petkov
@ 2013-02-04 14:21                   ` Viresh Kumar
  2013-02-04 15:05                     ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 14:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On 4 February 2013 19:39, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 07:28:16PM +0530, Viresh Kumar wrote:
>> All files which are directly present in cpu/cpu*/cpufreq/ folder. I am
>> not talking about governor tunables but policy tunables. Things like
>> scaling_[min]max_freq are policy tunables.
>
> No, on x86 those are the P-states frequencies. They're defined by the
> hardware.

A question we need to answer: What is "policy"?

For me, "policy" is the hardware policy for a group of cpus, defined by the
hardware. We call them P-states. But these are strictly per policy (i.e. per
cpu group sharing clock line).

So, if we have two packages with two cpus each, we will have two copies
of these P-states and every other file/directory in cpu/cpu*/cpufreq.
One common copy would be shared between cpu/cpu[0-1]/cpufreq
directory and other one between cpu/cpu[2-3]/cpufreq.

>> Policies don't have a name associated with them and so
>> cpu/cpufreq/policies doesn't make any sense. Rather one policy is
>> related to multiple cpus and its tunables are linked in all the cpus
>> that belong to it, like scaling_[min]max_freq.
>
> Then do the following:
>
> cpu/cpufreq/policies/
> |-> policy0
>     |-> min_freq
>     |-> max_freq
>     |-> affected_cpus
>     ...

We correlate things with cpus rather than policies and so the current directory
structure of cpu/cpu*/cpufreq/*** is the best suited ones.

> or whatever needs to be a flexible interface for multi-policy cpufreq
> support.

The multi policy part is already well implemented, we are talking about governor
per policy here.

> Remember: once you do those, they're more or less cast in stone so take
> your time and do the design right, do not hurry those.

Yes, and that's why cpu/cpu*/cpufreq/ondemand/*** suits the best, with exactly
the same logic that went for P-states or cpufreq-stats.

>> But then we will get governors tunables in cpu/cpu*/cpufreq/ instead
>> of cpu/cpufreq/ . Will that not break userspace for other systems?
>
> What's wrong with having both? The cpu/cpufreq/ governor will set the
> system-wide governor and the cpu/cpu*/cpufreq/ will add the different
> policies.

Hmm.. confused..
Consider two systems:
- A dual core system, with cores sharing clocks.
- A dual cluster system (dual core per cluster), with separate clocks
per cluster.

Where will you keep governor directories for both of these configurations?

We need to select only one... cpu/cpufreq doesn't suit the second case at all
as we need to use ondemand governor for both the clusters but with separate
tunables. And so a single cpu/cpufreq/ondemand directory wouldn't solve the
issue.

--
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 14:21                   ` Viresh Kumar
@ 2013-02-04 15:05                     ` Borislav Petkov
  2013-02-04 15:37                       ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-04 15:05 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Mon, Feb 04, 2013 at 07:51:33PM +0530, Viresh Kumar wrote:
> We correlate things with cpus rather than policies and so the current
> directory structure of cpu/cpu*/cpufreq/*** is the best suited ones.

Ok, show me the details of that layout. How is that going to look?

One thing I've come to realize with the current interface is that if
you want to change stuff, you need to iterate over all cpus instead of
writing to a system-wide node.

And, in this case, if you can and need to change the policy per
clock-domain, I wouldn't make it needlessly too-granulary per-cpu.

That's why I'm advocating the cpu/cpufreq/ path.

> Yes, and that's why cpu/cpu*/cpufreq/ondemand/*** suits the best, with
> exactly the same logic that went for P-states or cpufreq-stats.

See above.

> Hmm.. confused..
> Consider two systems:
> - A dual core system, with cores sharing clocks.
> - A dual cluster system (dual core per cluster), with separate clocks
> per cluster.
> 
> Where will you keep governor directories for both of these configurations?

Easy: as said above, make the policy granularity per clock-domain. On
systems which have only one set of P-states - like it is the case with
the overwhelming majority of systems running linux now - nothing should
change.

> We need to select only one... cpu/cpufreq doesn't suit the second case
> at all as we need to use ondemand governor for both the clusters but
> with separate tunables. And so a single cpu/cpufreq/ondemand directory
> wouldn't solve the issue.

Think of it this way: what is the highest granularity you need per
clock-domain? If you want to control the policy per clock-domain, then
cpu/cpufreq/ is what you want. If you want finer-grained control -
and you need to think hard of what use cases are sensible for that
finer-grained solution - then you're better off with cpu/cpu*/ layout.

In both cases though, having clear examples of why you've come up with
the layout you're advocating would help reviewers a lot. If you simply
come and say we need this because there might be systems out there who
could use it, then that probably is not going to get you that far.

HTH.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 15:05                     ` Borislav Petkov
@ 2013-02-04 15:37                       ` Viresh Kumar
  2013-02-04 16:50                         ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-04 15:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On 4 February 2013 20:35, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 07:51:33PM +0530, Viresh Kumar wrote:
>> We correlate things with cpus rather than policies and so the current
>> directory structure of cpu/cpu*/cpufreq/*** is the best suited ones.
>
> Ok, show me the details of that layout. How is that going to look?

I don't have board right now to take the snapshot, but it would be
like:

$ tree /sys/devices/system/cpu/cpu0/cpufreq/
/sys/devices/system/cpu/cpu0/cpufreq/
├── affected_cpus
├── bios_limit
├── cpb
├── cpuinfo_cur_freq
├── cpuinfo_max_freq
├── cpuinfo_min_freq
├── cpuinfo_transition_latency
├── related_cpus
├── scaling_available_frequencies
├── scaling_available_governors
├── scaling_cur_freq
├── scaling_driver
├── scaling_governor
├── scaling_max_freq
├── scaling_min_freq
├── scaling_setspeed
└── stats
    ├── time_in_state
    ├── total_trans
    └── trans_table
└── ondemand
    ├── sampling_rate
    ├── up_threshold
    └── ignore_nice
etc..

> One thing I've come to realize with the current interface is that if
> you want to change stuff, you need to iterate over all cpus instead of
> writing to a system-wide node.

Not really. Following is the way by which cpu/cpu*/cpufreq directories
are created:

For policy->cpu:
	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
				   &dev->kobj, "cpufreq");

This creates cpufreq directory for policy in policy->cpu...

For all other cpus in policy->cpus, we do:
		ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj,
					"cpufreq");

And so whatever gets added in cpu/cpu0/cpufreq directory is reflected in
all other policy->cpus.

> And, in this case, if you can and need to change the policy per
> clock-domain, I wouldn't make it needlessly too-granulary per-cpu.
>
> That's why I'm advocating the cpu/cpufreq/ path.

Its already like this, i.e. per policy or clock-domain. Other cpus just have a
link. And that's why in my code, i just add governor directory in policy->cpu's
cpufreq directory and it gets reflected in other cpus of policy->cpus.

That's why i said P-states as policy tunables.

>> Hmm.. confused..
>> Consider two systems:
>> - A dual core system, with cores sharing clocks.
>> - A dual cluster system (dual core per cluster), with separate clocks
>> per cluster.
>>
>> Where will you keep governor directories for both of these configurations?
>
> Easy: as said above, make the policy granularity per clock-domain. On
> systems which have only one set of P-states - like it is the case with
> the overwhelming majority of systems running linux now - nothing should
> change.

Currently its not per policy, but single instance of any governor is supported.
And it is present in cpu/cpufreq . That's why i said earlier, it isn't the right
place for governor's directory. It is very much related to a policy or
clock-domain.

>> We need to select only one... cpu/cpufreq doesn't suit the second case
>> at all as we need to use ondemand governor for both the clusters but
>> with separate tunables. And so a single cpu/cpufreq/ondemand directory
>> wouldn't solve the issue.
>
> Think of it this way: what is the highest granularity you need per
> clock-domain? If you want to control the policy per clock-domain, then
> cpu/cpufreq/ is what you want. If you want finer-grained control -
> and you need to think hard of what use cases are sensible for that
> finer-grained solution - then you're better off with cpu/cpu*/ layout.

I want to control it over clock-domain, but can't get that in cpu/cpufreq/.
Policies don't have numbers assigned to them.

> In both cases though, having clear examples of why you've come up with
> the layout you're advocating would help reviewers a lot. If you simply
> come and say we need this because there might be systems out there who
> could use it, then that probably is not going to get you that far.

So, i am working on ARM's big.LITTLE system where we have two clusters.
One of A15s and other of A7s. Because of their different power ratings or
performance figures, we need to have separate set of ondemand tunables
for them. And hence this patch. Though this patch is required for any
multi-cluster system.

--
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 15:37                       ` Viresh Kumar
@ 2013-02-04 16:50                         ` Borislav Petkov
  2013-02-05  7:20                           ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-04 16:50 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Mon, Feb 04, 2013 at 09:07:11PM +0530, Viresh Kumar wrote:
> I don't have board right now to take the snapshot, but it would be
> like:
> 
> $ tree /sys/devices/system/cpu/cpu0/cpufreq/
> /sys/devices/system/cpu/cpu0/cpufreq/
> ├── affected_cpus
> ├── bios_limit
> ├── cpb
> ├── cpuinfo_cur_freq
> ├── cpuinfo_max_freq
> ├── cpuinfo_min_freq
> ├── cpuinfo_transition_latency
> ├── related_cpus
> ├── scaling_available_frequencies
> ├── scaling_available_governors
> ├── scaling_cur_freq
> ├── scaling_driver
> ├── scaling_governor
> ├── scaling_max_freq
> ├── scaling_min_freq
> ├── scaling_setspeed
> └── stats
>     ├── time_in_state
>     ├── total_trans
>     └── trans_table
> └── ondemand
>     ├── sampling_rate
>     ├── up_threshold
>     └── ignore_nice

So this is adding the current governor as a per-cpu thing.

> > One thing I've come to realize with the current interface is that if
> > you want to change stuff, you need to iterate over all cpus instead of
> > writing to a system-wide node.
> 
> Not really. Following is the way by which cpu/cpu*/cpufreq directories
> are created:

That's not what I meant - I meant from userspace:

for $i in $(grep processor /proc/cpuinfo | awk '{ print $3 }');
do
	echo "performance" > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor;
done

Instead of

echo "performance" > /sys/devices/system/cpu/cpufreq/scaling_governor

which is hypothetical but sets it for the whole system without fuss.

[ … ]

> I want to control it over clock-domain, but can't get that in cpu/cpufreq/.
> Policies don't have numbers assigned to them.

So, give them names.

> So, i am working on ARM's big.LITTLE system where we have two
> clusters. One of A15s and other of A7s. Because of their different
> power ratings or performance figures, we need to have separate set of
> ondemand tunables for them. And hence this patch. Though this patch is
> required for any multi-cluster system.

So you want this (values after "="):

cpu/cpufreq/
|-> policy0
    |-> name		= A15
    |-> min_freq	= ...
    |-> max_freq	= ...
    |-> affected_cpus	= 0,1,2,...
    |-> ondemand
        |-> sampling_rate
	|-> up_threshold
	|-> ignore_nice
    ...
|-> policy1
    |-> name		= A7
    |-> min_freq	= ...
    |-> max_freq	= ...
    |-> affected_cpus	= n,n+1,n+2,...
    |-> performance
        |-> sampling_rate
	|-> up_threshold
	|-> ignore_nice
    ...

Other arches create other policies and that's it. If you need another
policy added to the set, you simply add 'policyN++' and that's it.

I think this is cleaner but whatever - I don't care that much. My
only strong concern is that this thing should be a Kconfig option and
optional for arches where it doesn't apply.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 16:50                         ` Borislav Petkov
@ 2013-02-05  7:20                           ` Viresh Kumar
  2013-02-05  9:15                             ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05  7:20 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau

On Mon, Feb 4, 2013 at 10:20 PM, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 09:07:11PM +0530, Viresh Kumar wrote:
>> └── ondemand
>>     ├── sampling_rate
>>     ├── up_threshold
>>     └── ignore_nice
>
> So this is adding the current governor as a per-cpu thing.

Its per policy, but yes it is replicated into all cpus as all policy->cpus
share the same directory.

>> > One thing I've come to realize with the current interface is that if
>> > you want to change stuff, you need to iterate over all cpus instead of
>> > writing to a system-wide node.
>>
>> Not really. Following is the way by which cpu/cpu*/cpufreq directories
>> are created:
>
> That's not what I meant - I meant from userspace:
>
> for $i in $(grep processor /proc/cpuinfo | awk '{ print $3 }');
> do
>         echo "performance" > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor;
> done
>
> Instead of
>
> echo "performance" > /sys/devices/system/cpu/cpufreq/scaling_governor
>
> which is hypothetical but sets it for the whole system without fuss.

We actually need to do this only for policy->cpu, but yes the user might not
be well aware of policy cpu-groups and may do what you wrote.

But that is true even today, when we need to change any policy tunables or
P-states.

>> I want to control it over clock-domain, but can't get that in cpu/cpufreq/.
>> Policies don't have numbers assigned to them.
>
> So, give them names.

IMHO, names doesn't suit policies.

>> So, i am working on ARM's big.LITTLE system where we have two
>> clusters. One of A15s and other of A7s. Because of their different
>> power ratings or performance figures, we need to have separate set of
>> ondemand tunables for them. And hence this patch. Though this patch is
>> required for any multi-cluster system.
>
> So you want this (values after "="):
>
> cpu/cpufreq/
> |-> policy0
>     |-> name            = A15
>     |-> min_freq        = ...
>     |-> max_freq        = ...
>     |-> affected_cpus   = 0,1,2,...
>     |-> ondemand
>         |-> sampling_rate
>         |-> up_threshold
>         |-> ignore_nice
>     ...
> |-> policy1
>     |-> name            = A7
>     |-> min_freq        = ...
>     |-> max_freq        = ...
>     |-> affected_cpus   = n,n+1,n+2,...
>     |-> performance
>         |-> sampling_rate
>         |-> up_threshold
>         |-> ignore_nice
>     ...

We may have two clusters of A7's also, in a non-big little arch. Then
these names become very confusing.

> Other arches create other policies and that's it. If you need another
> policy added to the set, you simply add 'policyN++' and that's it.

For me, adding policy->names per arch is increasing complexity without
much gain. We already have an existing infrastructure where this info
is present inside cpus and that looks good to me.

> I think this is cleaner but whatever - I don't care that much. My
> only strong concern is that this thing should be a Kconfig option and
> optional for arches where it doesn't apply.

Your concern is: we don't want to fix userspace for existing platforms
where we have just a single cluster and so struct policy in the system.

so, a good solution instead of Kconfig stuff is, add another field in policy
structure that would be filled by platform specific cpufreq drivers init()
routine. Based on which we can decide to put governor directory in
cpu/cpufreq/gov-name or cpu/cpu*/cpufreq/gov-name.

But i am not sure if keeping both kind of directory locations for separate
platforms is a good idea. Userspace needs to adapt to these changes as
well for multi-cluster platforms.

@Rafael: you know about any other multi-cluster/multi-clock-domain
platform leaving big.LITTLE, where we have multiple structs policy?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05  7:20                           ` Viresh Kumar
@ 2013-02-05  9:15                             ` Borislav Petkov
  2013-02-05  9:47                               ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05  9:15 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau

On Tue, Feb 05, 2013 at 12:50:31PM +0530, Viresh Kumar wrote:
> > I think this is cleaner but whatever - I don't care that much. My
> > only strong concern is that this thing should be a Kconfig option and
> > optional for arches where it doesn't apply.
> 
> Your concern is: we don't want to fix userspace for existing platforms
> where we have just a single cluster and so struct policy in the system.

No, as I said so many times already and you're unwilling to understand
it: multiple policies support in cpufreq should be optional and
selectable in Kconfig so that systems which don't need that, don't
have to see or use it. It is yet another feature which doesn't apply
universally so we make such features optional. Like the rest of the
gazillion things in the kernel already.

The existing sysfs layout cannot be changed because you're breaking
userspace and we don't do that. It is that simple.

Concerning adding new sysfs entries, I told you to make it as easy as
possible and as sensible as possible, dictated by the use cases. If you
can't come up with some, then talk to the people who are going to use
your design and ask them what makes sense the most.

*Then* write the code.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 13:36             ` Borislav Petkov
  2013-02-04 13:58               ` Viresh Kumar
@ 2013-02-05  9:36               ` Viresh Kumar
  2013-02-05 11:29                 ` Charles Garcia-Tobin
  1 sibling, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05  9:36 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Charles Garcia-Tobin

On 4 February 2013 19:06, Borislav Petkov <bp@alien8.de> wrote:
> On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:

>> Its not only for multicluster system, but a system where multiple cpus
>> have separate clock control and hence multiple policy structures.
>
> What are those systems? Examples?

Qualcomm's ARM based "krait". Currently shipping in millions of Android
phones.

http://en.wikipedia.org/wiki/Krait_(CPU)

Thanks Charles for pointing it out, I knew there is one :)

--
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05  9:15                             ` Borislav Petkov
@ 2013-02-05  9:47                               ` Viresh Kumar
  2013-02-05 10:27                                 ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05  9:47 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On 5 February 2013 14:45, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Feb 05, 2013 at 12:50:31PM +0530, Viresh Kumar wrote:
>> > I think this is cleaner but whatever - I don't care that much. My
>> > only strong concern is that this thing should be a Kconfig option and
>> > optional for arches where it doesn't apply.
>>
>> Your concern is: we don't want to fix userspace for existing platforms
>> where we have just a single cluster and so struct policy in the system.
>
> No, as I said so many times already and you're unwilling to understand
> it:

I am willing to, but not able to :)

> multiple policies support in cpufreq should be optional and
> selectable in Kconfig so that systems which don't need that, don't
> have to see or use it. It is yet another feature which doesn't apply
> universally so we make such features optional. Like the rest of the
> gazillion things in the kernel already.

I understand what Kconfig options are for, but i am not able to understand
what's the benefit of this option here. For example: for single image solutions
we need to keep it enabled. And so, would need some sort of logic in cpufreq
core & platform driver to decide where to create the governors directory.

The code without Kconfig option would be as simple as:

platform_driver:
init(struct cpufreq_policy *policy)
{
..
    policy->have_multiple_policies = true;
..
}

cpufreq-core:

add_dev()
{
    if (policy->have_multiple_policies)
        create-folder-in-cpu/cpu*/cpufreq;
    else
        create-folder-in-cpu/cpufreq;
}

And so, platforms like Krait or big.LITTLE can set it to true from their
cpufreq-drivers. And this wouldn't break any of the current platforms.

> The existing sysfs layout cannot be changed because you're breaking
> userspace and we don't do that. It is that simple.

That's fine. I understood it already. :)

The problem i see is:
- both governor tunables, cpufreq-stats & policy tunables (P-states) have the
same requirement. They are all per policy or clock-domain, instead of per cpu.
- I want to keep all of these at the same place, as they should be
present in the
same hierarchy.
- If we move everything to cpu/cpufreq/policy-names/ then also we would break
existing userspace stuff for stats and P-states.
- If we move everything to cpu/cpu*/cpufreq/ then also we would break
existing userspace stuff for governors.

> Concerning adding new sysfs entries, I told you to make it as easy as
> possible and as sensible as possible, dictated by the use cases. If you
> can't come up with some, then talk to the people who are going to use
> your design and ask them what makes sense the most.

For me cpu/cpu*/ is the most sensible as it is an very easy/convenient interface
for users. I am the first one who is going to use it :)

@Rafael: What's your view on this discussion we are having? We probably need few
more "minds" to jump in, as we are not moving towards a conclusion. :)

--
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05  9:47                               ` Viresh Kumar
@ 2013-02-05 10:27                                 ` Borislav Petkov
  2013-02-05 10:43                                   ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 10:27 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On Tue, Feb 05, 2013 at 03:17:21PM +0530, Viresh Kumar wrote:
> > multiple policies support in cpufreq should be optional and
> > selectable in Kconfig so that systems which don't need that, don't
> > have to see or use it. It is yet another feature which doesn't apply
> > universally so we make such features optional. Like the rest of the
> > gazillion things in the kernel already.
> 
> I understand what Kconfig options are for, but i am not able to understand
> what's the benefit of this option here.

Are you kidding me? You're simply not reading what I'm saying to you:
"... should be optional and selectable in Kconfig so that systems which
don't need that, don't have to see or use it." Because on those systems
it doesn't apply.

How about we add an x86-specific extension which is a big wad of code
and is needlessly run on ARM just because it is easier?

That's why we do config options, so that code which doesn't apply on a
specific system, doesn't run on it.

Goddammit, how hard is to understand that??!

> For example: for single image solutions we need to keep it enabled.

So keep it enabled!

> And so, would need some sort of logic in cpufreq core & platform
> driver to decide where to create the governors directory.
>
> The code without Kconfig option would be as simple as:

> 
> platform_driver:
> init(struct cpufreq_policy *policy)
> {
> ..
>     policy->have_multiple_policies = true;
> ..
> }
> 
> cpufreq-core:
> 
> add_dev()
> {
>     if (policy->have_multiple_policies)
>         create-folder-in-cpu/cpu*/cpufreq;
>     else
>         create-folder-in-cpu/cpufreq;

Yes, this is how it could be done.

And this is what I mean by making it optional:

You go and abstract away the "create-folder-in-cpu/cpu*/cpufreq"
functionality. If this is a function called
add_additional_sysfs_entries(), for example, it should do nothing when
CONFIG_CPUFREQ_MULTIPLE_POLICIES is disabled. Otherwise, it will do your
dance:

#ifdef CONFIG_CPUFREQ_MULTIPLE_POLICIES
static int add_additional_sysfs_entries(...)
{

	do all stuff required for multiple policies

}
#else /* CONFIG_CPUFREQ_MULTIPLE_POLICIES */
static int add_additional_sysfs_entries(...)
{
	return 0;
}
#endif /* CONFIG_CPUFREQ_MULTIPLE_POLICIES */

and all the rest of the stuff which is needed for multiple system
policies, should be abstracted that way, more functions added, whatever.
If it is starting to become more, you can create your own compilation
unit. And so on and so on... the kernel is full of examples how to do
stuff like that.

> And so, platforms like Krait or big.LITTLE can set it to true from their
> cpufreq-drivers. And this wouldn't break any of the current platforms.
> 
> > The existing sysfs layout cannot be changed because you're breaking
> > userspace and we don't do that. It is that simple.
> 
> That's fine. I understood it already. :)

Not really, you obviously didn't.

> The problem i see is:
> - both governor tunables, cpufreq-stats & policy tunables (P-states) have the
> same requirement. They are all per policy or clock-domain, instead of per cpu.
> - I want to keep all of these at the same place, as they should be
> present in the
> same hierarchy.
> - If we move everything to cpu/cpufreq/policy-names/ then also we would break
> existing userspace stuff for stats and P-states.
> - If we move everything to cpu/cpu*/cpufreq/ then also we would break
> existing userspace stuff for governors.

No, you're not allowed to change existing sysfs layout. FULLSTOP.

Simply add the new stuff to cpu/cpu*/cpufreq/ with code which
is enabled when CONFIG_CPUFREQ_MULTIPLE_POLICIES is set. If
CONFIG_CPUFREQ_MULTIPLE_POLICIES is not enabled, nothing changes.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 10:27                                 ` Borislav Petkov
@ 2013-02-05 10:43                                   ` Viresh Kumar
  2013-02-05 11:04                                     ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05 10:43 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Charles Garcia-Tobin

On 5 February 2013 15:57, Borislav Petkov <bp@alien8.de> wrote:
> Are you kidding me? You're simply not reading what I'm saying to you:
> "... should be optional and selectable in Kconfig so that systems which
> don't need that, don't have to see or use it." Because on those systems
> it doesn't apply.
>
> How about we add an x86-specific extension which is a big wad of code
> and is needlessly run on ARM just because it is easier?

There isn't lot of code that we have to keep inside the macro you suggest.
Its just an if else (with single line block), which would give the parent
kobject. Nothing else.

I didn't wanted to create a macro for just that. For me an if/else is not that
big code.

Anyway, if nobody else comes on my side i can create that macro for you.
But, personally i would prefer code without such macros.

--
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 10:43                                   ` Viresh Kumar
@ 2013-02-05 11:04                                     ` Borislav Petkov
  2013-02-05 11:12                                       ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 11:04 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On Tue, Feb 05, 2013 at 04:13:23PM +0530, Viresh Kumar wrote:
> There isn't lot of code that we have to keep inside the macro you
> suggest. Its just an if else (with single line block), which would
> give the parent kobject. Nothing else.
>
> I didn't wanted to create a macro for just that. For me an if/else is
> not that big code.

Yeah, I imagine for you it isn't, no.

> Anyway, if nobody else comes on my side i can create that macro for you.
> But, personally i would prefer code without such macros.

Here's an even cleaner way:

platform_driver:
init(struct cpufreq_policy *policy)
{
	...

	add_additional_sysfs_entries(policy);

	...
}

...

static void add_additional_sysfs_entries(struct cpufreq_policy *policy)
{
#ifdef CONFIG_CPUFREQ_MULTIPLE_POLICIES
	create-folder-in-cpu/cpu*/cpufreq;
	...
#endif
}

and the platform driver will have in its Kconfig section:

config CPUFREQ_PLATFORM_DRIVER_X
	...
	select CPUFREQ_MULTIPLE_POLICIES


You don't need the policy->have_multiple_policies member even.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:04                                     ` Borislav Petkov
@ 2013-02-05 11:12                                       ` Viresh Kumar
  2013-02-05 11:19                                         ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05 11:12 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On 5 February 2013 16:34, Borislav Petkov <bp@alien8.de> wrote:
> Here's an even cleaner way:
>
> platform_driver:
> init(struct cpufreq_policy *policy)
> {
>         ...
>
>         add_additional_sysfs_entries(policy);
>
>         ...
> }
>
> ...
>
> static void add_additional_sysfs_entries(struct cpufreq_policy *policy)
> {
> #ifdef CONFIG_CPUFREQ_MULTIPLE_POLICIES
>         create-folder-in-cpu/cpu*/cpufreq;
>         ...
> #endif
> }
>
> and the platform driver will have in its Kconfig section:
>
> config CPUFREQ_PLATFORM_DRIVER_X
>         ...
>         select CPUFREQ_MULTIPLE_POLICIES
>
>
> You don't need the policy->have_multiple_policies member even.

Tricky part is the name of this routine: add_additional_sysfs_entries().

I am not proposing additional set of directories inside cpu/cpu*/cpufreq/
but either of cpu/cpufreq/ or cpu/cpu*/cpufreq/ for governor.

And so, keeping that additional variable looks a better solution. Let me
get a patch with this change only and see how it looks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:12                                       ` Viresh Kumar
@ 2013-02-05 11:19                                         ` Borislav Petkov
  2013-02-05 11:26                                           ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 11:19 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On Tue, Feb 05, 2013 at 04:42:23PM +0530, Viresh Kumar wrote:
> Tricky part is the name of this routine: add_additional_sysfs_entries().

Now you're just being silly - this is just an example how to do it. If
you want me to do it for ya, you need to send me your monthly salary.

> And so, keeping that additional variable looks a better solution.

Yeah, I don't think you understand me at all. :(

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:19                                         ` Borislav Petkov
@ 2013-02-05 11:26                                           ` Viresh Kumar
  2013-02-05 11:32                                             ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05 11:26 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Charles Garcia-Tobin

On 5 February 2013 16:49, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Feb 05, 2013 at 04:42:23PM +0530, Viresh Kumar wrote:
>> Tricky part is the name of this routine: add_additional_sysfs_entries().
>
> Now you're just being silly - this is just an example how to do it. If
> you want me to do it for ya, you need to send me your monthly salary.

Haha... I don't want you to do it. I don't want such routine to be exposed
to cpufreq drivers as this belongs to the core code.

Just some kind of indication from platform driver is required about
how/where it wants its governor directory to be present.
And the variable suits more here.

Lets discuss it more on the next patch i send :)

--
viresh

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05  9:36               ` Viresh Kumar
@ 2013-02-05 11:29                 ` Charles Garcia-Tobin
  2013-02-05 11:39                   ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Charles Garcia-Tobin @ 2013-02-05 11:29 UTC (permalink / raw)
  To: Viresh Kumar, Borislav Petkov, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, Robin Randhawa,
	Steve Bannister, Liviu Dudau

>
> Qualcomm's ARM based "krait". Currently shipping in millions of Android
> phones.
>
> http://en.wikipedia.org/wiki/Krait_(CPU)
>
> Thanks Charles for pointing it out, I knew there is one :)
>
> --
> viresh

> On 4 February 2013 19:06, Borislav Petkov <bp@alien8.de> wrote:
> > On Mon, Feb 04, 2013 at 06:55:25PM +0530, Viresh Kumar wrote:
>
> >> Its not only for multicluster system, but a system where multiple cpus
> >> have separate clock control and hence multiple policy structures.
> >
> > What are those systems? Examples?
>
> Qualcomm's ARM based "krait". Currently shipping in millions of Android
> phones.
>
> http://en.wikipedia.org/wiki/Krait_(CPU)
>
> Thanks Charles for pointing it out, I knew there is one :)
>
> --
> viresh

Actually shooting myself in the foot here, Krait is not such a great example because although you can use difference between frequencies you are less likely to use different tunables (not inconceivable but unlikely). The best examples systems are multi cluster and hereterogeneous systems, like the recently announced Samsung Exynos 5 octa http://en.wikipedia.org/wiki/Exynos_(system_on_chip). We will see more systems like this appearing, sporting low power cores combined with high performance ones, all running at the same time. I appreciate this is all very new, but more will come, and the requirement to have different tunables per cluster is very real. In ARM on our own multi cluster test chip, using an experimental version of this approach, we have seen good improvements in power consumption without compromising performance.

(Apologies ahead for any bit my mail server appends, not much I can do about it)

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:26                                           ` Viresh Kumar
@ 2013-02-05 11:32                                             ` Borislav Petkov
  2013-02-05 12:24                                               ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 11:32 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On Tue, Feb 05, 2013 at 04:56:03PM +0530, Viresh Kumar wrote:
> Just some kind of indication from platform driver is required about
> how/where it wants its governor directory to be present.

The indication is this:

config CPUFREQ_PLATFORM_DRIVER_X
        ...
        select CPUFREQ_MULTIPLE_POLICIES

You really need to slow down and really look at what I'm proposing.

> And the variable suits more here.

And this variable means nothing on other systems so why add it in the
first place?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:29                 ` Charles Garcia-Tobin
@ 2013-02-05 11:39                   ` Borislav Petkov
  2013-02-05 18:38                     ` Charles Garcia-Tobin
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 11:39 UTC (permalink / raw)
  To: Charles Garcia-Tobin
  Cc: Viresh Kumar, Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel,
	linaro-dev, Robin Randhawa, Steve Bannister, Liviu Dudau

On Tue, Feb 05, 2013 at 11:29:04AM +0000, Charles Garcia-Tobin wrote:
> Actually shooting myself in the foot here, Krait is not such a great
> example because although you can use difference between frequencies
> you are less likely to use different tunables (not inconceivable
> but unlikely). The best examples systems are multi cluster and
> hereterogeneous systems, like the recently announced Samsung Exynos 5
> octa http://en.wikipedia.org/wiki/Exynos_(system_on_chip). We will see
> more systems like this appearing, sporting low power cores combined
> with high performance ones, all running at the same time. I appreciate
> this is all very new, but more will come, and the requirement to have
> different tunables per cluster is very real. In ARM on our own multi
> cluster test chip, using an experimental version of this approach, we
> have seen good improvements in power consumption without compromising
> performance.

Ok, thanks for giving this insight, this is useful.

Question: do you need the granularity of that control to be per cpu
(with that I mean what linux understands under "cpu," i.e. logical or
physical core) or does one governor suffice per a set of cores, or as
you call it, a cluster?

> (Apologies ahead for any bit my mail server appends, not much I can do
> about it)

Yeah, my condolences :-)

> -- IMPORTANT NOTICE: The contents of this email and any attachments
> are confidential and may also be privileged. If you are not the
> intended recipient, please notify the sender immediately and do not
> disclose the contents to any other person, use it for any purpose, or
> store or copy the information in any medium. Thank you.

Leaving it in, in case you haven't seen how it looks like :-)

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:32                                             ` Borislav Petkov
@ 2013-02-05 12:24                                               ` Viresh Kumar
  2013-02-05 13:22                                                 ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05 12:24 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Charles Garcia-Tobin

On 5 February 2013 17:02, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Feb 05, 2013 at 04:56:03PM +0530, Viresh Kumar wrote:
>> Just some kind of indication from platform driver is required about
>> how/where it wants its governor directory to be present.
>
> The indication is this:
>
> config CPUFREQ_PLATFORM_DRIVER_X
>         ...
>         select CPUFREQ_MULTIPLE_POLICIES
>
> You really need to slow down and really look at what I'm proposing.

This indication isn't enough. On a single image solution, we need to
identify the system which needs support for multiple policies and
i still feel we need that variable type indication :)

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 12:24                                               ` Viresh Kumar
@ 2013-02-05 13:22                                                 ` Borislav Petkov
  2013-02-05 13:55                                                   ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 13:22 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel, linaro-dev,
	robin.randhawa, Steve.Bannister, Liviu.Dudau,
	Charles Garcia-Tobin

On Tue, Feb 05, 2013 at 05:54:57PM +0530, Viresh Kumar wrote:q
> This indication isn't enough. On a single image solution, we need to
> identify the system which needs support for multiple policies and i
> still feel we need that variable type indication :)

If the image is going to run also on systems which support only a
single policy, then I guess you can make it a bool, stuff it in struct
cpufreq_policy and ifdef around it.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 13:22                                                 ` Borislav Petkov
@ 2013-02-05 13:55                                                   ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05 13:55 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki, cpufreq,
	linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Charles Garcia-Tobin

On 5 February 2013 18:52, Borislav Petkov <bp@alien8.de> wrote:
> On Tue, Feb 05, 2013 at 05:54:57PM +0530, Viresh Kumar wrote:q
>> This indication isn't enough. On a single image solution, we need to
>> identify the system which needs support for multiple policies and i
>> still feel we need that variable type indication :)
>
> If the image is going to run also on systems which support only a
> single policy, then I guess you can make it a bool, stuff it in struct
> cpufreq_policy and ifdef around it.

That's what i proposed initially.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
                   ` (4 preceding siblings ...)
  2013-02-04 12:17 ` [PATCH 0/4] CPUFreq: " Rafael J. Wysocki
@ 2013-02-05 16:21 ` Viresh Kumar
  2013-02-06  9:58   ` Viresh Kumar
  5 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-05 16:21 UTC (permalink / raw)
  To: rjw, Borislav Petkov
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar, Charles Garcia-Tobin

On 4 February 2013 17:08, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> Currently, there can't be multiple instances of single governor_type. If we have
> a multi-package system, where we have multiple instances of struct policy (per
> package), we can't have multiple instances of same governor. i.e. We can't have
> multiple instances of ondemand governor for multiple packages.
>
> Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
> governor-name/. Which again reflects that there can be only one instance of a
> governor_type in the system.
>
> This is a bottleneck for multicluster system, where we want different packages
> to use same governor type, but with different tunables.
>
> This patchset is inclined towards fixing this issue.

After a long & hot discussion over the changes made by this patchset, following
patches came out:

--------------x------------------x-------------------

commit 15b5548c9ccfb8088270f7574710d9d67edfe33b
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Tue Feb 5 21:29:05 2013 +0530

    cpufreq: Make governors directory sysfs location based on
have_multiple_policies

    Until now directory for governors tunables was getting created in
    cpu/cpufreq/<gov-name>. With the introduction of following patch:
    "cpufreq: governor: Implement per policy instances of governors"

    this directory would be created in
cpu/cpu<num>/cpufreq/<gov-name>. This might
    break userspace of existing platforms. Lets do this change only
for platforms
    which need support for multiple policies and thus above mentioned patch.

    From now on, such platforms would be require to do following from
their init()
    routines:

        policy->have_multiple_policies = true;

    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/cpufreq_governor.c |  2 +-
 include/linux/cpufreq.h            | 14 ++++++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 545ae24..41ee86f 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -300,7 +300,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
                                             dbs_data->cdata->gov_dbs_timer);
                }

-               rc = sysfs_create_group(&policy->kobj,
+               rc = sysfs_create_group(get_governor_parent_kobj(policy),
                                dbs_data->cdata->attr_group);
                if (rc) {
                        mutex_unlock(&dbs_data->mutex);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 5dae7e6..6e1abd2 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -107,6 +107,11 @@ struct cpufreq_policy {
        unsigned int            policy; /* see above */
        struct cpufreq_governor *governor; /* see below */
        void                    *governor_data;
+       /* This should be set by init() of platforms having multiple
+        * clock-domains, i.e.  supporting multiple policies. With this sysfs
+        * directories of governor would be created in cpu/cpu<num>/cpufreq/
+        * directory */
+       bool                    have_multiple_policies;

        struct work_struct      update; /* if update_policy() needs to be
                                         * called, but you're in IRQ context */
@@ -134,6 +139,15 @@ static inline bool policy_is_shared(struct
cpufreq_policy *policy)
        return cpumask_weight(policy->cpus) > 1;
 }

+static inline struct kobject *
+get_governor_parent_kobj(struct cpufreq_policy *policy)
+{
+       if (policy->have_multiple_policies)
+               return &policy->kobj;
+       else
+               return cpufreq_global_kobject;
+}
+
 /******************** cpufreq transition notifiers *******************/

 #define CPUFREQ_PRECHANGE      (0)

--------------x------------------x-------------------

Plus the following patch, though i am still not in favor of this patch.
@Rafael: Please share your opinion too on this one. :)

--------------x------------------x-------------------
commit 1c7e9871fce7388136eda1c86ca75a520e4d3b9d
Author: Viresh Kumar <viresh.kumar@linaro.org>
Date:   Tue Feb 5 21:41:40 2013 +0530

    cpufreq: Add Kconfig option to enable/disable have_multiple_policies

    have_multiple_policies is required by platforms having multiple
clock-domains
    for cpus, i.e. supporting multiple policies for cpus. This patch adds in a
    Kconfig option for enabling execution of this code.

    Requested-by: Borislav Petkov <bp@alien8.de>
    Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 drivers/cpufreq/Kconfig | 3 +++
 include/linux/cpufreq.h | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index cbcb21e..e6e6939 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -23,6 +23,9 @@ config CPU_FREQ_TABLE
 config CPU_FREQ_GOV_COMMON
        bool

+config CPU_FREQ_HAVE_MULTIPLE_POLICIES
+       bool
+
 config CPU_FREQ_STAT
        tristate "CPU frequency translation statistics"
        select CPU_FREQ_TABLE
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 6e1abd2..a092fcb 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -107,11 +107,13 @@ struct cpufreq_policy {
        unsigned int            policy; /* see above */
        struct cpufreq_governor *governor; /* see below */
        void                    *governor_data;
+#ifdef CONFIG_CPU_FREQ_HAVE_MULTIPLE_POLICIES
        /* This should be set by init() of platforms having multiple
         * clock-domains, i.e.  supporting multiple policies. With this sysfs
         * directories of governor would be created in cpu/cpu<num>/cpufreq/
         * directory */
        bool                    have_multiple_policies;
+#endif

        struct work_struct      update; /* if update_policy() needs to be
                                         * called, but you're in IRQ context */
@@ -142,9 +144,11 @@ static inline bool policy_is_shared(struct
cpufreq_policy *policy)
 static inline struct kobject *
 get_governor_parent_kobj(struct cpufreq_policy *policy)
 {
+#ifdef CONFIG_CPU_FREQ_HAVE_MULTIPLE_POLICIES
        if (policy->have_multiple_policies)
                return &policy->kobj;
        else
+#endif
                return cpufreq_global_kobject;
 }


--------------x------------------x-------------------

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* RE: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 11:39                   ` Borislav Petkov
@ 2013-02-05 18:38                     ` Charles Garcia-Tobin
  2013-02-05 18:44                       ` Borislav Petkov
  0 siblings, 1 reply; 44+ messages in thread
From: Charles Garcia-Tobin @ 2013-02-05 18:38 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Viresh Kumar, Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel,
	linaro-dev, Robin Randhawa, Steve Bannister, Liviu Dudau

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 2053 bytes --]

>
> On Tue, Feb 05, 2013 at 11:29:04AM +0000, Charles Garcia-Tobin wrote:
> > Actually shooting myself in the foot here, Krait is not such a great
> > example because although you can use difference between frequencies
> > you are less likely to use different tunables (not inconceivable
> > but unlikely). The best examples systems are multi cluster and
> > hereterogeneous systems, like the recently announced Samsung Exynos 5
> > octa http://en.wikipedia.org/wiki/Exynos_(system_on_chip). We will see
> > more systems like this appearing, sporting low power cores combined
> > with high performance ones, all running at the same time. I appreciate
> > this is all very new, but more will come, and the requirement to have
> > different tunables per cluster is very real. In ARM on our own multi
> > cluster test chip, using an experimental version of this approach, we
> > have seen good improvements in power consumption without compromising
> > performance.
>
> Ok, thanks for giving this insight, this is useful.
>
> Question: do you need the granularity of that control to be per cpu
> (with that I mean what linux understands under "cpu," i.e. logical or
> physical core) or does one governor suffice per a set of cores, or as
> you call it, a cluster?
>

Later. Whatever you'd like to call it, but essentially a set of cpus, as linux understands them, that are logically related by the fact that you'd like to be able to use the same tuning policy and same governor across all of them.

Cheers

Charles

(apologies again for annoying addendums to follow)


-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium.  Thank you.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 18:38                     ` Charles Garcia-Tobin
@ 2013-02-05 18:44                       ` Borislav Petkov
  0 siblings, 0 replies; 44+ messages in thread
From: Borislav Petkov @ 2013-02-05 18:44 UTC (permalink / raw)
  To: Charles Garcia-Tobin
  Cc: Viresh Kumar, Rafael J. Wysocki, cpufreq, linux-pm, linux-kernel,
	linaro-dev, Robin Randhawa, Steve Bannister, Liviu Dudau

On Tue, Feb 05, 2013 at 06:38:07PM +0000, Charles Garcia-Tobin wrote:
> Later. Whatever you'd like to call it, but essentially a set of cpus,
> as linux understands them, that are logically related by the fact that
> you'd like to be able to use the same tuning policy and same governor
> across all of them.

Right, policy will be applied to the whole set, yes, but can you imagine
that per-core settings could also make sense at some point?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-05 16:21 ` Viresh Kumar
@ 2013-02-06  9:58   ` Viresh Kumar
  2013-02-06 10:08     ` Amit Kucheria
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-06  9:58 UTC (permalink / raw)
  To: rjw, Borislav Petkov
  Cc: cpufreq, linux-pm, linux-kernel, linaro-dev, robin.randhawa,
	Steve.Bannister, Liviu.Dudau, Viresh Kumar, Charles Garcia-Tobin

On 5 February 2013 21:51, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> commit 15b5548c9ccfb8088270f7574710d9d67edfe33b
> Author: Viresh Kumar <viresh.kumar@linaro.org>
> Date:   Tue Feb 5 21:29:05 2013 +0530
>
>     cpufreq: Make governors directory sysfs location based on
> have_multiple_policies
>
>     Until now directory for governors tunables was getting created in
>     cpu/cpufreq/<gov-name>. With the introduction of following patch:
>     "cpufreq: governor: Implement per policy instances of governors"
>
>     this directory would be created in
> cpu/cpu<num>/cpufreq/<gov-name>. This might
>     break userspace of existing platforms. Lets do this change only
> for platforms
>     which need support for multiple policies and thus above mentioned patch.
>
>     From now on, such platforms would be require to do following from
> their init()
>     routines:
>
>         policy->have_multiple_policies = true;
>
>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq_governor.c |  2 +-
>  include/linux/cpufreq.h            | 14 ++++++++++++++
>  2 files changed, 15 insertions(+), 1 deletion(-)

Hi Rafael,

Because this patch was quite big (317 insertions(+), 238 deletions(-)), i was
planning a detailed self review to capture any mistakes and luckily i found
one for above patch :)

diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 41ee86f..fe037c0 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -342,7 +342,8 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
                mutex_lock(&dbs_data->mutex);
                mutex_destroy(&cpu_cdbs->timer_mutex);

-               sysfs_remove_group(&policy->kobj, dbs_data->cdata->attr_group);
+               sysfs_remove_group(get_governor_parent_kobj(policy),
+                               dbs_data->cdata->attr_group);
                if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
                        cpufreq_unregister_notifier(cs_ops->notifier_block,
                                        CPUFREQ_TRANSITION_NOTIFIER);

I have pushed the complete patchset here:

http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-06  9:58   ` Viresh Kumar
@ 2013-02-06 10:08     ` Amit Kucheria
  2013-02-06 10:15       ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Amit Kucheria @ 2013-02-06 10:08 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: rjw, Borislav Petkov, Steve Bannister, Lists linaro-dev,
	linux-pm, linux-kernel, cpufreq, Charles Garcia-Tobin

On Wed, Feb 6, 2013 at 3:28 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 5 February 2013 21:51, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> commit 15b5548c9ccfb8088270f7574710d9d67edfe33b
>> Author: Viresh Kumar <viresh.kumar@linaro.org>
>> Date:   Tue Feb 5 21:29:05 2013 +0530
>>
>>     cpufreq: Make governors directory sysfs location based on
>> have_multiple_policies
>>
>>     Until now directory for governors tunables was getting created in
>>     cpu/cpufreq/<gov-name>. With the introduction of following patch:
>>     "cpufreq: governor: Implement per policy instances of governors"
>>
>>     this directory would be created in
>> cpu/cpu<num>/cpufreq/<gov-name>. This might
>>     break userspace of existing platforms. Lets do this change only
>> for platforms
>>     which need support for multiple policies and thus above mentioned patch.
>>
>>     From now on, such platforms would be require to do following from
>> their init()
>>     routines:
>>
>>         policy->have_multiple_policies = true;
>>
>>     Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>> ---
>>  drivers/cpufreq/cpufreq_governor.c |  2 +-
>>  include/linux/cpufreq.h            | 14 ++++++++++++++
>>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> Hi Rafael,
>
> Because this patch was quite big (317 insertions(+), 238 deletions(-)), i was
> planning a detailed self review to capture any mistakes and luckily i found
> one for above patch :)
>
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 41ee86f..fe037c0 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -342,7 +342,8 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
>                 mutex_lock(&dbs_data->mutex);
>                 mutex_destroy(&cpu_cdbs->timer_mutex);
>
> -               sysfs_remove_group(&policy->kobj, dbs_data->cdata->attr_group);
> +               sysfs_remove_group(get_governor_parent_kobj(policy),
> +                               dbs_data->cdata->attr_group);
>                 if (dbs_data->cdata->governor == GOV_CONSERVATIVE)
>                         cpufreq_unregister_notifier(cs_ops->notifier_block,
>                                         CPUFREQ_TRANSITION_NOTIFIER);
>
> I have pushed the complete patchset here:
>
> http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates
>

Viresh, perhaps you should ask Stephen Rothwell to pull in your tree
to get some more testing before Rafael pulls it in for 3.10?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-06 10:08     ` Amit Kucheria
@ 2013-02-06 10:15       ` Viresh Kumar
  2013-02-06 10:38         ` Rafael J. Wysocki
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-06 10:15 UTC (permalink / raw)
  To: Amit Kucheria
  Cc: rjw, Borislav Petkov, Steve Bannister, Lists linaro-dev,
	linux-pm, linux-kernel, cpufreq, Charles Garcia-Tobin

On 6 February 2013 15:38, Amit Kucheria <amit.kucheria@linaro.org> wrote:
> On Wed, Feb 6, 2013 at 3:28 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> I have pushed the complete patchset here:
>>
>> http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates
>>
>
> Viresh, perhaps you should ask Stephen Rothwell to pull in your tree
> to get some more testing before Rafael pulls it in for 3.10?

Its has been made clear by Rafael that these patches wouldn't make it for
3.9 (though i wanted them to :) ), and so once the merge window is over
Rafael might pull them in and so they would reach Stephen's linux-next too...

I am not sure if sending a cpufreq pull request directly to Stephen is
preferred.
@Rafael: ??

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/4] CPUFreq: Implement per policy instances of governors
  2013-02-06 10:15       ` Viresh Kumar
@ 2013-02-06 10:38         ` Rafael J. Wysocki
  0 siblings, 0 replies; 44+ messages in thread
From: Rafael J. Wysocki @ 2013-02-06 10:38 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Amit Kucheria, Borislav Petkov, Steve Bannister,
	Lists linaro-dev, linux-pm, linux-kernel, cpufreq,
	Charles Garcia-Tobin

On Wednesday, February 06, 2013 03:45:58 PM Viresh Kumar wrote:
> On 6 February 2013 15:38, Amit Kucheria <amit.kucheria@linaro.org> wrote:
> > On Wed, Feb 6, 2013 at 3:28 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >> I have pushed the complete patchset here:
> >>
> >> http://git.linaro.org/gitweb?p=people/vireshk/linux.git;a=shortlog;h=refs/heads/cpufreq-updates
> >>
> >
> > Viresh, perhaps you should ask Stephen Rothwell to pull in your tree
> > to get some more testing before Rafael pulls it in for 3.10?
> 
> Its has been made clear by Rafael that these patches wouldn't make it for
> 3.9 (though i wanted them to :) ), and so once the merge window is over
> Rafael might pull them in and so they would reach Stephen's linux-next too...
> 
> I am not sure if sending a cpufreq pull request directly to Stephen is
> preferred.
> @Rafael: ??

You may do that, if you want, but that's slightly confusing.

Also the policy is that material which is not going to be included into v3.9
shouldn't be in linux-next before v3.9-rc1.

Moreover, for build testing it is sufficient to put it into a branch somewhere
at git.kernel.org (as you have already noticed :-)).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors
  2013-02-04 11:38 ` [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors Viresh Kumar
@ 2013-02-10 21:14   ` Francesco Lavra
  2013-02-11  4:16     ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Francesco Lavra @ 2013-02-10 21:14 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: rjw, Steve.Bannister, linaro-dev, linux-pm, linux-kernel, cpufreq

Hi,

On 02/04/2013 12:38 PM, Viresh Kumar wrote:
> Currently, there can't be multiple instances of single governor_type. If we have
> a multi-package system, where we have multiple instances of struct policy (per
> package), we can't have multiple instances of same governor. i.e. We can't have
> multiple instances of ondemand governor for multiple packages.
> 
> Governors directory in sysfs is created at /sys/devices/system/cpu/cpufreq/
> governor-name/. Which again reflects that there can be only one instance of a
> governor_type in the system.
> 
> This is a bottleneck for multicluster system, where we want different packages
> to use same governor type, but with different tunables.
> 
> This patch uses the infrastructure provided by earlier patch and implements
> init/exit routines for ondemand and conservative governors.
> 
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
>  drivers/cpufreq/cpufreq.c              |   4 -
>  drivers/cpufreq/cpufreq_conservative.c | 142 +++++++++++++----------
>  drivers/cpufreq/cpufreq_governor.c     | 138 +++++++++++++---------
>  drivers/cpufreq/cpufreq_governor.h     |  42 ++++---
>  drivers/cpufreq/cpufreq_ondemand.c     | 205 +++++++++++++++++++--------------
>  include/linux/cpufreq.h                |  19 +--
>  6 files changed, 314 insertions(+), 236 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 1ae78d4..7aacfbf 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1551,9 +1551,6 @@ static int __cpufreq_governor(struct cpufreq_policy *policy,
>  						policy->cpu, event);
>  	ret = policy->governor->governor(policy, event);
>  
> -	if (!policy->governor->initialized && (event == CPUFREQ_GOV_START))
> -		policy->governor->initialized = 1;
> -
>  	/* we keep one module reference alive for
>  			each CPU governed by this CPU */
>  	if ((event != CPUFREQ_GOV_START) || ret)
> @@ -1577,7 +1574,6 @@ int cpufreq_register_governor(struct cpufreq_governor *governor)
>  
>  	mutex_lock(&cpufreq_governor_mutex);
>  
> -	governor->initialized = 0;
>  	err = -EBUSY;
>  	if (__find_governor(governor->name) == NULL) {
>  		err = 0;
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
[...]
> +static int cs_init(struct dbs_data *dbs_data)
> +{
> +	struct cs_dbs_tuners *tuners;
> +
> +	tuners = kzalloc(sizeof(struct cs_dbs_tuners), GFP_KERNEL);
> +	if (!tuners) {
> +		pr_err("%s: kzalloc failed\n", __func__);
> +		return -ENOMEM;
> +	}
> +
> +	tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
> +	tuners->down_threshold = DEF_FREQUENCY_DOWN_THRESHOLD;
> +	tuners->sampling_down_factor = DEF_SAMPLING_DOWN_FACTOR;
> +	tuners->ignore_nice = 0;
> +	tuners->freq_step = 5;
> +
> +	dbs_data->tuners = tuners;

dbs_data->tuners is never freed, which means there is a memory leak
across CPUFREQ_GOV_POLICY_INIT and CPUFREQ_GOV_POLICY_EXIT events.

The same goes for the ondemand governor.

Regards,
Francesco

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors
  2013-02-10 21:14   ` Francesco Lavra
@ 2013-02-11  4:16     ` Viresh Kumar
  2013-02-11  4:39       ` Viresh Kumar
  0 siblings, 1 reply; 44+ messages in thread
From: Viresh Kumar @ 2013-02-11  4:16 UTC (permalink / raw)
  To: Francesco Lavra
  Cc: rjw, Steve.Bannister, linaro-dev, linux-pm, linux-kernel, cpufreq

On 11 February 2013 02:44, Francesco Lavra <francescolavra.fl@gmail.com> wrote:
> dbs_data->tuners is never freed, which means there is a memory leak
> across CPUFREQ_GOV_POLICY_INIT and CPUFREQ_GOV_POLICY_EXIT events.
>
> The same goes for the ondemand governor.

Thanks for pointing out. Would be fixed in next version.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors
  2013-02-11  4:16     ` Viresh Kumar
@ 2013-02-11  4:39       ` Viresh Kumar
  0 siblings, 0 replies; 44+ messages in thread
From: Viresh Kumar @ 2013-02-11  4:39 UTC (permalink / raw)
  To: Francesco Lavra
  Cc: rjw, Steve.Bannister, linaro-dev, linux-pm, linux-kernel, cpufreq

On 11 February 2013 09:46, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 11 February 2013 02:44, Francesco Lavra <francescolavra.fl@gmail.com> wrote:
>> dbs_data->tuners is never freed, which means there is a memory leak
>> across CPUFREQ_GOV_POLICY_INIT and CPUFREQ_GOV_POLICY_EXIT events.
>>
>> The same goes for the ondemand governor.
>
> Thanks for pointing out. Would be fixed in next version.

Adding following to the original patch:

diff --git a/drivers/cpufreq/cpufreq_conservative.c
b/drivers/cpufreq/cpufreq_conservative.c
index 6b13f9f..5d03577 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -329,6 +329,11 @@ static int cs_init(struct dbs_data *dbs_data)
        return 0;
 }

+static void cs_exit(struct dbs_data *dbs_data)
+{
+       kfree(dbs_data->tuners);
+}
+
 define_get_cpu_dbs_routines(cs_cpu_dbs_info);

 static struct notifier_block cs_cpufreq_notifier_block = {
@@ -348,6 +353,7 @@ static struct common_dbs_data cs_dbs_cdata = {
        .gov_check_cpu = cs_check_cpu,
        .gov_ops = &cs_ops,
        .init = cs_init,
+       .exit = cs_exit,
 };

 static int cs_cpufreq_governor_dbs(struct cpufreq_policy *policy,
diff --git a/drivers/cpufreq/cpufreq_governor.c
b/drivers/cpufreq/cpufreq_governor.c
index 925f5b3..7722505 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -255,6 +255,7 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy,
                                        latency * LATENCY_MULTIPLIER));
                return 0;
        case CPUFREQ_GOV_POLICY_EXIT:
+               cdata->exit(dbs_data);
                kfree(dbs_data);
                policy->governor_data = NULL;
                return 0;
diff --git a/drivers/cpufreq/cpufreq_governor.h
b/drivers/cpufreq/cpufreq_governor.h
index 8bd8df6..6301790 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -139,6 +139,7 @@ struct common_dbs_data {
        void (*gov_dbs_timer)(struct work_struct *work);
        void (*gov_check_cpu)(int cpu, unsigned int load);
        int (*init)(struct dbs_data *dbs_data);
+       void (*exit)(struct dbs_data *dbs_data);

        /* Governor specific ops, see below */
        void *gov_ops;
diff --git a/drivers/cpufreq/cpufreq_ondemand.c
b/drivers/cpufreq/cpufreq_ondemand.c
index 5eda540..f2539e0 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -525,6 +525,11 @@ static int od_init(struct dbs_data *dbs_data)
        return 0;
 }

+static void od_exit(struct dbs_data *dbs_data)
+{
+       kfree(dbs_data->tuners);
+}
+
 define_get_cpu_dbs_routines(od_cpu_dbs_info);

 static struct od_ops od_ops = {
@@ -542,6 +547,7 @@ static struct common_dbs_data od_dbs_cdata = {
        .gov_check_cpu = od_check_cpu,
        .gov_ops = &od_ops,
        .init = od_init,
+       .exit = od_exit,
 };

 static int od_cpufreq_governor_dbs(struct cpufreq_policy *policy,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2013-02-11  4:39 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-04 11:38 [PATCH 0/4] CPUFreq: Implement per policy instances of governors Viresh Kumar
2013-02-04 11:38 ` [PATCH 1/4] cpufreq: Don't check cpu_online(policy->cpu) Viresh Kumar
2013-02-04 11:38 ` [PATCH 2/4] cpufreq: stats: Get rid of CPUFREQ_STATDEVICE_ATTR Viresh Kumar
2013-02-04 11:38 ` [PATCH 3/4] cpufreq: Add per policy governor-init/exit infrastructure Viresh Kumar
2013-02-04 11:38 ` [PATCH 4/4] cpufreq: governor: Implement per policy instances of governors Viresh Kumar
2013-02-10 21:14   ` Francesco Lavra
2013-02-11  4:16     ` Viresh Kumar
2013-02-11  4:39       ` Viresh Kumar
2013-02-04 12:17 ` [PATCH 0/4] CPUFreq: " Rafael J. Wysocki
2013-02-04 12:24   ` Viresh Kumar
2013-02-04 12:32     ` Borislav Petkov
2013-02-04 12:54       ` Viresh Kumar
2013-02-04 13:04         ` Borislav Petkov
2013-02-04 13:25           ` Viresh Kumar
2013-02-04 13:36             ` Borislav Petkov
2013-02-04 13:58               ` Viresh Kumar
2013-02-04 14:09                 ` Borislav Petkov
2013-02-04 14:21                   ` Viresh Kumar
2013-02-04 15:05                     ` Borislav Petkov
2013-02-04 15:37                       ` Viresh Kumar
2013-02-04 16:50                         ` Borislav Petkov
2013-02-05  7:20                           ` Viresh Kumar
2013-02-05  9:15                             ` Borislav Petkov
2013-02-05  9:47                               ` Viresh Kumar
2013-02-05 10:27                                 ` Borislav Petkov
2013-02-05 10:43                                   ` Viresh Kumar
2013-02-05 11:04                                     ` Borislav Petkov
2013-02-05 11:12                                       ` Viresh Kumar
2013-02-05 11:19                                         ` Borislav Petkov
2013-02-05 11:26                                           ` Viresh Kumar
2013-02-05 11:32                                             ` Borislav Petkov
2013-02-05 12:24                                               ` Viresh Kumar
2013-02-05 13:22                                                 ` Borislav Petkov
2013-02-05 13:55                                                   ` Viresh Kumar
2013-02-05  9:36               ` Viresh Kumar
2013-02-05 11:29                 ` Charles Garcia-Tobin
2013-02-05 11:39                   ` Borislav Petkov
2013-02-05 18:38                     ` Charles Garcia-Tobin
2013-02-05 18:44                       ` Borislav Petkov
2013-02-05 16:21 ` Viresh Kumar
2013-02-06  9:58   ` Viresh Kumar
2013-02-06 10:08     ` Amit Kucheria
2013-02-06 10:15       ` Viresh Kumar
2013-02-06 10:38         ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).