linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cpufreq: stats: Make the stats code non-modular
@ 2016-05-25 22:23 Rafael J. Wysocki
  2016-05-26  4:59 ` Viresh Kumar
  0 siblings, 1 reply; 7+ messages in thread
From: Rafael J. Wysocki @ 2016-05-25 22:23 UTC (permalink / raw)
  To: Linux PM list, Viresh Kumar
  Cc: Linux Kernel Mailing List, Srinivas Pandruvada

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The modularity of cpufreq_stats is quite problematic.

First off, the usage of policy notifiers for the initialization
and cleanup in the cpufreq_stats module is inherently racy with
respect to CPU offline/online and the initialization and cleanup
of the cpufreq driver.

Second, fast frequency switching (used by the schedutil governor)
cannot be enabled if any transition notifiers are registered, so
if the cpufreq_stats module (that registers a transition notifier
for updating transition statistics) is loaded, the schedutil governor
cannot use fast frequency switching.

On the other hand, allowing cpufreq_stats to be built as a module
doesn't really add much value.  Arguably, there's not much reason
for that code to be modular at all.

For the above reasons, make the cpufreq stats code non-modular,
modify the core to invoke functions provided by that code directly
and drop the notifiers from it.

While at it, clean up Kconfig help for the CPU_FREQ_STAT and
CPU_FREQ_STAT_DETAILS options.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/Kconfig         |   13 +--
 drivers/cpufreq/cpufreq.c       |    4 +
 drivers/cpufreq/cpufreq_stats.c |  146 +++-------------------------------------
 include/linux/cpufreq.h         |   12 +++
 4 files changed, 34 insertions(+), 141 deletions(-)

Index: linux-pm/drivers/cpufreq/Kconfig
===================================================================
--- linux-pm.orig/drivers/cpufreq/Kconfig
+++ linux-pm/drivers/cpufreq/Kconfig
@@ -31,23 +31,18 @@ config CPU_FREQ_BOOST_SW
 	depends on THERMAL
 
 config CPU_FREQ_STAT
-	tristate "CPU frequency translation statistics"
+	bool "CPU frequency transition statistics"
 	default y
 	help
-	  This driver exports CPU frequency statistics information through sysfs
-	  file system.
-
-	  To compile this driver as a module, choose M here: the
-	  module will be called cpufreq_stats.
+	  Export CPU frequency statistics information through sysfs.
 
 	  If in doubt, say N.
 
 config CPU_FREQ_STAT_DETAILS
-	bool "CPU frequency translation statistics details"
+	bool "CPU frequency transition statistics details"
 	depends on CPU_FREQ_STAT
 	help
-	  This will show detail CPU frequency translation table in sysfs file
-	  system.
+	  Show detailed CPU frequency transition table in sysfs.
 
 	  If in doubt, say N.
 
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -347,6 +347,7 @@ static void __cpufreq_notify_transition(
 		pr_debug("FREQ: %lu - CPU: %lu\n",
 			 (unsigned long)freqs->new, (unsigned long)freqs->cpu);
 		trace_cpu_frequency(freqs->new, freqs->cpu);
+		cpufreq_stats_record_transition(policy, freqs->new);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
@@ -1108,6 +1109,7 @@ static void cpufreq_policy_put_kobj(stru
 					     CPUFREQ_REMOVE_POLICY, policy);
 
 	down_write(&policy->rwsem);
+	cpufreq_stats_free_table(policy);
 	cpufreq_remove_dev_symlink(policy);
 	kobj = &policy->kobj;
 	cmp = &policy->kobj_unregister;
@@ -1265,6 +1267,8 @@ static int cpufreq_online(unsigned int c
 		ret = cpufreq_add_dev_interface(policy);
 		if (ret)
 			goto out_exit_policy;
+
+		cpufreq_stats_create_table(policy);
 		blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 				CPUFREQ_CREATE_POLICY, policy);
 
Index: linux-pm/drivers/cpufreq/cpufreq_stats.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_stats.c
+++ linux-pm/drivers/cpufreq/cpufreq_stats.c
@@ -130,7 +130,7 @@ static int freq_table_get_index(struct c
 	return -1;
 }
 
-static void __cpufreq_stats_free_table(struct cpufreq_policy *policy)
+void cpufreq_stats_free_table(struct cpufreq_policy *policy)
 {
 	struct cpufreq_stats *stats = policy->stats;
 
@@ -146,20 +146,7 @@ static void __cpufreq_stats_free_table(s
 	policy->stats = NULL;
 }
 
-static void cpufreq_stats_free_table(unsigned int cpu)
-{
-	struct cpufreq_policy *policy;
-
-	policy = cpufreq_cpu_get(cpu);
-	if (!policy)
-		return;
-
-	__cpufreq_stats_free_table(policy);
-
-	cpufreq_cpu_put(policy);
-}
-
-static int __cpufreq_stats_create_table(struct cpufreq_policy *policy)
+void cpufreq_stats_create_table(struct cpufreq_policy *policy)
 {
 	unsigned int i = 0, count = 0, ret = -ENOMEM;
 	struct cpufreq_stats *stats;
@@ -170,15 +157,15 @@ static int __cpufreq_stats_create_table(
 	/* We need cpufreq table for creating stats table */
 	table = cpufreq_frequency_get_table(cpu);
 	if (unlikely(!table))
-		return 0;
+		return;
 
 	/* stats already initialized */
 	if (policy->stats)
-		return -EEXIST;
+		return;
 
 	stats = kzalloc(sizeof(*stats), GFP_KERNEL);
 	if (!stats)
-		return -ENOMEM;
+		return;
 
 	/* Find total allocation size */
 	cpufreq_for_each_valid_entry(pos, table)
@@ -215,80 +202,32 @@ static int __cpufreq_stats_create_table(
 	policy->stats = stats;
 	ret = sysfs_create_group(&policy->kobj, &stats_attr_group);
 	if (!ret)
-		return 0;
+		return;
 
 	/* We failed, release resources */
 	policy->stats = NULL;
 	kfree(stats->time_in_state);
 free_stat:
 	kfree(stats);
-
-	return ret;
-}
-
-static void cpufreq_stats_create_table(unsigned int cpu)
-{
-	struct cpufreq_policy *policy;
-
-	/*
-	 * "likely(!policy)" because normally cpufreq_stats will be registered
-	 * before cpufreq driver
-	 */
-	policy = cpufreq_cpu_get(cpu);
-	if (likely(!policy))
-		return;
-
-	__cpufreq_stats_create_table(policy);
-
-	cpufreq_cpu_put(policy);
-}
-
-static int cpufreq_stat_notifier_policy(struct notifier_block *nb,
-		unsigned long val, void *data)
-{
-	int ret = 0;
-	struct cpufreq_policy *policy = data;
-
-	if (val == CPUFREQ_CREATE_POLICY)
-		ret = __cpufreq_stats_create_table(policy);
-	else if (val == CPUFREQ_REMOVE_POLICY)
-		__cpufreq_stats_free_table(policy);
-
-	return ret;
 }
 
-static int cpufreq_stat_notifier_trans(struct notifier_block *nb,
-		unsigned long val, void *data)
+void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
+				     unsigned int new_freq)
 {
-	struct cpufreq_freqs *freq = data;
-	struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
-	struct cpufreq_stats *stats;
+	struct cpufreq_stats *stats = policy->stats;
 	int old_index, new_index;
 
-	if (!policy) {
-		pr_err("%s: No policy found\n", __func__);
-		return 0;
-	}
-
-	if (val != CPUFREQ_POSTCHANGE)
-		goto put_policy;
-
-	if (!policy->stats) {
+	if (!stats) {
 		pr_debug("%s: No stats found\n", __func__);
-		goto put_policy;
+		return;
 	}
 
-	stats = policy->stats;
-
 	old_index = stats->last_index;
-	new_index = freq_table_get_index(stats, freq->new);
+	new_index = freq_table_get_index(stats, new_freq);
 
 	/* We can't do stats->time_in_state[-1]= .. */
-	if (old_index == -1 || new_index == -1)
-		goto put_policy;
-
-	if (old_index == new_index)
-		goto put_policy;
+	if (old_index == -1 || new_index == -1 || old_index == new_index)
+		return;
 
 	cpufreq_stats_update(stats);
 
@@ -297,61 +236,4 @@ static int cpufreq_stat_notifier_trans(s
 	stats->trans_table[old_index * stats->max_state + new_index]++;
 #endif
 	stats->total_trans++;
-
-put_policy:
-	cpufreq_cpu_put(policy);
-	return 0;
 }
-
-static struct notifier_block notifier_policy_block = {
-	.notifier_call = cpufreq_stat_notifier_policy
-};
-
-static struct notifier_block notifier_trans_block = {
-	.notifier_call = cpufreq_stat_notifier_trans
-};
-
-static int __init cpufreq_stats_init(void)
-{
-	int ret;
-	unsigned int cpu;
-
-	spin_lock_init(&cpufreq_stats_lock);
-	ret = cpufreq_register_notifier(&notifier_policy_block,
-				CPUFREQ_POLICY_NOTIFIER);
-	if (ret)
-		return ret;
-
-	for_each_online_cpu(cpu)
-		cpufreq_stats_create_table(cpu);
-
-	ret = cpufreq_register_notifier(&notifier_trans_block,
-				CPUFREQ_TRANSITION_NOTIFIER);
-	if (ret) {
-		cpufreq_unregister_notifier(&notifier_policy_block,
-				CPUFREQ_POLICY_NOTIFIER);
-		for_each_online_cpu(cpu)
-			cpufreq_stats_free_table(cpu);
-		return ret;
-	}
-
-	return 0;
-}
-static void __exit cpufreq_stats_exit(void)
-{
-	unsigned int cpu;
-
-	cpufreq_unregister_notifier(&notifier_policy_block,
-			CPUFREQ_POLICY_NOTIFIER);
-	cpufreq_unregister_notifier(&notifier_trans_block,
-			CPUFREQ_TRANSITION_NOTIFIER);
-	for_each_online_cpu(cpu)
-		cpufreq_stats_free_table(cpu);
-}
-
-MODULE_AUTHOR("Zou Nan hai <nanhai.zou@intel.com>");
-MODULE_DESCRIPTION("Export cpufreq stats via sysfs");
-MODULE_LICENSE("GPL");
-
-module_init(cpufreq_stats_init);
-module_exit(cpufreq_stats_exit);
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -185,6 +185,18 @@ static inline unsigned int cpufreq_quick
 static inline void disable_cpufreq(void) { }
 #endif
 
+#ifdef CONFIG_CPU_FREQ_STAT
+void cpufreq_stats_create_table(struct cpufreq_policy *policy);
+void cpufreq_stats_free_table(struct cpufreq_policy *policy);
+void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
+				     unsigned int new_freq);
+#else
+static inline void cpufreq_stats_create_table(struct cpufreq_policy *policy) { }
+static inline void cpufreq_stats_free_table(struct cpufreq_policy *policy) { }
+static inline void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
+						   unsigned int new_freq) { }
+#endif /* CONFIG_CPU_FREQ_STAT */
+
 /*********************************************************************
  *                      CPUFREQ DRIVER INTERFACE                     *
  *********************************************************************/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: stats: Make the stats code non-modular
  2016-05-25 22:23 [PATCH] cpufreq: stats: Make the stats code non-modular Rafael J. Wysocki
@ 2016-05-26  4:59 ` Viresh Kumar
  2016-05-28 13:15   ` Rafael J. Wysocki
  0 siblings, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2016-05-26  4:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Linux Kernel Mailing List, Srinivas Pandruvada

On 26-05-16, 00:23, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The modularity of cpufreq_stats is quite problematic.
> 
> First off, the usage of policy notifiers for the initialization
> and cleanup in the cpufreq_stats module is inherently racy with
> respect to CPU offline/online and the initialization and cleanup
> of the cpufreq driver.
> 
> Second, fast frequency switching (used by the schedutil governor)
> cannot be enabled if any transition notifiers are registered, so
> if the cpufreq_stats module (that registers a transition notifier
> for updating transition statistics) is loaded, the schedutil governor
> cannot use fast frequency switching.
> 
> On the other hand, allowing cpufreq_stats to be built as a module
> doesn't really add much value.  Arguably, there's not much reason
> for that code to be modular at all.
> 
> For the above reasons, make the cpufreq stats code non-modular,
> modify the core to invoke functions provided by that code directly
> and drop the notifiers from it.
> 
> While at it, clean up Kconfig help for the CPU_FREQ_STAT and
> CPU_FREQ_STAT_DETAILS options.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/Kconfig         |   13 +--
>  drivers/cpufreq/cpufreq.c       |    4 +
>  drivers/cpufreq/cpufreq_stats.c |  146 +++-------------------------------------
>  include/linux/cpufreq.h         |   12 +++
>  4 files changed, 34 insertions(+), 141 deletions(-)

Looks far better now :)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: stats: Make the stats code non-modular
  2016-05-26  4:59 ` Viresh Kumar
@ 2016-05-28 13:15   ` Rafael J. Wysocki
  2016-05-30  4:53     ` Viresh Kumar
  2016-05-31 10:44     ` Viresh Kumar
  0 siblings, 2 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2016-05-28 13:15 UTC (permalink / raw)
  To: Viresh Kumar, Linux PM list
  Cc: Linux Kernel Mailing List, Srinivas Pandruvada

On Thursday, May 26, 2016 10:29:45 AM Viresh Kumar wrote:
> On 26-05-16, 00:23, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > The modularity of cpufreq_stats is quite problematic.
> > 
> > First off, the usage of policy notifiers for the initialization
> > and cleanup in the cpufreq_stats module is inherently racy with
> > respect to CPU offline/online and the initialization and cleanup
> > of the cpufreq driver.
> > 
> > Second, fast frequency switching (used by the schedutil governor)
> > cannot be enabled if any transition notifiers are registered, so
> > if the cpufreq_stats module (that registers a transition notifier
> > for updating transition statistics) is loaded, the schedutil governor
> > cannot use fast frequency switching.
> > 
> > On the other hand, allowing cpufreq_stats to be built as a module
> > doesn't really add much value.  Arguably, there's not much reason
> > for that code to be modular at all.
> > 
> > For the above reasons, make the cpufreq stats code non-modular,
> > modify the core to invoke functions provided by that code directly
> > and drop the notifiers from it.
> > 
> > While at it, clean up Kconfig help for the CPU_FREQ_STAT and
> > CPU_FREQ_STAT_DETAILS options.
> > 
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/cpufreq/Kconfig         |   13 +--
> >  drivers/cpufreq/cpufreq.c       |    4 +
> >  drivers/cpufreq/cpufreq_stats.c |  146 +++-------------------------------------
> >  include/linux/cpufreq.h         |   12 +++
> >  4 files changed, 34 insertions(+), 141 deletions(-)
> 
> Looks far better now :)

It does, but there's a problem.

If fast frequency switching is in use, the stats attributes just sit there
full of zeros (because the stats are not updated then) which is confusing.

Of course, the ultimate solution here would be to make the stats actually
work with fast frequency switching, but that requires some major surgery of
the stats code, so for now I'd like to simply make those attributes return
nothing if fast frequency switching is enabled for the policy (returning
-EBUSY from there, which would have been cleaner, unfortunately breaks
powertop).

Updated patch is below.

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] cpufreq: stats: Make the stats code non-modular

The modularity of cpufreq_stats is quite problematic.

First off, the usage of policy notifiers for the initialization
and cleanup in the cpufreq_stats module is inherently racy with
respect to CPU offline/online and the initialization and cleanup
of the cpufreq driver.

Second, fast frequency switching (used by the schedutil governor)
cannot be enabled if any transition notifiers are registered, so
if the cpufreq_stats module (that registers a transition notifier
for updating transition statistics) is loaded, the schedutil governor
cannot use fast frequency switching.

On the other hand, allowing cpufreq_stats to be built as a module
doesn't really add much value.  Arguably, there's not much reason
for that code to be modular at all.

For the above reasons, make the cpufreq stats code non-modular,
modify the core to invoke functions provided by that code directly
and drop the notifiers from it.

Make the stats sysfs attributes appear empty if fast frequency
switching is enabled as the statistics will not be updated in that
case anyway (and returning -EBUSY from those attributes breaks
powertop).

While at it, clean up Kconfig help for the CPU_FREQ_STAT and
CPU_FREQ_STAT_DETAILS options.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/Kconfig         |   13 +--
 drivers/cpufreq/cpufreq.c       |    4 +
 drivers/cpufreq/cpufreq_stats.c |  152 +++++-----------------------------------
 include/linux/cpufreq.h         |   12 +++
 4 files changed, 40 insertions(+), 141 deletions(-)

Index: linux-pm/drivers/cpufreq/Kconfig
===================================================================
--- linux-pm.orig/drivers/cpufreq/Kconfig
+++ linux-pm/drivers/cpufreq/Kconfig
@@ -31,23 +31,18 @@ config CPU_FREQ_BOOST_SW
 	depends on THERMAL
 
 config CPU_FREQ_STAT
-	tristate "CPU frequency translation statistics"
+	bool "CPU frequency transition statistics"
 	default y
 	help
-	  This driver exports CPU frequency statistics information through sysfs
-	  file system.
-
-	  To compile this driver as a module, choose M here: the
-	  module will be called cpufreq_stats.
+	  Export CPU frequency statistics information through sysfs.
 
 	  If in doubt, say N.
 
 config CPU_FREQ_STAT_DETAILS
-	bool "CPU frequency translation statistics details"
+	bool "CPU frequency transition statistics details"
 	depends on CPU_FREQ_STAT
 	help
-	  This will show detail CPU frequency translation table in sysfs file
-	  system.
+	  Show detailed CPU frequency transition table in sysfs.
 
 	  If in doubt, say N.
 
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -347,6 +347,7 @@ static void __cpufreq_notify_transition(
 		pr_debug("FREQ: %lu - CPU: %lu\n",
 			 (unsigned long)freqs->new, (unsigned long)freqs->cpu);
 		trace_cpu_frequency(freqs->new, freqs->cpu);
+		cpufreq_stats_record_transition(policy, freqs->new);
 		srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
 				CPUFREQ_POSTCHANGE, freqs);
 		if (likely(policy) && likely(policy->cpu == freqs->cpu))
@@ -1108,6 +1109,7 @@ static void cpufreq_policy_put_kobj(stru
 					     CPUFREQ_REMOVE_POLICY, policy);
 
 	down_write(&policy->rwsem);
+	cpufreq_stats_free_table(policy);
 	cpufreq_remove_dev_symlink(policy);
 	kobj = &policy->kobj;
 	cmp = &policy->kobj_unregister;
@@ -1265,6 +1267,8 @@ static int cpufreq_online(unsigned int c
 		ret = cpufreq_add_dev_interface(policy);
 		if (ret)
 			goto out_exit_policy;
+
+		cpufreq_stats_create_table(policy);
 		blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 				CPUFREQ_CREATE_POLICY, policy);
 
Index: linux-pm/drivers/cpufreq/cpufreq_stats.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq_stats.c
+++ linux-pm/drivers/cpufreq/cpufreq_stats.c
@@ -52,6 +52,9 @@ static ssize_t show_time_in_state(struct
 	ssize_t len = 0;
 	int i;
 
+	if (policy->fast_switch_enabled)
+		return 0;
+
 	cpufreq_stats_update(stats);
 	for (i = 0; i < stats->state_num; i++) {
 		len += sprintf(buf + len, "%u %llu\n", stats->freq_table[i],
@@ -68,6 +71,9 @@ static ssize_t show_trans_table(struct c
 	ssize_t len = 0;
 	int i, j;
 
+	if (policy->fast_switch_enabled)
+		return 0;
+
 	len += snprintf(buf + len, PAGE_SIZE - len, "   From  :    To\n");
 	len += snprintf(buf + len, PAGE_SIZE - len, "         : ");
 	for (i = 0; i < stats->state_num; i++) {
@@ -130,7 +136,7 @@ static int freq_table_get_index(struct c
 	return -1;
 }
 
-static void __cpufreq_stats_free_table(struct cpufreq_policy *policy)
+void cpufreq_stats_free_table(struct cpufreq_policy *policy)
 {
 	struct cpufreq_stats *stats = policy->stats;
 
@@ -146,20 +152,7 @@ static void __cpufreq_stats_free_table(s
 	policy->stats = NULL;
 }
 
-static void cpufreq_stats_free_table(unsigned int cpu)
-{
-	struct cpufreq_policy *policy;
-
-	policy = cpufreq_cpu_get(cpu);
-	if (!policy)
-		return;
-
-	__cpufreq_stats_free_table(policy);
-
-	cpufreq_cpu_put(policy);
-}
-
-static int __cpufreq_stats_create_table(struct cpufreq_policy *policy)
+void cpufreq_stats_create_table(struct cpufreq_policy *policy)
 {
 	unsigned int i = 0, count = 0, ret = -ENOMEM;
 	struct cpufreq_stats *stats;
@@ -170,15 +163,15 @@ static int __cpufreq_stats_create_table(
 	/* We need cpufreq table for creating stats table */
 	table = cpufreq_frequency_get_table(cpu);
 	if (unlikely(!table))
-		return 0;
+		return;
 
 	/* stats already initialized */
 	if (policy->stats)
-		return -EEXIST;
+		return;
 
 	stats = kzalloc(sizeof(*stats), GFP_KERNEL);
 	if (!stats)
-		return -ENOMEM;
+		return;
 
 	/* Find total allocation size */
 	cpufreq_for_each_valid_entry(pos, table)
@@ -215,80 +208,32 @@ static int __cpufreq_stats_create_table(
 	policy->stats = stats;
 	ret = sysfs_create_group(&policy->kobj, &stats_attr_group);
 	if (!ret)
-		return 0;
+		return;
 
 	/* We failed, release resources */
 	policy->stats = NULL;
 	kfree(stats->time_in_state);
 free_stat:
 	kfree(stats);
-
-	return ret;
-}
-
-static void cpufreq_stats_create_table(unsigned int cpu)
-{
-	struct cpufreq_policy *policy;
-
-	/*
-	 * "likely(!policy)" because normally cpufreq_stats will be registered
-	 * before cpufreq driver
-	 */
-	policy = cpufreq_cpu_get(cpu);
-	if (likely(!policy))
-		return;
-
-	__cpufreq_stats_create_table(policy);
-
-	cpufreq_cpu_put(policy);
-}
-
-static int cpufreq_stat_notifier_policy(struct notifier_block *nb,
-		unsigned long val, void *data)
-{
-	int ret = 0;
-	struct cpufreq_policy *policy = data;
-
-	if (val == CPUFREQ_CREATE_POLICY)
-		ret = __cpufreq_stats_create_table(policy);
-	else if (val == CPUFREQ_REMOVE_POLICY)
-		__cpufreq_stats_free_table(policy);
-
-	return ret;
 }
 
-static int cpufreq_stat_notifier_trans(struct notifier_block *nb,
-		unsigned long val, void *data)
+void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
+				     unsigned int new_freq)
 {
-	struct cpufreq_freqs *freq = data;
-	struct cpufreq_policy *policy = cpufreq_cpu_get(freq->cpu);
-	struct cpufreq_stats *stats;
+	struct cpufreq_stats *stats = policy->stats;
 	int old_index, new_index;
 
-	if (!policy) {
-		pr_err("%s: No policy found\n", __func__);
-		return 0;
-	}
-
-	if (val != CPUFREQ_POSTCHANGE)
-		goto put_policy;
-
-	if (!policy->stats) {
+	if (!stats) {
 		pr_debug("%s: No stats found\n", __func__);
-		goto put_policy;
+		return;
 	}
 
-	stats = policy->stats;
-
 	old_index = stats->last_index;
-	new_index = freq_table_get_index(stats, freq->new);
+	new_index = freq_table_get_index(stats, new_freq);
 
 	/* We can't do stats->time_in_state[-1]= .. */
-	if (old_index == -1 || new_index == -1)
-		goto put_policy;
-
-	if (old_index == new_index)
-		goto put_policy;
+	if (old_index == -1 || new_index == -1 || old_index == new_index)
+		return;
 
 	cpufreq_stats_update(stats);
 
@@ -297,61 +242,4 @@ static int cpufreq_stat_notifier_trans(s
 	stats->trans_table[old_index * stats->max_state + new_index]++;
 #endif
 	stats->total_trans++;
-
-put_policy:
-	cpufreq_cpu_put(policy);
-	return 0;
-}
-
-static struct notifier_block notifier_policy_block = {
-	.notifier_call = cpufreq_stat_notifier_policy
-};
-
-static struct notifier_block notifier_trans_block = {
-	.notifier_call = cpufreq_stat_notifier_trans
-};
-
-static int __init cpufreq_stats_init(void)
-{
-	int ret;
-	unsigned int cpu;
-
-	spin_lock_init(&cpufreq_stats_lock);
-	ret = cpufreq_register_notifier(&notifier_policy_block,
-				CPUFREQ_POLICY_NOTIFIER);
-	if (ret)
-		return ret;
-
-	for_each_online_cpu(cpu)
-		cpufreq_stats_create_table(cpu);
-
-	ret = cpufreq_register_notifier(&notifier_trans_block,
-				CPUFREQ_TRANSITION_NOTIFIER);
-	if (ret) {
-		cpufreq_unregister_notifier(&notifier_policy_block,
-				CPUFREQ_POLICY_NOTIFIER);
-		for_each_online_cpu(cpu)
-			cpufreq_stats_free_table(cpu);
-		return ret;
-	}
-
-	return 0;
 }
-static void __exit cpufreq_stats_exit(void)
-{
-	unsigned int cpu;
-
-	cpufreq_unregister_notifier(&notifier_policy_block,
-			CPUFREQ_POLICY_NOTIFIER);
-	cpufreq_unregister_notifier(&notifier_trans_block,
-			CPUFREQ_TRANSITION_NOTIFIER);
-	for_each_online_cpu(cpu)
-		cpufreq_stats_free_table(cpu);
-}
-
-MODULE_AUTHOR("Zou Nan hai <nanhai.zou@intel.com>");
-MODULE_DESCRIPTION("Export cpufreq stats via sysfs");
-MODULE_LICENSE("GPL");
-
-module_init(cpufreq_stats_init);
-module_exit(cpufreq_stats_exit);
Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -185,6 +185,18 @@ static inline unsigned int cpufreq_quick
 static inline void disable_cpufreq(void) { }
 #endif
 
+#ifdef CONFIG_CPU_FREQ_STAT
+void cpufreq_stats_create_table(struct cpufreq_policy *policy);
+void cpufreq_stats_free_table(struct cpufreq_policy *policy);
+void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
+				     unsigned int new_freq);
+#else
+static inline void cpufreq_stats_create_table(struct cpufreq_policy *policy) { }
+static inline void cpufreq_stats_free_table(struct cpufreq_policy *policy) { }
+static inline void cpufreq_stats_record_transition(struct cpufreq_policy *policy,
+						   unsigned int new_freq) { }
+#endif /* CONFIG_CPU_FREQ_STAT */
+
 /*********************************************************************
  *                      CPUFREQ DRIVER INTERFACE                     *
  *********************************************************************/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: stats: Make the stats code non-modular
  2016-05-28 13:15   ` Rafael J. Wysocki
@ 2016-05-30  4:53     ` Viresh Kumar
  2016-05-30 13:43       ` Rafael J. Wysocki
  2016-05-31 10:44     ` Viresh Kumar
  1 sibling, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2016-05-30  4:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Linux Kernel Mailing List, Srinivas Pandruvada

On 28-05-16, 15:15, Rafael J. Wysocki wrote:
> It does, but there's a problem.
> 
> If fast frequency switching is in use, the stats attributes just sit there
> full of zeros (because the stats are not updated then) which is confusing.
> 
> Of course, the ultimate solution here would be to make the stats actually
> work with fast frequency switching, but that requires some major surgery of
> the stats code, so for now I'd like to simply make those attributes return
> nothing if fast frequency switching is enabled for the policy (returning
> -EBUSY from there, which would have been cleaner, unfortunately breaks
> powertop).

What's stopping us from doing that? Sorry I don't remember that well :(

I mean, why can't we call cpufreq_stats_record_transition() somehow via that
code ?

> Updated patch is below.
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] cpufreq: stats: Make the stats code non-modular
> 
> The modularity of cpufreq_stats is quite problematic.
> 
> First off, the usage of policy notifiers for the initialization
> and cleanup in the cpufreq_stats module is inherently racy with
> respect to CPU offline/online and the initialization and cleanup
> of the cpufreq driver.
> 
> Second, fast frequency switching (used by the schedutil governor)
> cannot be enabled if any transition notifiers are registered, so
> if the cpufreq_stats module (that registers a transition notifier
> for updating transition statistics) is loaded, the schedutil governor
> cannot use fast frequency switching.
> 
> On the other hand, allowing cpufreq_stats to be built as a module
> doesn't really add much value.  Arguably, there's not much reason
> for that code to be modular at all.
> 
> For the above reasons, make the cpufreq stats code non-modular,
> modify the core to invoke functions provided by that code directly
> and drop the notifiers from it.
> 
> Make the stats sysfs attributes appear empty if fast frequency
> switching is enabled as the statistics will not be updated in that
> case anyway (and returning -EBUSY from those attributes breaks
> powertop).
> 
> While at it, clean up Kconfig help for the CPU_FREQ_STAT and
> CPU_FREQ_STAT_DETAILS options.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/Kconfig         |   13 +--
>  drivers/cpufreq/cpufreq.c       |    4 +
>  drivers/cpufreq/cpufreq_stats.c |  152 +++++-----------------------------------
>  include/linux/cpufreq.h         |   12 +++
>  4 files changed, 40 insertions(+), 141 deletions(-)

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

-- 
viresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: stats: Make the stats code non-modular
  2016-05-30  4:53     ` Viresh Kumar
@ 2016-05-30 13:43       ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2016-05-30 13:43 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Linux PM list, Linux Kernel Mailing List,
	Srinivas Pandruvada

On Mon, May 30, 2016 at 6:53 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 28-05-16, 15:15, Rafael J. Wysocki wrote:
>> It does, but there's a problem.
>>
>> If fast frequency switching is in use, the stats attributes just sit there
>> full of zeros (because the stats are not updated then) which is confusing.
>>
>> Of course, the ultimate solution here would be to make the stats actually
>> work with fast frequency switching, but that requires some major surgery of
>> the stats code, so for now I'd like to simply make those attributes return
>> nothing if fast frequency switching is enabled for the policy (returning
>> -EBUSY from there, which would have been cleaner, unfortunately breaks
>> powertop).
>
> What's stopping us from doing that? Sorry I don't remember that well :(
>
> I mean, why can't we call cpufreq_stats_record_transition() somehow via that
> code ?

Because it does things that aren't allowed to be done from the
scheduler paths (using the non-raw spinlock in the first place).

It also isn't efficient enough.  For example, the fast switch path in
schedutil for non-shared policies is lockless and it would be a shame
to add a lock to it just for the stats.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: stats: Make the stats code non-modular
  2016-05-28 13:15   ` Rafael J. Wysocki
  2016-05-30  4:53     ` Viresh Kumar
@ 2016-05-31 10:44     ` Viresh Kumar
  2016-05-31 20:47       ` Rafael J. Wysocki
  1 sibling, 1 reply; 7+ messages in thread
From: Viresh Kumar @ 2016-05-31 10:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM list, Linux Kernel Mailing List, Srinivas Pandruvada

On 28-05-16, 15:15, Rafael J. Wysocki wrote:
> -static int __init cpufreq_stats_init(void)
> -{
> -	int ret;
> -	unsigned int cpu;
> -
> -	spin_lock_init(&cpufreq_stats_lock);

My bad for ignoring this, but the spin lock is left uninitialized now
and this is what I get:

[    3.335732] BUG: spinlock bad magic on CPU#0, kworker/0:0/4
[    3.339847]  lock: cpufreq_stats_lock+0x0/0x10, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
[    3.348789] CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G        W       4.7.0-rc1-00046-g6ce5bc9b2166-dirty #148
[    3.358754] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[    3.364831] Workqueue: events dbs_work_handler
[    3.369259] [<c010e1ac>] (unwind_backtrace) from [<c010af38>] (show_stack+0x10/0x14)
[    3.376987] [<c010af38>] (show_stack) from [<c03248dc>] (dump_stack+0x88/0x9c)
[    3.384191] [<c03248dc>] (dump_stack) from [<c0159b6c>] (do_raw_spin_lock+0x1ac/0x1b0)
[    3.392089] [<c0159b6c>] (do_raw_spin_lock) from [<c0510b40>] (cpufreq_stats_update+0x24/0x60)
[    3.400682] [<c0510b40>] (cpufreq_stats_update) from [<c0511030>] (cpufreq_stats_record_transition+0x6c/0x9c)
[    3.410581] [<c0511030>] (cpufreq_stats_record_transition) from [<c050d4dc>] (cpufreq_notify_transition+0x94/0x13c)
[    3.420995] [<c050d4dc>] (cpufreq_notify_transition) from [<c050d6b8>] (cpufreq_freq_transition_end+0x24/0x90)
[    3.430975] [<c050d6b8>] (cpufreq_freq_transition_end) from [<c050e954>] (__cpufreq_driver_target+0x124/0x264)
[    3.440957] [<c050e954>] (__cpufreq_driver_target) from [<c051159c>] (od_dbs_timer+0x9c/0x168)
[    3.449550] [<c051159c>] (od_dbs_timer) from [<c0512060>] (dbs_work_handler+0x2c/0x60)
[    3.457449] [<c0512060>] (dbs_work_handler) from [<c0130f68>] (process_one_work+0x124/0x338)
[    3.465867] [<c0130f68>] (process_one_work) from [<c01311b4>] (worker_thread+0x38/0x4d4)
[    3.473943] [<c01311b4>] (worker_thread) from [<c0136490>] (kthread+0xdc/0xf4)
[    3.481145] [<c0136490>] (kthread) from [<c0107978>] (ret_from_fork+0x14/0x3c)

-- 
viresh

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] cpufreq: stats: Make the stats code non-modular
  2016-05-31 10:44     ` Viresh Kumar
@ 2016-05-31 20:47       ` Rafael J. Wysocki
  0 siblings, 0 replies; 7+ messages in thread
From: Rafael J. Wysocki @ 2016-05-31 20:47 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Linux PM list, Linux Kernel Mailing List, Srinivas Pandruvada

On Tuesday, May 31, 2016 04:14:34 PM Viresh Kumar wrote:
> On 28-05-16, 15:15, Rafael J. Wysocki wrote:
> > -static int __init cpufreq_stats_init(void)
> > -{
> > -	int ret;
> > -	unsigned int cpu;
> > -
> > -	spin_lock_init(&cpufreq_stats_lock);
> 
> My bad for ignoring this, but the spin lock is left uninitialized now

That's embarrassing.

Should be fixed now, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-31 20:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-25 22:23 [PATCH] cpufreq: stats: Make the stats code non-modular Rafael J. Wysocki
2016-05-26  4:59 ` Viresh Kumar
2016-05-28 13:15   ` Rafael J. Wysocki
2016-05-30  4:53     ` Viresh Kumar
2016-05-30 13:43       ` Rafael J. Wysocki
2016-05-31 10:44     ` Viresh Kumar
2016-05-31 20:47       ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).