All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephen Boyd <sboyd@codeaurora.org>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linux-kernel@vger.kernel.org, cpufreq@vger.kernel.org,
	linux-pm@vger.kernel.org, "Rafael J . Wysocki" <rjw@sisk.pl>
Subject: [PATCH] cpufreq: Fix timer/workqueue corruption due to double queueing
Date: Tue, 27 Aug 2013 11:47:29 -0700	[thread overview]
Message-ID: <1377629249-1177-1-git-send-email-sboyd@codeaurora.org> (raw)
In-Reply-To: <CAKohpoktPNtSMRn0dEVjcEW5_62E+_UEgySOBJ-hG+r=NU3V2w@mail.gmail.com>

When a CPU is hot removed we'll cancel all the delayed work items
via gov_cancel_work(). Normally this will just cancel a delayed
timer on each CPU that the policy is managing and the work won't
run, but if the work is already running the workqueue code will
wait for the work to finish before continuing to prevent the
work items from re-queuing themselves like they normally do. This
scheme will work most of the time, except for the case where the
work function determines that it should adjust the delay for all
other CPUs that the policy is managing. If this scenario occurs,
the canceling CPU will cancel its own work but queue up the other
CPUs works to run. For example:

 CPU0                                        CPU1
 ----                                        ----
 cpu_down()
  ...
  __cpufreq_remove_dev()
   cpufreq_governor_dbs()
    case CPUFREQ_GOV_STOP:
     gov_cancel_work(dbs_data, policy);
      cpu0 work is canceled
       timer is canceled
       cpu1 work is canceled                    <work runs>
       <waits for cpu1>                         od_dbs_timer()
                                                 gov_queue_work(*, *, true);
 						  cpu0 work queued
 						  cpu1 work queued
						  cpu2 work queued
						  ...
       cpu1 work is canceled
       cpu2 work is canceled
       ...

At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:

WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10
Modules linked in:
CPU: 0 PID: 1491 Comm: sh Tainted: G        W    3.10.0 #19
[<c010c178>] (unwind_backtrace+0x0/0x11c) from [<c0109dec>] (show_stack+0x10/0x14)
[<c0109dec>] (show_stack+0x10/0x14) from [<c01904cc>] (warn_slowpath_common+0x4c/0x6c)
[<c01904cc>] (warn_slowpath_common+0x4c/0x6c) from [<c019056c>] (warn_slowpath_fmt+0x2c/0x3c)
[<c019056c>] (warn_slowpath_fmt+0x2c/0x3c) from [<c0388a7c>] (debug_print_object+0x94/0xbc)
[<c0388a7c>] (debug_print_object+0x94/0xbc) from [<c0388e34>] (__debug_object_init+0x2d0/0x340)
[<c0388e34>] (__debug_object_init+0x2d0/0x340) from [<c019e3b0>] (init_timer_key+0x14/0xb0)
[<c019e3b0>] (init_timer_key+0x14/0xb0) from [<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8)
[<c0635f78>] (cpufreq_governor_dbs+0x3e8/0x5f8) from [<c06325a0>] (__cpufreq_governor+0xdc/0x1a4)
[<c06325a0>] (__cpufreq_governor+0xdc/0x1a4) from [<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434)
[<c0633704>] (__cpufreq_remove_dev.isra.10+0x3b4/0x434) from [<c08989f4>] (cpufreq_cpu_callback+0x60/0x80)
[<c08989f4>] (cpufreq_cpu_callback+0x60/0x80) from [<c08a43c0>] (notifier_call_chain+0x38/0x68)
[<c08a43c0>] (notifier_call_chain+0x38/0x68) from [<c01938e0>] (__cpu_notify+0x28/0x40)
[<c01938e0>] (__cpu_notify+0x28/0x40) from [<c0892ad4>] (_cpu_down+0x7c/0x2c0)
[<c0892ad4>] (_cpu_down+0x7c/0x2c0) from [<c0892d3c>] (cpu_down+0x24/0x40)
[<c0892d3c>] (cpu_down+0x24/0x40) from [<c0893ea8>] (store_online+0x2c/0x74)
[<c0893ea8>] (store_online+0x2c/0x74) from [<c04519d8>] (dev_attr_store+0x18/0x24)
[<c04519d8>] (dev_attr_store+0x18/0x24) from [<c02a69d4>] (sysfs_write_file+0x100/0x148)
[<c02a69d4>] (sysfs_write_file+0x100/0x148) from [<c0255c18>] (vfs_write+0xcc/0x174)
[<c0255c18>] (vfs_write+0xcc/0x174) from [<c0255f70>] (SyS_write+0x38/0x64)
[<c0255f70>] (SyS_write+0x38/0x64) from [<c0106120>] (ret_fast_syscall+0x0/0x30)

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
---

On 08/27, Viresh Kumar wrote:
> On 27 August 2013 04:15, Stephen Boyd <sboyd@codeaurora.org> wrote:
> > +++ b/drivers/cpufreq/cpufreq_governor.c
> > @@ -133,7 +133,7 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
> >  {
> >         int i;
> >
> > -       if (!all_cpus) {
> > +       if (!all_cpus || !policy->governor_enabled) {
> >                 __gov_queue_work(smp_processor_id(), dbs_data, delay);
> >         } else {
> >                 for_each_cpu(i, policy->cpus)
> 
> Shouldn't we simply do this instead at the top of this function?
> 
> > +       if (!policy->governor_enabled)
> > +              return;

Sure that works just as well. Here's a patch.

 drivers/cpufreq/cpufreq_governor.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 7b839a8..b9b20fd 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -133,6 +133,9 @@ void gov_queue_work(struct dbs_data *dbs_data, struct cpufreq_policy *policy,
 {
 	int i;
 
+	if (!policy->governor_enabled)
+		return;
+
 	if (!all_cpus) {
 		__gov_queue_work(smp_processor_id(), dbs_data, delay);
 	} else {
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation


  reply	other threads:[~2013-08-27 18:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-26 22:45 [PATCH] cpufreq: Fix timer/workqueue corruption due to double queueing Stephen Boyd
2013-08-27  6:31 ` Viresh Kumar
2013-08-27 18:47   ` Stephen Boyd [this message]
2013-08-27 22:01     ` [PATCH] cpufreq: Don't use smp_processor_id() in preemptible context Stephen Boyd
2013-08-28  6:34       ` Viresh Kumar
2013-08-28 16:26         ` Stephen Boyd
2013-08-28 21:24           ` [PATCH v2] " Stephen Boyd
2013-08-29  4:20             ` Viresh Kumar
2013-08-29 20:31               ` Rafael J. Wysocki
2013-08-28  5:37     ` [PATCH] cpufreq: Fix timer/workqueue corruption due to double queueing Viresh Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1377629249-1177-1-git-send-email-sboyd@codeaurora.org \
    --to=sboyd@codeaurora.org \
    --cc=cpufreq@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@sisk.pl \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.