linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Colin Cross <ccross@android.com>
To: Pavel Machek <pavel@ucw.cz>
Cc: Zoran Markovic <zoran.markovic@linaro.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Benoit Goby <benoit@android.com>,
	Android Kernel Team <kernel-team@android.com>,
	Todd Poynor <toddpoynor@google.com>, San Mehat <san@google.com>,
	John Stultz <john.stultz@linaro.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Len Brown <len.brown@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend.
Date: Wed, 1 May 2013 09:10:49 -0700	[thread overview]
Message-ID: <CAMbhsRTtYSLpAmB0hsanBH3ez4bm8BgbcH1xEBQsZSj4dc069g@mail.gmail.com> (raw)
In-Reply-To: <20130501105630.GA22552@amd.pavel.ucw.cz>

On Wed, May 1, 2013 at 3:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> Hi!
>
>> >> @@ -663,6 +671,30 @@ static bool is_async(struct device *dev)
>> >>  }
>> >>
>> >>  /**
>> >> + *     dpm_drv_timeout - Driver suspend / resume watchdog handler
>> >> + *     @data: struct device which timed out
>> >> + *
>> >> + *     Called when a driver has timed out suspending or resuming.
>> >> + *     There's not much we can do here to recover so
>> >> + *     BUG() out for a crash-dump
>> >> + *
>> >> + */
>> >> +static void dpm_drv_timeout(unsigned long data)
>> >> +{
>> >> +     struct dpm_drv_wd_data *wd_data = (void *)data;
>> >> +     struct device *dev = wd_data->dev;
>> >> +     struct task_struct *tsk = wd_data->tsk;
>> >> +
>> >> +     pr_emerg("**** DPM device timeout: %s (%s)\n", dev_name(dev),
>> >> +             (dev->driver ? dev->driver->name : "no driver"));
>> >> +
>> >> +     pr_emerg("dpm suspend stack:\n");
>> >> +     show_stack(tsk, NULL);
>> >> +
>> >> +     BUG();
>> >> +}
>> >
>> > So you:
>> >
>> > dump stack of the suspend task
>> It dumps the stack of the suspend task if the suspend callback is run
>> synchronously, or the async task if the suspend op is run
>> asynchronously.
>
> Lets call that [a]suspend task.
>
>> > do BUG which
>> >    dumps stack of current task
>> >    kills current task
>> >
>> > Current task may very well be idle task; in such case you kill the
>> > machine. Sounds like you should be doing something else, like kill -9
>> > instead of BUG()?
>>
>> Not much else you can do, you are stuck part way into suspend with a
>> driver's suspend callback half executed.  All userspace tasks are
>> frozen, and the suspend task is blocked indefinitely.
>
> Yes, there's better option. Attempt killing the [a]suspend task,
> instead of killing the current task.

That will leave you in a completely undefined state.  If you just kill
the task, you are likely to kill the synchronous suspend task, which
is the task that would resume your drivers and unfreeze tasks.  That
will leave you with no userspace tasks running, and much of your
hardware suspended.  How is that a useful result?  If you somehow
respawn a resume thread to resume whatever hardware you can and
unfreeze tasks, you still have the hardware that was suspending when
it was killed in a bad state, and probably has locks held, so you're
just going to deadlock or crash soon after.

> Try putting mdelay(100000) into suspend path. Your patch will do the
> wrong thing in that case (actually turning debuggable problem into
> undebuggable one).

I'm not saying this patch as is is right for everyone (it probably at
least needs to be configurable to be turned off, change the delay, and
change the panic to just a stack trace), but from a mobile perspective
this patch is far more debuggable than without this patch.  We work
very hard to make sure that panic's are highly debuggable, in fact we
often prefer panics over any other behavior when the device is in a
bad state, because it immediately gets the user's device working again
while still giving us useful information in our automatic log
collection.

With an mdelay(100000) in the suspend path, users in our debug device
pool are likely to just pull the battery because their screen won't
turn on, in which case I get no debugging information.  With this
patch, the device will automatically reboot due to the panic, and I
will get captured logs after reboot that show a stack trace ending
with
mdelay, which tells me exactly where to look for this mdelay(100000).

  reply	other threads:[~2013-05-01 16:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-04-30 22:28 [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend Zoran Markovic
2013-04-30 22:28 ` [RFC PATCH] power: Add option to log time spent in suspend Zoran Markovic
2013-05-01  0:29   ` Pavel Machek
2013-05-01  3:29     ` Colin Cross
2013-05-02 12:27       ` Pavel Machek
2013-05-02 18:29         ` Colin Cross
2013-05-02 18:58           ` John Stultz
2013-05-02 19:11             ` Colin Cross
2013-04-30 23:30 ` [RFC PATCH] drivers: power: Add watchdog timer to catch drivers which lockup during suspend Greg Kroah-Hartman
2013-05-01  3:36   ` Colin Cross
2013-05-01  4:17     ` Greg Kroah-Hartman
2013-05-01  4:39       ` Colin Cross
     [not found]         ` <CAK7N6voYXxJKWDwSj5T9Y2fKK+Y5JqN9Wm8Qoffi9N7nRnsYhw@mail.gmail.com>
2013-05-01  5:14           ` Colin Cross
2013-05-01  0:30 ` Pavel Machek
2013-05-01  3:39   ` Colin Cross
2013-05-01 10:56     ` Pavel Machek
2013-05-01 16:10       ` Colin Cross [this message]
2013-05-01 16:24         ` Greg Kroah-Hartman
2013-05-02 12:30         ` Pavel Machek
2013-05-02 18:25           ` Colin Cross

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMbhsRTtYSLpAmB0hsanBH3ez4bm8BgbcH1xEBQsZSj4dc069g@mail.gmail.com \
    --to=ccross@android.com \
    --cc=benoit@android.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=rjw@sisk.pl \
    --cc=san@google.com \
    --cc=toddpoynor@google.com \
    --cc=zoran.markovic@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).