All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Colin Cross <ccross@android.com>
Cc: Zoran Markovic <zoran.markovic@linaro.org>,
	lkml <linux-kernel@vger.kernel.org>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Benoit Goby <benoit@android.com>,
	Android Kernel Team <kernel-team@android.com>,
	Todd Poynor <toddpoynor@google.com>, San Mehat <san@google.com>,
	John Stultz <john.stultz@linaro.org>, Pavel Machek <pavel@ucw.cz>,
	Len Brown <len.brown@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCHv2 1/2] drivers: power: Add watchdog timer to catch drivers which lockup during suspend/resume.
Date: Sun, 12 May 2013 02:39:44 +0200	[thread overview]
Message-ID: <29127567.mTXaJOMILH@vostro.rjw.lan> (raw)
In-Reply-To: <CAMbhsRS+fNo19srma9WqcHutJnv8driOMnpM1-yW=pduJMo63g@mail.gmail.com>

On Friday, May 10, 2013 11:13:27 PM Colin Cross wrote:
> On Fri, May 10, 2013 at 2:28 PM, Zoran Markovic
> <zoran.markovic@linaro.org> wrote:
> > From: Benoit Goby <benoit@android.com>
> >
> > Below is a patch from android kernel that detects a driver suspend/resume
> > lockup and captures dump in the kernel log. Please review and provide
> > comments.
> 
> This paragraph should go below the --- line so it doesn't end up in
> the final commit message.
> 
> > Rather than hard-lock the kernel, dump the suspend/resume thread stack and
> > BUG() when a driver takes too long to suspend/resume.

And how exactly is that different from hanging the kernel?

> > The timeout is set to 12 seconds to be longer than the usbhid 10
> > second timeout.

Which is kind of arbitrary.

> > Exclude from the watchdog the time spent waiting for children that
> > are resumed asynchronously and time every device, whether or not they
> > resumed synchronously.

What about changing the code to use wait_for_completion_timeout() instead
of wait_for_completion() and actually trying to recover if one of them times
out?  [It could try to unregister the device in question and if *that* hangs
indefinitely, *then* commit a panic() or something similar, but if it succeeds,
abort the ongoing suspend or complete the resume without that device.]

Of course, that would involve changing things not to depend on async, but
might be better than slapping a timer on top.

> > This patch is targeted for mobile devices where a suspend/resume lockup
> > could cause a system reboot and catch user's attention. Information
> > about failing device can later be retrieved from captured log in
> > subsequent boot session.
> 
> I would take out the phrase "catch user's attention", the intention of
> this patch is actually the opposite - get the system back to working
> normally as fast as possible, while still putting enough information
> to debug the problem into the log.
> 
> > The hardware watchdog timer is likely suspended during this time and
> > couldn't be relied upon. The soft-lockup detector would eventually tell
> > that tasks are not scheduled, but would provide little context as to why.
> > The patch hence uses system timer and assumes it is still active while the
> > devices are suspended/resumed.

I must say I'm not particularly impressed by this patch series.  I understand
the motivation, but I'm wondering if that's the best we can do.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

  reply	other threads:[~2013-05-12  0:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-10 21:28 [RFC PATCHv2 0/2] power: device suspend/resume watchdog Zoran Markovic
2013-05-10 21:28 ` [RFC PATCHv2 1/2] drivers: power: Add watchdog timer to catch drivers which lockup during suspend/resume Zoran Markovic
2013-05-11  6:13   ` Colin Cross
2013-05-12  0:39     ` Rafael J. Wysocki [this message]
2013-05-12 19:15       ` Colin Cross
2013-05-13 11:26         ` Rafael J. Wysocki
2013-05-28 18:26           ` Zoran Markovic
2013-05-28 20:49             ` Rafael J. Wysocki
2013-05-31 21:13               ` Zoran Markovic
2013-06-05 22:17                 ` Zoran Markovic
2013-06-05 22:29                   ` Rafael J. Wysocki
2013-06-06 14:12                   ` Alan Stern
2013-06-10 21:25                     ` Colin Cross
2013-05-10 21:28 ` [RFC PATCHv2 2/2] PM: compile-time configuration of device suspend/resume watchdogs Zoran Markovic
2013-05-11  6:23   ` Colin Cross
2013-05-13 16:03     ` John Stultz
2013-05-11  9:28   ` Pavel Machek
2013-05-11 22:21     ` Colin Cross
2013-05-12  0:05       ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29127567.mTXaJOMILH@vostro.rjw.lan \
    --to=rjw@sisk.pl \
    --cc=benoit@android.com \
    --cc=ccross@android.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=san@google.com \
    --cc=toddpoynor@google.com \
    --cc=zoran.markovic@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.