All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Zoran Markovic <zoran.markovic@linaro.org>
Cc: Colin Cross <ccross@android.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Benoit Goby <benoit@android.com>,
	Android Kernel Team <kernel-team@android.com>,
	Todd Poynor <toddpoynor@google.com>, San Mehat <san@google.com>,
	John Stultz <john.stultz@linaro.org>, Pavel Machek <pavel@ucw.cz>,
	Len Brown <len.brown@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Subject: Re: [RFC PATCHv2 1/2] drivers: power: Add watchdog timer to catch drivers which lockup during suspend/resume.
Date: Tue, 28 May 2013 22:49:26 +0200	[thread overview]
Message-ID: <1515063.BZy7p4GtyV@vostro.rjw.lan> (raw)
In-Reply-To: <CAME+o4kJYD2K_aL4xNi-jNP1Q8R+seDNZDwsGBhvU00ArR0Skg@mail.gmail.com>

On Tuesday, May 28, 2013 11:26:09 AM Zoran Markovic wrote:
> > What about this:
> >  - Add one more list_head to struct dev_pm_info.
> >  - Make dpm_prepare() create a new list for the next steps instead of moving
> >    devices out of dpm_list.
> >  - Start an async work to carry out dpm_suspend() and make the main thread
> >    do wait_for_completion_timeout() for every device in dpm_list (in the
> >    reverse order).
> >  - If it times out, mark the device in question as unusable, possibly resume
> >    the already suspended devices (except for descendants of the failed one)
> >    and abort the suspend.  Return a specific error code to user space so that
> >    it knows what happened.  [You can make this step configurable to BUG()
> >    instead of doing all those things if you think that will be more useful for
> >    platforms you care about.]
> >  - Disable future suspends.
> > And analogously for resume.
> >
> > That should allow people to investigate what happened on a system that
> > (hopefully) is not completely dead and you still can have your "reboot if
> > suspend hangs" feature if you like.
> 
> I looked into implementing this. The problem that I encountered is
> that there is no reliable way of canceling an async task, and hence
> the asynchronous __device_suspend() would be left racing with a
> recovery from a suspend timeout.

Why exactly would it be racing?  We wouldn't call device_resume() for
the device that timed out (and its descendants).

> We could do cancel_work_sync() as a recovery, but that call blocks until the
> running async task is flushed, which might never happen. So doing a panic()
> is pretty much the only option for recovering.

Well, its usefulness is quite limited, then.  That said I'm still not convinced
that this actually is the case.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

  reply	other threads:[~2013-05-28 20:40 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-10 21:28 [RFC PATCHv2 0/2] power: device suspend/resume watchdog Zoran Markovic
2013-05-10 21:28 ` [RFC PATCHv2 1/2] drivers: power: Add watchdog timer to catch drivers which lockup during suspend/resume Zoran Markovic
2013-05-11  6:13   ` Colin Cross
2013-05-12  0:39     ` Rafael J. Wysocki
2013-05-12 19:15       ` Colin Cross
2013-05-13 11:26         ` Rafael J. Wysocki
2013-05-28 18:26           ` Zoran Markovic
2013-05-28 20:49             ` Rafael J. Wysocki [this message]
2013-05-31 21:13               ` Zoran Markovic
2013-06-05 22:17                 ` Zoran Markovic
2013-06-05 22:29                   ` Rafael J. Wysocki
2013-06-06 14:12                   ` Alan Stern
2013-06-10 21:25                     ` Colin Cross
2013-05-10 21:28 ` [RFC PATCHv2 2/2] PM: compile-time configuration of device suspend/resume watchdogs Zoran Markovic
2013-05-11  6:23   ` Colin Cross
2013-05-13 16:03     ` John Stultz
2013-05-11  9:28   ` Pavel Machek
2013-05-11 22:21     ` Colin Cross
2013-05-12  0:05       ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1515063.BZy7p4GtyV@vostro.rjw.lan \
    --to=rjw@sisk.pl \
    --cc=benoit@android.com \
    --cc=ccross@android.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.stultz@linaro.org \
    --cc=kernel-team@android.com \
    --cc=len.brown@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=pavel@ucw.cz \
    --cc=san@google.com \
    --cc=toddpoynor@google.com \
    --cc=zoran.markovic@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.