All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: David Brownell <david-b@pacbell.net>
Cc: Nigel Cunningham <nigel@suspend2.net>,
	Andrew Morton <akpm@osdl.org>,
	linux-pm@lists.osdl.org, linux-usb-devel@lists.sourceforge.net
Subject: Re: [patch/rft 2.6.17-rc2] swsusp resume must not device_suspend()
Date: Tue, 25 Apr 2006 23:55:28 +0200	[thread overview]
Message-ID: <200604252355.29006.rjw@sisk.pl> (raw)
In-Reply-To: <200604251404.30228.david-b@pacbell.net>

On Tuesday 25 April 2006 23:04, David Brownell wrote:
> On Tuesday 25 April 2006 11:56 am, Rafael J. Wysocki wrote:
> > 
> > > I've begun thinking that calls like pm_should_I_spin_down_drives() would be a
> > > better structural approach than continually redefining this "freeze" thing so
> > > it makes less and less sense to all other drivers ... who nonethless need to
> > > clutter themselves up with a growing list of special cases, to accomodate
> > > rotating media that may not even exist in the target system.
> > 
> > I think we should do something different to device_power_down(PMSG_FREEZE)
> > there, but I'm not sure it should be kernel_restart_prepare(NULL).
> > 
> > Actually spinning down disks during resume is a problem for some users (yes,
> > we've had such bug reports recently), so it's better to avoid this.
> 
> Well, if we had a pm_should_I_spin_down_drives() it would make sense to me
> that it return FALSE during kernel_restart_prepare() too ... surely kexec
> users have the same issues!
> 
> If you currently have users who object to spindown-during-resume, then it'd
> seem that my patch couldn't change anything except maybe details.

Fortunately this particular problem has been fixed in the driver. ;-)

> And that switching over to a call like pm_should_I_spin_down_drives() should
> fix it all.

Agreed, but I have to learn quite a bit to implement such a thing.

> > > > OTOH I think at least some device driver writers assume that .resume() will
> > > > always be called after .suspend() which only is true for non-modular drivers
> > > > (or for modular drivers loaded from an initrd before resume). 
> > > 
> > > Say what?  Of _course_ resume() should only be called after suspend().  If
> > > that's not true in any case, the code wrongly issuing the resume() is buggy.
> > 
> > Well, suppose we have a modular driver that's not loaded before resume.
> 
> That's not the problem case though; it works correctly, since the device
> hardware is already being left in an appropriate (RESET) state.
> 
> 
> > Then it goes like that (approximately):
> > (1) We activate swsusp which calls .suspend() for all devices including our
> > driver (this is a real suspend).
> > (2) swsusp snapshots the system and creates the image.
> > (3) swsusp calls .resume() for all devices in order to be able to save the
> > image (.resume() for our driver is also called which is OK).
> > (4) swsusp turns off the system.
> > (5) (some time later) We start a new kernel and tell it to resume.
> > (6) It activates swsusp which reads the image.
> 
> And assuming this is an x86 PC, at this point every device is in one of three states:
> 
>   - initialized by BIOS.  This is a particular PITA for USB, but one that's
>     handled OK (mostly) except when BIOS bugs kick in.  There's some nasty
>     code that kicks in along with PCI quirk handling, which ensures that by
>     the time Linux-USB  driver could see this state (or the input subsystem
>     needs to care about it), the state has morphed to reset.  Video cards
>     have funky issues here too.
> 
>   - (powerup) reset.  This is the ideal state, in terms of "truth" to convey
>     to the image we're about to restore ... no ambiguity, every driver will
>     need to re-init.  As if there were (thank you!!) no BIOS.
> 
>   - initialized by Linux ... which leads to the case my patch addresses.
> 
> Those first two states are legit for any resume() call, and they apply in
> your scenario restriction.

IIRC, there were some ALSA problems with .resume() called on a reseted
device, but they seem to be fixed now.
 
> The third state is the problem scenario, kicking in when the driver was
> statically linked (or modprobed from initramfs, etc), but not during
> your scenario.

The problem, as I see it, is that too many devices may be initialized at the
kernel startup.  I think we _can_ reset some of them before the image
is restored, but at least some of them need to be treated more carefully.

> > (7) (without your change) swsusp calls .suspend() for all device drivers that
> > are present at that time,
> 
> ... the current troublesome consequence of that third state ...
> 
> > but our driver is not there, so its .suspend() 
> > _won't_ be called.  [Of course with your change .suspend() won't be called
> > for any driver.]
> 
> Right:  the first two "safe" cases kick in.  This is the partial workaround I
> had identified:  dodging the code paths for that third state, where suspend()
> is being used to put the hardware into a broken suspend state.

So perhaps we should just make them enter a state that's not broken?
That may be reset for some devices (eg. USB) and something else for some
others (eg. storage).

> Note that with that third state, there are actually two suspend() calls, but
> only one resume() call.  (Suspend before snapshot, suspend before resume
> snapshot, resume after activating snapshot.)  Such an extra suspend() call is
> a small hint that something's odd, and maybe wrong.

Agreed.

> > (8) swsusp restores the image.
> > (9) swsusp calls .resume() for all devices _including_ our driver, because it
> > was in memory before suspend.  For our driver this .resume() is not
> > called after .suspend(), is it?
> 
> The suspend() was called from the kernel being resumed ... and the device hardware
> is in one of the states (reset) that it's allowed to be in when calling its
> matching resume().  No problem there.
> 
> 
> > You're saying that (9) is wrong, so could you please suggest what to do
> > instead of it?
> 
> The case in which (9) is wrong is the case you excluded:  where the pre-resume
> kernel loaded the driver and used the third state listed above, and then trashed
> the correct device hardware state (reset) and replaced it with a suspend state.
> 
> It may help to think of two distinct types of device hardware suspend states
> (only the first is real, the second is just a software bug):
> 
>  - Correct, with internal state corresponding to what the driver suspend() did;
>    what a normal hardware suspend/resume cycle (not powercycle!) could do.
> 
>  - Broken, with any other internal state (except reset).  This is what swsusp
>    currently forces, by adding **AND HIDING** a reset and reinit cycle, because
>    of the extra suspend() call in (7).

Still there are drivers that have no problems with it, so why we should we
forcibly reset their devices?

> My patch/suggestion just ensures that instead of that broken state, reset is used.
> in all cases ... not just the "driver not initialized before snapshot resume" case.

As I said before I generally agree with this except I think some more fine
grained approach is needed in this case.

Basically it seems we need something like a .prepare_for_resume() routine,
that will be called before restoring the swsusp's image, apart from .suspend()
and .resume() for each driver.  Your patch just assumes that
.prepare_for_resume() should be "reset" for all devices.

Greetings,
Rafael

  parent reply	other threads:[~2006-04-25 21:55 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-24 21:29 [patch/rft 2.6.17-rc2] swsusp resume must not device_suspend() David Brownell
2006-04-24 21:47 ` [linux-pm] " Rafael J. Wysocki
2006-04-24 22:47   ` David Brownell
2006-04-25 10:34     ` Nigel Cunningham
2006-04-25 14:41       ` Alan Stern
2006-04-25 17:37         ` [linux-pm] " David Brownell
2006-04-25 20:45           ` Alan Stern
2006-04-26  0:30             ` David Brownell
2006-04-27  8:20               ` Pavel Machek
2006-04-27  8:16             ` Pavel Machek
2006-04-27  8:08         ` Pavel Machek
2006-04-27 14:34           ` [linux-pm] " Alan Stern
2006-04-27 16:55             ` Patrick Mochel
2006-04-27 17:41               ` Alan Stern
2006-04-27 19:21               ` David Brownell
2006-04-27 20:35                 ` Nigel Cunningham
2006-04-27 20:58                   ` Pavel Machek
2006-04-25 16:56       ` David Brownell
2006-04-24 22:31 ` [linux-pm] " Nigel Cunningham
2006-04-25  8:32   ` Rafael J. Wysocki
2006-04-25 16:11     ` David Brownell
2006-04-25 18:56       ` Rafael J. Wysocki
2006-04-25 20:28         ` Nigel Cunningham
2006-04-25 20:53           ` [linux-pm] " Rafael J. Wysocki
2006-04-25 21:03             ` Nigel Cunningham
2006-04-25 22:06               ` Rafael J. Wysocki
2006-04-25 22:18                 ` Nigel Cunningham
2006-04-25 22:34                   ` Rafael J. Wysocki
2006-04-25 23:55                   ` David Brownell
2006-04-26  1:16                     ` Nigel Cunningham
2006-04-26  3:32                       ` [linux-pm] " David Brownell
2006-04-26  3:44                         ` Nigel Cunningham
2006-04-26 14:24           ` Alan Stern
2006-04-26 19:47             ` [linux-pm] " Nigel Cunningham
2006-04-25 21:04         ` David Brownell
2006-04-25 21:41           ` Pavel Machek
2006-04-25 23:13             ` [linux-pm] " David Brownell
2006-04-26  9:07               ` Pavel Machek
2006-04-25 21:55           ` Rafael J. Wysocki [this message]
2006-04-25 22:56             ` David Brownell
2006-04-26 11:26               ` Rafael J. Wysocki
2006-04-26 14:38                 ` [linux-pm] " Alan Stern
2006-04-26 15:26                   ` Rafael J. Wysocki
2006-04-26 15:38                     ` Alan Stern
2006-04-26 16:09                       ` Rafael J. Wysocki
2006-04-26 19:06                         ` Alan Stern
2006-04-26 20:37                           ` Rafael J. Wysocki
2006-04-26 21:31                 ` David Brownell
2006-04-26 22:24                   ` Rafael J. Wysocki
2006-04-27 19:44                     ` David Brownell
2006-04-25 15:56   ` David Brownell
2006-04-27 10:54     ` Pavel Machek
2006-04-25 13:50 ` [linux-usb-devel] " Alan Stern
2006-04-25 15:49   ` David Brownell
2006-04-27  1:22 ` Patrick Mochel
2006-04-27 19:41   ` [linux-pm] " David Brownell
2006-05-02 16:12     ` Patrick Mochel
2006-05-26  3:06       ` David Brownell
2006-05-26 19:50         ` Rafael J. Wysocki
2006-05-26 23:16         ` Pavel Machek
2006-05-27  0:19           ` [linux-usb-devel] " David Brownell
2006-05-27 16:38             ` Pavel Machek
2006-06-05 15:31               ` David Brownell
2006-06-05 16:36               ` [patch/rft 2.6.17-rc5-git 0/6] PM_EVENT_PRETHAW David Brownell
2006-06-05 16:36               ` [patch/rft 2.6.17-rc5-git 1/6] fix broken/dubious driver suspend() methods David Brownell
     [not found]                 ` <20060530191140.GA4017@ucw.cz>
2006-06-07  0:53                   ` David Brownell
2006-06-05 16:37               ` [patch/rft 2.6.17-rc5-git 2/6] add PM_EVENT_PRETHAW David Brownell
2006-05-30 19:17                 ` Pavel Machek
2006-06-07  1:02                   ` David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 3/6] PM_EVENT_PRETHAW, handle in IDE and PCI David Brownell
2006-05-30 19:21                 ` Pavel Machek
2006-06-07  0:51                   ` David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 4/6] PM_EVENT_PRETHAW for various graphics cards David Brownell
2006-05-30 19:30                 ` Pavel Machek
2006-06-07  1:24                   ` David Brownell
2006-06-07 18:57                     ` PM docs and API? bsmith
2006-06-07 22:58                       ` David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 5/6] PM_EVENT_PRETHAW, handle for USB David Brownell
2006-06-05 16:38               ` [patch/rft 2.6.17-rc5-git 6/6] PM_EVENT_PRETHAW, issue from PM core David Brownell
2006-05-30 19:28                 ` Pavel Machek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200604252355.29006.rjw@sisk.pl \
    --to=rjw@sisk.pl \
    --cc=akpm@osdl.org \
    --cc=david-b@pacbell.net \
    --cc=linux-pm@lists.osdl.org \
    --cc=linux-usb-devel@lists.sourceforge.net \
    --cc=nigel@suspend2.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.