linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
@ 2009-04-09 22:57 Rafael J. Wysocki
  2009-04-09 23:06 ` Linus Torvalds
  2009-04-14 12:09 ` Takashi Iwai
  0 siblings, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-09 22:57 UTC (permalink / raw)
  To: Arjan van de Ven, Takashi Iwai
  Cc: LKML, pm list, Linus Torvalds, Andrew Morton

Hi,

I'm seeing two problems with -git from today.

First, resume from hibernation is apparently broken if the storage driver
handling the resume device is modular.  If it is compiled statically into
the kernel, the resume works correctly.  I guess this is a consequence of the
async changes (Arjan, please help).

Second, KDE4 on openSUSE 11.1 sometimes fails to handle audio correctly after
a fresh boot.  Everything seems to work, but there's no sound at all.  It is
sufficient to close the X session and start the desktop environment again to
make it work, though (may that be async too? ;-)).  The hardware in question is
Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller
(rev 02) and the driver is snd_hda_intel.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-09 22:57 [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem Rafael J. Wysocki
@ 2009-04-09 23:06 ` Linus Torvalds
  2009-04-10 12:39   ` Rafael J. Wysocki
  2009-04-14 12:09 ` Takashi Iwai
  1 sibling, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2009-04-09 23:06 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arjan van de Ven, Takashi Iwai, LKML, pm list, Andrew Morton



On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> 
> I'm seeing two problems with -git from today.

When today? A number of people apparently had module problems before 
commit 97c18e2c7a8e36d2d83d50ee070314aadac73a11 ("module: 
try_then_request_module must wait").

If you don't have that one, then that might explain both problems, in that 
a "second try" would work for modules (your sound issue), and that 
compiling things in would obviously avoid it (your storage driver thing).

But if you _do_ have that, then it's obviously something else ;)

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-09 23:06 ` Linus Torvalds
@ 2009-04-10 12:39   ` Rafael J. Wysocki
  2009-04-10 18:58     ` [linux-pm] " Rafael J. Wysocki
  2009-04-11  2:35     ` Len Brown
  0 siblings, 2 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-10 12:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Takashi Iwai, LKML, pm list, Andrew Morton

On Friday 10 April 2009, Linus Torvalds wrote:
> 
> On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> > 
> > I'm seeing two problems with -git from today.
> 
> When today? A number of people apparently had module problems before 
> commit 97c18e2c7a8e36d2d83d50ee070314aadac73a11 ("module: 
> try_then_request_module must wait").
> 
> If you don't have that one, then that might explain both problems, in that 
> a "second try" would work for modules (your sound issue), and that 
> compiling things in would obviously avoid it (your storage driver thing).
> 
> But if you _do_ have that, then it's obviously something else ;)

Well, I do have that one. :-)

Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-10 12:39   ` Rafael J. Wysocki
@ 2009-04-10 18:58     ` Rafael J. Wysocki
  2009-04-10 19:17       ` Linus Torvalds
  2009-04-11  2:35     ` Len Brown
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-10 18:58 UTC (permalink / raw)
  To: linux-pm
  Cc: Linus Torvalds, Takashi Iwai, Arjan van de Ven, LKML,
	Andrew Morton, Ingo Molnar

On Friday 10 April 2009, Rafael J. Wysocki wrote:
> On Friday 10 April 2009, Linus Torvalds wrote:
> > 
> > On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> > > 
> > > I'm seeing two problems with -git from today.
> > 
> > When today? A number of people apparently had module problems before 
> > commit 97c18e2c7a8e36d2d83d50ee070314aadac73a11 ("module: 
> > try_then_request_module must wait").
> > 
> > If you don't have that one, then that might explain both problems, in that 
> > a "second try" would work for modules (your sound issue), and that 
> > compiling things in would obviously avoid it (your storage driver thing).
> > 
> > But if you _do_ have that, then it's obviously something else ;)
> 
> Well, I do have that one. :-)

I've just verified that the resume-after-hibernation issue goes away after
reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
(async: remove the temporary (2.6.29) "async is off by default" code) , so it
is async-related.

The audio issue still remains after the revert, so it is really different.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-10 18:58     ` [linux-pm] " Rafael J. Wysocki
@ 2009-04-10 19:17       ` Linus Torvalds
  2009-04-11  0:06         ` Rafael J. Wysocki
  2009-04-11  1:53         ` Arjan van de Ven
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2009-04-10 19:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Takashi Iwai, Arjan van de Ven, LKML, Andrew Morton,
	Ingo Molnar



On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> 
> I've just verified that the resume-after-hibernation issue goes away after
> reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
> (async: remove the temporary (2.6.29) "async is off by default" code) , so it
> is async-related.

Arjan? Clearly all the necessary fixes weren't found..

There _is_ a module loading problem wrt initmem - I think you found that 
and we added a hack for it for the ACPI battery driver. I wonder if we're 
hitting a similar issue now with module discovery: modules that use 
"async_schedule()" to do their discovery asynchronously are now not 
necessarily fully "done" when the module is loaded. 

And so, anything that expected the devices to be available after module 
load (like they used to) would be screwed.

IOW, maybe something like the totally untested patch appended here (that 
should also allow us to make the ACPI battery code to go back to using 
__init).

As usual, I'm not using modules, so what do I know.

> The audio issue still remains after the revert, so it is really different.

Ok, probably something from Takashi..

		Linus

---
 kernel/module.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 05f014e..e797812 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2388,6 +2388,9 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
 	blocking_notifier_call_chain(&module_notify_list,
 				     MODULE_STATE_LIVE, mod);
 
+	/* We need to finish all async code before the module init sequence is done */
+	async_synchronize_full();
+
 	mutex_lock(&module_mutex);
 	/* Drop initial reference. */
 	module_put(mod);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-10 19:17       ` Linus Torvalds
@ 2009-04-11  0:06         ` Rafael J. Wysocki
  2009-04-11  1:19           ` Linus Torvalds
  2009-04-11  1:53         ` Arjan van de Ven
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-11  0:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-pm, Takashi Iwai, Arjan van de Ven, LKML, Andrew Morton,
	Ingo Molnar

[Sorry for having to switch the From address, my MTA has just decided to break.]

On Friday 10 April 2009, Linus Torvalds wrote:
> 
> On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> > 
> > I've just verified that the resume-after-hibernation issue goes away after
> > reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
> > (async: remove the temporary (2.6.29) "async is off by default" code) , so it
> > is async-related.
> 
> Arjan? Clearly all the necessary fixes weren't found..
> 
> There _is_ a module loading problem wrt initmem - I think you found that 
> and we added a hack for it for the ACPI battery driver. I wonder if we're 
> hitting a similar issue now with module discovery: modules that use 
> "async_schedule()" to do their discovery asynchronously are now not 
> necessarily fully "done" when the module is loaded. 
> 
> And so, anything that expected the devices to be available after module 
> load (like they used to) would be screwed.
> 
> IOW, maybe something like the totally untested patch appended here (that 
> should also allow us to make the ACPI battery code to go back to using 
> __init).

I tested it and it worked.

> As usual, I'm not using modules, so what do I know.
> 
> > The audio issue still remains after the revert, so it is really different.
> 
> Ok, probably something from Takashi..

This one is only reproducible in one out of three attempts on the average.
I tried to bisect, but it went to nowhere.

Thanks,
Rafael

 
> ---
>  kernel/module.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 05f014e..e797812 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2388,6 +2388,9 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
>  	blocking_notifier_call_chain(&module_notify_list,
>  				     MODULE_STATE_LIVE, mod);
>  
> +	/* We need to finish all async code before the module init sequence is done */
> +	async_synchronize_full();
> +
>  	mutex_lock(&module_mutex);
>  	/* Drop initial reference. */
>  	module_put(mod);
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11  0:06         ` Rafael J. Wysocki
@ 2009-04-11  1:19           ` Linus Torvalds
  2009-04-11  1:45             ` Arjan van de Ven
  2009-04-11  2:10             ` Arjan van de Ven
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2009-04-11  1:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-pm, Takashi Iwai, Arjan van de Ven, LKML, Andrew Morton,
	Ingo Molnar



On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
> 
> On Friday 10 April 2009, Linus Torvalds wrote:
> > 
> > On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> > > 
> > > I've just verified that the resume-after-hibernation issue goes away after
> > > reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
> > > (async: remove the temporary (2.6.29) "async is off by default" code) , so it
> > > is async-related.
> > 
> > Arjan? Clearly all the necessary fixes weren't found..
> > 
> > There _is_ a module loading problem wrt initmem - I think you found that 
> > and we added a hack for it for the ACPI battery driver. I wonder if we're 
> > hitting a similar issue now with module discovery: modules that use 
> > "async_schedule()" to do their discovery asynchronously are now not 
> > necessarily fully "done" when the module is loaded. 
> > 
> > And so, anything that expected the devices to be available after module 
> > load (like they used to) would be screwed.
> > 
> > IOW, maybe something like the totally untested patch appended here (that 
> > should also allow us to make the ACPI battery code to go back to using 
> > __init).
> 
> I tested it and it worked.

Hmm.

I'm not 100% sure that patch is good.

The reason? I think it's going to deadlock if an async caller ends up 
wanting to load a module, because then the nestecd 
"async_synchronize_full()" will basically want to wait for itself.

So it's a good test-patch, and maybe no async caller ever loads a module, 
but it makes me a bit nervous.

But the fact that it fixes things for you at least means that the _reason_ 
for the problem is know, and maybe there are alternative solutions. Arjan?

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11  1:19           ` Linus Torvalds
@ 2009-04-11  1:45             ` Arjan van de Ven
  2009-04-11  2:10             ` Arjan van de Ven
  1 sibling, 0 replies; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-11  1:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, linux-pm, Takashi Iwai, LKML, Andrew Morton,
	Ingo Molnar

Linus Torvalds wrote:
> 
> Hmm.
> 
> I'm not 100% sure that patch is good.
> 
> The reason? I think it's going to deadlock if an async caller ends up 
> wanting to load a module, because then the nestecd 
> "async_synchronize_full()" will basically want to wait for itself.
> 
> So it's a good test-patch, and maybe no async caller ever loads a module, 
> but it makes me a bit nervous.
> 
> But the fact that it fixes things for you at least means that the _reason_ 
> for the problem is know, and maybe there are alternative solutions. Arjan?

I just got back from the LF summit and am catching up on mail; I'll take a look
at the history.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-10 19:17       ` Linus Torvalds
  2009-04-11  0:06         ` Rafael J. Wysocki
@ 2009-04-11  1:53         ` Arjan van de Ven
  1 sibling, 0 replies; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-11  1:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, linux-pm, Takashi Iwai, LKML, Andrew Morton,
	Ingo Molnar

Linus Torvalds wrote:
> 
> On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
>> I've just verified that the resume-after-hibernation issue goes away after
>> reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
>> (async: remove the temporary (2.6.29) "async is off by default" code) , so it
>> is async-related.
> 
> Arjan? Clearly all the necessary fixes weren't found..
> 
> There _is_ a module loading problem wrt initmem - I think you found that 
> and we added a hack for it for the ACPI battery driver. I wonder if we're 
> hitting a similar issue now with module discovery: modules that use 
> "async_schedule()" to do their discovery asynchronously are now not 
> necessarily fully "done" when the module is loaded. 

this is both a "yes and no" kind of thing. It already happened for many cases
(USB, SCSI (and thus libata) etc) but now it can happen for more new cases.


> And so, anything that expected the devices to be available after module 
> load (like they used to) would be screwed.
> 
> IOW, maybe something like the totally untested patch appended here (that 
> should also allow us to make the ACPI battery code to go back to using 
> __init).
> 

this will work. It's a tad unfortunate that we basically end up synchronizing
at load time; maybe some time in the future we can make this opt-in/out ;-(

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11  1:19           ` Linus Torvalds
  2009-04-11  1:45             ` Arjan van de Ven
@ 2009-04-11  2:10             ` Arjan van de Ven
  2009-04-11 10:46               ` Rafael J. Wysocki
  2009-04-11 19:38               ` Rafael J. Wysocki
  1 sibling, 2 replies; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-11  2:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, linux-pm, Takashi Iwai, LKML, Andrew Morton,
	Ingo Molnar

Linus Torvalds wrote:
> 
> On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
>> On Friday 10 April 2009, Linus Torvalds wrote:
>>> On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
>>>> I've just verified that the resume-after-hibernation issue goes away after
>>>> reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
>>>> (async: remove the temporary (2.6.29) "async is off by default" code) , so it
>>>> is async-related.
>>> Arjan? Clearly all the necessary fixes weren't found..
>>>
>>> There _is_ a module loading problem wrt initmem - I think you found that 
>>> and we added a hack for it for the ACPI battery driver. I wonder if we're 
>>> hitting a similar issue now with module discovery: modules that use 
>>> "async_schedule()" to do their discovery asynchronously are now not 
>>> necessarily fully "done" when the module is loaded. 
>>>
>>> And so, anything that expected the devices to be available after module 
>>> load (like they used to) would be screwed.
>>>
>>> IOW, maybe something like the totally untested patch appended here (that 
>>> should also allow us to make the ACPI battery code to go back to using 
>>> __init).
>> I tested it and it worked.
> 
> Hmm.
> 
> I'm not 100% sure that patch is good.
> 
> The reason? I think it's going to deadlock if an async caller ends up 
> wanting to load a module, because then the nested 
> "async_synchronize_full()" will basically want to wait for itself.
> 
> So it's a good test-patch, and maybe no async caller ever loads a module, 
> but it makes me a bit nervous.

It would make a rule that async context can only use request_module_nowait().
Not too nice, I think we can do better.

> But the fact that it fixes things for you at least means that the _reason_ 
> for the problem is know, and maybe there are alternative solutions. Arjan?

We have async domains; for the acpi type of case we could make a "__init" domain
that we wait on selectively... but I'm not too fond of this since it'll be fragile over time.

I suppose this got exposed now that we (just) removed the stop_machine stuff from the module loader...
(which is a generally good improvement, don't get me wrong).

For the __init freeing case there is a even more interesting option:
We can schedule an async task that will free the init mem of the module, and have that
async task just wait for all its predecessors to complete before doing the actual work.
That way the module is available and ready to its caller, while the freeing of the init mem
will be done at a safe time.

Like this (not yet tested):

diff --git a/kernel/module.c b/kernel/module.c
index 1196f5d..4ec90e8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2312,6 +2312,22 @@ static noinline struct module *load_module(void __user *umod,
  	goto free_hdr;
  }

+static void async_module_free_initmem(void *data, async_cookie_t cookie)
+{
+	struct module *mod = data;
+	async_synchronize_cookie(cookie);
+
+	mutex_lock(&module_mutex);
+	/* Drop initial reference. */
+	module_put(mod);
+	module_free(mod, mod->module_init);
+	mod->module_init = NULL;
+	mod->init_size = 0;
+	mod->init_text_size = 0;
+	mutex_unlock(&module_mutex);
+
+}
+
  /* This is where the real work happens */
  SYSCALL_DEFINE3(init_module, void __user *, umod,
  		unsigned long, len, const char __user *, uargs)
@@ -2372,15 +2388,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
  	blocking_notifier_call_chain(&module_notify_list,
  				     MODULE_STATE_LIVE, mod);

-	mutex_lock(&module_mutex);
-	/* Drop initial reference. */
-	module_put(mod);
-	module_free(mod, mod->module_init);
-	mod->module_init = NULL;
-	mod->init_size = 0;
-	mod->init_text_size = 0;
-	mutex_unlock(&module_mutex);
-
+	async_schedule(async_module_free_initmem, mod);
  	return 0;
  }



Now the second case of "the devices are available when the module load is done".

The hard case here is that we're talking about a storage device. In 2.6.28 and before (eg well before
any async stuff landed), SCSI probing already was done asynchronously. Same for USB. Libata I think is the same,
by virtue of using scsi as infrastructure.

Realistically, unless you call scsi_complete_async_scans(), you could not depend on scsi devices being available
after loading one of their modules. (from userland you could do this via the scsi_wait_scan module)

(speculation: on ATA likely the scan was done relatively quickly ... and with async it's just done with different timing. So it
was kind of luck before)

Rafael: would it be reasonable to call "scsi_compete_async_scans()" from the resume-from-disk code ?

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-10 12:39   ` Rafael J. Wysocki
  2009-04-10 18:58     ` [linux-pm] " Rafael J. Wysocki
@ 2009-04-11  2:35     ` Len Brown
  2009-04-11 18:11       ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Len Brown @ 2009-04-11  2:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, Takashi Iwai, pm list, Arjan van de Ven, LKML,
	Andrew Morton, linux-acpi

fastboot also causes the S3 regression here:

http://bugzilla.kernel.org/show_bug.cgi?id=12936


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11  2:10             ` Arjan van de Ven
@ 2009-04-11 10:46               ` Rafael J. Wysocki
  2009-04-11 19:38               ` Rafael J. Wysocki
  1 sibling, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-11 10:46 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, linux-pm, Takashi Iwai, LKML, Andrew Morton, Ingo Molnar

On Saturday 11 April 2009, Arjan van de Ven wrote:
> Linus Torvalds wrote:
> > 
> > On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
> >> On Friday 10 April 2009, Linus Torvalds wrote:
> >>> On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> >>>> I've just verified that the resume-after-hibernation issue goes away after
> >>>> reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
> >>>> (async: remove the temporary (2.6.29) "async is off by default" code) , so it
> >>>> is async-related.
> >>> Arjan? Clearly all the necessary fixes weren't found..
> >>>
> >>> There _is_ a module loading problem wrt initmem - I think you found that 
> >>> and we added a hack for it for the ACPI battery driver. I wonder if we're 
> >>> hitting a similar issue now with module discovery: modules that use 
> >>> "async_schedule()" to do their discovery asynchronously are now not 
> >>> necessarily fully "done" when the module is loaded. 
> >>>
> >>> And so, anything that expected the devices to be available after module 
> >>> load (like they used to) would be screwed.
> >>>
> >>> IOW, maybe something like the totally untested patch appended here (that 
> >>> should also allow us to make the ACPI battery code to go back to using 
> >>> __init).
> >> I tested it and it worked.
> > 
> > Hmm.
> > 
> > I'm not 100% sure that patch is good.
> > 
> > The reason? I think it's going to deadlock if an async caller ends up 
> > wanting to load a module, because then the nested 
> > "async_synchronize_full()" will basically want to wait for itself.
> > 
> > So it's a good test-patch, and maybe no async caller ever loads a module, 
> > but it makes me a bit nervous.
> 
> It would make a rule that async context can only use request_module_nowait().
> Not too nice, I think we can do better.
> 
> > But the fact that it fixes things for you at least means that the _reason_ 
> > for the problem is know, and maybe there are alternative solutions. Arjan?
> 
> We have async domains; for the acpi type of case we could make a "__init" domain
> that we wait on selectively... but I'm not too fond of this since it'll be fragile over time.
> 
> I suppose this got exposed now that we (just) removed the stop_machine stuff from the module loader...
> (which is a generally good improvement, don't get me wrong).
> 
> For the __init freeing case there is a even more interesting option:
> We can schedule an async task that will free the init mem of the module, and have that
> async task just wait for all its predecessors to complete before doing the actual work.
> That way the module is available and ready to its caller, while the freeing of the init mem
> will be done at a safe time.
> 
> Like this (not yet tested):
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 1196f5d..4ec90e8 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2312,6 +2312,22 @@ static noinline struct module *load_module(void __user *umod,
>   	goto free_hdr;
>   }
> 
> +static void async_module_free_initmem(void *data, async_cookie_t cookie)
> +{
> +	struct module *mod = data;
> +	async_synchronize_cookie(cookie);
> +
> +	mutex_lock(&module_mutex);
> +	/* Drop initial reference. */
> +	module_put(mod);
> +	module_free(mod, mod->module_init);
> +	mod->module_init = NULL;
> +	mod->init_size = 0;
> +	mod->init_text_size = 0;
> +	mutex_unlock(&module_mutex);
> +
> +}
> +
>   /* This is where the real work happens */
>   SYSCALL_DEFINE3(init_module, void __user *, umod,
>   		unsigned long, len, const char __user *, uargs)
> @@ -2372,15 +2388,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
>   	blocking_notifier_call_chain(&module_notify_list,
>   				     MODULE_STATE_LIVE, mod);
> 
> -	mutex_lock(&module_mutex);
> -	/* Drop initial reference. */
> -	module_put(mod);
> -	module_free(mod, mod->module_init);
> -	mod->module_init = NULL;
> -	mod->init_size = 0;
> -	mod->init_text_size = 0;
> -	mutex_unlock(&module_mutex);
> -
> +	async_schedule(async_module_free_initmem, mod);
>   	return 0;
>   }
> 
> 
> 
> Now the second case of "the devices are available when the module load is done".
> 
> The hard case here is that we're talking about a storage device. In 2.6.28 and before (eg well before
> any async stuff landed), SCSI probing already was done asynchronously. Same for USB. Libata I think is the same,
> by virtue of using scsi as infrastructure.
> 
> Realistically, unless you call scsi_complete_async_scans(), you could not depend on scsi devices being available
> after loading one of their modules. (from userland you could do this via the scsi_wait_scan module)
> 
> (speculation: on ATA likely the scan was done relatively quickly ... and with async it's just done with different timing. So it
> was kind of luck before)
> 
> Rafael: would it be reasonable to call "scsi_compete_async_scans()" from the resume-from-disk code ?

I think it would.

I'll try to put it in there and see if that helps.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11  2:35     ` Len Brown
@ 2009-04-11 18:11       ` Linus Torvalds
  2009-04-11 19:00         ` Heinz Diehl
  2009-04-11 19:16         ` Arjan van de Ven
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2009-04-11 18:11 UTC (permalink / raw)
  To: Len Brown
  Cc: Rafael J. Wysocki, Takashi Iwai, pm list, Arjan van de Ven, LKML,
	Andrew Morton, linux-acpi



On Fri, 10 Apr 2009, Len Brown wrote:
>
> fastboot also causes the S3 regression here:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=12936

Looks like the same issue. Does my one-liner fix that case too?

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 18:11       ` Linus Torvalds
@ 2009-04-11 19:00         ` Heinz Diehl
  2009-04-11 19:16         ` Arjan van de Ven
  1 sibling, 0 replies; 30+ messages in thread
From: Heinz Diehl @ 2009-04-11 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Takashi Iwai, pm list, Arjan van de Ven, LKML,
	Andrew Morton, linux-acpi

On 11.04.2009, Linus Torvalds wrote: 

[Len Brown:]
> > fastboot also causes the S3 regression here:
> > http://bugzilla.kernel.org/show_bug.cgi?id=12936
 
> Looks like the same issue. Does my one-liner fix that case too?

http://bugzilla.kernel.org/show_bug.cgi?id=13063

Your one-liner fixes this bug, too. Without the revertion mentioned in the
bugreport applied!

Regards,
Heinz.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 18:11       ` Linus Torvalds
  2009-04-11 19:00         ` Heinz Diehl
@ 2009-04-11 19:16         ` Arjan van de Ven
  2009-04-11 19:50           ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-11 19:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Len Brown, Rafael J. Wysocki, Takashi Iwai, pm list, LKML,
	Andrew Morton, linux-acpi

Linus Torvalds wrote:
> 
> On Fri, 10 Apr 2009, Len Brown wrote:
>> fastboot also causes the S3 regression here:
>>
>> http://bugzilla.kernel.org/show_bug.cgi?id=12936
> 
> Looks like the same issue. Does my one-liner fix that case too?
Now that I've been able to scan most of my mail; it looks like the one-liner should just go in.
Restricting module loading from async work to only non-blocking is not a big deal,
and it seems that userspace in various distros is really broken (and breaks on scsi already today),
but that is not something we can really fix quickly.
The sad part is that the userland is unlikely get fixed unless it shows breakage, and we can't break it (obviously).

I don't know of any good other solutions; right now it's mostly the partition scan (which is already mostly async on
scsi since a really long time, just in practice fast enough unless you have a big server).

(The __init thing in modules is separate and easy to fix, but without the other stuff being fixed there's no point)



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11  2:10             ` Arjan van de Ven
  2009-04-11 10:46               ` Rafael J. Wysocki
@ 2009-04-11 19:38               ` Rafael J. Wysocki
  2009-04-11 19:55                 ` Linus Torvalds
  1 sibling, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-11 19:38 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, linux-pm, Takashi Iwai, LKML, Andrew Morton, Ingo Molnar

On Saturday 11 April 2009, Arjan van de Ven wrote:
> Linus Torvalds wrote:
> > 
> > On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
> >> On Friday 10 April 2009, Linus Torvalds wrote:
> >>> On Fri, 10 Apr 2009, Rafael J. Wysocki wrote:
> >>>> I've just verified that the resume-after-hibernation issue goes away after
> >>>> reverting commit 9710794383ee5008d67f1a6613a4717bf6de47bc
> >>>> (async: remove the temporary (2.6.29) "async is off by default" code) , so it
> >>>> is async-related.
> >>> Arjan? Clearly all the necessary fixes weren't found..
> >>>
> >>> There _is_ a module loading problem wrt initmem - I think you found that 
> >>> and we added a hack for it for the ACPI battery driver. I wonder if we're 
> >>> hitting a similar issue now with module discovery: modules that use 
> >>> "async_schedule()" to do their discovery asynchronously are now not 
> >>> necessarily fully "done" when the module is loaded. 
> >>>
> >>> And so, anything that expected the devices to be available after module 
> >>> load (like they used to) would be screwed.
> >>>
> >>> IOW, maybe something like the totally untested patch appended here (that 
> >>> should also allow us to make the ACPI battery code to go back to using 
> >>> __init).
> >> I tested it and it worked.
> > 
> > Hmm.
> > 
> > I'm not 100% sure that patch is good.
> > 
> > The reason? I think it's going to deadlock if an async caller ends up 
> > wanting to load a module, because then the nested 
> > "async_synchronize_full()" will basically want to wait for itself.
> > 
> > So it's a good test-patch, and maybe no async caller ever loads a module, 
> > but it makes me a bit nervous.
> 
> It would make a rule that async context can only use request_module_nowait().
> Not too nice, I think we can do better.
> 
> > But the fact that it fixes things for you at least means that the _reason_ 
> > for the problem is know, and maybe there are alternative solutions. Arjan?
> 
> We have async domains; for the acpi type of case we could make a "__init" domain
> that we wait on selectively... but I'm not too fond of this since it'll be fragile over time.
> 
> I suppose this got exposed now that we (just) removed the stop_machine stuff from the module loader...
> (which is a generally good improvement, don't get me wrong).
> 
> For the __init freeing case there is a even more interesting option:
> We can schedule an async task that will free the init mem of the module, and have that
> async task just wait for all its predecessors to complete before doing the actual work.
> That way the module is available and ready to its caller, while the freeing of the init mem
> will be done at a safe time.
> 
> Like this (not yet tested):
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index 1196f5d..4ec90e8 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -2312,6 +2312,22 @@ static noinline struct module *load_module(void __user *umod,
>   	goto free_hdr;
>   }
> 
> +static void async_module_free_initmem(void *data, async_cookie_t cookie)
> +{
> +	struct module *mod = data;
> +	async_synchronize_cookie(cookie);
> +
> +	mutex_lock(&module_mutex);
> +	/* Drop initial reference. */
> +	module_put(mod);
> +	module_free(mod, mod->module_init);
> +	mod->module_init = NULL;
> +	mod->init_size = 0;
> +	mod->init_text_size = 0;
> +	mutex_unlock(&module_mutex);
> +
> +}
> +
>   /* This is where the real work happens */
>   SYSCALL_DEFINE3(init_module, void __user *, umod,
>   		unsigned long, len, const char __user *, uargs)
> @@ -2372,15 +2388,7 @@ SYSCALL_DEFINE3(init_module, void __user *, umod,
>   	blocking_notifier_call_chain(&module_notify_list,
>   				     MODULE_STATE_LIVE, mod);
> 
> -	mutex_lock(&module_mutex);
> -	/* Drop initial reference. */
> -	module_put(mod);
> -	module_free(mod, mod->module_init);
> -	mod->module_init = NULL;
> -	mod->init_size = 0;
> -	mod->init_text_size = 0;
> -	mutex_unlock(&module_mutex);
> -
> +	async_schedule(async_module_free_initmem, mod);
>   	return 0;
>   }
> 
> 
> 
> Now the second case of "the devices are available when the module load is done".
> 
> The hard case here is that we're talking about a storage device. In 2.6.28 and before (eg well before
> any async stuff landed), SCSI probing already was done asynchronously. Same for USB. Libata I think is the same,
> by virtue of using scsi as infrastructure.
> 
> Realistically, unless you call scsi_complete_async_scans(), you could not depend on scsi devices being available
> after loading one of their modules. (from userland you could do this via the scsi_wait_scan module)
> 
> (speculation: on ATA likely the scan was done relatively quickly ... and with async it's just done with different timing. So it
> was kind of luck before)
> 
> Rafael: would it be reasonable to call "scsi_compete_async_scans()" from the resume-from-disk code ?

FWIW, I applied the appended patch and it fixed the problem for me.

Thanks,
Rafael

---
 drivers/scsi/scsi_priv.h      |    3 ---
 drivers/scsi/scsi_wait_scan.c |    2 +-
 include/scsi/scsi_scan.h      |   11 +++++++++++
 kernel/power/disk.c           |    8 ++++++++
 4 files changed, 20 insertions(+), 4 deletions(-)

Index: linux-2.6/include/scsi/scsi_scan.h
===================================================================
--- /dev/null
+++ linux-2.6/include/scsi/scsi_scan.h
@@ -0,0 +1,11 @@
+#ifndef _SCSI_SCSI_SCAN_H
+#define _SCSI_SCSI_SCAN_H
+
+#ifdef CONFIG_SCSI
+/* drivers/scsi/scsi_scan.c */
+extern int scsi_complete_async_scans(void);
+#else
+static inline int scsi_complete_async_scans(void) { return 0; }
+#endif
+
+#endif /* _SCSI_SCSI_SCAN_H */
Index: linux-2.6/drivers/scsi/scsi_priv.h
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_priv.h
+++ linux-2.6/drivers/scsi/scsi_priv.h
@@ -38,9 +38,6 @@ static inline void scsi_log_completion(s
 	{ };
 #endif
 
-/* scsi_scan.c */
-int scsi_complete_async_scans(void);
-
 /* scsi_devinfo.c */
 extern int scsi_get_device_flags(struct scsi_device *sdev,
 				 const unsigned char *vendor,
Index: linux-2.6/drivers/scsi/scsi_wait_scan.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_wait_scan.c
+++ linux-2.6/drivers/scsi/scsi_wait_scan.c
@@ -11,7 +11,7 @@
  */
 
 #include <linux/module.h>
-#include "scsi_priv.h"
+#include <scsi/scsi_scan.h>
 
 static int __init wait_scan_init(void)
 {
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -22,6 +22,7 @@
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/freezer.h>
+#include <scsi/scsi_scan.h>
 #include <asm/suspend.h>
 
 #include "power.h"
@@ -645,6 +646,13 @@ static int software_resume(void)
 		return 0;
 
 	/*
+	 * We can't depend on SCSI devices being available after loading one of
+	 * their modules if scsi_complete_async_scans() is not called and the
+	 * resume device usually is a SCSI one.
+	 */
+	scsi_complete_async_scans();
+
+	/*
 	 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
 	 * is configured into the kernel. Since the regular hibernate
 	 * trigger path is via sysfs which takes a buffer mutex before


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 19:16         ` Arjan van de Ven
@ 2009-04-11 19:50           ` Linus Torvalds
  2009-04-11 20:11             ` Arjan van de Ven
  2009-04-11 20:18             ` Arkadiusz Miskiewicz
  0 siblings, 2 replies; 30+ messages in thread
From: Linus Torvalds @ 2009-04-11 19:50 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Len Brown, Rafael J. Wysocki, Takashi Iwai, pm list, LKML,
	Andrew Morton, linux-acpi, Heinz Diehl, Arkadiusz Miskiewicz,
	Vegard Nossum



On Sat, 11 Apr 2009, Arjan van de Ven wrote:
>
> Now that I've been able to scan most of my mail; it looks like the one-liner
> should just go in.

Ok, I committed my one-liner, and then also reverted the ACPI battery 
workaround of removing __init, since the one-liner should fix that too.

Cc'ing the people involved with that commit, just so that they know to 
test to make sure the alternate fix really did fix it for them (I'm pretty 
sure it does, but still a good idea to verify or at least let people know 
that a previous fix got reverted)

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 19:38               ` Rafael J. Wysocki
@ 2009-04-11 19:55                 ` Linus Torvalds
  2009-04-11 20:05                   ` Arjan van de Ven
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2009-04-11 19:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arjan van de Ven, linux-pm, Takashi Iwai, LKML, Andrew Morton,
	Ingo Molnar



On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
> 
> FWIW, I applied the appended patch and it fixed the problem for me.

I think that patch is probably the right thing to do regardless, but I 
also think it doesn't obviate the need for also doing so at module loading 
time. 

I'm pretty sure that calling scsi_complete_async_scans() won't protect 
against an async USB bus scan, for example.

But I do think your patch is probably worth doing anyway.

		Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 19:55                 ` Linus Torvalds
@ 2009-04-11 20:05                   ` Arjan van de Ven
  2009-04-12 18:06                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-11 20:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, linux-pm, Takashi Iwai, LKML, Andrew Morton,
	Ingo Molnar

Linus Torvalds wrote:
> 
> On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
>> FWIW, I applied the appended patch and it fixed the problem for me.
> 
> I think that patch is probably the right thing to do regardless, but I 
> also think it doesn't obviate the need for also doing so at module loading 
> time. 
> 
> I'm pretty sure that calling scsi_complete_async_scans() won't protect 
> against an async USB bus scan, for example.

but neither does anything else we have in the kernel right now.......
(specifically, USB does not use the async infrastructure)

> But I do think your patch is probably worth doing anyway.

Absolutely; I wonder if we should make a more generic "wait for async storage scans"
function, that just calls this one, but allows other systems to also be in there.
(although.. right now I'd not know which ones it would be).




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 19:50           ` Linus Torvalds
@ 2009-04-11 20:11             ` Arjan van de Ven
  2009-04-11 20:18             ` Arkadiusz Miskiewicz
  1 sibling, 0 replies; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-11 20:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Len Brown, Rafael J. Wysocki, Takashi Iwai, pm list, LKML,
	Andrew Morton, linux-acpi, Heinz Diehl, Arkadiusz Miskiewicz,
	Vegard Nossum

Linus Torvalds wrote:
> 
> On Sat, 11 Apr 2009, Arjan van de Ven wrote:
>> Now that I've been able to scan most of my mail; it looks like the one-liner
>> should just go in.
> 
> Ok, I committed my one-liner, and then also reverted the ACPI battery 
> workaround of removing __init, since the one-liner should fix that too.
> 
> Cc'ing the people involved with that commit, just so that they know to 
> test to make sure the alternate fix really did fix it for them (I'm pretty 
> sure it does, but still a good idea to verify or at least let people know 
> that a previous fix got reverted)

longer term I would like to work on two things
1) Fix the __init thing by just only freeing when appropriate, but not having to wait for it
2) Find a way for userland to trigger a sync
    - for all storage probing
    - for just async work
    - "global sync", which includes all driver init
    We can then have insmod/modprobe use this always, unless a flag is set
    (so say, udev, can load a slew of modules, and then sync only on the last one)

and then after a year or two we can maybe remove the sync from the module loader ;(

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 19:50           ` Linus Torvalds
  2009-04-11 20:11             ` Arjan van de Ven
@ 2009-04-11 20:18             ` Arkadiusz Miskiewicz
  1 sibling, 0 replies; 30+ messages in thread
From: Arkadiusz Miskiewicz @ 2009-04-11 20:18 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arjan van de Ven, Len Brown, Rafael J. Wysocki, Takashi Iwai,
	pm list, LKML, Andrew Morton, linux-acpi, Heinz Diehl,
	Vegard Nossum

On Saturday 11 of April 2009, Linus Torvalds wrote:
> On Sat, 11 Apr 2009, Arjan van de Ven wrote:
> > Now that I've been able to scan most of my mail; it looks like the
> > one-liner should just go in.
>
> Ok, I committed my one-liner, and then also reverted the ACPI battery
> workaround of removing __init, since the one-liner should fix that too.

Tested - works here|.

> 			Linus


-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-11 20:05                   ` Arjan van de Ven
@ 2009-04-12 18:06                     ` Rafael J. Wysocki
  2009-04-12 18:27                       ` Arjan van de Ven
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-12 18:06 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Linus Torvalds, linux-pm, Takashi Iwai, LKML, Andrew Morton, Ingo Molnar

On Saturday 11 April 2009, Arjan van de Ven wrote:
> Linus Torvalds wrote:
> > 
> > On Sat, 11 Apr 2009, Rafael J. Wysocki wrote:
> >> FWIW, I applied the appended patch and it fixed the problem for me.
> > 
> > I think that patch is probably the right thing to do regardless, but I 
> > also think it doesn't obviate the need for also doing so at module loading 
> > time. 
> > 
> > I'm pretty sure that calling scsi_complete_async_scans() won't protect 
> > against an async USB bus scan, for example.
> 
> but neither does anything else we have in the kernel right now.......
> (specifically, USB does not use the async infrastructure)
> 
> > But I do think your patch is probably worth doing anyway.
> 
> Absolutely; I wonder if we should make a more generic "wait for async storage scans"
> function, that just calls this one, but allows other systems to also be in there.
> (although.. right now I'd not know which ones it would be).

OK, updated patch follows, with a changelog.

I've added this check to user.c too, because that code can be called
independently of the one in disk.c .  Also, if resume is user space-driven,
it's a good idea to wait for all of the device probes to complete before
continuing.

Thanks,
Rafael

---
From: Rafael J. Wysocki <rjw@sisk.pl>
Subject: PM/Hibernate: Wait for SCSI devices scan to complete during resume

There is a race between resume from hibernation and the asynchronous
scanning of SCSI devices and to prevent it from happening we need to
call scsi_complete_async_scans() during resume from hibernation.

In addition, if the resume from hibernation is userland-driven, it's
better to wait for all device probes in the kernel to complete before
attempting to open the resume device.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
---
 drivers/scsi/scsi_priv.h      |    3 ---
 drivers/scsi/scsi_wait_scan.c |    2 +-
 include/scsi/scsi_scan.h      |   11 +++++++++++
 kernel/power/disk.c           |    8 ++++++++
 kernel/power/user.c           |    9 +++++++++
 5 files changed, 29 insertions(+), 4 deletions(-)

Index: linux-2.6/include/scsi/scsi_scan.h
===================================================================
--- /dev/null
+++ linux-2.6/include/scsi/scsi_scan.h
@@ -0,0 +1,11 @@
+#ifndef _SCSI_SCSI_SCAN_H
+#define _SCSI_SCSI_SCAN_H
+
+#ifdef CONFIG_SCSI
+/* drivers/scsi/scsi_scan.c */
+extern int scsi_complete_async_scans(void);
+#else
+static inline int scsi_complete_async_scans(void) { return 0; }
+#endif
+
+#endif /* _SCSI_SCSI_SCAN_H */
Index: linux-2.6/drivers/scsi/scsi_priv.h
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_priv.h
+++ linux-2.6/drivers/scsi/scsi_priv.h
@@ -38,9 +38,6 @@ static inline void scsi_log_completion(s
 	{ };
 #endif
 
-/* scsi_scan.c */
-int scsi_complete_async_scans(void);
-
 /* scsi_devinfo.c */
 extern int scsi_get_device_flags(struct scsi_device *sdev,
 				 const unsigned char *vendor,
Index: linux-2.6/drivers/scsi/scsi_wait_scan.c
===================================================================
--- linux-2.6.orig/drivers/scsi/scsi_wait_scan.c
+++ linux-2.6/drivers/scsi/scsi_wait_scan.c
@@ -11,7 +11,7 @@
  */
 
 #include <linux/module.h>
-#include "scsi_priv.h"
+#include <scsi/scsi_scan.h>
 
 static int __init wait_scan_init(void)
 {
Index: linux-2.6/kernel/power/disk.c
===================================================================
--- linux-2.6.orig/kernel/power/disk.c
+++ linux-2.6/kernel/power/disk.c
@@ -22,6 +22,7 @@
 #include <linux/console.h>
 #include <linux/cpu.h>
 #include <linux/freezer.h>
+#include <scsi/scsi_scan.h>
 #include <asm/suspend.h>
 
 #include "power.h"
@@ -645,6 +646,13 @@ static int software_resume(void)
 		return 0;
 
 	/*
+	 * We can't depend on SCSI devices being available after loading one of
+	 * their modules if scsi_complete_async_scans() is not called and the
+	 * resume device usually is a SCSI one.
+	 */
+	scsi_complete_async_scans();
+
+	/*
 	 * name_to_dev_t() below takes a sysfs buffer mutex when sysfs
 	 * is configured into the kernel. Since the regular hibernate
 	 * trigger path is via sysfs which takes a buffer mutex before
Index: linux-2.6/kernel/power/user.c
===================================================================
--- linux-2.6.orig/kernel/power/user.c
+++ linux-2.6/kernel/power/user.c
@@ -24,6 +24,7 @@
 #include <linux/cpu.h>
 #include <linux/freezer.h>
 #include <linux/smp_lock.h>
+#include <scsi/scsi_scan.h>
 
 #include <asm/uaccess.h>
 
@@ -92,6 +93,7 @@ static int snapshot_open(struct inode *i
 	filp->private_data = data;
 	memset(&data->handle, 0, sizeof(struct snapshot_handle));
 	if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
+		/* Hibernating.  The image device should be accessible. */
 		data->swap = swsusp_resume_device ?
 			swap_type_of(swsusp_resume_device, 0, NULL) : -1;
 		data->mode = O_RDONLY;
@@ -99,6 +101,13 @@ static int snapshot_open(struct inode *i
 		if (error)
 			pm_notifier_call_chain(PM_POST_HIBERNATION);
 	} else {
+		/*
+		 * Resuming.  We may need to wait for the image device to
+		 * appear.
+		 */
+		wait_for_device_probe();
+		scsi_complete_async_scans();
+
 		data->swap = -1;
 		data->mode = O_WRONLY;
 		error = pm_notifier_call_chain(PM_RESTORE_PREPARE);

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [linux-pm] [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-12 18:06                     ` Rafael J. Wysocki
@ 2009-04-12 18:27                       ` Arjan van de Ven
  0 siblings, 0 replies; 30+ messages in thread
From: Arjan van de Ven @ 2009-04-12 18:27 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linus Torvalds, linux-pm, Takashi Iwai, LKML, Andrew Morton, Ingo Molnar

Rafael J. Wysocki wrote:
> 
> OK, updated patch follows, with a changelog.
> 
> I've added this check to user.c too, because that code can be called
> independently of the one in disk.c .  Also, if resume is user space-driven,
> it's a good idea to wait for all of the device probes to complete before
> continuing.
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rjw@sisk.pl>
> Subject: PM/Hibernate: Wait for SCSI devices scan to complete during resume
> 
> There is a race between resume from hibernation and the asynchronous
> scanning of SCSI devices and to prevent it from happening we need to
> call scsi_complete_async_scans() during resume from hibernation.
> 
> In addition, if the resume from hibernation is userland-driven, it's
> better to wait for all device probes in the kernel to complete before
> attempting to open the resume device.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

looks like a good fix to me (regardless of the async stuff)

Acked-by: Arjan van de Ven <arjan@linux.intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-09 22:57 [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem Rafael J. Wysocki
  2009-04-09 23:06 ` Linus Torvalds
@ 2009-04-14 12:09 ` Takashi Iwai
  2009-04-14 21:12   ` Rafael J. Wysocki
  1 sibling, 1 reply; 30+ messages in thread
From: Takashi Iwai @ 2009-04-14 12:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arjan van de Ven, LKML, pm list, Linus Torvalds, Andrew Morton

At Fri, 10 Apr 2009 00:57:41 +0200,
Rafael J. Wysocki wrote:
> 
> Second, KDE4 on openSUSE 11.1 sometimes fails to handle audio correctly after
> a fresh boot.  Everything seems to work, but there's no sound at all.  It is
> sufficient to close the X session and start the desktop environment again to
> make it work, though (may that be async too? ;-)).  The hardware in question is
> Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller
> (rev 02) and the driver is snd_hda_intel.

Is this problem still there?

If restarting the X session fixes the problem, it could be a
pulseaudio problem.  But, then it doesn't sound like a kernel update
issue.

Could you check whether non-PA environment (e.g. linux console) works
reliably?


thanks,

Takashi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-14 12:09 ` Takashi Iwai
@ 2009-04-14 21:12   ` Rafael J. Wysocki
  2009-04-15 13:13     ` Takashi Iwai
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-14 21:12 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Arjan van de Ven, LKML, pm list, Linus Torvalds, Andrew Morton

On Tuesday 14 April 2009, Takashi Iwai wrote:
> At Fri, 10 Apr 2009 00:57:41 +0200,
> Rafael J. Wysocki wrote:
> > 
> > Second, KDE4 on openSUSE 11.1 sometimes fails to handle audio correctly after
> > a fresh boot.  Everything seems to work, but there's no sound at all.  It is
> > sufficient to close the X session and start the desktop environment again to
> > make it work, though (may that be async too? ;-)).  The hardware in question is
> > Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller
> > (rev 02) and the driver is snd_hda_intel.
> 
> Is this problem still there?

Just a few minutes ago sound stopped working (that was after a resume from S3,
but not immediately after it), without anything suspicious in dmesg etc.
Restarting X fixed that.

Still, that doesn't happen very often and it is not readily reproducible.

> If restarting the X session fixes the problem, it could be a
> pulseaudio problem.  But, then it doesn't sound like a kernel update
> issue.

Well, that may be a pulseaudio problem, but the very same pulsaudio apparently
 worked well with 2.6.29-git<something>.  That might be a coincidence, though.
 
> Could you check whether non-PA environment (e.g. linux console) works
> reliably?

Yes, it appears to work.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-14 21:12   ` Rafael J. Wysocki
@ 2009-04-15 13:13     ` Takashi Iwai
  2009-04-15 20:59       ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Takashi Iwai @ 2009-04-15 13:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Arjan van de Ven, LKML, pm list, Linus Torvalds, Andrew Morton

At Tue, 14 Apr 2009 23:12:33 +0200,
Rafael J. Wysocki wrote:
> 
> On Tuesday 14 April 2009, Takashi Iwai wrote:
> > At Fri, 10 Apr 2009 00:57:41 +0200,
> > Rafael J. Wysocki wrote:
> > > 
> > > Second, KDE4 on openSUSE 11.1 sometimes fails to handle audio correctly after
> > > a fresh boot.  Everything seems to work, but there's no sound at all.  It is
> > > sufficient to close the X session and start the desktop environment again to
> > > make it work, though (may that be async too? ;-)).  The hardware in question is
> > > Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller
> > > (rev 02) and the driver is snd_hda_intel.
> > 
> > Is this problem still there?
> 
> Just a few minutes ago sound stopped working (that was after a resume from S3,
> but not immediately after it), without anything suspicious in dmesg etc.
> Restarting X fixed that.
> 
> Still, that doesn't happen very often and it is not readily reproducible.
> 
> > If restarting the X session fixes the problem, it could be a
> > pulseaudio problem.  But, then it doesn't sound like a kernel update
> > issue.
> 
> Well, that may be a pulseaudio problem, but the very same pulsaudio apparently
>  worked well with 2.6.29-git<something>.  That might be a coincidence, though.

2.6.30-rc[12] have some patches regarding the DMA pointer handling,
which may influence on PA.  More fix patches are pending.  Could you
try sound git tree either master or for-next branch?  At least it
seems working for some others.
    git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git


thanks,

Takashi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-15 13:13     ` Takashi Iwai
@ 2009-04-15 20:59       ` Rafael J. Wysocki
  2009-04-15 21:05         ` Linus Torvalds
  0 siblings, 1 reply; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-15 20:59 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Arjan van de Ven, LKML, pm list, Linus Torvalds, Andrew Morton

On Wednesday 15 April 2009, Takashi Iwai wrote:
> At Tue, 14 Apr 2009 23:12:33 +0200,
> Rafael J. Wysocki wrote:
> > 
> > On Tuesday 14 April 2009, Takashi Iwai wrote:
> > > At Fri, 10 Apr 2009 00:57:41 +0200,
> > > Rafael J. Wysocki wrote:
> > > > 
> > > > Second, KDE4 on openSUSE 11.1 sometimes fails to handle audio correctly after
> > > > a fresh boot.  Everything seems to work, but there's no sound at all.  It is
> > > > sufficient to close the X session and start the desktop environment again to
> > > > make it work, though (may that be async too? ;-)).  The hardware in question is
> > > > Intel Corporation 82801G (ICH7 Family) High Definition Audio Controller
> > > > (rev 02) and the driver is snd_hda_intel.
> > > 
> > > Is this problem still there?
> > 
> > Just a few minutes ago sound stopped working (that was after a resume from S3,
> > but not immediately after it), without anything suspicious in dmesg etc.
> > Restarting X fixed that.
> > 
> > Still, that doesn't happen very often and it is not readily reproducible.
> > 
> > > If restarting the X session fixes the problem, it could be a
> > > pulseaudio problem.  But, then it doesn't sound like a kernel update
> > > issue.
> > 
> > Well, that may be a pulseaudio problem, but the very same pulsaudio apparently
> >  worked well with 2.6.29-git<something>.  That might be a coincidence, though.
> 
> 2.6.30-rc[12] have some patches regarding the DMA pointer handling,
> which may influence on PA.  More fix patches are pending.  Could you
> try sound git tree either master or for-next branch?  At least it
> seems working for some others.
>     git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6.git

I'm going to try 2.6.30-rc2-git in a while.  If I have the time, I'll also test
the sound git tree.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-15 20:59       ` Rafael J. Wysocki
@ 2009-04-15 21:05         ` Linus Torvalds
  2009-04-16  6:05           ` Takashi Iwai
  0 siblings, 1 reply; 30+ messages in thread
From: Linus Torvalds @ 2009-04-15 21:05 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Takashi Iwai, Arjan van de Ven, LKML, pm list, Andrew Morton



On Wed, 15 Apr 2009, Rafael J. Wysocki wrote:
> 
> I'm going to try 2.6.30-rc2-git in a while.  If I have the time, I'll also test
> the sound git tree.

There's a sound merge today (ie post-rc2) which hopefully contains the 
relevant fixes. 

			Linus

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-15 21:05         ` Linus Torvalds
@ 2009-04-16  6:05           ` Takashi Iwai
  2009-04-16 20:18             ` Rafael J. Wysocki
  0 siblings, 1 reply; 30+ messages in thread
From: Takashi Iwai @ 2009-04-16  6:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rafael J. Wysocki, Arjan van de Ven, LKML, pm list, Andrew Morton

At Wed, 15 Apr 2009 14:05:20 -0700 (PDT),
Linus Torvalds wrote:
> 
> 
> 
> On Wed, 15 Apr 2009, Rafael J. Wysocki wrote:
> > 
> > I'm going to try 2.6.30-rc2-git in a while.  If I have the time, I'll also test
> > the sound git tree.
> 
> There's a sound merge today (ie post-rc2) which hopefully contains the 
> relevant fixes. 

Yep, they are in.

thanks,

Takashi

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem
  2009-04-16  6:05           ` Takashi Iwai
@ 2009-04-16 20:18             ` Rafael J. Wysocki
  0 siblings, 0 replies; 30+ messages in thread
From: Rafael J. Wysocki @ 2009-04-16 20:18 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Linus Torvalds, Arjan van de Ven, LKML, pm list, Andrew Morton

On Thursday 16 April 2009, Takashi Iwai wrote:
> At Wed, 15 Apr 2009 14:05:20 -0700 (PDT),
> Linus Torvalds wrote:
> > 
> > 
> > 
> > On Wed, 15 Apr 2009, Rafael J. Wysocki wrote:
> > > 
> > > I'm going to try 2.6.30-rc2-git in a while.  If I have the time, I'll also test
> > > the sound git tree.
> > 
> > There's a sound merge today (ie post-rc2) which hopefully contains the 
> > relevant fixes. 
> 
> Yep, they are in.

I've been running the post-rc2 kernel containing these changes for several
hours without any visible problems.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2009-04-16 20:19 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-09 22:57 [2.6.30-rc1-git2 regressions] Hibernation broken and (minor but annoying) audio problem Rafael J. Wysocki
2009-04-09 23:06 ` Linus Torvalds
2009-04-10 12:39   ` Rafael J. Wysocki
2009-04-10 18:58     ` [linux-pm] " Rafael J. Wysocki
2009-04-10 19:17       ` Linus Torvalds
2009-04-11  0:06         ` Rafael J. Wysocki
2009-04-11  1:19           ` Linus Torvalds
2009-04-11  1:45             ` Arjan van de Ven
2009-04-11  2:10             ` Arjan van de Ven
2009-04-11 10:46               ` Rafael J. Wysocki
2009-04-11 19:38               ` Rafael J. Wysocki
2009-04-11 19:55                 ` Linus Torvalds
2009-04-11 20:05                   ` Arjan van de Ven
2009-04-12 18:06                     ` Rafael J. Wysocki
2009-04-12 18:27                       ` Arjan van de Ven
2009-04-11  1:53         ` Arjan van de Ven
2009-04-11  2:35     ` Len Brown
2009-04-11 18:11       ` Linus Torvalds
2009-04-11 19:00         ` Heinz Diehl
2009-04-11 19:16         ` Arjan van de Ven
2009-04-11 19:50           ` Linus Torvalds
2009-04-11 20:11             ` Arjan van de Ven
2009-04-11 20:18             ` Arkadiusz Miskiewicz
2009-04-14 12:09 ` Takashi Iwai
2009-04-14 21:12   ` Rafael J. Wysocki
2009-04-15 13:13     ` Takashi Iwai
2009-04-15 20:59       ` Rafael J. Wysocki
2009-04-15 21:05         ` Linus Torvalds
2009-04-16  6:05           ` Takashi Iwai
2009-04-16 20:18             ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).