All of lore.kernel.org
 help / color / mirror / Atom feed
* Fixing gave up waiting for init of module libcrc32c.
@ 2010-03-19 23:40 Brandon Philips
  2010-03-20  1:01 ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Brandon Philips @ 2010-03-19 23:40 UTC (permalink / raw)
  To: davem, herbert; +Cc: linux-crypto

Hello Herbert, Dave-

I have a little "lockup" in libcrc32c on boot and I don't know how to
solve it correctly but there are a few ideas.

First, note this bug is happening with bnx2x. But, since I didn't have
the hardware I added usage of crc32c to bnx2 to reproduce.

Here are backtraces of the two processes and the relevant dmesg
output:

[   61.816131] ehci_hcd 0000:00:1d.7: suspend root hub
[   91.512113] bnx2: gave up waiting for init of module libcrc32c.
[   91.524105] bnx2: Unknown symbol crc32c
[   92.215003] bp: lib/libcrc32c.c:66 libcrc32c_mod_init crypto_alloc_shash SUCCESS
[   92.401131] Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v2.0.4 (Mar 03, 2010)
[   92.401262]   alloc irq_desc for 16 on node 0
[   92.401266]   alloc kstat_irqs on node 0
[   92.401282] bnx2 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16

The issue is that bnx2 is waiting for crc32c on use_module while
libcrc32c is trying to insert crc32-generic to resolve the symbol. You
can see it here via the process stacks for modprobe:

[0]kdb> btp 1661  # modprobe for bnx2
Stack traceback for pid 1661
0xffff88051209ddc0     1661     1657  0   13   S  0xffff88051209e300  modprobe
[   75.797063]  ffff8805131e7c78 0000000000000046 0000000000000000 0000000000000000
[   75.797063]  000000000000e480 ffff8805131e7fd8 ffff88051209ddc0 ffff88051209e1c0
[   75.797063]  00000000ffff1edb ffffffff8109517d ffff8800283ae480 ffff8800283b3e18
[   75.797063] Call Trace:
[   75.797063]  [<ffffffff81437785>] schedule_timeout+0x1e5/0x2f0
[   75.797063]  [<ffffffff810a5e7b>] use_module+0x2eb/0x330
[   75.797063]  [<ffffffff810a7605>] load_module+0x1545/0x1e10
[   75.797063]  [<ffffffff810a7f51>] sys_init_module+0x81/0x2b0
[   75.797063]  [<ffffffff810031ab>] system_call_fastpath+0x16/0x1b
[   75.797063]  [<00007f4bf994dfba>] 0x7f4bf994dfba
[0]kdb> 
[0]kdb> 
[0]kdb> 
[0]kdb> btp 1662 #modprobe for libcrc32c
Stack traceback for pid 1662  # modprobe for libcrc32c blocking on 
0xffff880513278980     1662     1658  0   13   D  0xffff880513278ec0  modprobe
[   75.797063]  ffff88051254bb28 0000000000000046 ffff88051254baf8 ffff88051254baf4
[   75.797063]  000000000000e480 ffff88051254bfd8 ffff880513278980 ffff880513278d80
[   75.797063]  00000000ffff1e51 ffff8800282d3e00 ffff8800283ae480 ffff8800283b3e18
[   75.797063] Call Trace:
[   75.797063]  [<ffffffff814377fd>] schedule_timeout+0x25d/0x2f0
[   75.797063]  [<ffffffff8143652b>] wait_for_common+0xdb/0x190
[   75.797063]  [<ffffffff8143669d>] wait_for_completion+0x1d/0x20
[   75.797063]  [<ffffffff810787d2>] call_usermodehelper_exec+0xd2/0xe0
[   75.797063]  [<ffffffff81078a7c>] __request_module+0x19c/0x1f0
[   75.797063]  [<ffffffff8122184f>] crypto_larval_lookup+0x10f/0x210
[   75.797063]  [<ffffffff812219b3>] crypto_alg_mod_lookup+0x63/0xf0
[   75.797063]  [<ffffffff81220cae>] crypto_find_alg+0xae/0xd0
[   75.797063]  [<ffffffff812215a4>] crypto_alloc_tfm+0x84/0x100
[   75.797063]  [<ffffffff81227fb9>] crypto_alloc_shash+0x19/0x20
[   75.797063]  [<ffffffffa0131035>] libcrc32c_mod_init+0x35/0x72 [libcrc32c]
[   75.797063]  [<ffffffff810001ed>] do_one_initcall+0x3d/0x1c0
[   75.797063]  [<ffffffff810a7fbd>] sys_init_module+0xed/0x2b0
[   75.797063]  [<ffffffff810031ab>] system_call_fastpath+0x16/0x1b
[   75.797063]  [<00007f4aeefaffba>] 0x7f4aeefaffba

Possible solutions:

1) Figure out the right place to drop the module_lock while waiting on
other modules to load. This boots and seems to work but who knows what
it will break. :) This might be the right fix after investigating
possible issues.

Index: linux-2.6.32/kernel/module.c
===================================================================
--- linux-2.6.32.orig/kernel/module.c
+++ linux-2.6.32/kernel/module.c
@@ -678,6 +678,8 @@ int use_module(struct module *a, struct
 
 	if (b == NULL || already_uses(a, b)) return 1;
 
+	mutex_unlock(&module_mutex);
+
 	/* If we're interrupted or time out, we fail. */
 	if (wait_event_interruptible_timeout(
 		    module_wq, (err = strong_try_module_get(b)) != -EBUSY,
@@ -687,6 +689,8 @@ int use_module(struct module *a, struct
 		return 0;
 	}
 
+	mutex_lock(&module_mutex);
+
 	/* If strong_try_module_get() returned a different error, we fail. */
 	if (err)
 		return 0;

2) Force libcrc32c to be built in?

3) Tell everyone to load it first thing in initrd? This seems hacky.

What are you thoughts on the right way to fix this?

Cheers,

	Brandon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-19 23:40 Fixing gave up waiting for init of module libcrc32c Brandon Philips
@ 2010-03-20  1:01 ` Herbert Xu
  2010-03-20  1:08   ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2010-03-20  1:01 UTC (permalink / raw)
  To: Brandon Philips; +Cc: davem, linux-crypto, Rusty Russell

On Fri, Mar 19, 2010 at 04:40:45PM -0700, Brandon Philips wrote:
>
> Possible solutions:
> 
> 1) Figure out the right place to drop the module_lock while waiting on
> other modules to load. This boots and seems to work but who knows what
> it will break. :) This might be the right fix after investigating
> possible issues.

I'm not sure that this is the problem.

IIRC modprobe(8) in user-space is supposed to load all dependencies
prior to loading the main module.  So something must've gone wrong
before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  1:01 ` Herbert Xu
@ 2010-03-20  1:08   ` Herbert Xu
  2010-03-20  2:46     ` David Miller
  2010-03-20  4:21     ` Brandon Philips
  0 siblings, 2 replies; 15+ messages in thread
From: Herbert Xu @ 2010-03-20  1:08 UTC (permalink / raw)
  To: Brandon Philips; +Cc: davem, linux-crypto, Rusty Russell

On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
>
> IIRC modprobe(8) in user-space is supposed to load all dependencies
> prior to loading the main module.  So something must've gone wrong
> before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
> wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.

In fact I just tried it on my machine and modprobe bnx2x correctly
loaded libcrc32c + crc32c before loading bnx2x (you don't need the
hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.

I suspect it's a user-space problem.  I'm using module-init-tools
3.4-1 from Debian.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  1:08   ` Herbert Xu
@ 2010-03-20  2:46     ` David Miller
  2010-03-20  3:45       ` Herbert Xu
  2010-03-20  4:21     ` Brandon Philips
  1 sibling, 1 reply; 15+ messages in thread
From: David Miller @ 2010-03-20  2:46 UTC (permalink / raw)
  To: herbert; +Cc: brandon, linux-crypto, rusty

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 20 Mar 2010 09:08:49 +0800

> On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
>>
>> IIRC modprobe(8) in user-space is supposed to load all dependencies
>> prior to loading the main module.  So something must've gone wrong
>> before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
>> wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.
> 
> In fact I just tried it on my machine and modprobe bnx2x correctly
> loaded libcrc32c + crc32c before loading bnx2x (you don't need the
> hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.
> 
> I suspect it's a user-space problem.  I'm using module-init-tools
> 3.4-1 from Debian.

Even if userland "does" this, nothing stops another thread
from rmmod'ing the dependency in between the modloads.

We have to handle this case properly in the kernel, no matter
what.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  2:46     ` David Miller
@ 2010-03-20  3:45       ` Herbert Xu
  0 siblings, 0 replies; 15+ messages in thread
From: Herbert Xu @ 2010-03-20  3:45 UTC (permalink / raw)
  To: David Miller; +Cc: brandon, linux-crypto, rusty

On Fri, Mar 19, 2010 at 07:46:12PM -0700, David Miller wrote:
>
> Even if userland "does" this, nothing stops another thread
> from rmmod'ing the dependency in between the modloads.
> 
> We have to handle this case properly in the kernel, no matter
> what.

Right, if that had happened I think the reasonable response would
be to fail the subsequent load for lack of dependency rather than
looping around forever.

However, I don't think that's what Brandon is seeing here, unless
there is something performing the rmmod?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  1:08   ` Herbert Xu
  2010-03-20  2:46     ` David Miller
@ 2010-03-20  4:21     ` Brandon Philips
  2010-03-20  4:24       ` Herbert Xu
  1 sibling, 1 reply; 15+ messages in thread
From: Brandon Philips @ 2010-03-20  4:21 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-crypto, Rusty Russell

On 09:08 Sat 20 Mar 2010, Herbert Xu wrote:
> On Sat, Mar 20, 2010 at 09:01:15AM +0800, Herbert Xu wrote:
> >
> > IIRC modprobe(8) in user-space is supposed to load all dependencies
> > prior to loading the main module.  So something must've gone wrong
> > before bnx2/bnx2x was loaded.  Please strace it to see why libcrc32c
> > wasn't loaded successfully before the kernel ever saw bnx2/bnx2x.
> 
> In fact I just tried it on my machine and modprobe bnx2x correctly
> loaded libcrc32c + crc32c before loading bnx2x (you don't need the
> hardware, it'll stay loaded).  My kernel is 2.6.33-rc8.
> 
> I suspect it's a user-space problem.  I'm using module-init-tools
> 3.4-1 from Debian.

This is module-init-tools 3.11.1 and SUSE Linux Enterprise SP1 RC1. I
can only reproduce on boot (thus why I changed bnx2 which I have) so
perhaps SUSE is doing something odd in how it loads modules at boot.

I will trace the boot process and get more information on exactly what
happens before it gets stuck.

Do you see how it is getting stuck though with modprobe bnx2 and
modprobe libcrc32c going at the same time though?

Cheers,

	Brandon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  4:21     ` Brandon Philips
@ 2010-03-20  4:24       ` Herbert Xu
  2010-03-20  5:23         ` David Miller
  0 siblings, 1 reply; 15+ messages in thread
From: Herbert Xu @ 2010-03-20  4:24 UTC (permalink / raw)
  To: Brandon Philips; +Cc: davem, linux-crypto, Rusty Russell

On Fri, Mar 19, 2010 at 09:21:03PM -0700, Brandon Philips wrote:
>
> Do you see how it is getting stuck though with modprobe bnx2 and
> modprobe libcrc32c going at the same time though?

Yes I agree that's a bug.  Something in those paths should simply
fail instead of waiting around.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  4:24       ` Herbert Xu
@ 2010-03-20  5:23         ` David Miller
  2010-03-20 12:29           ` Herbert Xu
  0 siblings, 1 reply; 15+ messages in thread
From: David Miller @ 2010-03-20  5:23 UTC (permalink / raw)
  To: herbert; +Cc: brandon, linux-crypto, rusty

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sat, 20 Mar 2010 12:24:34 +0800

> On Fri, Mar 19, 2010 at 09:21:03PM -0700, Brandon Philips wrote:
>>
>> Do you see how it is getting stuck though with modprobe bnx2 and
>> modprobe libcrc32c going at the same time though?
> 
> Yes I agree that's a bug.  Something in those paths should simply
> fail instead of waiting around.

I hear what you're saying Herbert, but thinking about this a bit I
really think we should make this situation work instead of fail.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20  5:23         ` David Miller
@ 2010-03-20 12:29           ` Herbert Xu
  2010-03-20 13:16             ` Neil Horman
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Herbert Xu @ 2010-03-20 12:29 UTC (permalink / raw)
  To: David Miller; +Cc: brandon, linux-crypto, rusty

On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
>
> I hear what you're saying Herbert, but thinking about this a bit I
> really think we should make this situation work instead of fail.

I think the initial report perhaps painted this in a slight
different fashion than what it really is.  The code that was
looping in module.c is not trying to load libcrc32c, but rather
it is trying to get a reference on the already-loaded libcrc32c
module.

AFAICS the only way to make it "work" would be to reload the
module in question when we can't get a reference on it.  But
that would entail recursively loading a module during the process
of loading another module.

Rusty can chime in on whether this is doable.

I think I have a good guess as to why this problem is occuring
for Brandon.  It is probably the result of two near-simultaneous
modprobes, one issued against libcrc32c and one against bnx2x.

The libcrc32c module is partially loaded to the point of invoking
its init function, which then tries to modprobe crc32c.

However, before this starts the modprobe on bnx2x is already in
progression.  When bnx2x's loading tries to acquire a reference
on libcrc32c which it depends on, we hit the dead-lock.

So if Suse were doing some kind of parallel booting where multiple
modules may be loaded together then this could occur.

The easiest solution again would be for modprobe(8) to block the
loading of bnx2x because the module that it depends on libcrc32c
hasn't yet finished loading.

I'm open to a kernel solution too if anyone has suggestions.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20 12:29           ` Herbert Xu
@ 2010-03-20 13:16             ` Neil Horman
  2010-03-29 23:06             ` Rusty Russell
  2010-03-30 16:50             ` Brandon Philips
  2 siblings, 0 replies; 15+ messages in thread
From: Neil Horman @ 2010-03-20 13:16 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, brandon, linux-crypto, rusty

On Sat, Mar 20, 2010 at 08:29:59PM +0800, Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.
> 
> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.
> 
> I think I have a good guess as to why this problem is occuring
> for Brandon.  It is probably the result of two near-simultaneous
> modprobes, one issued against libcrc32c and one against bnx2x.
> 
> The libcrc32c module is partially loaded to the point of invoking
> its init function, which then tries to modprobe crc32c.
> 
> However, before this starts the modprobe on bnx2x is already in
> progression.  When bnx2x's loading tries to acquire a reference
> on libcrc32c which it depends on, we hit the dead-lock.
> 
> So if Suse were doing some kind of parallel booting where multiple
> modules may be loaded together then this could occur.
> 
> The easiest solution again would be for modprobe(8) to block the
> loading of bnx2x because the module that it depends on libcrc32c
> hasn't yet finished loading.
> 
> I'm open to a kernel solution too if anyone has suggestions.
> 

FWIW, this sounds like a regression in modprobe to me.  A few years ago I fixed
a deadlock condition in the netfilter conntrack code that was tickled by
parallel rmmod's and modprobes.  modprobe would take file locks on modules, and
if the same module was getting rmmodded and modprobed in parallel we'd wind up
with a deadlock.  I fixed it by making the default modprobe -r behavior to be
non-blocking (which is the same as rmmod).  That commit is here:
http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commit;h=b45a24e9c89a14baf63bffe0a9ff04c1c1bffb29

Later, in late 2009, That behavior was reverted:
http://git.kernel.org/?p=utils/kernel/module-init-tools/module-init-tools.git;a=commit;h=b45a24e9c89a14baf63bffe0a9ff04c1c1bffb29

withuot consideration of the consequences, of which this sounds like one.

JCM I think is working on fixing the problem in a sane way.  I'd suggested that
he reapply the patch, but IIRC he told me that hes planning on trying to fix it
by removing the file locking on the modules in userspace entirely, which I think
is also reasonable.

As a test, you might try massaging my old patch above into the latest
module-init-tools to see if it makes the problem go away.  Note, the result of
this will be that either the modprobe or rmmod will fail and will need to be
retried, but its non-fatal, and a retry is usually successful, as it moves the
rmmod and modprobe further apart in time.

Regards
Neil

> Cheers,
> -- 
> Visit Openswan at http://www.openswan.org/
> Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20 12:29           ` Herbert Xu
  2010-03-20 13:16             ` Neil Horman
@ 2010-03-29 23:06             ` Rusty Russell
  2010-03-31 19:03               ` Brandon Philips
  2010-03-30 16:50             ` Brandon Philips
  2 siblings, 1 reply; 15+ messages in thread
From: Rusty Russell @ 2010-03-29 23:06 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, brandon, linux-crypto, Tim Abbott

On Sat, 20 Mar 2010 10:59:59 pm Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.
> 
> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.

Not really.  The kernel code is pretty simple: if we can't find a symbol
we need, fail the load.  To avoid nasty spurious races, we *do* wait if the
symbol we need is in a module which is still initializing.  But we time out
after 30 seconds to avoid a module trainwreck.

The real fix here is to drop the lock, like Brandon suggested, but we need
to do it more carefully: when we re-acquire the lock we need to re-lookup
the symbol in case the module has vanished or changed.

Brandon, I can't see how libcrc32c's module_init calls crypto_alloc_shash,
but the problem is reproducible with simple example modules.  Does it fix
your problem?

Tim: note that use_module() return value changes.

Thanks,
Rusty.

module: drop the lock while waiting for module to complete initialization.

This fixes "gave up waiting for init of module libcrc32c." which
happened at boot time due to multiple parallel module loads.

The problem was a deadlock: we wait for a module to finish
initializing, but we keep the module_lock mutex so it can't complete.
In particular, this could reasonably happen if libcrc32c does a
request_module() in its initialization routine.

So we change use_module() to return an errno rather than a bool, and if
it's -EBUSY we drop the lock and wait in the caller, then reaquire the
lock.

Reported-by: Brandon Philips <brandon@ifup.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/kernel/module.c b/kernel/module.c
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -510,33 +510,25 @@ int use_module(struct module *a, struct 
 	struct module_use *use;
 	int no_warn, err;
 
-	if (b == NULL || already_uses(a, b)) return 1;
+	if (b == NULL || already_uses(a, b)) return 0;
 
 	/* If we're interrupted or time out, we fail. */
-	if (wait_event_interruptible_timeout(
-		    module_wq, (err = strong_try_module_get(b)) != -EBUSY,
-		    30 * HZ) <= 0) {
-		printk("%s: gave up waiting for init of module %s.\n",
-		       a->name, b->name);
-		return 0;
-	}
-
-	/* If strong_try_module_get() returned a different error, we fail. */
+	err = strong_try_module_get(b);
 	if (err)
-		return 0;
+		return err;
 
 	DEBUGP("Allocating new usage for %s.\n", a->name);
 	use = kmalloc(sizeof(*use), GFP_ATOMIC);
 	if (!use) {
 		printk("%s: out of memory loading\n", a->name);
 		module_put(b);
-		return 0;
+		return -ENOMEM;
 	}
 
 	use->module_which_uses = a;
 	list_add(&use->list, &b->modules_which_use_me);
 	no_warn = sysfs_create_link(b->holders_dir, &a->mkobj.kobj, a->name);
-	return 1;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(use_module);
 
@@ -823,7 +815,7 @@ static inline void module_unload_free(st
 
 int use_module(struct module *a, struct module *b)
 {
-	return strong_try_module_get(b) == 0;
+	return strong_try_module_get(b);
 }
 EXPORT_SYMBOL_GPL(use_module);
 
@@ -994,17 +986,39 @@ static const struct kernel_symbol *resol
 	struct module *owner;
 	const struct kernel_symbol *sym;
 	const unsigned long *crc;
+	DEFINE_WAIT(wait);
+	int err;
+	long timeleft = 30 * HZ;
 
+again:
 	sym = find_symbol(name, &owner, &crc,
 			  !(mod->taints & (1 << TAINT_PROPRIETARY_MODULE)), true);
-	/* use_module can fail due to OOM,
-	   or module initialization or unloading */
-	if (sym) {
-		if (!check_version(sechdrs, versindex, name, mod, crc, owner)
-		    || !use_module(mod, owner))
-			sym = NULL;
+	if (!sym)
+		return NULL;
+
+	if (!check_version(sechdrs, versindex, name, mod, crc, owner))
+		return NULL;
+
+	prepare_to_wait(&module_wq, &wait, TASK_INTERRUPTIBLE);
+	err = use_module(mod, owner);
+	if (likely(!err) || err != -EBUSY || signal_pending(current)) {
+		finish_wait(&module_wq, &wait);
+		return err ? NULL : sym;
 	}
-	return sym;
+
+	/* Module is still loading.  Drop lock and wait. */
+	mutex_unlock(&module_mutex);
+	timeleft = schedule_timeout(timeleft);
+	mutex_lock(&module_mutex);
+	finish_wait(&module_wq, &wait);
+
+	/* Module might be gone entirely, or replaced.  Re-lookup. */
+	if (timeleft)
+		goto again;
+
+	printk("%s: gave up waiting for init of module %s.\n",
+	       mod->name, owner->name);
+	return NULL;
 }
 
 /*


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-20 12:29           ` Herbert Xu
  2010-03-20 13:16             ` Neil Horman
  2010-03-29 23:06             ` Rusty Russell
@ 2010-03-30 16:50             ` Brandon Philips
  2010-03-30 17:47               ` Kay Sievers
  2 siblings, 1 reply; 15+ messages in thread
From: Brandon Philips @ 2010-03-30 16:50 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, linux-crypto, rusty, Kay Sievers

On 20:29 Sat 20 Mar 2010, Herbert Xu wrote:
> On Fri, Mar 19, 2010 at 10:23:25PM -0700, David Miller wrote:
> >
> > I hear what you're saying Herbert, but thinking about this a bit I
> > really think we should make this situation work instead of fail.
> 
> I think the initial report perhaps painted this in a slight
> different fashion than what it really is.  The code that was
> looping in module.c is not trying to load libcrc32c, but rather
> it is trying to get a reference on the already-loaded libcrc32c
> module.

You are correct. Sorry for the confusion.

> AFAICS the only way to make it "work" would be to reload the
> module in question when we can't get a reference on it.  But
> that would entail recursively loading a module during the process
> of loading another module.
> 
> Rusty can chime in on whether this is doable.
> 
> I think I have a good guess as to why this problem is occuring
> for Brandon.  It is probably the result of two near-simultaneous
> modprobes, one issued against libcrc32c and one against bnx2x.
> 
> So if Suse were doing some kind of parallel booting where multiple
> modules may be loaded together then this could occur.

It is actually due to two nearly simultaneous modprobes of bnx2x from
udev since there are two devices:

02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)

Output from my modprobe strace wrappers:

Process 4589 (first to get to libcrc32c)
------------

4589  execve("/sbin/modprobe.bin", ["/sbin/modprobe.bin", "pci:v000014E4d00001639sv00001014"...], [/* 14 vars */]) = 0

...

4589  open("/lib/modules/2.6.32.9-debug/kernel/lib/libcrc32c.ko", O_RDONLY) = 3
4589  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4589  stat("/sys/module/libcrc32c", 0x7fff3d03bef0) = -1 ENOENT (No such file or directory)
4589  read(3, "\177E", 2)               = 2
4589  lseek(3, 0, SEEK_SET)             = 0
4589  fstat(3, {st_mode=S_IFREG|0644, st_size=90482, ...}) = 0
4589  mmap(NULL, 90482, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f7963d49000
4589  init_module(0x7f7963d49000, 90482, "") = 0
4589  munmap(0x7f7963d49000, 90482)     = 0
4589  close(3)                          = 0
4589  open("/lib/modules/2.6.32.9-debug/kernel/drivers/net/bnx2.ko", O_RDONLY) = 3
4589  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4589  stat("/sys/module/bnx2", 0x7fff3d03bf90) = -1 ENOENT (No such file or directory)
4589  read(3, "\177E", 2)               = 2
4589  lseek(3, 0, SEEK_SET)             = 0
4589  fstat(3, {st_mode=S_IFREG|0644, st_size=950325, ...}) = 0
4589  mmap(NULL, 950325, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f7963c4c000
4589  init_module(0x7f7963c4c000, 950325, "") = 0

Process 4590 (loses out on libcrc32c and gets stuck)
------------

4590  execve("/sbin/modprobe.bin", ["/sbin/modprobe.bin", "pci:v000014E4d00001639sv00001014"...], [/* 14 vars */]) = 0

....

4590  open("/lib/modules/2.6.32.9-debug/kernel/lib/libcrc32c.ko",O_RDONLY) = 3
4590  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4590  stat("/sys/module/libcrc32c", 0x7ffff52a0ef0) = -1 ENOENT (No such file or directory)
4590  read(3, "\177E", 2)               = 2
4590  lseek(3, 0, SEEK_SET)             = 0
4590  fstat(3, {st_mode=S_IFREG|0644, st_size=90482, ...}) = 0
4590  mmap(NULL, 90482, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f1397a1b000
4590  init_module(0x7f1397a1b000, 90482, "") = -1 EEXIST (File exists)
4590  close(3)                          = 0
4590  open("/lib/modules/2.6.32.9-debug/kernel/drivers/net/bnx2.ko", O_RDONLY) = 3
4590  stat("/sys/module", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
4590  stat("/sys/module/bnx2", 0x7ffff52a0f90) = -1 ENOENT (No such file or directory)
4590  read(3, "\177E", 2)               = 2
4590  lseek(3, 0, SEEK_SET)             = 0
4590  fstat(3, {st_mode=S_IFREG|0644, st_size=950325, ...}) = 0
4590  mmap(NULL, 950325, PROT_READ|PROT_WRITE, MAP_PRIVATE, 3, 0) = 0x7f139791e000
4590  init_module(0x7f139791e000, 950325, "") = -1 ENOENT (No such file or directory)
4590  write(2, "FATAL: Error inserting bnx2 (/li"..., 145) = 145

As you can see 4589 inserts libcrc32c and the other starts inserting
bnx2 after seeing libcrc32c was loaded. Creating our situation. Phew,
took me a bit to see what was really going on :)

Side note: Intially I thought that modprobe was forking for each
module dependency but that was incorrect after looking at the
code. udev is forking off two modprobes almost at the same time for
each bnx2x device alias when udevadm trigger is called during the boot
sequence.

FWIW, I don't think this is a SUSE thing as our udev is pretty lightly
patched.

Cheers,

	Brandon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-30 16:50             ` Brandon Philips
@ 2010-03-30 17:47               ` Kay Sievers
  0 siblings, 0 replies; 15+ messages in thread
From: Kay Sievers @ 2010-03-30 17:47 UTC (permalink / raw)
  To: Brandon Philips; +Cc: Herbert Xu, David Miller, linux-crypto, rusty

On Tue, Mar 30, 2010 at 18:50, Brandon Philips <brandon@ifup.org> wrote:
> Side note: Intially I thought that modprobe was forking for each
> module dependency but that was incorrect after looking at the
> code. udev is forking off two modprobes almost at the same time for
> each bnx2x device alias when udevadm trigger is called during the boot
> sequence.

Right, that's the standard behavior, and applies to all usual systems.

> FWIW, I don't think this is a SUSE thing as our udev is pretty lightly
> patched.

Yes, there is no SUSE patch at all. All usual systems ship the same stuff here.

Kay

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-29 23:06             ` Rusty Russell
@ 2010-03-31 19:03               ` Brandon Philips
  2010-03-31 23:25                 ` Rusty Russell
  0 siblings, 1 reply; 15+ messages in thread
From: Brandon Philips @ 2010-03-31 19:03 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Herbert Xu, David Miller, linux-crypto, Tim Abbott

On 09:36 Tue 30 Mar 2010, Rusty Russell wrote:
> The real fix here is to drop the lock, like Brandon suggested, but
> we need to do it more carefully: when we re-acquire the lock we need
> to re-lookup the symbol in case the module has vanished or changed.
> 
> Brandon, I can't see how libcrc32c's module_init calls
> crypto_alloc_shash, but the problem is reproducible with simple
> example modules.  Does it fix your problem?

Did you see my email yesterday explaining how this came about? Does
that answer your question?

http://www.mail-archive.com/linux-crypto@vger.kernel.org/msg04363.html

> module: drop the lock while waiting for module to complete initialization.
> 
> This fixes "gave up waiting for init of module libcrc32c." which
> happened at boot time due to multiple parallel module loads.

Since the root cause is known a bit more detail can be added:

This fixes "gave up waiting for init of module libcrc32c." which is
triggered by parallel modprobes by udev for the same driver. For
example on a machine with two network interfaces:

02:00.0 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)
02:00.1 Ethernet controller [0200]: Broadcom Corporation NetXtreme II
BCM5709 Gigabit Ethernet [14e4:1639] (rev 01)

udev will fork off two modprobe processes:

/sbin/modprobe pci:v000014E4d00001639sv00001014sd0000037Cbc02sc00i00
/sbin/modprobe pci:v000014E4d00001639sv00001014sd0000037Cbc02sc00i00

> The problem was a deadlock: we wait for a module to finish
> initializing, but we keep the module_lock mutex so it can't
> complete.  In particular, this could reasonably happen if libcrc32c
> does a request_module() in its initialization routine.
> 
> So we change use_module() to return an errno rather than a bool, and if
> it's -EBUSY we drop the lock and wait in the caller, then reaquire the
> lock.
> 
> Reported-by: Brandon Philips <brandon@ifup.org>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

Reviewed the patch and it looks good and I tested it on the machine
and it works. A couple of trivial things inline below if you care.

Signed-off-by: Brandon Philips <brandon@ifup.org>
Cc: stable@kernel.org

Thanks Rusty.

Cheers,

	Brandon


> -	if (b == NULL || already_uses(a, b)) return 1;
> +	if (b == NULL || already_uses(a, b)) return 0;

	if (b == NULL || already_uses(a, b))
		return 0;

>  	/* If we're interrupted or time out, we fail. */
> -	if (wait_event_interruptible_timeout(
> -		    module_wq, (err = strong_try_module_get(b)) != -EBUSY,
> -		    30 * HZ) <= 0) {
> -		printk("%s: gave up waiting for init of module %s.\n",
> -		       a->name, b->name);
> -		return 0;
> -	}
> -
> -	/* If strong_try_module_get() returned a different error, we fail. */
> +	err = strong_try_module_get(b);
>  	if (err)
> -		return 0;
> +		return err;
>  
>  	DEBUGP("Allocating new usage for %s.\n", a->name);
>  	use = kmalloc(sizeof(*use), GFP_ATOMIC);
>  	if (!use) {
>  		printk("%s: out of memory loading\n", a->name);

  		printk(KERN_ERR "%s: out of memory loading\n", a->name);

>  		module_put(b);
> -		return 0;
> +		return -ENOMEM;
>  	}
>  
>  	use->module_which_uses = a;
>  	list_add(&use->list, &b->modules_which_use_me);
>  	no_warn = sysfs_create_link(b->holders_dir, &a->mkobj.kobj, a->name);
> -	return 1;
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(use_module);
>  
> @@ -823,7 +815,7 @@ static inline void module_unload_free(st
>  
>  int use_module(struct module *a, struct module *b)
>  {
> -	return strong_try_module_get(b) == 0;
> +	return strong_try_module_get(b);
>  }
>  EXPORT_SYMBOL_GPL(use_module);
>  
> @@ -994,17 +986,39 @@ static const struct kernel_symbol *resol
>  	struct module *owner;
>  	const struct kernel_symbol *sym;
>  	const unsigned long *crc;
> +	DEFINE_WAIT(wait);
> +	int err;
> +	long timeleft = 30 * HZ;
>  
> +again:
>  	sym = find_symbol(name, &owner, &crc,
>  			  !(mod->taints & (1 << TAINT_PROPRIETARY_MODULE)), true);
> -	/* use_module can fail due to OOM,
> -	   or module initialization or unloading */
> -	if (sym) {
> -		if (!check_version(sechdrs, versindex, name, mod, crc, owner)
> -		    || !use_module(mod, owner))
> -			sym = NULL;
> +	if (!sym)
> +		return NULL;
> +
> +	if (!check_version(sechdrs, versindex, name, mod, crc, owner))
> +		return NULL;
> +
> +	prepare_to_wait(&module_wq, &wait, TASK_INTERRUPTIBLE);
> +	err = use_module(mod, owner);
> +	if (likely(!err) || err != -EBUSY || signal_pending(current)) {
> +		finish_wait(&module_wq, &wait);
> +		return err ? NULL : sym;
>  	}
> -	return sym;
> +
> +	/* Module is still loading.  Drop lock and wait. */
> +	mutex_unlock(&module_mutex);
> +	timeleft = schedule_timeout(timeleft);
> +	mutex_lock(&module_mutex);
> +	finish_wait(&module_wq, &wait);
> +
> +	/* Module might be gone entirely, or replaced.  Re-lookup. */
> +	if (timeleft)
> +		goto again;
> +
> +	printk("%s: gave up waiting for init of module %s.\n",
> +	       mod->name, owner->name);

	printk(KERN_ERR "%s: gave up waiting for init of module %s.\n",
	       mod->name, owner->name);

> +	return NULL;
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Fixing gave up waiting for init of module libcrc32c.
  2010-03-31 19:03               ` Brandon Philips
@ 2010-03-31 23:25                 ` Rusty Russell
  0 siblings, 0 replies; 15+ messages in thread
From: Rusty Russell @ 2010-03-31 23:25 UTC (permalink / raw)
  To: Brandon Philips; +Cc: Herbert Xu, David Miller, linux-crypto, Tim Abbott

On Thu, 1 Apr 2010 05:33:51 am Brandon Philips wrote:
> On 09:36 Tue 30 Mar 2010, Rusty Russell wrote:
> > The real fix here is to drop the lock, like Brandon suggested, but
> > we need to do it more carefully: when we re-acquire the lock we need
> > to re-lookup the symbol in case the module has vanished or changed.
> > 
> > Brandon, I can't see how libcrc32c's module_init calls
> > crypto_alloc_shash, but the problem is reproducible with simple
> > example modules.  Does it fix your problem?
> 
> Did you see my email yesterday explaining how this came about? Does
> that answer your question?

Yep, thanks.

> Reviewed the patch and it looks good and I tested it on the machine
> and it works. A couple of trivial things inline below if you care.
> 
> Signed-off-by: Brandon Philips <brandon@ifup.org>
> Cc: stable@kernel.org
> 
> Thanks Rusty.

Thanks; already had the two caught by checkpatch; changing the other
printk is not something I like to do in the same patch.

Thanks!
Rusty.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-03-31 23:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-19 23:40 Fixing gave up waiting for init of module libcrc32c Brandon Philips
2010-03-20  1:01 ` Herbert Xu
2010-03-20  1:08   ` Herbert Xu
2010-03-20  2:46     ` David Miller
2010-03-20  3:45       ` Herbert Xu
2010-03-20  4:21     ` Brandon Philips
2010-03-20  4:24       ` Herbert Xu
2010-03-20  5:23         ` David Miller
2010-03-20 12:29           ` Herbert Xu
2010-03-20 13:16             ` Neil Horman
2010-03-29 23:06             ` Rusty Russell
2010-03-31 19:03               ` Brandon Philips
2010-03-31 23:25                 ` Rusty Russell
2010-03-30 16:50             ` Brandon Philips
2010-03-30 17:47               ` Kay Sievers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.