linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Suspicious bug in module refcounting
@ 2009-02-03 13:47 Karsten Keil
  2009-02-03 15:02 ` richard kennedy
  2009-02-04  3:48 ` Rusty Russell
  0 siblings, 2 replies; 13+ messages in thread
From: Karsten Keil @ 2009-02-03 13:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Michal Hocko, Rusty Russell

Hi,

While debugging a wired SCTP bug we hit from time to time
the BUG_ON(module_refcount(module) == 0) statement in __module_get().
After fixing the SCTP bug in final tests runs with lot of traffic
we still saw this bug message from time to time.
Looking at the refcounting in sctp, does not show up any forgotten
get module operation or some wrong put module calls.
We added some debug code and this shows only, that the test case heavly
change module_refcount(sctp) bacause the socket accept call also increase
the module count (and the BUG always was triggered in the __module_get() here).
If the socket close, it decrease the module refcount, so no problem here.
Some times the refcount seems to go very high, very quickly and later
goes down very quickly again, I think this occurs if the network stalls for
some time (the test case try to saturate a GB networklink).
So I had the idea that the bug is in the refcounting itself and not related
to the sctp code.
After looking closer at __module_get() and module_refcount(module) it looks
as I'm right, but I can not belive that this bug was not discovered before,
the code is here since long time.
The refcount is a per CPU atomic variable, module_refcount() simple add
in a fully unprotected loop (not disabled irqs, not protected against
scheduling) all per cpu values.

The issue is that so the process (it is as syscall from a userspace process)
get scheduled in the middle of the counting loop, so it already has counted
some per cpu vales, but not all.
Now while the process is not active, other processes modify the counts, some
accepts (== module gets) increase the count on already summed up CPUs,
some other release sockets on CPUs, which are not already summed up.
If the process become active again, it read the now decremented values from
later CPUS, so the total count is too low and may reach zero, in which the above
BUG_ON() will be triggered.

To prove this I replaced the BUG_ON with following code:

        if (module) {
-               BUG_ON(module_refcount(module) == 0);
+               unsigned int c = module_refcount(module);
+
+               if (unlikely(c == 0)) {
+                       printk(KERN_ERR" module %s refcount=%x/%x\n", module->name, c,  module_refcount(module));
+                       module_dump_refcounts(module);
+                       WARN_ON(1);
+               }
                local_inc(&module->ref[get_cpu()].count);
                put_cpu();
        }

module_dump_refcounts() does print out the per cpu refcounts.
If my idea is correct the second call to  module_refcount(module) should
have a none zero value.

And indeed:

Feb  2 03:29:51 pingi5 kernel:  module sctp refcount=0/8e
Feb  2 03:29:51 pingi5 kernel: CPU0 refcount:38562331
Feb  2 03:29:51 pingi5 kernel: CPU1 refcount:-38562187
Feb  2 03:29:51 pingi5 kernel: Badness in __module_get at include/linux/module.h:374
Feb  2 03:29:51 pingi5 kernel:
Feb  2 03:29:51 pingi5 kernel: Call Trace: <ffffffff802837c2>{sys_accept+212} <ffffffff8019c76f>{dput+44}
Feb  2 03:29:51 pingi5 kernel:        <ffffffff8018704f>{__fput+355} <ffffffff801a0dff>{mntput_no_expire+29}
Feb  2 03:29:51 pingi5 kernel:        <ffffffff801845df>{filp_close+92} <ffffffff8010adba>{system_call+126}


Do my findings be correct, or do I miss something ?

A other thing is, why __module_get() should be used anyway, I think it was a
optimation long time ago, current code seems to need more cycles on default
SMP kernels as try_module_get(), because of the big loop  in module_refcount()
which goes trough all possible CPUs (NR_CPU, 64 in default config).
try_module_get() only has one test and one atomic increment.


I think we should replace all unprotected __module_get() calls with
try_module_get(), or remove __module_get() completely.


Any comments ?


-- 
Karsten Keil
SuSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-03 13:47 [RFC] Suspicious bug in module refcounting Karsten Keil
@ 2009-02-03 15:02 ` richard kennedy
  2009-02-04  3:48 ` Rusty Russell
  1 sibling, 0 replies; 13+ messages in thread
From: richard kennedy @ 2009-02-03 15:02 UTC (permalink / raw)
  To: Karsten Keil; +Cc: linux-kernel, Michal Hocko, Rusty Russell

Karsten Keil wrote:
> Hi,
> 
> While debugging a wired SCTP bug we hit from time to time
> the BUG_ON(module_refcount(module) == 0) statement in __module_get().
> After fixing the SCTP bug in final tests runs with lot of traffic
> we still saw this bug message from time to time.
> Looking at the refcounting in sctp, does not show up any forgotten
> get module operation or some wrong put module calls.
> We added some debug code and this shows only, that the test case heavly
> change module_refcount(sctp) bacause the socket accept call also increase
> the module count (and the BUG always was triggered in the __module_get() here).
> If the socket close, it decrease the module refcount, so no problem here.
> Some times the refcount seems to go very high, very quickly and later
> goes down very quickly again, I think this occurs if the network stalls for
> some time (the test case try to saturate a GB networklink).
> So I had the idea that the bug is in the refcounting itself and not related
> to the sctp code.
> After looking closer at __module_get() and module_refcount(module) it looks
> as I'm right, but I can not belive that this bug was not discovered before,
> the code is here since long time.
> The refcount is a per CPU atomic variable, module_refcount() simple add
> in a fully unprotected loop (not disabled irqs, not protected against
> scheduling) all per cpu values.
> 
> The issue is that so the process (it is as syscall from a userspace process)
> get scheduled in the middle of the counting loop, so it already has counted
> some per cpu vales, but not all.
> Now while the process is not active, other processes modify the counts, some
> accepts (== module gets) increase the count on already summed up CPUs,
> some other release sockets on CPUs, which are not already summed up.
> If the process become active again, it read the now decremented values from
> later CPUS, so the total count is too low and may reach zero, in which the above
> BUG_ON() will be triggered.
> 
> To prove this I replaced the BUG_ON with following code:
> 
>         if (module) {
> -               BUG_ON(module_refcount(module) == 0);
> +               unsigned int c = module_refcount(module);
> +
> +               if (unlikely(c == 0)) {
> +                       printk(KERN_ERR" module %s refcount=%x/%x\n", module->name, c,  module_refcount(module));
> +                       module_dump_refcounts(module);
> +                       WARN_ON(1);
> +               }
>                 local_inc(&module->ref[get_cpu()].count);
>                 put_cpu();
>         }
> 
> module_dump_refcounts() does print out the per cpu refcounts.
> If my idea is correct the second call to  module_refcount(module) should
> have a none zero value.
> 
> And indeed:
> 
> Feb  2 03:29:51 pingi5 kernel:  module sctp refcount=0/8e
> Feb  2 03:29:51 pingi5 kernel: CPU0 refcount:38562331
> Feb  2 03:29:51 pingi5 kernel: CPU1 refcount:-38562187
> Feb  2 03:29:51 pingi5 kernel: Badness in __module_get at include/linux/module.h:374
> Feb  2 03:29:51 pingi5 kernel:
> Feb  2 03:29:51 pingi5 kernel: Call Trace: <ffffffff802837c2>{sys_accept+212} <ffffffff8019c76f>{dput+44}
> Feb  2 03:29:51 pingi5 kernel:        <ffffffff8018704f>{__fput+355} <ffffffff801a0dff>{mntput_no_expire+29}
> Feb  2 03:29:51 pingi5 kernel:        <ffffffff801845df>{filp_close+92} <ffffffff8010adba>{system_call+126}
> 
> 
> Do my findings be correct, or do I miss something ?
> 
> A other thing is, why __module_get() should be used anyway, I think it was a
> optimation long time ago, current code seems to need more cycles on default
> SMP kernels as try_module_get(), because of the big loop  in module_refcount()
> which goes trough all possible CPUs (NR_CPU, 64 in default config).
> try_module_get() only has one test and one atomic increment.
> 
> 
> I think we should replace all unprotected __module_get() calls with
> try_module_get(), or remove __module_get() completely.
> 
> 
> Any comments ?
> 

Hi Karsten,

I've been seeing problems in a simple socket test harness that I think
has the same cause. I have several clients all calling a single socket
server on the same machine, under heavy load & after some time the
clients will fail to connect with -EADDRNOTAVAIL even though the server
is still running.

I changed the module ref count to be an atomic_t and the problem goes
away. I haven't tracked down the exact cause of the problem yet ( the
network code is kind of tricky!).

So I think you're right module_refcount() does need some sort of locking.

BTW On my desktop machine I cannot measure any significant performance
differences between the atomic_t refcount & the existing code, so it may
be that atomic_t will be good enough.

regards
Richard






^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-03 13:47 [RFC] Suspicious bug in module refcounting Karsten Keil
  2009-02-03 15:02 ` richard kennedy
@ 2009-02-04  3:48 ` Rusty Russell
  2009-02-04 10:11   ` Russell King
                     ` (3 more replies)
  1 sibling, 4 replies; 13+ messages in thread
From: Rusty Russell @ 2009-02-04  3:48 UTC (permalink / raw)
  To: Karsten Keil
  Cc: linux-kernel, Michal Hocko, richard kennedy, Dan Williams,
	Dmitry Torokhov, Russell King, dwmw2, Scott Wood, netdev,
	Al Viro

On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
> The refcount is a per CPU atomic variable, module_refcount() simple add
> in a fully unprotected loop (not disabled irqs, not protected against
> scheduling) all per cpu values.

Hi Karsten,

   Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
and it looks like most of the callers are completely bogus.  The watchdog
drivers use it to nail themselves in place in their open routines: this is
OK, if a bit weird.

   We should only use __module_get() when you *can't handle* failure;
otherwise you should accept that the admin did rmmod --wait and don't use the
module any further.

  dmaengine.c seems to be taking liberties like this.  AFAICT it can error
out, so why not just try_module_get() always?

  gameport.c, serio.c and input.c increment their own refcount, but to get
into those init functions someone must be holding a refcount already (ie. a
module depends on this module).  Ditto cyber2000fb.c, and MTD.

  mdio-bitbang.c should definitely use try_module_get.

  loop.c bumping its own refcount, Al might know why, but definitely can be
try_module_get() if it's valid at all.

  net/socket.c can also handle failure, so that's another try_module_get.

etc.

> I think we should replace all unprotected __module_get() calls with
> try_module_get(), or remove __module_get() completely.

Agreed.  We will need a "nail_module()" call for those legitimate uses (which
should clear mod->exit, rather than manipulating the refcount at all).

Meanwhile, I'll remove the BUG_ON for 2.6.29.

Thanks,
Rusty.

module: remove over-zealous check in __module_get()

module_refcount() isn't reliable outside stop_machine(), as demonstrated
by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
(an inc on one cpu and dec on another while module_refcount() is tallying
 can give false results, for example).

Almost noone should be using __module_get, but that's another issue.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/include/linux/module.h b/include/linux/module.h
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -407,7 +407,6 @@ static inline void __module_get(struct m
 static inline void __module_get(struct module *module)
 {
 	if (module) {
-		BUG_ON(module_refcount(module) == 0);
 		local_inc(__module_ref_addr(module, get_cpu()));
 		put_cpu();
 	}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-04  3:48 ` Rusty Russell
@ 2009-02-04 10:11   ` Russell King
  2009-02-04 10:55     ` Rusty Russell
  2009-02-04 16:33   ` Dan Williams
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Russell King @ 2009-02-04 10:11 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Karsten Keil, linux-kernel, Michal Hocko, richard kennedy,
	Dan Williams, Dmitry Torokhov, dwmw2, Scott Wood, netdev,
	Al Viro

On Wed, Feb 04, 2009 at 02:18:08PM +1030, Rusty Russell wrote:
>   gameport.c, serio.c and input.c increment their own refcount, but to get
> into those init functions someone must be holding a refcount already (ie. a
> module depends on this module).  Ditto cyber2000fb.c, and MTD.

Err, wrong.  cyber2000fb.c does it in its module initialization function
to prevent the module (when built for Shark) from being unloaded.  It
does this because it's from the days of 2.2 kernels and no one bothered
writing the module unload support for Shark.  I'm certainly not in a
position to do that.

Since you can't unload a module while its initialization function is
running, so someone else must be holding a refcount (the insmod process).

I'm not saying that it's the right solution, I'm saying that this is how
it's evolved.

If someone has an idea on what to do about it then patches will be given
due consideration.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-04 10:11   ` Russell King
@ 2009-02-04 10:55     ` Rusty Russell
  2009-02-04 10:59       ` Russell King
  0 siblings, 1 reply; 13+ messages in thread
From: Rusty Russell @ 2009-02-04 10:55 UTC (permalink / raw)
  To: Russell King
  Cc: Karsten Keil, linux-kernel, Michal Hocko, richard kennedy,
	Dan Williams, Dmitry Torokhov, dwmw2, Scott Wood, netdev,
	Al Viro

On Wednesday 04 February 2009 20:41:30 Russell King wrote:
> On Wed, Feb 04, 2009 at 02:18:08PM +1030, Rusty Russell wrote:
> >   gameport.c, serio.c and input.c increment their own refcount, but to get
> > into those init functions someone must be holding a refcount already (ie. a
> > module depends on this module).  Ditto cyber2000fb.c, and MTD.
> 
> Err, wrong.  cyber2000fb.c does it in its module initialization function
> to prevent the module (when built for Shark) from being unloaded.  It
> does this because it's from the days of 2.2 kernels and no one bothered
> writing the module unload support for Shark.  I'm certainly not in a
> position to do that.

Thanks, here's the patch then:

Subject: cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK

Russell explains the __module_get():
> cyber2000fb.c does it in its module initialization function
> to prevent the module (when built for Shark) from being unloaded.  It
> does this because it's from the days of 2.2 kernels and no one bothered
> writing the module unload support for Shark.

Since 2.4, the correct answer has been to not define an unload fn.

Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/video/cyber2000fb.c b/drivers/video/cyber2000fb.c
--- a/drivers/video/cyber2000fb.c
+++ b/drivers/video/cyber2000fb.c
@@ -1736,10 +1736,8 @@ static int __init cyber2000fb_init(void)
 
 #ifdef CONFIG_ARCH_SHARK
 	err = cyberpro_vl_probe();
-	if (!err) {
+	if (!err)
 		ret = 0;
-		__module_get(THIS_MODULE);
-	}
 #endif
 #ifdef CONFIG_PCI
 	err = pci_register_driver(&cyberpro_driver);
@@ -1749,14 +1747,15 @@ static int __init cyber2000fb_init(void)
 
 	return ret ? err : 0;
 }
+module_init(cyber2000fb_init);
 
+#ifndef CONFIG_ARCH_SHARK
 static void __exit cyberpro_exit(void)
 {
 	pci_unregister_driver(&cyberpro_driver);
 }
-
-module_init(cyber2000fb_init);
 module_exit(cyberpro_exit);
+#endif
 
 MODULE_AUTHOR("Russell King");
 MODULE_DESCRIPTION("CyberPro 2000, 2010 and 5000 framebuffer driver");

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-04 10:55     ` Rusty Russell
@ 2009-02-04 10:59       ` Russell King
  0 siblings, 0 replies; 13+ messages in thread
From: Russell King @ 2009-02-04 10:59 UTC (permalink / raw)
  To: Rusty Russell, alex
  Cc: Karsten Keil, linux-kernel, Michal Hocko, richard kennedy,
	Dan Williams, Dmitry Torokhov, dwmw2, Scott Wood, netdev,
	Al Viro

Adding Alexander Schulz who can comment on the correctness of your patch
for Shark.

On Wed, Feb 04, 2009 at 09:25:47PM +1030, Rusty Russell wrote:
> On Wednesday 04 February 2009 20:41:30 Russell King wrote:
> > On Wed, Feb 04, 2009 at 02:18:08PM +1030, Rusty Russell wrote:
> > >   gameport.c, serio.c and input.c increment their own refcount, but to get
> > > into those init functions someone must be holding a refcount already (ie. a
> > > module depends on this module).  Ditto cyber2000fb.c, and MTD.
> > 
> > Err, wrong.  cyber2000fb.c does it in its module initialization function
> > to prevent the module (when built for Shark) from being unloaded.  It
> > does this because it's from the days of 2.2 kernels and no one bothered
> > writing the module unload support for Shark.  I'm certainly not in a
> > position to do that.
> 
> Thanks, here's the patch then:
> 
> Subject: cyber2000fb.c: use proper method for stopping unload if CONFIG_ARCH_SHARK
> 
> Russell explains the __module_get():
> > cyber2000fb.c does it in its module initialization function
> > to prevent the module (when built for Shark) from being unloaded.  It
> > does this because it's from the days of 2.2 kernels and no one bothered
> > writing the module unload support for Shark.
> 
> Since 2.4, the correct answer has been to not define an unload fn.
> 
> Cc: Russell King <rmk+lkml@arm.linux.org.uk>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/drivers/video/cyber2000fb.c b/drivers/video/cyber2000fb.c
> --- a/drivers/video/cyber2000fb.c
> +++ b/drivers/video/cyber2000fb.c
> @@ -1736,10 +1736,8 @@ static int __init cyber2000fb_init(void)
>  
>  #ifdef CONFIG_ARCH_SHARK
>  	err = cyberpro_vl_probe();
> -	if (!err) {
> +	if (!err)
>  		ret = 0;
> -		__module_get(THIS_MODULE);
> -	}
>  #endif
>  #ifdef CONFIG_PCI
>  	err = pci_register_driver(&cyberpro_driver);
> @@ -1749,14 +1747,15 @@ static int __init cyber2000fb_init(void)
>  
>  	return ret ? err : 0;
>  }
> +module_init(cyber2000fb_init);
>  
> +#ifndef CONFIG_ARCH_SHARK
>  static void __exit cyberpro_exit(void)
>  {
>  	pci_unregister_driver(&cyberpro_driver);
>  }
> -
> -module_init(cyber2000fb_init);
>  module_exit(cyberpro_exit);
> +#endif
>  
>  MODULE_AUTHOR("Russell King");
>  MODULE_DESCRIPTION("CyberPro 2000, 2010 and 5000 framebuffer driver");

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-04  3:48 ` Rusty Russell
  2009-02-04 10:11   ` Russell King
@ 2009-02-04 16:33   ` Dan Williams
  2009-02-06 22:41   ` Karsten Keil
  2009-02-09 15:18   ` Michal Hocko
  3 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2009-02-04 16:33 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Karsten Keil, linux-kernel, Michal Hocko, richard kennedy,
	Dmitry Torokhov, Russell King, dwmw2, Scott Wood, netdev,
	Al Viro

On Tue, Feb 3, 2009 at 8:48 PM, Rusty Russell <rusty@rustcorp.com.au> wrote:
> On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
>> The refcount is a per CPU atomic variable, module_refcount() simple add
>> in a fully unprotected loop (not disabled irqs, not protected against
>> scheduling) all per cpu values.
>
> Hi Karsten,
>
>   Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
> and it looks like most of the callers are completely bogus.  The watchdog
> drivers use it to nail themselves in place in their open routines: this is
> OK, if a bit weird.
>
>   We should only use __module_get() when you *can't handle* failure;
> otherwise you should accept that the admin did rmmod --wait and don't use the
> module any further.
>
>  dmaengine.c seems to be taking liberties like this.  AFAICT it can error
> out, so why not just try_module_get() always?

Currently there is no feedback loop for clients calling
dmaengine_get().  It simply means "I want to do offload, pin any
offload resources you may see, and don't let the resource leave until
dmaengine_ref_count == 0".  Even if we always called try_module_get()
we would still need to wait until dmaengine_ref_count reached zero to
be sure no transactions are in flight, effectively ignoring module_get
failures.

However, dma-driver module removal is still in the central control of
the administrator as downing all network interfaces and unloading the
async_tx api (i.e. raid456) will kill all dmaengine references.  We
just have the caveat highlighted below:

modprobe raid456
ifup eth0
rmmod --wait ioat_dma &
ifup eth1
modprobe -r raid456
ifdown eth0  <-- module removal succeeds here in a perfect world
ifdown eth1 <-- module removal currently succeeds here

Regards,
Dan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-04  3:48 ` Rusty Russell
  2009-02-04 10:11   ` Russell King
  2009-02-04 16:33   ` Dan Williams
@ 2009-02-06 22:41   ` Karsten Keil
  2009-02-09 15:18   ` Michal Hocko
  3 siblings, 0 replies; 13+ messages in thread
From: Karsten Keil @ 2009-02-06 22:41 UTC (permalink / raw)
  To: Rusty Russell
  Cc: linux-kernel, Michal Hocko, richard kennedy, Dan Williams,
	Dmitry Torokhov, Russell King, dwmw2, Scott Wood, netdev,
	Al Viro, gregkh

Hi Rusty,

On Wed, Feb 04, 2009 at 02:18:08PM +1030, Rusty Russell wrote:
> On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
> > The refcount is a per CPU atomic variable, module_refcount() simple add
> > in a fully unprotected loop (not disabled irqs, not protected against
> > scheduling) all per cpu values.
> 
> Hi Karsten,
> 
>    Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
> and it looks like most of the callers are completely bogus.  The watchdog
> drivers use it to nail themselves in place in their open routines: this is
> OK, if a bit weird.
> 

...
> 
> Meanwhile, I'll remove the BUG_ON for 2.6.29.
> 
> Thanks,
> Rusty.

Seems that this was not picked up yet for 2.6.29, but I think it really should
go in random triggering BUG() is not very nice, maybe it should also added to
the stable trees.

Can you please submit it again ?

> 
> module: remove over-zealous check in __module_get()
> 
> module_refcount() isn't reliable outside stop_machine(), as demonstrated
> by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
> (an inc on one cpu and dec on another while module_refcount() is tallying
>  can give false results, for example).
> 
> Almost noone should be using __module_get, but that's another issue.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -407,7 +407,6 @@ static inline void __module_get(struct m
>  static inline void __module_get(struct module *module)
>  {
>  	if (module) {
> -		BUG_ON(module_refcount(module) == 0);
>  		local_inc(__module_ref_addr(module, get_cpu()));
>  		put_cpu();
>  	}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Karsten Keil
SuSE Labs
ISDN and VOIP development
SUSE LINUX Products GmbH, Maxfeldstr.5 90409 Nuernberg, GF: Markus Rex, HRB 16746 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-04  3:48 ` Rusty Russell
                     ` (2 preceding siblings ...)
  2009-02-06 22:41   ` Karsten Keil
@ 2009-02-09 15:18   ` Michal Hocko
  2009-02-10  3:15     ` Rusty Russell
  3 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2009-02-09 15:18 UTC (permalink / raw)
  To: David S. Miller
  Cc: Karsten Keil, linux-kernel, richard kennedy, Dan Williams,
	Dmitry Torokhov, Russell King, dwmw2, Scott Wood, netdev,
	Al Viro, Rusty Russell

Hi David,

On Wed 04-02-09 14:18:08, Rusty Russell wrote:
> On Wednesday 04 February 2009 00:17:21 Karsten Keil wrote:
> > The refcount is a per CPU atomic variable, module_refcount() simple add
> > in a fully unprotected loop (not disabled irqs, not protected against
> > scheduling) all per cpu values.
> 
> Hi Karsten,
> 
>    Yes, the BUG_ON() is overly aggressive.  And I really hate __module_get,
> and it looks like most of the callers are completely bogus.  The watchdog
> drivers use it to nail themselves in place in their open routines: this is
> OK, if a bit weird.
> 
>    We should only use __module_get() when you *can't handle* failure;
> otherwise you should accept that the admin did rmmod --wait and don't use the
> module any further.
> 
>   dmaengine.c seems to be taking liberties like this.  AFAICT it can error
> out, so why not just try_module_get() always?
> 
>   gameport.c, serio.c and input.c increment their own refcount, but to get
> into those init functions someone must be holding a refcount already (ie. a
> module depends on this module).  Ditto cyber2000fb.c, and MTD.
> 
>   mdio-bitbang.c should definitely use try_module_get.
> 
>   loop.c bumping its own refcount, Al might know why, but definitely can be
> try_module_get() if it's valid at all.
> 
>   net/socket.c can also handle failure, so that's another try_module_get.
> 
> etc.
> 
> > I think we should replace all unprotected __module_get() calls with
> > try_module_get(), or remove __module_get() completely.
> 
> Agreed.  We will need a "nail_module()" call for those legitimate uses (which
> should clear mod->exit, rather than manipulating the refcount at all).
> 
> Meanwhile, I'll remove the BUG_ON for 2.6.29.
> 
> Thanks,
> Rusty.
> 
> module: remove over-zealous check in __module_get()
> 
> module_refcount() isn't reliable outside stop_machine(), as demonstrated
> by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
> (an inc on one cpu and dec on another while module_refcount() is tallying
>  can give false results, for example).
> 
> Almost noone should be using __module_get, but that's another issue.
> 
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -407,7 +407,6 @@ static inline void __module_get(struct m
>  static inline void __module_get(struct module *module)
>  {
>  	if (module) {
> -		BUG_ON(module_refcount(module) == 0);
>  		local_inc(__module_ref_addr(module, get_cpu()));
>  		put_cpu();
>  	}

Based on this change, would it make sense to update sys_accept to change
__module_get to try_module_get like in the following patch?


>From 368c52b25414d1ccd0851d77fa5f20431487c172 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Mon, 9 Feb 2009 16:06:15 +0100
Subject: [PATCH] [NET] replace __module_get by try_module_get in accept4

After 7f9a50a5b89b87f8e754f59ae9968da28be618a5 we are not checking for
potential BUG in module reference counting. Therefore we should replace
__module_get by try_module_get and BUG if the module is being unloaded.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 net/socket.c |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 35dd737..d0d4c92 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1444,10 +1444,11 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
 	newsock->ops = sock->ops;
 
 	/*
-	 * We don't need try_module_get here, as the listening socket (sock)
-	 * has the protocol module (sock->ops->owner) held.
+	 * Socket's owner cannot be in unloading path because there
+	 * must be at least one listening reference
 	 */
-	__module_get(newsock->ops->owner);
+	if (unlikely(!try_module_get(newsock->ops->owner)))
+		BUG();
 
 	newfd = sock_alloc_fd(&newfile, flags & O_CLOEXEC);
 	if (unlikely(newfd < 0)) {
-- 
1.5.6.5


-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-09 15:18   ` Michal Hocko
@ 2009-02-10  3:15     ` Rusty Russell
  2009-02-10  3:42       ` Karsten Keil
  2009-02-10 10:31       ` Michal Hocko
  0 siblings, 2 replies; 13+ messages in thread
From: Rusty Russell @ 2009-02-10  3:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David S. Miller, Karsten Keil, linux-kernel, richard kennedy,
	Dan Williams, Dmitry Torokhov, Russell King, dwmw2, Scott Wood,
	netdev, Al Viro

On Tuesday 10 February 2009 01:48:31 Michal Hocko wrote:
> Based on this change, would it make sense to update sys_accept to change
> __module_get to try_module_get like in the following patch?

I don't think so:

>  	/*
> -	 * We don't need try_module_get here, as the listening socket (sock)
> -	 * has the protocol module (sock->ops->owner) held.
> +	 * Socket's owner cannot be in unloading path because there
> +	 * must be at least one listening reference
>  	 */
> -	__module_get(newsock->ops->owner);
> +	if (unlikely(!try_module_get(newsock->ops->owner)))
> +		BUG();

rmmod --wait can make try_module_get fail even if the reference count isn't
zero.  But in this case, we should return an error from the accept call;
presumably the admin really doesn't want us to keep using the module...

Thanks,
Rusty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-10  3:15     ` Rusty Russell
@ 2009-02-10  3:42       ` Karsten Keil
  2009-02-10 10:31       ` Michal Hocko
  1 sibling, 0 replies; 13+ messages in thread
From: Karsten Keil @ 2009-02-10  3:42 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Michal Hocko, David S. Miller, linux-kernel, richard kennedy,
	Dan Williams, Dmitry Torokhov, Russell King, dwmw2, Scott Wood,
	netdev, Al Viro

On Tue, Feb 10, 2009 at 01:45:07PM +1030, Rusty Russell wrote:
> On Tuesday 10 February 2009 01:48:31 Michal Hocko wrote:
> > Based on this change, would it make sense to update sys_accept to change
> > __module_get to try_module_get like in the following patch?
> 
> I don't think so:
> 
> >  	/*
> > -	 * We don't need try_module_get here, as the listening socket (sock)
> > -	 * has the protocol module (sock->ops->owner) held.
> > +	 * Socket's owner cannot be in unloading path because there
> > +	 * must be at least one listening reference
> >  	 */
> > -	__module_get(newsock->ops->owner);
> > +	if (unlikely(!try_module_get(newsock->ops->owner)))
> > +		BUG();
> 
> rmmod --wait can make try_module_get fail even if the reference count isn't
> zero.  But in this case, we should return an error from the accept call;
> presumably the admin really doesn't want us to keep using the module...
> 

I was thinking about this for a while too, but I did not find a valid
error value for this case, the current accept syscall API only allow few
error codes and I think we should not break this.
Maybe -ECONNABORTED  could be used but it doesn't fit 100% and applications
may interpret this in a wrong way.
If the admin decide to remove the module, he should also stop the services
needing this resource. In this case  the effect of __module_get(newsock->ops->owner);
is ok, if the service is still in use, the module will not be removed and
still does the expected work, if all sockets are closed the module refcount
will reach zero and the module will go away.

-- 
Karsten Keil
SuSE Labs

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-10  3:15     ` Rusty Russell
  2009-02-10  3:42       ` Karsten Keil
@ 2009-02-10 10:31       ` Michal Hocko
  2009-02-10 13:36         ` Rusty Russell
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2009-02-10 10:31 UTC (permalink / raw)
  To: Rusty Russell
  Cc: David S. Miller, Karsten Keil, linux-kernel, richard kennedy,
	Dan Williams, Dmitry Torokhov, Russell King, dwmw2, Scott Wood,
	netdev, Al Viro

On Tue 10-02-09 13:45:07, Rusty Russell wrote:
> On Tuesday 10 February 2009 01:48:31 Michal Hocko wrote:
> > Based on this change, would it make sense to update sys_accept to change
> > __module_get to try_module_get like in the following patch?
> 
> I don't think so:
> 
> >  	/*
> > -	 * We don't need try_module_get here, as the listening socket (sock)
> > -	 * has the protocol module (sock->ops->owner) held.
> > +	 * Socket's owner cannot be in unloading path because there
> > +	 * must be at least one listening reference
> >  	 */
> > -	__module_get(newsock->ops->owner);
> > +	if (unlikely(!try_module_get(newsock->ops->owner)))
> > +		BUG();
> 
> rmmod --wait can make try_module_get fail even if the reference count isn't
> zero.  

OK, I though that rmmod --wait waits for refcount==0 and then changes
the state.

> But in this case, we should return an error from the accept call;
> presumably the admin really doesn't want us to keep using the module...
> 
> Thanks,
> Rusty.

-- 
Michal Hocko
L3 team 
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC] Suspicious bug in module refcounting
  2009-02-10 10:31       ` Michal Hocko
@ 2009-02-10 13:36         ` Rusty Russell
  0 siblings, 0 replies; 13+ messages in thread
From: Rusty Russell @ 2009-02-10 13:36 UTC (permalink / raw)
  To: Michal Hocko
  Cc: David S. Miller, Karsten Keil, linux-kernel, richard kennedy,
	Dan Williams, Dmitry Torokhov, Russell King, dwmw2, Scott Wood,
	netdev, Al Viro

On Tuesday 10 February 2009 21:01:37 Michal Hocko wrote:
> On Tue 10-02-09 13:45:07, Rusty Russell wrote:
> > On Tuesday 10 February 2009 01:48:31 Michal Hocko wrote:
> > > Based on this change, would it make sense to update sys_accept to change
> > > __module_get to try_module_get like in the following patch?
> > 
> > I don't think so:
> > 
> > >  	/*
> > > -	 * We don't need try_module_get here, as the listening socket (sock)
> > > -	 * has the protocol module (sock->ops->owner) held.
> > > +	 * Socket's owner cannot be in unloading path because there
> > > +	 * must be at least one listening reference
> > >  	 */
> > > -	__module_get(newsock->ops->owner);
> > > +	if (unlikely(!try_module_get(newsock->ops->owner)))
> > > +		BUG();
> > 
> > rmmod --wait can make try_module_get fail even if the reference count isn't
> > zero.  
> 
> OK, I though that rmmod --wait waits for refcount==0 and then changes
> the state.

No, it has to stop all future use, otherwise it's useless under load.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-02-10 13:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-03 13:47 [RFC] Suspicious bug in module refcounting Karsten Keil
2009-02-03 15:02 ` richard kennedy
2009-02-04  3:48 ` Rusty Russell
2009-02-04 10:11   ` Russell King
2009-02-04 10:55     ` Rusty Russell
2009-02-04 10:59       ` Russell King
2009-02-04 16:33   ` Dan Williams
2009-02-06 22:41   ` Karsten Keil
2009-02-09 15:18   ` Michal Hocko
2009-02-10  3:15     ` Rusty Russell
2009-02-10  3:42       ` Karsten Keil
2009-02-10 10:31       ` Michal Hocko
2009-02-10 13:36         ` Rusty Russell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).