All of lore.kernel.org
 help / color / mirror / Atom feed
* module: fix module_refcount() return when running in a module exit routine
@ 2015-01-18 16:55 James Bottomley
  2015-01-18 23:37   ` Rusty Russell
  2015-01-19  5:18 ` Masami Hiramatsu
  0 siblings, 2 replies; 22+ messages in thread
From: James Bottomley @ 2015-01-18 16:55 UTC (permalink / raw)
  To: rusty, masami.hiramatsu.pt; +Cc: linux-kernel, linux-scsi

From: James Bottomley <JBottomley@Parallels.com>

After e513cc1 module: Remove stop_machine from module unloading,
module_refcount() is returning (unsigned long)-1 when called from within
a routine that runs in module_exit.  This is confusing the scsi device
put code which is coded to detect a module_refcount() of zero for
running within a module exit routine and not try to do another
module_put.  The fix is to restore the original behaviour of
module_refcount() and return zero if we're running inside an exit
routine.

Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>

---

diff --git a/kernel/module.c b/kernel/module.c
index 3965511..c33a113 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -774,7 +774,12 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
 
 unsigned long module_refcount(struct module *mod)
 {
-	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
+	unsigned long ret = atomic_read(&mod->refcnt);
+
+	if (ret == 0)
+		/* ref is already zero (probably in module exit) */
+		return 0;
+	return ret - MODULE_REF_BASE;
 }
 EXPORT_SYMBOL(module_refcount);
 



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-18 16:55 module: fix module_refcount() return when running in a module exit routine James Bottomley
@ 2015-01-18 23:37   ` Rusty Russell
  2015-01-19  5:18 ` Masami Hiramatsu
  1 sibling, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2015-01-18 23:37 UTC (permalink / raw)
  To: James Bottomley, masami.hiramatsu.pt; +Cc: linux-kernel, linux-scsi

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> From: James Bottomley <JBottomley@Parallels.com>
>
> After e513cc1 module: Remove stop_machine from module unloading,
> module_refcount() is returning (unsigned long)-1 when called from within
> a routine that runs in module_exit.  This is confusing the scsi device
> put code which is coded to detect a module_refcount() of zero for
> running within a module exit routine and not try to do another
> module_put.  The fix is to restore the original behaviour of
> module_refcount() and return zero if we're running inside an exit
> routine.

I think this code used to be wrong, if someone used rmmod --wait.
Fortunately nobody ever did that, so we removed the feature in 2013 and
your code was OK again :)

But I think the correct fix is the same: turn try_module_get() into
__module_get():

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e02885451425..9b3829931f40 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -986,9 +986,9 @@ int scsi_device_get(struct scsi_device *sdev)
 		return -ENXIO;
 	if (!get_device(&sdev->sdev_gendev))
 		return -ENXIO;
-	/* We can fail this if we're doing SCSI operations
+	/* We can fail try_module_get if we're doing SCSI operations
 	 * from module exit (like cache flush) */
-	try_module_get(sdev->host->hostt->module);
+	__module_get(sdev->host->hostt->module);
 
 	return 0;
 }
@@ -1004,14 +1004,7 @@ EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-#ifdef CONFIG_MODULE_UNLOAD
-	struct module *module = sdev->host->hostt->module;
-
-	/* The module refcount will be zero if scsi_device_get()
-	 * was called from a module removal routine */
-	if (module && module_refcount(module) != 0)
-		module_put(module);
-#endif
+	module_put(sdev->host->hostt->module);
 	put_device(&sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_device_put);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
@ 2015-01-18 23:37   ` Rusty Russell
  0 siblings, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2015-01-18 23:37 UTC (permalink / raw)
  To: James Bottomley, masami.hiramatsu.pt; +Cc: linux-kernel, linux-scsi

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> From: James Bottomley <JBottomley@Parallels.com>
>
> After e513cc1 module: Remove stop_machine from module unloading,
> module_refcount() is returning (unsigned long)-1 when called from within
> a routine that runs in module_exit.  This is confusing the scsi device
> put code which is coded to detect a module_refcount() of zero for
> running within a module exit routine and not try to do another
> module_put.  The fix is to restore the original behaviour of
> module_refcount() and return zero if we're running inside an exit
> routine.

I think this code used to be wrong, if someone used rmmod --wait.
Fortunately nobody ever did that, so we removed the feature in 2013 and
your code was OK again :)

But I think the correct fix is the same: turn try_module_get() into
__module_get():

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e02885451425..9b3829931f40 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -986,9 +986,9 @@ int scsi_device_get(struct scsi_device *sdev)
 		return -ENXIO;
 	if (!get_device(&sdev->sdev_gendev))
 		return -ENXIO;
-	/* We can fail this if we're doing SCSI operations
+	/* We can fail try_module_get if we're doing SCSI operations
 	 * from module exit (like cache flush) */
-	try_module_get(sdev->host->hostt->module);
+	__module_get(sdev->host->hostt->module);
 
 	return 0;
 }
@@ -1004,14 +1004,7 @@ EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-#ifdef CONFIG_MODULE_UNLOAD
-	struct module *module = sdev->host->hostt->module;
-
-	/* The module refcount will be zero if scsi_device_get()
-	 * was called from a module removal routine */
-	if (module && module_refcount(module) != 0)
-		module_put(module);
-#endif
+	module_put(sdev->host->hostt->module);
 	put_device(&sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_device_put);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-18 16:55 module: fix module_refcount() return when running in a module exit routine James Bottomley
  2015-01-18 23:37   ` Rusty Russell
@ 2015-01-19  5:18 ` Masami Hiramatsu
  2015-01-19  5:51   ` Rusty Russell
  1 sibling, 1 reply; 22+ messages in thread
From: Masami Hiramatsu @ 2015-01-19  5:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: rusty, linux-kernel, linux-scsi

(2015/01/19 1:55), James Bottomley wrote:
> From: James Bottomley <JBottomley@Parallels.com>
> 
> After e513cc1 module: Remove stop_machine from module unloading,
> module_refcount() is returning (unsigned long)-1 when called from within
> a routine that runs in module_exit.  This is confusing the scsi device
> put code which is coded to detect a module_refcount() of zero for
> running within a module exit routine and not try to do another
> module_put.  The fix is to restore the original behaviour of
> module_refcount() and return zero if we're running inside an exit
> routine.
> 
> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
> Reported-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> 

Yes, this should be fixed as you said, since it must return "unsigned long" value.

Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>

Just one comment below;

> diff --git a/kernel/module.c b/kernel/module.c
> index 3965511..c33a113 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -774,7 +774,12 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
>  
>  unsigned long module_refcount(struct module *mod)
>  {
> -	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> +	unsigned long ret = atomic_read(&mod->refcnt);
> +
> +	if (ret == 0)

This would better be "if (unlikely(ret == 0))".

Thank you,

> +		/* ref is already zero (probably in module exit) */
> +		return 0;
> +	return ret - MODULE_REF_BASE;
>  }
>  EXPORT_SYMBOL(module_refcount);
>  
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-19  5:18 ` Masami Hiramatsu
@ 2015-01-19  5:51   ` Rusty Russell
  2015-01-19  8:28     ` Christoph Hellwig
  2015-01-19 16:08     ` James Bottomley
  0 siblings, 2 replies; 22+ messages in thread
From: Rusty Russell @ 2015-01-19  5:51 UTC (permalink / raw)
  To: Masami Hiramatsu, James Bottomley; +Cc: linux-kernel, linux-scsi

Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> (2015/01/19 1:55), James Bottomley wrote:
>> From: James Bottomley <JBottomley@Parallels.com>
>> 
>> After e513cc1 module: Remove stop_machine from module unloading,
>> module_refcount() is returning (unsigned long)-1 when called from within
>> a routine that runs in module_exit.  This is confusing the scsi device
>> put code which is coded to detect a module_refcount() of zero for
>> running within a module exit routine and not try to do another
>> module_put.  The fix is to restore the original behaviour of
>> module_refcount() and return zero if we're running inside an exit
>> routine.
>> 
>> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
>> Reported-by: Bart Van Assche <bvanassche@acm.org>
>> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
>> 
>
> Yes, this should be fixed as you said, since it must return "unsigned long" value.

But there are only three non-module callers:

drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
kernel/module.c:775:unsigned long module_refcount(struct module *mod)
kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));

The first one I think should be eliminated, and the second one is simply
an assertion before calling module_put() (which should probably be
eliminated).  The others are just printing information.

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-19  5:51   ` Rusty Russell
@ 2015-01-19  8:28     ` Christoph Hellwig
  2015-01-19 16:07       ` James Bottomley
  2015-01-19 16:08     ` James Bottomley
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-01-19  8:28 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Masami Hiramatsu, James Bottomley, linux-kernel, linux-scsi

On Mon, Jan 19, 2015 at 04:21:15PM +1030, Rusty Russell wrote:
> The first one I think should be eliminated, and the second one is simply
> an assertion before calling module_put() (which should probably be
> eliminated).  The others are just printing information.

FYI, I've got a pathcset to eliminate the use of module_refcount in
SCSI, which was a horrible hack to start with, but it needs a little more
clarification / work, so I'd prefer to do it for 3.20.  Bart has a fix
that eliminates it for 3.19, which piles aother bandaid over the bandaid
that introduced the use module_refcount, and James doesn't seem to like
it.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-19  8:28     ` Christoph Hellwig
@ 2015-01-19 16:07       ` James Bottomley
  0 siblings, 0 replies; 22+ messages in thread
From: James Bottomley @ 2015-01-19 16:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rusty Russell, Masami Hiramatsu, linux-kernel, linux-scsi

On Mon, 2015-01-19 at 00:28 -0800, Christoph Hellwig wrote:
> On Mon, Jan 19, 2015 at 04:21:15PM +1030, Rusty Russell wrote:
> > The first one I think should be eliminated, and the second one is simply
> > an assertion before calling module_put() (which should probably be
> > eliminated).  The others are just printing information.
> 
> FYI, I've got a pathcset to eliminate the use of module_refcount in
> SCSI, which was a horrible hack to start with, but it needs a little more
> clarification / work, so I'd prefer to do it for 3.20.  Bart has a fix
> that eliminates it for 3.19, which piles aother bandaid over the bandaid
> that introduced the use module_refcount, and James doesn't seem to like
> it.

I don't like adding another refcount where one already exists because
it's pointless duplication.  Rusty's fix is fine because it captures the
intention of the use in scsi_device_get, so let's go with it.  You can
add an Acked-by from me to his patch.

James




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-19  5:51   ` Rusty Russell
  2015-01-19  8:28     ` Christoph Hellwig
@ 2015-01-19 16:08     ` James Bottomley
  2015-01-20  0:45       ` Rusty Russell
  1 sibling, 1 reply; 22+ messages in thread
From: James Bottomley @ 2015-01-19 16:08 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Masami Hiramatsu, linux-kernel, linux-scsi

On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> > (2015/01/19 1:55), James Bottomley wrote:
> >> From: James Bottomley <JBottomley@Parallels.com>
> >> 
> >> After e513cc1 module: Remove stop_machine from module unloading,
> >> module_refcount() is returning (unsigned long)-1 when called from within
> >> a routine that runs in module_exit.  This is confusing the scsi device
> >> put code which is coded to detect a module_refcount() of zero for
> >> running within a module exit routine and not try to do another
> >> module_put.  The fix is to restore the original behaviour of
> >> module_refcount() and return zero if we're running inside an exit
> >> routine.
> >> 
> >> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
> >> Reported-by: Bart Van Assche <bvanassche@acm.org>
> >> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> >> 
> >
> > Yes, this should be fixed as you said, since it must return "unsigned long" value.
> 
> But there are only three non-module callers:
> 
> drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
> kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
> kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
> kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> 
> The first one I think should be eliminated, and the second one is simply
> an assertion before calling module_put() (which should probably be
> eliminated).  The others are just printing information.

If you really want to insist on module_reference() returning -1 when the
module is in it's exit phase, OK, but in that case, I think it should
return a signed value, not an unsigned one.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-19 16:08     ` James Bottomley
@ 2015-01-20  0:45       ` Rusty Russell
  2015-01-20  2:17         ` Masami Hiramatsu
  2015-01-20 17:23         ` James Bottomley
  0 siblings, 2 replies; 22+ messages in thread
From: Rusty Russell @ 2015-01-20  0:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: Masami Hiramatsu, linux-kernel, linux-scsi

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
>> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
>> > (2015/01/19 1:55), James Bottomley wrote:
>> >> From: James Bottomley <JBottomley@Parallels.com>
>> >> 
>> >> After e513cc1 module: Remove stop_machine from module unloading,
>> >> module_refcount() is returning (unsigned long)-1 when called from within
>> >> a routine that runs in module_exit.  This is confusing the scsi device
>> >> put code which is coded to detect a module_refcount() of zero for
>> >> running within a module exit routine and not try to do another
>> >> module_put.  The fix is to restore the original behaviour of
>> >> module_refcount() and return zero if we're running inside an exit
>> >> routine.
>> >> 
>> >> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
>> >> Reported-by: Bart Van Assche <bvanassche@acm.org>
>> >> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
>> >> 
>> >
>> > Yes, this should be fixed as you said, since it must return "unsigned long" value.
>> 
>> But there are only three non-module callers:
>> 
>> drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
>> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
>> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
>> kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
>> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
>> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
>> kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
>> kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
>> 
>> The first one I think should be eliminated, and the second one is simply
>> an assertion before calling module_put() (which should probably be
>> eliminated).  The others are just printing information.
>
> If you really want to insist on module_reference() returning -1 when the
> module is in it's exit phase, OK, but in that case, I think it should
> return a signed value, not an unsigned one.

Sure; I just didn't want to paper over the problem here.  And I'm not
sure we want to lose information, eg. in kgdb we're presumably looking
at it because something went wrong...

Thanks,
Rusty.

Subject: module: make module_refcount() a signed integer.

James Bottomley points out that it will be -1 during unload.  It's
only used for diagnostics, so let's not hide that as it could be a
clue as to what's gone wrong.

Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/include/linux/module.h b/include/linux/module.h
index ebfb0e153c6a..b653d7c0a05a 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long code)
 #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
 
 #ifdef CONFIG_MODULE_UNLOAD
-unsigned long module_refcount(struct module *mod);
+int module_refcount(struct module *mod);
 void __symbol_put(const char *symbol);
 #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
 void symbol_put_addr(void *addr);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index f191bddf64b8..7b40c5f07dce 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
 		kdb_printf("%-20s%8u  0x%p ", mod->name,
 			   mod->core_size, (void *)mod);
 #ifdef CONFIG_MODULE_UNLOAD
-		kdb_printf("%4ld ", module_refcount(mod));
+		kdb_printf("%4d ", module_refcount(mod));
 #endif
 		if (mod->state == MODULE_STATE_GOING)
 			kdb_printf(" (Unloading)");
diff --git a/kernel/module.c b/kernel/module.c
index 3965511ae133..2387c98347c1 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -772,9 +772,9 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
 	return 0;
 }
 
-unsigned long module_refcount(struct module *mod)
+int module_refcount(struct module *mod)
 {
-	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
+	return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
 }
 EXPORT_SYMBOL(module_refcount);
 
@@ -856,7 +856,7 @@ static inline void print_unload_info(struct seq_file *m, struct module *mod)
 	struct module_use *use;
 	int printed_something = 0;
 
-	seq_printf(m, " %lu ", module_refcount(mod));
+	seq_printf(m, " %i ", module_refcount(mod));
 
 	/*
 	 * Always include a trailing , so userspace can differentiate
@@ -908,7 +908,7 @@ EXPORT_SYMBOL_GPL(symbol_put_addr);
 static ssize_t show_refcnt(struct module_attribute *mattr,
 			   struct module_kobject *mk, char *buffer)
 {
-	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
+	return sprintf(buffer, "%i\n", module_refcount(mk->mod));
 }
 
 static struct module_attribute modinfo_refcnt =

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-20  0:45       ` Rusty Russell
@ 2015-01-20  2:17         ` Masami Hiramatsu
  2015-01-20 17:23         ` James Bottomley
  1 sibling, 0 replies; 22+ messages in thread
From: Masami Hiramatsu @ 2015-01-20  2:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: James Bottomley, linux-kernel, linux-scsi

(2015/01/20 9:45), Rusty Russell wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
>> On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
>>> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
>>>> (2015/01/19 1:55), James Bottomley wrote:
>>>>> From: James Bottomley <JBottomley@Parallels.com>
>>>>>
>>>>> After e513cc1 module: Remove stop_machine from module unloading,
>>>>> module_refcount() is returning (unsigned long)-1 when called from within
>>>>> a routine that runs in module_exit.  This is confusing the scsi device
>>>>> put code which is coded to detect a module_refcount() of zero for
>>>>> running within a module exit routine and not try to do another
>>>>> module_put.  The fix is to restore the original behaviour of
>>>>> module_refcount() and return zero if we're running inside an exit
>>>>> routine.
>>>>>
>>>>> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
>>>>> Reported-by: Bart Van Assche <bvanassche@acm.org>
>>>>> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
>>>>>
>>>>
>>>> Yes, this should be fixed as you said, since it must return "unsigned long" value.
>>>
>>> But there are only three non-module callers:
>>>
>>> drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
>>> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
>>> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
>>> kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
>>> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
>>> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
>>> kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
>>> kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
>>>
>>> The first one I think should be eliminated, and the second one is simply
>>> an assertion before calling module_put() (which should probably be
>>> eliminated).  The others are just printing information.
>>
>> If you really want to insist on module_reference() returning -1 when the
>> module is in it's exit phase, OK, but in that case, I think it should
>> return a signed value, not an unsigned one.
> 
> Sure; I just didn't want to paper over the problem here.  And I'm not
> sure we want to lose information, eg. in kgdb we're presumably looking
> at it because something went wrong...
> 
> Thanks,
> Rusty.
> 
> Subject: module: make module_refcount() a signed integer.
> 
> James Bottomley points out that it will be -1 during unload.  It's
> only used for diagnostics, so let's not hide that as it could be a
> clue as to what's gone wrong.

Thanks!
This looks good to me too :)

Reviewed-by: Masami Hiramatsu <maasami.hiramatsu.pt@hitachi.com>

> 
> Cc: Jason Wessel <jason.wessel@windriver.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index ebfb0e153c6a..b653d7c0a05a 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long code)
>  #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
>  
>  #ifdef CONFIG_MODULE_UNLOAD
> -unsigned long module_refcount(struct module *mod);
> +int module_refcount(struct module *mod);
>  void __symbol_put(const char *symbol);
>  #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
>  void symbol_put_addr(void *addr);
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index f191bddf64b8..7b40c5f07dce 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
>  		kdb_printf("%-20s%8u  0x%p ", mod->name,
>  			   mod->core_size, (void *)mod);
>  #ifdef CONFIG_MODULE_UNLOAD
> -		kdb_printf("%4ld ", module_refcount(mod));
> +		kdb_printf("%4d ", module_refcount(mod));
>  #endif
>  		if (mod->state == MODULE_STATE_GOING)
>  			kdb_printf(" (Unloading)");
> diff --git a/kernel/module.c b/kernel/module.c
> index 3965511ae133..2387c98347c1 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -772,9 +772,9 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
>  	return 0;
>  }
>  
> -unsigned long module_refcount(struct module *mod)
> +int module_refcount(struct module *mod)
>  {
> -	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> +	return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
>  }
>  EXPORT_SYMBOL(module_refcount);
>  
> @@ -856,7 +856,7 @@ static inline void print_unload_info(struct seq_file *m, struct module *mod)
>  	struct module_use *use;
>  	int printed_something = 0;
>  
> -	seq_printf(m, " %lu ", module_refcount(mod));
> +	seq_printf(m, " %i ", module_refcount(mod));
>  
>  	/*
>  	 * Always include a trailing , so userspace can differentiate
> @@ -908,7 +908,7 @@ EXPORT_SYMBOL_GPL(symbol_put_addr);
>  static ssize_t show_refcnt(struct module_attribute *mattr,
>  			   struct module_kobject *mk, char *buffer)
>  {
> -	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> +	return sprintf(buffer, "%i\n", module_refcount(mk->mod));
>  }
>  
>  static struct module_attribute modinfo_refcnt =
> 


-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu.pt@hitachi.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-20  0:45       ` Rusty Russell
  2015-01-20  2:17         ` Masami Hiramatsu
@ 2015-01-20 17:23         ` James Bottomley
  2015-01-21  5:30           ` Rusty Russell
  2015-01-22 16:50           ` Christoph Hellwig
  1 sibling, 2 replies; 22+ messages in thread
From: James Bottomley @ 2015-01-20 17:23 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Masami Hiramatsu, linux-kernel, linux-scsi

On Tue, 2015-01-20 at 11:15 +1030, Rusty Russell wrote:
> James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
> >> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> >> > (2015/01/19 1:55), James Bottomley wrote:
> >> >> From: James Bottomley <JBottomley@Parallels.com>
> >> >> 
> >> >> After e513cc1 module: Remove stop_machine from module unloading,
> >> >> module_refcount() is returning (unsigned long)-1 when called from within
> >> >> a routine that runs in module_exit.  This is confusing the scsi device
> >> >> put code which is coded to detect a module_refcount() of zero for
> >> >> running within a module exit routine and not try to do another
> >> >> module_put.  The fix is to restore the original behaviour of
> >> >> module_refcount() and return zero if we're running inside an exit
> >> >> routine.
> >> >> 
> >> >> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
> >> >> Reported-by: Bart Van Assche <bvanassche@acm.org>
> >> >> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> >> >> 
> >> >
> >> > Yes, this should be fixed as you said, since it must return "unsigned long" value.
> >> 
> >> But there are only three non-module callers:
> >> 
> >> drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
> >> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
> >> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
> >> kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
> >> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
> >> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
> >> kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
> >> kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> >> 
> >> The first one I think should be eliminated, and the second one is simply
> >> an assertion before calling module_put() (which should probably be
> >> eliminated).  The others are just printing information.
> >
> > If you really want to insist on module_reference() returning -1 when the
> > module is in it's exit phase, OK, but in that case, I think it should
> > return a signed value, not an unsigned one.
> 
> Sure; I just didn't want to paper over the problem here.  And I'm not
> sure we want to lose information, eg. in kgdb we're presumably looking
> at it because something went wrong...
> 
> Thanks,
> Rusty.
> 
> Subject: module: make module_refcount() a signed integer.
> 
> James Bottomley points out that it will be -1 during unload.  It's
> only used for diagnostics, so let's not hide that as it could be a
> clue as to what's gone wrong.
> 
> Cc: Jason Wessel <jason.wessel@windriver.com>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> 
> diff --git a/include/linux/module.h b/include/linux/module.h
> index ebfb0e153c6a..b653d7c0a05a 100644
> --- a/include/linux/module.h
> +++ b/include/linux/module.h
> @@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long code)
>  #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
>  
>  #ifdef CONFIG_MODULE_UNLOAD
> -unsigned long module_refcount(struct module *mod);
> +int module_refcount(struct module *mod);
>  void __symbol_put(const char *symbol);
>  #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
>  void symbol_put_addr(void *addr);
> diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> index f191bddf64b8..7b40c5f07dce 100644
> --- a/kernel/debug/kdb/kdb_main.c
> +++ b/kernel/debug/kdb/kdb_main.c
> @@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
>  		kdb_printf("%-20s%8u  0x%p ", mod->name,
>  			   mod->core_size, (void *)mod);
>  #ifdef CONFIG_MODULE_UNLOAD
> -		kdb_printf("%4ld ", module_refcount(mod));
> +		kdb_printf("%4d ", module_refcount(mod));
>  #endif
>  		if (mod->state == MODULE_STATE_GOING)
>  			kdb_printf(" (Unloading)");
> diff --git a/kernel/module.c b/kernel/module.c
> index 3965511ae133..2387c98347c1 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -772,9 +772,9 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
>  	return 0;
>  }
>  
> -unsigned long module_refcount(struct module *mod)
> +int module_refcount(struct module *mod)
>  {
> -	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> +	return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
>  }
>  EXPORT_SYMBOL(module_refcount);
>  
> @@ -856,7 +856,7 @@ static inline void print_unload_info(struct seq_file *m, struct module *mod)
>  	struct module_use *use;
>  	int printed_something = 0;
>  
> -	seq_printf(m, " %lu ", module_refcount(mod));
> +	seq_printf(m, " %i ", module_refcount(mod));
>  
>  	/*
>  	 * Always include a trailing , so userspace can differentiate
> @@ -908,7 +908,7 @@ EXPORT_SYMBOL_GPL(symbol_put_addr);
>  static ssize_t show_refcnt(struct module_attribute *mattr,
>  			   struct module_kobject *mk, char *buffer)
>  {
> -	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> +	return sprintf(buffer, "%i\n", module_refcount(mk->mod));
>  }

Actually, I don't think this is enough.  Some Australian once came up
with a guide to APIs, and lectured on it at length, one of which was
that the name should be the obvious use and it is unexpected that a
refcount would go negative.  I think we could raise it from -6 on the
API scale to +3 if we add some documentation like below.

James

---

diff --git a/include/linux/module.h b/include/linux/module.h
index ebfb0e1..b653d7c 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long code)
 #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
 
 #ifdef CONFIG_MODULE_UNLOAD
-unsigned long module_refcount(struct module *mod);
+int module_refcount(struct module *mod);
 void __symbol_put(const char *symbol);
 #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
 void symbol_put_addr(void *addr);
diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
index f191bdd..7b40c5f 100644
--- a/kernel/debug/kdb/kdb_main.c
+++ b/kernel/debug/kdb/kdb_main.c
@@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
 		kdb_printf("%-20s%8u  0x%p ", mod->name,
 			   mod->core_size, (void *)mod);
 #ifdef CONFIG_MODULE_UNLOAD
-		kdb_printf("%4ld ", module_refcount(mod));
+		kdb_printf("%4d ", module_refcount(mod));
 #endif
 		if (mod->state == MODULE_STATE_GOING)
 			kdb_printf(" (Unloading)");
diff --git a/kernel/module.c b/kernel/module.c
index 3965511..63fc779 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -772,9 +772,18 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
 	return 0;
 }
 
-unsigned long module_refcount(struct module *mod)
+/**
+ * module_refcount - return the refcount or -1 if unloading
+ *
+ * @mod:	the module we're checking
+ *
+ * Returns:
+ *	-1 if the module is in the process of unloading
+ *	otherwise the number of references in the kernel to the module
+ */
+int module_refcount(struct module *mod)
 {
-	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
+	return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
 }
 EXPORT_SYMBOL(module_refcount);
 
@@ -856,7 +865,7 @@ static inline void print_unload_info(struct seq_file *m, struct module *mod)
 	struct module_use *use;
 	int printed_something = 0;
 
-	seq_printf(m, " %lu ", module_refcount(mod));
+	seq_printf(m, " %i ", module_refcount(mod));
 
 	/*
 	 * Always include a trailing , so userspace can differentiate
@@ -908,7 +917,7 @@ EXPORT_SYMBOL_GPL(symbol_put_addr);
 static ssize_t show_refcnt(struct module_attribute *mattr,
 			   struct module_kobject *mk, char *buffer)
 {
-	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
+	return sprintf(buffer, "%i\n", module_refcount(mk->mod));
 }
 
 static struct module_attribute modinfo_refcnt =



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-20 17:23         ` James Bottomley
@ 2015-01-21  5:30           ` Rusty Russell
  2015-01-22 16:50           ` Christoph Hellwig
  1 sibling, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2015-01-21  5:30 UTC (permalink / raw)
  To: James Bottomley; +Cc: Masami Hiramatsu, linux-kernel, linux-scsi

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> Actually, I don't think this is enough.  Some Australian once came up
> with a guide to APIs, and lectured on it at length, one of which was
> that the name should be the obvious use and it is unexpected that a
> refcount would go negative.  I think we could raise it from -6 on the
> API scale to +3 if we add some documentation like below.

Thanks, applied!

Cheers,
Rusty.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-20 17:23         ` James Bottomley
  2015-01-21  5:30           ` Rusty Russell
@ 2015-01-22 16:50           ` Christoph Hellwig
  2015-01-22 17:02             ` James Bottomley
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-01-22 16:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Rusty Russell, Masami Hiramatsu, linux-kernel, linux-scsi

On Tue, Jan 20, 2015 at 09:23:35AM -0800, James Bottomley wrote:
> On Tue, 2015-01-20 at 11:15 +1030, Rusty Russell wrote:
> > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > > On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
> > >> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> > >> > (2015/01/19 1:55), James Bottomley wrote:
> > >> >> From: James Bottomley <JBottomley@Parallels.com>
> > >> >> 
> > >> >> After e513cc1 module: Remove stop_machine from module unloading,
> > >> >> module_refcount() is returning (unsigned long)-1 when called from within
> > >> >> a routine that runs in module_exit.  This is confusing the scsi device
> > >> >> put code which is coded to detect a module_refcount() of zero for
> > >> >> running within a module exit routine and not try to do another
> > >> >> module_put.  The fix is to restore the original behaviour of
> > >> >> module_refcount() and return zero if we're running inside an exit
> > >> >> routine.
> > >> >> 
> > >> >> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
> > >> >> Reported-by: Bart Van Assche <bvanassche@acm.org>
> > >> >> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> > >> >> 
> > >> >
> > >> > Yes, this should be fixed as you said, since it must return "unsigned long" value.
> > >> 
> > >> But there are only three non-module callers:
> > >> 
> > >> drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
> > >> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
> > >> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
> > >> kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
> > >> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
> > >> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
> > >> kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
> > >> kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> > >> 
> > >> The first one I think should be eliminated, and the second one is simply
> > >> an assertion before calling module_put() (which should probably be
> > >> eliminated).  The others are just printing information.
> > >
> > > If you really want to insist on module_reference() returning -1 when the
> > > module is in it's exit phase, OK, but in that case, I think it should
> > > return a signed value, not an unsigned one.
> > 
> > Sure; I just didn't want to paper over the problem here.  And I'm not
> > sure we want to lose information, eg. in kgdb we're presumably looking
> > at it because something went wrong...
> > 
> > Thanks,
> > Rusty.
> > 
> > Subject: module: make module_refcount() a signed integer.
> > 
> > James Bottomley points out that it will be -1 during unload.  It's
> > only used for diagnostics, so let's not hide that as it could be a
> > clue as to what's gone wrong.
> > 
> > Cc: Jason Wessel <jason.wessel@windriver.com>
> > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > 
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index ebfb0e153c6a..b653d7c0a05a 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long code)
> >  #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
> >  
> >  #ifdef CONFIG_MODULE_UNLOAD
> > -unsigned long module_refcount(struct module *mod);
> > +int module_refcount(struct module *mod);
> >  void __symbol_put(const char *symbol);
> >  #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
> >  void symbol_put_addr(void *addr);
> > diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> > index f191bddf64b8..7b40c5f07dce 100644
> > --- a/kernel/debug/kdb/kdb_main.c
> > +++ b/kernel/debug/kdb/kdb_main.c
> > @@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
> >  		kdb_printf("%-20s%8u  0x%p ", mod->name,
> >  			   mod->core_size, (void *)mod);
> >  #ifdef CONFIG_MODULE_UNLOAD
> > -		kdb_printf("%4ld ", module_refcount(mod));
> > +		kdb_printf("%4d ", module_refcount(mod));
> >  #endif
> >  		if (mod->state == MODULE_STATE_GOING)
> >  			kdb_printf(" (Unloading)");
> > diff --git a/kernel/module.c b/kernel/module.c
> > index 3965511ae133..2387c98347c1 100644
> > --- a/kernel/module.c
> > +++ b/kernel/module.c
> > @@ -772,9 +772,9 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
> >  	return 0;
> >  }
> >  
> > -unsigned long module_refcount(struct module *mod)
> > +int module_refcount(struct module *mod)
> >  {
> > -	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> > +	return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> >  }
> >  EXPORT_SYMBOL(module_refcount);
> >  
> > @@ -856,7 +856,7 @@ static inline void print_unload_info(struct seq_file *m, struct module *mod)
> >  	struct module_use *use;
> >  	int printed_something = 0;
> >  
> > -	seq_printf(m, " %lu ", module_refcount(mod));
> > +	seq_printf(m, " %i ", module_refcount(mod));
> >  
> >  	/*
> >  	 * Always include a trailing , so userspace can differentiate
> > @@ -908,7 +908,7 @@ EXPORT_SYMBOL_GPL(symbol_put_addr);
> >  static ssize_t show_refcnt(struct module_attribute *mattr,
> >  			   struct module_kobject *mk, char *buffer)
> >  {
> > -	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> > +	return sprintf(buffer, "%i\n", module_refcount(mk->mod));
> >  }
> 
> Actually, I don't think this is enough.  Some Australian once came up
> with a guide to APIs, and lectured on it at length, one of which was
> that the name should be the obvious use and it is unexpected that a
> refcount would go negative.  I think we could raise it from -6 on the
> API scale to +3 if we add some documentation like below.

We'll also still need to change scsi_device_put to deal with
a negative refcount..

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-22 16:50           ` Christoph Hellwig
@ 2015-01-22 17:02             ` James Bottomley
  2015-01-23  2:54               ` Rusty Russell
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2015-01-22 17:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rusty Russell, Masami Hiramatsu, linux-kernel, linux-scsi

On Thu, 2015-01-22 at 08:50 -0800, Christoph Hellwig wrote:
> On Tue, Jan 20, 2015 at 09:23:35AM -0800, James Bottomley wrote:
> > On Tue, 2015-01-20 at 11:15 +1030, Rusty Russell wrote:
> > > James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> > > > On Mon, 2015-01-19 at 16:21 +1030, Rusty Russell wrote:
> > > >> Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> writes:
> > > >> > (2015/01/19 1:55), James Bottomley wrote:
> > > >> >> From: James Bottomley <JBottomley@Parallels.com>
> > > >> >> 
> > > >> >> After e513cc1 module: Remove stop_machine from module unloading,
> > > >> >> module_refcount() is returning (unsigned long)-1 when called from within
> > > >> >> a routine that runs in module_exit.  This is confusing the scsi device
> > > >> >> put code which is coded to detect a module_refcount() of zero for
> > > >> >> running within a module exit routine and not try to do another
> > > >> >> module_put.  The fix is to restore the original behaviour of
> > > >> >> module_refcount() and return zero if we're running inside an exit
> > > >> >> routine.
> > > >> >> 
> > > >> >> Fixes: e513cc1c07e2ab93a4514eec9833e031df3e30bb
> > > >> >> Reported-by: Bart Van Assche <bvanassche@acm.org>
> > > >> >> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
> > > >> >> 
> > > >> >
> > > >> > Yes, this should be fixed as you said, since it must return "unsigned long" value.
> > > >> 
> > > >> But there are only three non-module callers:
> > > >> 
> > > >> drivers/scsi/scsi.c:1012:	if (module && module_refcount(module) != 0)
> > > >> drivers/staging/lustre/lustre/obdclass/lu_object.c:1359:			LINVRNT(module_refcount(key->lct_owner) > 0);
> > > >> include/linux/module.h:447:unsigned long module_refcount(struct module *mod);
> > > >> kernel/debug/kdb/kdb_main.c:2026:		kdb_printf("%4ld ", module_refcount(mod));
> > > >> kernel/module.c:775:unsigned long module_refcount(struct module *mod)
> > > >> kernel/module.c:779:EXPORT_SYMBOL(module_refcount);
> > > >> kernel/module.c:859:	seq_printf(m, " %lu ", module_refcount(mod));
> > > >> kernel/module.c:911:	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> > > >> 
> > > >> The first one I think should be eliminated, and the second one is simply
> > > >> an assertion before calling module_put() (which should probably be
> > > >> eliminated).  The others are just printing information.
> > > >
> > > > If you really want to insist on module_reference() returning -1 when the
> > > > module is in it's exit phase, OK, but in that case, I think it should
> > > > return a signed value, not an unsigned one.
> > > 
> > > Sure; I just didn't want to paper over the problem here.  And I'm not
> > > sure we want to lose information, eg. in kgdb we're presumably looking
> > > at it because something went wrong...
> > > 
> > > Thanks,
> > > Rusty.
> > > 
> > > Subject: module: make module_refcount() a signed integer.
> > > 
> > > James Bottomley points out that it will be -1 during unload.  It's
> > > only used for diagnostics, so let's not hide that as it could be a
> > > clue as to what's gone wrong.
> > > 
> > > Cc: Jason Wessel <jason.wessel@windriver.com>
> > > Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> > > 
> > > diff --git a/include/linux/module.h b/include/linux/module.h
> > > index ebfb0e153c6a..b653d7c0a05a 100644
> > > --- a/include/linux/module.h
> > > +++ b/include/linux/module.h
> > > @@ -444,7 +444,7 @@ extern void __module_put_and_exit(struct module *mod, long code)
> > >  #define module_put_and_exit(code) __module_put_and_exit(THIS_MODULE, code)
> > >  
> > >  #ifdef CONFIG_MODULE_UNLOAD
> > > -unsigned long module_refcount(struct module *mod);
> > > +int module_refcount(struct module *mod);
> > >  void __symbol_put(const char *symbol);
> > >  #define symbol_put(x) __symbol_put(VMLINUX_SYMBOL_STR(x))
> > >  void symbol_put_addr(void *addr);
> > > diff --git a/kernel/debug/kdb/kdb_main.c b/kernel/debug/kdb/kdb_main.c
> > > index f191bddf64b8..7b40c5f07dce 100644
> > > --- a/kernel/debug/kdb/kdb_main.c
> > > +++ b/kernel/debug/kdb/kdb_main.c
> > > @@ -2023,7 +2023,7 @@ static int kdb_lsmod(int argc, const char **argv)
> > >  		kdb_printf("%-20s%8u  0x%p ", mod->name,
> > >  			   mod->core_size, (void *)mod);
> > >  #ifdef CONFIG_MODULE_UNLOAD
> > > -		kdb_printf("%4ld ", module_refcount(mod));
> > > +		kdb_printf("%4d ", module_refcount(mod));
> > >  #endif
> > >  		if (mod->state == MODULE_STATE_GOING)
> > >  			kdb_printf(" (Unloading)");
> > > diff --git a/kernel/module.c b/kernel/module.c
> > > index 3965511ae133..2387c98347c1 100644
> > > --- a/kernel/module.c
> > > +++ b/kernel/module.c
> > > @@ -772,9 +772,9 @@ static int try_stop_module(struct module *mod, int flags, int *forced)
> > >  	return 0;
> > >  }
> > >  
> > > -unsigned long module_refcount(struct module *mod)
> > > +int module_refcount(struct module *mod)
> > >  {
> > > -	return (unsigned long)atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> > > +	return atomic_read(&mod->refcnt) - MODULE_REF_BASE;
> > >  }
> > >  EXPORT_SYMBOL(module_refcount);
> > >  
> > > @@ -856,7 +856,7 @@ static inline void print_unload_info(struct seq_file *m, struct module *mod)
> > >  	struct module_use *use;
> > >  	int printed_something = 0;
> > >  
> > > -	seq_printf(m, " %lu ", module_refcount(mod));
> > > +	seq_printf(m, " %i ", module_refcount(mod));
> > >  
> > >  	/*
> > >  	 * Always include a trailing , so userspace can differentiate
> > > @@ -908,7 +908,7 @@ EXPORT_SYMBOL_GPL(symbol_put_addr);
> > >  static ssize_t show_refcnt(struct module_attribute *mattr,
> > >  			   struct module_kobject *mk, char *buffer)
> > >  {
> > > -	return sprintf(buffer, "%lu\n", module_refcount(mk->mod));
> > > +	return sprintf(buffer, "%i\n", module_refcount(mk->mod));
> > >  }
> > 
> > Actually, I don't think this is enough.  Some Australian once came up
> > with a guide to APIs, and lectured on it at length, one of which was
> > that the name should be the obvious use and it is unexpected that a
> > refcount would go negative.  I think we could raise it from -6 on the
> > API scale to +3 if we add some documentation like below.
> 
> We'll also still need to change scsi_device_put to deal with
> a negative refcount..

I don't believe so ... we never call module_refcount() now, and the
actual module ref count never goes negative.  That's the point of
Rusty's change.  In the unload routines, the actual module refcount is
zero.  __module_get() will increment this and the final module_put()
will decrement it without triggering a warning.

In the old code, we expected try_module_get() to fail but then used
module_refcount() to detect the failure and not do a put.

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-22 17:02             ` James Bottomley
@ 2015-01-23  2:54               ` Rusty Russell
  2015-01-23 13:17                 ` Christoph Hellwig
  0 siblings, 1 reply; 22+ messages in thread
From: Rusty Russell @ 2015-01-23  2:54 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Masami Hiramatsu, linux-kernel, linux-scsi

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Thu, 2015-01-22 at 08:50 -0800, Christoph Hellwig wrote:
>> We'll also still need to change scsi_device_put to deal with
>> a negative refcount..
>
> I don't believe so ... we never call module_refcount() now, and the
> actual module ref count never goes negative.  That's the point of
> Rusty's change.  In the unload routines, the actual module refcount is
> zero.  __module_get() will increment this and the final module_put()
> will decrement it without triggering a warning.
>
> In the old code, we expected try_module_get() to fail but then used
> module_refcount() to detect the failure and not do a put.

Yep, I've put that patch in my fixes queue now.

Thanks,
Rusty.

From: Rusty Russell <rusty@rustcorp.com.au>
Subject: scsi: always increment reference count

James reported:
> After e513cc1 module: Remove stop_machine from module unloading,
> module_refcount() is returning (unsigned long)-1 when called from within
> a routine that runs in module_exit.  This is confusing the scsi device
> put code which is coded to detect a module_refcount() of zero for
> running within a module exit routine and not try to do another
> module_put.  The fix is to restore the original behaviour of
> module_refcount() and return zero if we're running inside an exit
> routine.

The correct fix is to turn try_module_get() into __module_get(), and
always do the module_put().

Acked-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index e02885451425..9b3829931f40 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -986,9 +986,9 @@ int scsi_device_get(struct scsi_device *sdev)
 		return -ENXIO;
 	if (!get_device(&sdev->sdev_gendev))
 		return -ENXIO;
-	/* We can fail this if we're doing SCSI operations
+	/* We can fail try_module_get if we're doing SCSI operations
 	 * from module exit (like cache flush) */
-	try_module_get(sdev->host->hostt->module);
+	__module_get(sdev->host->hostt->module);
 
 	return 0;
 }
@@ -1004,14 +1004,7 @@ EXPORT_SYMBOL(scsi_device_get);
  */
 void scsi_device_put(struct scsi_device *sdev)
 {
-#ifdef CONFIG_MODULE_UNLOAD
-	struct module *module = sdev->host->hostt->module;
-
-	/* The module refcount will be zero if scsi_device_get()
-	 * was called from a module removal routine */
-	if (module && module_refcount(module) != 0)
-		module_put(module);
-#endif
+	module_put(sdev->host->hostt->module);
 	put_device(&sdev->sdev_gendev);
 }
 EXPORT_SYMBOL(scsi_device_put);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-23  2:54               ` Rusty Russell
@ 2015-01-23 13:17                 ` Christoph Hellwig
  2015-01-23 18:42                   ` James Bottomley
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Hellwig @ 2015-01-23 13:17 UTC (permalink / raw)
  To: Rusty Russell; +Cc: James Bottomley, Masami Hiramatsu, linux-kernel, linux-scsi

On Fri, Jan 23, 2015 at 01:24:15PM +1030, Rusty Russell wrote:
> The correct fix is to turn try_module_get() into __module_get(), and
> always do the module_put().

Is this really safe?  __module_get sais it needs a non-zero refcount
to start with, but scsi_device_get is the only thing every incrementing
the refcount on the module pointer in the scsi host template, so
exactly that case can happen easily.  There's not assert actually
hardcoding the assumption, so I'm not sure if requirement the comment
really just is nice to have or a strong requirement.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-23 13:17                 ` Christoph Hellwig
@ 2015-01-23 18:42                   ` James Bottomley
  2015-01-23 23:35                     ` Rusty Russell
                                       ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: James Bottomley @ 2015-01-23 18:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Rusty Russell, Masami Hiramatsu, linux-kernel, linux-scsi

On Fri, 2015-01-23 at 05:17 -0800, Christoph Hellwig wrote:
> On Fri, Jan 23, 2015 at 01:24:15PM +1030, Rusty Russell wrote:
> > The correct fix is to turn try_module_get() into __module_get(), and
> > always do the module_put().
> 
> Is this really safe?  __module_get sais it needs a non-zero refcount
> to start with, but scsi_device_get is the only thing every incrementing
> the refcount on the module pointer in the scsi host template, so
> exactly that case can happen easily.  There's not assert actually
> hardcoding the assumption, so I'm not sure if requirement the comment
> really just is nice to have or a strong requirement.

The comment was just documenting the old status quo: the
try_module_get() was expected to fail if called within the host module
remove path.  If you look at the flow, we use the refcounts on the
actual scsi device to measure.  If they fail we know we have a problem.
The module stuff is really only slaved to our master refcount on the
device.  It's purpose is to prevent an inopportune rmmod.  Our default
operating state is zero references on everything, so the user can just
do rmmod ... obviously if the device is open or mounted then we hold
both the device and the module.

To that point, Rusty's patch just keeps the status quo in the new
module_refcount() environment, so it's the quick bandaid.

I think the use case you're worrying about is what happens if someone
tries to use a device after module removal begins executing but before
the device has been deleted (say by opening it)?  We'll exit the device
removal routines and then kill the module, because after the module code
gets to ->exit(), nothing re-checks the module refcount, so the host
module will get free'd while we're still using the device.

The fix for this seems to be to differentiate between special uses of
scsi_get_device, which are allowed to get the device in the module exit
routines and ordinary uses which aren't.  Something like this? (the
patch isn't complete, but you get the idea).

James

---

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 08c90a7..31ba254 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -965,6 +965,15 @@ int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
 }
 EXPORT_SYMBOL(scsi_report_opcode);
 
+static int __scsi_device_get_common(struct scsi_device *sdev)
+{
+	if (sdev->sdev_state == SDEV_DEL)
+		return -ENXIO;
+	if (!get_device(&sdev->sdev_gendev))
+		return -ENXIO;
+	return 0;
+}
+
 /**
  * scsi_device_get  -  get an additional reference to a scsi_device
  * @sdev:	device to get a reference to
@@ -975,19 +984,45 @@ EXPORT_SYMBOL(scsi_report_opcode);
  */
 int scsi_device_get(struct scsi_device *sdev)
 {
-	if (sdev->sdev_state == SDEV_DEL)
-		return -ENXIO;
-	if (!get_device(&sdev->sdev_gendev))
-		return -ENXIO;
-	/* We can fail this if we're doing SCSI operations
-	 * from module exit (like cache flush) */
-	try_module_get(sdev->host->hostt->module);
+	int ret;
 
-	return 0;
+	ret = __scsi_device_get_common(sdev);
+	if (ret)
+		return ret;
+
+	ret = try_module_get(sdev->host->hostt->module);
+
+	if (ret)
+		put_device(&sdev->sdev_gendev);
+
+	return ret;
 }
 EXPORT_SYMBOL(scsi_device_get);
 
 /**
+ * scsi_device_get_in_module_exit() - get an additional reference to a scsi_device
+ * @sdev:	device to get a reference to
+ *
+ * Functions identically to scsi_device_get() except that it unconditionally
+ * gets the module reference.  This allows it to be called from module exit
+ * routines where scsi_device_get() will fail.  This routine is still paired
+ * with scsi_device_put().
+ */
+int scsi_device_get_in_module_exit(struct scsi_device *sdev)
+{
+	int ret;
+
+	ret = __scsi_device_get_common(sdev);
+	if (ret)
+		return ret;
+
+	__module_get(sdev->host->hostt->module);
+
+	return 0;
+}
+EXPORT_SYMBOL(scsi_device_get_in_module_exit);
+
+/**
  * scsi_device_put  -  release a reference to a scsi_device
  * @sdev:	device to release a reference on.
  *
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index ebf35cb6..057604e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -564,16 +564,22 @@ static int sd_major(int major_idx)
 	}
 }
 
-static struct scsi_disk *__scsi_disk_get(struct gendisk *disk)
+static struct scsi_disk *__scsi_disk_get(struct gendisk *disk, int in_exit)
 {
 	struct scsi_disk *sdkp = NULL;
 
 	if (disk->private_data) {
+		int ret;
+
 		sdkp = scsi_disk(disk);
-		if (scsi_device_get(sdkp->device) == 0)
-			get_device(&sdkp->dev);
+		if (in_exit)
+			ret = scsi_device_get_in_module_exit(sdkp->device);
 		else
+			ret = scsi_device_get(sdkp->device);
+		if (unlikely(ret))
 			sdkp = NULL;
+		else
+			get_device(&sdkp->dev);
 	}
 	return sdkp;
 }
@@ -583,19 +589,19 @@ static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
 	struct scsi_disk *sdkp;
 
 	mutex_lock(&sd_ref_mutex);
-	sdkp = __scsi_disk_get(disk);
+	sdkp = __scsi_disk_get(disk, 0);
 	mutex_unlock(&sd_ref_mutex);
 	return sdkp;
 }
 
-static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev)
+static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev, int in_exit)
 {
 	struct scsi_disk *sdkp;
 
 	mutex_lock(&sd_ref_mutex);
 	sdkp = dev_get_drvdata(dev);
 	if (sdkp)
-		sdkp = __scsi_disk_get(sdkp->disk);
+		sdkp = __scsi_disk_get(sdkp->disk, in_exit);
 	mutex_unlock(&sd_ref_mutex);
 	return sdkp;
 }
@@ -1525,7 +1531,7 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
 
 static void sd_rescan(struct device *dev)
 {
-	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
+	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
 
 	if (sdkp) {
 		revalidate_disk(sdkp->disk);
@@ -3147,7 +3153,7 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
  */
 static void sd_shutdown(struct device *dev)
 {
-	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
+	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 1);
 
 	if (!sdkp)
 		return;         /* this can happen */
@@ -3171,7 +3177,7 @@ exit:
 
 static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
 {
-	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
+	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
 	int ret = 0;
 
 	if (!sdkp)
@@ -3213,7 +3219,7 @@ static int sd_suspend_runtime(struct device *dev)
 
 static int sd_resume(struct device *dev)
 {
-	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
+	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
 	int ret = 0;
 
 	if (!sdkp->device->manage_start_stop)
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 2e0281e..0bad37c 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -327,6 +327,7 @@ extern int scsi_unregister_device_handler(struct scsi_device_handler *scsi_dh);
 void scsi_attach_vpd(struct scsi_device *sdev);
 
 extern int scsi_device_get(struct scsi_device *);
+extern int scsi_device_get_in_module_exit(struct scsi_device *);
 extern void scsi_device_put(struct scsi_device *);
 extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *,
 					      uint, uint, u64);



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-23 18:42                   ` James Bottomley
@ 2015-01-23 23:35                     ` Rusty Russell
  2015-01-26 17:16                     ` Christoph Hellwig
  2015-01-28  9:23                     ` Bart Van Assche
  2 siblings, 0 replies; 22+ messages in thread
From: Rusty Russell @ 2015-01-23 23:35 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Masami Hiramatsu, linux-kernel, linux-scsi

James Bottomley <James.Bottomley@HansenPartnership.com> writes:
> On Fri, 2015-01-23 at 05:17 -0800, Christoph Hellwig wrote:
>> On Fri, Jan 23, 2015 at 01:24:15PM +1030, Rusty Russell wrote:
>> > The correct fix is to turn try_module_get() into __module_get(), and
>> > always do the module_put().
>> 
>> Is this really safe?  __module_get sais it needs a non-zero refcount
>> to start with, but scsi_device_get is the only thing every incrementing
>> the refcount on the module pointer in the scsi host template, so
>> exactly that case can happen easily.  There's not assert actually
>> hardcoding the assumption, so I'm not sure if requirement the comment
>> really just is nice to have or a strong requirement.

Yes, as James says, my patch makes it no worse.

The "someone else grabs a reference" can be fixed in two ways.  James'
alternate path as per below, or by waiting in the exit function.  The
latter is kind of annoying, which is why we do the whole atomic
deregistration for modules...

Cheers,
Rusty.

> The comment was just documenting the old status quo: the
> try_module_get() was expected to fail if called within the host module
> remove path.  If you look at the flow, we use the refcounts on the
> actual scsi device to measure.  If they fail we know we have a problem.
> The module stuff is really only slaved to our master refcount on the
> device.  It's purpose is to prevent an inopportune rmmod.  Our default
> operating state is zero references on everything, so the user can just
> do rmmod ... obviously if the device is open or mounted then we hold
> both the device and the module.
>
> To that point, Rusty's patch just keeps the status quo in the new
> module_refcount() environment, so it's the quick bandaid.
>
> I think the use case you're worrying about is what happens if someone
> tries to use a device after module removal begins executing but before
> the device has been deleted (say by opening it)?  We'll exit the device
> removal routines and then kill the module, because after the module code
> gets to ->exit(), nothing re-checks the module refcount, so the host
> module will get free'd while we're still using the device.
>
> The fix for this seems to be to differentiate between special uses of
> scsi_get_device, which are allowed to get the device in the module exit
> routines and ordinary uses which aren't.  Something like this? (the
> patch isn't complete, but you get the idea).
>
> James
>
> ---
>
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 08c90a7..31ba254 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -965,6 +965,15 @@ int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
>  }
>  EXPORT_SYMBOL(scsi_report_opcode);
>  
> +static int __scsi_device_get_common(struct scsi_device *sdev)
> +{
> +	if (sdev->sdev_state == SDEV_DEL)
> +		return -ENXIO;
> +	if (!get_device(&sdev->sdev_gendev))
> +		return -ENXIO;
> +	return 0;
> +}
> +
>  /**
>   * scsi_device_get  -  get an additional reference to a scsi_device
>   * @sdev:	device to get a reference to
> @@ -975,19 +984,45 @@ EXPORT_SYMBOL(scsi_report_opcode);
>   */
>  int scsi_device_get(struct scsi_device *sdev)
>  {
> -	if (sdev->sdev_state == SDEV_DEL)
> -		return -ENXIO;
> -	if (!get_device(&sdev->sdev_gendev))
> -		return -ENXIO;
> -	/* We can fail this if we're doing SCSI operations
> -	 * from module exit (like cache flush) */
> -	try_module_get(sdev->host->hostt->module);
> +	int ret;
>  
> -	return 0;
> +	ret = __scsi_device_get_common(sdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = try_module_get(sdev->host->hostt->module);
> +
> +	if (ret)
> +		put_device(&sdev->sdev_gendev);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL(scsi_device_get);
>  
>  /**
> + * scsi_device_get_in_module_exit() - get an additional reference to a scsi_device
> + * @sdev:	device to get a reference to
> + *
> + * Functions identically to scsi_device_get() except that it unconditionally
> + * gets the module reference.  This allows it to be called from module exit
> + * routines where scsi_device_get() will fail.  This routine is still paired
> + * with scsi_device_put().
> + */
> +int scsi_device_get_in_module_exit(struct scsi_device *sdev)
> +{
> +	int ret;
> +
> +	ret = __scsi_device_get_common(sdev);
> +	if (ret)
> +		return ret;
> +
> +	__module_get(sdev->host->hostt->module);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(scsi_device_get_in_module_exit);
> +
> +/**
>   * scsi_device_put  -  release a reference to a scsi_device
>   * @sdev:	device to release a reference on.
>   *
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ebf35cb6..057604e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -564,16 +564,22 @@ static int sd_major(int major_idx)
>  	}
>  }
>  
> -static struct scsi_disk *__scsi_disk_get(struct gendisk *disk)
> +static struct scsi_disk *__scsi_disk_get(struct gendisk *disk, int in_exit)
>  {
>  	struct scsi_disk *sdkp = NULL;
>  
>  	if (disk->private_data) {
> +		int ret;
> +
>  		sdkp = scsi_disk(disk);
> -		if (scsi_device_get(sdkp->device) == 0)
> -			get_device(&sdkp->dev);
> +		if (in_exit)
> +			ret = scsi_device_get_in_module_exit(sdkp->device);
>  		else
> +			ret = scsi_device_get(sdkp->device);
> +		if (unlikely(ret))
>  			sdkp = NULL;
> +		else
> +			get_device(&sdkp->dev);
>  	}
>  	return sdkp;
>  }
> @@ -583,19 +589,19 @@ static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
>  	struct scsi_disk *sdkp;
>  
>  	mutex_lock(&sd_ref_mutex);
> -	sdkp = __scsi_disk_get(disk);
> +	sdkp = __scsi_disk_get(disk, 0);
>  	mutex_unlock(&sd_ref_mutex);
>  	return sdkp;
>  }
>  
> -static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev)
> +static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev, int in_exit)
>  {
>  	struct scsi_disk *sdkp;
>  
>  	mutex_lock(&sd_ref_mutex);
>  	sdkp = dev_get_drvdata(dev);
>  	if (sdkp)
> -		sdkp = __scsi_disk_get(sdkp->disk);
> +		sdkp = __scsi_disk_get(sdkp->disk, in_exit);
>  	mutex_unlock(&sd_ref_mutex);
>  	return sdkp;
>  }
> @@ -1525,7 +1531,7 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
>  
>  static void sd_rescan(struct device *dev)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
>  
>  	if (sdkp) {
>  		revalidate_disk(sdkp->disk);
> @@ -3147,7 +3153,7 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   */
>  static void sd_shutdown(struct device *dev)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 1);
>  
>  	if (!sdkp)
>  		return;         /* this can happen */
> @@ -3171,7 +3177,7 @@ exit:
>  
>  static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
>  	int ret = 0;
>  
>  	if (!sdkp)
> @@ -3213,7 +3219,7 @@ static int sd_suspend_runtime(struct device *dev)
>  
>  static int sd_resume(struct device *dev)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
>  	int ret = 0;
>  
>  	if (!sdkp->device->manage_start_stop)
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 2e0281e..0bad37c 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -327,6 +327,7 @@ extern int scsi_unregister_device_handler(struct scsi_device_handler *scsi_dh);
>  void scsi_attach_vpd(struct scsi_device *sdev);
>  
>  extern int scsi_device_get(struct scsi_device *);
> +extern int scsi_device_get_in_module_exit(struct scsi_device *);
>  extern void scsi_device_put(struct scsi_device *);
>  extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *,
>  					      uint, uint, u64);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-23 18:42                   ` James Bottomley
  2015-01-23 23:35                     ` Rusty Russell
@ 2015-01-26 17:16                     ` Christoph Hellwig
  2015-01-28  9:23                     ` Bart Van Assche
  2 siblings, 0 replies; 22+ messages in thread
From: Christoph Hellwig @ 2015-01-26 17:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Rusty Russell, Masami Hiramatsu, linux-kernel,
	linux-scsi

On Fri, Jan 23, 2015 at 10:42:47AM -0800, James Bottomley wrote:
> To that point, Rusty's patch just keeps the status quo in the new
> module_refcount() environment, so it's the quick bandaid.
> 
> I think the use case you're worrying about is what happens if someone
> tries to use a device after module removal begins executing but before
> the device has been deleted (say by opening it)?  We'll exit the device
> removal routines and then kill the module, because after the module code
> gets to ->exit(), nothing re-checks the module refcount, so the host
> module will get free'd while we're still using the device.
> 
> The fix for this seems to be to differentiate between special uses of
> scsi_get_device, which are allowed to get the device in the module exit
> routines and ordinary uses which aren't.  Something like this? (the
> patch isn't complete, but you get the idea).

Yes, that's exactly what I worry about.  But you're right, the patch
doesn't make anything worse compared to the 3.18 and earlier status quo.

So I think I'm fine with the __module_get patch (which I assume will go
through the module tree as well?), and I'll get back to my series to
unwind our bandaids back to the start for the proper fix.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-23 18:42                   ` James Bottomley
  2015-01-23 23:35                     ` Rusty Russell
  2015-01-26 17:16                     ` Christoph Hellwig
@ 2015-01-28  9:23                     ` Bart Van Assche
  2015-01-28 21:45                       ` James Bottomley
  2 siblings, 1 reply; 22+ messages in thread
From: Bart Van Assche @ 2015-01-28  9:23 UTC (permalink / raw)
  To: James Bottomley, Christoph Hellwig
  Cc: Rusty Russell, Masami Hiramatsu, linux-kernel, linux-scsi

On 01/23/15 19:42, James Bottomley wrote:
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 08c90a7..31ba254 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -965,6 +965,15 @@ int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
>  }
>  EXPORT_SYMBOL(scsi_report_opcode);
>  
> +static int __scsi_device_get_common(struct scsi_device *sdev)
> +{
> +	if (sdev->sdev_state == SDEV_DEL)
> +		return -ENXIO;
> +	if (!get_device(&sdev->sdev_gendev))
> +		return -ENXIO;
> +	return 0;
> +}
> +
>  /**
>   * scsi_device_get  -  get an additional reference to a scsi_device
>   * @sdev:	device to get a reference to
> @@ -975,19 +984,45 @@ EXPORT_SYMBOL(scsi_report_opcode);
>   */
>  int scsi_device_get(struct scsi_device *sdev)
>  {
> -	if (sdev->sdev_state == SDEV_DEL)
> -		return -ENXIO;
> -	if (!get_device(&sdev->sdev_gendev))
> -		return -ENXIO;
> -	/* We can fail this if we're doing SCSI operations
> -	 * from module exit (like cache flush) */
> -	try_module_get(sdev->host->hostt->module);
> +	int ret;
>  
> -	return 0;
> +	ret = __scsi_device_get_common(sdev);
> +	if (ret)
> +		return ret;
> +
> +	ret = try_module_get(sdev->host->hostt->module);
> +
> +	if (ret)
> +		put_device(&sdev->sdev_gendev);
> +
> +	return ret;
>  }
>  EXPORT_SYMBOL(scsi_device_get);
>  
>  /**
> + * scsi_device_get_in_module_exit() - get an additional reference to a scsi_device
> + * @sdev:	device to get a reference to
> + *
> + * Functions identically to scsi_device_get() except that it unconditionally
> + * gets the module reference.  This allows it to be called from module exit
> + * routines where scsi_device_get() will fail.  This routine is still paired
> + * with scsi_device_put().
> + */
> +int scsi_device_get_in_module_exit(struct scsi_device *sdev)
> +{
> +	int ret;
> +
> +	ret = __scsi_device_get_common(sdev);
> +	if (ret)
> +		return ret;
> +
> +	__module_get(sdev->host->hostt->module);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(scsi_device_get_in_module_exit);
> +
> +/**
>   * scsi_device_put  -  release a reference to a scsi_device
>   * @sdev:	device to release a reference on.
>   *
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index ebf35cb6..057604e 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -564,16 +564,22 @@ static int sd_major(int major_idx)
>  	}
>  }
>  
> -static struct scsi_disk *__scsi_disk_get(struct gendisk *disk)
> +static struct scsi_disk *__scsi_disk_get(struct gendisk *disk, int in_exit)
>  {
>  	struct scsi_disk *sdkp = NULL;
>  
>  	if (disk->private_data) {
> +		int ret;
> +
>  		sdkp = scsi_disk(disk);
> -		if (scsi_device_get(sdkp->device) == 0)
> -			get_device(&sdkp->dev);
> +		if (in_exit)
> +			ret = scsi_device_get_in_module_exit(sdkp->device);
>  		else
> +			ret = scsi_device_get(sdkp->device);
> +		if (unlikely(ret))
>  			sdkp = NULL;
> +		else
> +			get_device(&sdkp->dev);
>  	}
>  	return sdkp;
>  }
> @@ -583,19 +589,19 @@ static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
>  	struct scsi_disk *sdkp;
>  
>  	mutex_lock(&sd_ref_mutex);
> -	sdkp = __scsi_disk_get(disk);
> +	sdkp = __scsi_disk_get(disk, 0);
>  	mutex_unlock(&sd_ref_mutex);
>  	return sdkp;
>  }
>  
> -static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev)
> +static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev, int in_exit)
>  {
>  	struct scsi_disk *sdkp;
>  
>  	mutex_lock(&sd_ref_mutex);
>  	sdkp = dev_get_drvdata(dev);
>  	if (sdkp)
> -		sdkp = __scsi_disk_get(sdkp->disk);
> +		sdkp = __scsi_disk_get(sdkp->disk, in_exit);
>  	mutex_unlock(&sd_ref_mutex);
>  	return sdkp;
>  }
> @@ -1525,7 +1531,7 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
>  
>  static void sd_rescan(struct device *dev)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
>  
>  	if (sdkp) {
>  		revalidate_disk(sdkp->disk);
> @@ -3147,7 +3153,7 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
>   */
>  static void sd_shutdown(struct device *dev)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 1);
>  
>  	if (!sdkp)
>  		return;         /* this can happen */
> @@ -3171,7 +3177,7 @@ exit:
>  
>  static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
>  	int ret = 0;
>  
>  	if (!sdkp)
> @@ -3213,7 +3219,7 @@ static int sd_suspend_runtime(struct device *dev)
>  
>  static int sd_resume(struct device *dev)
>  {
> -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
>  	int ret = 0;
>  
>  	if (!sdkp->device->manage_start_stop)
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 2e0281e..0bad37c 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -327,6 +327,7 @@ extern int scsi_unregister_device_handler(struct scsi_device_handler *scsi_dh);
>  void scsi_attach_vpd(struct scsi_device *sdev);
>  
>  extern int scsi_device_get(struct scsi_device *);
> +extern int scsi_device_get_in_module_exit(struct scsi_device *);
>  extern void scsi_device_put(struct scsi_device *);
>  extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *,
>  					      uint, uint, u64);

Hello James,

Is this the latest version of this patch that is available ? I have
tried to test the above patch. However, I couldn't test the impact of
this patch on the SRP initiator driver since my test system didn't boot
anymore with the above patch applied. That test system boots from an ATA
disk managed by the SCSI subsystem:
$ lsscsi | head -n1
[0:0:0:0]    disk    ATA      KINGSTON SH103S3 BBF0  /dev/sda

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-28  9:23                     ` Bart Van Assche
@ 2015-01-28 21:45                       ` James Bottomley
  2015-01-29 12:16                         ` Bart Van Assche
  0 siblings, 1 reply; 22+ messages in thread
From: James Bottomley @ 2015-01-28 21:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Rusty Russell, Masami Hiramatsu, linux-kernel,
	linux-scsi

On Wed, 2015-01-28 at 10:23 +0100, Bart Van Assche wrote:
> On 01/23/15 19:42, James Bottomley wrote:
> > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> > index 08c90a7..31ba254 100644
> > --- a/drivers/scsi/scsi.c
> > +++ b/drivers/scsi/scsi.c
> > @@ -965,6 +965,15 @@ int scsi_report_opcode(struct scsi_device *sdev, unsigned char *buffer,
> >  }
> >  EXPORT_SYMBOL(scsi_report_opcode);
> >  
> > +static int __scsi_device_get_common(struct scsi_device *sdev)
> > +{
> > +	if (sdev->sdev_state == SDEV_DEL)
> > +		return -ENXIO;
> > +	if (!get_device(&sdev->sdev_gendev))
> > +		return -ENXIO;
> > +	return 0;
> > +}
> > +
> >  /**
> >   * scsi_device_get  -  get an additional reference to a scsi_device
> >   * @sdev:	device to get a reference to
> > @@ -975,19 +984,45 @@ EXPORT_SYMBOL(scsi_report_opcode);
> >   */
> >  int scsi_device_get(struct scsi_device *sdev)
> >  {
> > -	if (sdev->sdev_state == SDEV_DEL)
> > -		return -ENXIO;
> > -	if (!get_device(&sdev->sdev_gendev))
> > -		return -ENXIO;
> > -	/* We can fail this if we're doing SCSI operations
> > -	 * from module exit (like cache flush) */
> > -	try_module_get(sdev->host->hostt->module);
> > +	int ret;
> >  
> > -	return 0;
> > +	ret = __scsi_device_get_common(sdev);
> > +	if (ret)
> > +		return ret;
> > +
> > +	ret = try_module_get(sdev->host->hostt->module);
> > +
> > +	if (ret)
> > +		put_device(&sdev->sdev_gendev);
> > +
> > +	return ret;
> >  }
> >  EXPORT_SYMBOL(scsi_device_get);
> >  
> >  /**
> > + * scsi_device_get_in_module_exit() - get an additional reference to a scsi_device
> > + * @sdev:	device to get a reference to
> > + *
> > + * Functions identically to scsi_device_get() except that it unconditionally
> > + * gets the module reference.  This allows it to be called from module exit
> > + * routines where scsi_device_get() will fail.  This routine is still paired
> > + * with scsi_device_put().
> > + */
> > +int scsi_device_get_in_module_exit(struct scsi_device *sdev)
> > +{
> > +	int ret;
> > +
> > +	ret = __scsi_device_get_common(sdev);
> > +	if (ret)
> > +		return ret;
> > +
> > +	__module_get(sdev->host->hostt->module);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(scsi_device_get_in_module_exit);
> > +
> > +/**
> >   * scsi_device_put  -  release a reference to a scsi_device
> >   * @sdev:	device to release a reference on.
> >   *
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index ebf35cb6..057604e 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -564,16 +564,22 @@ static int sd_major(int major_idx)
> >  	}
> >  }
> >  
> > -static struct scsi_disk *__scsi_disk_get(struct gendisk *disk)
> > +static struct scsi_disk *__scsi_disk_get(struct gendisk *disk, int in_exit)
> >  {
> >  	struct scsi_disk *sdkp = NULL;
> >  
> >  	if (disk->private_data) {
> > +		int ret;
> > +
> >  		sdkp = scsi_disk(disk);
> > -		if (scsi_device_get(sdkp->device) == 0)
> > -			get_device(&sdkp->dev);
> > +		if (in_exit)
> > +			ret = scsi_device_get_in_module_exit(sdkp->device);
> >  		else
> > +			ret = scsi_device_get(sdkp->device);
> > +		if (unlikely(ret))
> >  			sdkp = NULL;
> > +		else
> > +			get_device(&sdkp->dev);
> >  	}
> >  	return sdkp;
> >  }
> > @@ -583,19 +589,19 @@ static struct scsi_disk *scsi_disk_get(struct gendisk *disk)
> >  	struct scsi_disk *sdkp;
> >  
> >  	mutex_lock(&sd_ref_mutex);
> > -	sdkp = __scsi_disk_get(disk);
> > +	sdkp = __scsi_disk_get(disk, 0);
> >  	mutex_unlock(&sd_ref_mutex);
> >  	return sdkp;
> >  }
> >  
> > -static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev)
> > +static struct scsi_disk *scsi_disk_get_from_dev(struct device *dev, int in_exit)
> >  {
> >  	struct scsi_disk *sdkp;
> >  
> >  	mutex_lock(&sd_ref_mutex);
> >  	sdkp = dev_get_drvdata(dev);
> >  	if (sdkp)
> > -		sdkp = __scsi_disk_get(sdkp->disk);
> > +		sdkp = __scsi_disk_get(sdkp->disk, in_exit);
> >  	mutex_unlock(&sd_ref_mutex);
> >  	return sdkp;
> >  }
> > @@ -1525,7 +1531,7 @@ static int sd_sync_cache(struct scsi_disk *sdkp)
> >  
> >  static void sd_rescan(struct device *dev)
> >  {
> > -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> > +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
> >  
> >  	if (sdkp) {
> >  		revalidate_disk(sdkp->disk);
> > @@ -3147,7 +3153,7 @@ static int sd_start_stop_device(struct scsi_disk *sdkp, int start)
> >   */
> >  static void sd_shutdown(struct device *dev)
> >  {
> > -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> > +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 1);
> >  
> >  	if (!sdkp)
> >  		return;         /* this can happen */
> > @@ -3171,7 +3177,7 @@ exit:
> >  
> >  static int sd_suspend_common(struct device *dev, bool ignore_stop_errors)
> >  {
> > -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> > +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
> >  	int ret = 0;
> >  
> >  	if (!sdkp)
> > @@ -3213,7 +3219,7 @@ static int sd_suspend_runtime(struct device *dev)
> >  
> >  static int sd_resume(struct device *dev)
> >  {
> > -	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev);
> > +	struct scsi_disk *sdkp = scsi_disk_get_from_dev(dev, 0);
> >  	int ret = 0;
> >  
> >  	if (!sdkp->device->manage_start_stop)
> > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> > index 2e0281e..0bad37c 100644
> > --- a/include/scsi/scsi_device.h
> > +++ b/include/scsi/scsi_device.h
> > @@ -327,6 +327,7 @@ extern int scsi_unregister_device_handler(struct scsi_device_handler *scsi_dh);
> >  void scsi_attach_vpd(struct scsi_device *sdev);
> >  
> >  extern int scsi_device_get(struct scsi_device *);
> > +extern int scsi_device_get_in_module_exit(struct scsi_device *);
> >  extern void scsi_device_put(struct scsi_device *);
> >  extern struct scsi_device *scsi_device_lookup(struct Scsi_Host *,
> >  					      uint, uint, u64);
> 
> Hello James,
> 
> Is this the latest version of this patch that is available ? I have
> tried to test the above patch. However, I couldn't test the impact of
> this patch on the SRP initiator driver since my test system didn't boot
> anymore with the above patch applied. That test system boots from an ATA
> disk managed by the SCSI subsystem:
> $ lsscsi | head -n1
> [0:0:0:0]    disk    ATA      KINGSTON SH103S3 BBF0  /dev/sda

Not yet, since I knew it would need a bit of testing to identify all the
potential in_exit acquisitions.  However, you could help me by
diagnosing the current failure.

Thanks,

James



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: module: fix module_refcount() return when running in a module exit routine
  2015-01-28 21:45                       ` James Bottomley
@ 2015-01-29 12:16                         ` Bart Van Assche
  0 siblings, 0 replies; 22+ messages in thread
From: Bart Van Assche @ 2015-01-29 12:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Christoph Hellwig, Rusty Russell, Masami Hiramatsu, linux-kernel,
	linux-scsi

On 01/28/15 22:45, James Bottomley wrote:
> On Wed, 2015-01-28 at 10:23 +0100, Bart Van Assche wrote:
>> Is this the latest version of this patch that is available ? I have
>> tried to test the above patch. However, I couldn't test the impact of
>> this patch on the SRP initiator driver since my test system didn't boot
>> anymore with the above patch applied. That test system boots from an ATA
>> disk managed by the SCSI subsystem:
>> $ lsscsi | head -n1
>> [0:0:0:0]    disk    ATA      KINGSTON SH103S3 BBF0  /dev/sda
> 
> Not yet, since I knew it would need a bit of testing to identify all the
> potential in_exit acquisitions.  However, you could help me by
> diagnosing the current failure.

Hello James,

I have done the following:
- Read over your patch but didn't spot anything obvious.
- Added printk() statements in the modified functions in an attempt to
  determine where in the boot process the hang occurs. Apparently none
  of the functions touched by this patch got called before the hang
  occurred.
- Uploaded a screenshot of the boot messages
(https://drive.google.com/file/d/0B1YQOreL3_FxN1pPMnZrSDdLYTQ/view?usp=sharing).
The most remarkable message is "SATA link down". This message does not
appear when booting e.g. kernel version v3.16.
- Left the system alone for a few minutes after the boot process
  stopped making progress. The message "boot drive not found" appeared.

So this is probably an issue in another component than the SCSI
subsystem. What's not clear to me is why every time I tried to boot
without this patch that booting succeeded and every time I tried to boot
with this patch applied that booting did not succeed ... I have not yet
had the time to run a bisect.

Note: the tests I ran today were performed with kernel v3.19-rc6 with
patch https://lkml.org/lkml/2015/1/28/334 applied. However, that last
patch shouldn't have any impact on the boot process since
CONFIG_SCSI_MQ_DEFAULT was not set.

Bart.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2015-01-29 12:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-18 16:55 module: fix module_refcount() return when running in a module exit routine James Bottomley
2015-01-18 23:37 ` Rusty Russell
2015-01-18 23:37   ` Rusty Russell
2015-01-19  5:18 ` Masami Hiramatsu
2015-01-19  5:51   ` Rusty Russell
2015-01-19  8:28     ` Christoph Hellwig
2015-01-19 16:07       ` James Bottomley
2015-01-19 16:08     ` James Bottomley
2015-01-20  0:45       ` Rusty Russell
2015-01-20  2:17         ` Masami Hiramatsu
2015-01-20 17:23         ` James Bottomley
2015-01-21  5:30           ` Rusty Russell
2015-01-22 16:50           ` Christoph Hellwig
2015-01-22 17:02             ` James Bottomley
2015-01-23  2:54               ` Rusty Russell
2015-01-23 13:17                 ` Christoph Hellwig
2015-01-23 18:42                   ` James Bottomley
2015-01-23 23:35                     ` Rusty Russell
2015-01-26 17:16                     ` Christoph Hellwig
2015-01-28  9:23                     ` Bart Van Assche
2015-01-28 21:45                       ` James Bottomley
2015-01-29 12:16                         ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.