All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: fix race condition when removing target
@ 2017-11-29  3:05 Jason Yan
  2017-11-29  7:41 ` Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Jason Yan @ 2017-11-29  3:05 UTC (permalink / raw)
  To: martin.petersen, jejb
  Cc: linux-scsi, Jason Yan, Hannes Reinecke, Christoph Hellwig,
	Johannes Thumshirn, Zhaohongjiang, Miao Xie

In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we
removed scsi_device_get() and directly called get_device() to increase
the refcount of the device. But actullay scsi_device_get() will fail in
three cases:
1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
2. get_device() fail
3. the module is not alive

The intended purpose was to remove the check of the module alive.
Unfortunately the check of the device state was droped too. And this
introduced a race condition like this:

      CPU0                                           CPU1
__scsi_remove_target()
  ->iterate shost->__devices
  ->scsi_remove_device()
  ->put_device()
      someone still hold a refcount
                                                   sd_release()
                                                      ->scsi_disk_put()
                                                      ->put_device() last put and trigger the device release

  ->goto restart
  ->iterate shost->__devices and got the same device
  ->get_device() while refcount is 0
  ->scsi_remove_device()
  ->put_device() refcount decreased to 0 again
  ->scsi_device_dev_release()
  ->scsi_device_dev_release_usercontext()

                                                      ->scsi_device_dev_release()
                                                      ->scsi_device_dev_release_usercontext()

The same scsi device will be found agian because it is in the shost->__devices
list until scsi_device_dev_release_usercontext() called, although the device
state was set to SDEV_DEL after the first scsi_remove_device().

Finally we got a oops in scsi_device_dev_release_usercontext() when the second
time be called.

Call trace:
[<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0
[<ffff0000080f1f90>] execute_in_process_context+0x70/0x80
[<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38
[<ffff0000086662cc>] device_release+0x3c/0xa0
[<ffff000008c2e780>] kobject_put+0x80/0xf0
[<ffff0000086666fc>] put_device+0x24/0x30
[<ffff0000086aeee0>] scsi_device_put+0x30/0x40
[<ffff000008704894>] scsi_disk_put+0x44/0x60
[<ffff000008704a50>] sd_release+0x50/0x80
[<ffff0000082bc704>] __blkdev_put+0x21c/0x230
[<ffff0000082bcb2c>] blkdev_put+0x54/0x118
[<ffff0000082bcc1c>] blkdev_close+0x2c/0x40
[<ffff000008279b64>] __fput+0x94/0x1d8
[<ffff000008279d20>] ____fput+0x20/0x30
[<ffff0000080f6f54>] task_work_run+0x9c/0xb8
[<ffff0000080dba64>] do_exit+0x2b4/0x9f8
[<ffff0000080dc234>] do_group_exit+0x3c/0xa0
[<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40

And sometimes in __scsi_remove_target() it will loop for a long time
removing the same device if someone else holding a refcount until the
last refcount is released.

Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered
because the full refcount implement will prevent the refcount increase
when it is 0.

Fix this by checking the sdev_state again like we did before in
scsi_device_get(). Then when iterating shost again we will skip the device
deleted because scsi_remove_device() will set the device state to
SDEV_CANCEL or SDEV_DEL.

Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()")
Signed-off-by: Jason Yan <yanaijie@huawei.com>
CC: Hannes Reinecke <hare@suse.de>
CC: Christoph Hellwig <hch@lst.de>
CC: Johannes Thumshirn <jthumshirn@suse.de>
CC: Zhaohongjiang <zhaohongjiang@huawei.com>
CC: Miao Xie <miaoxie@huawei.com>
---
 drivers/scsi/scsi_sysfs.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 50e7d7e..d398894 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev)
 }
 EXPORT_SYMBOL(scsi_remove_device);
 
+static int scsi_device_get_not_deleted(struct scsi_device *sdev)
+{
+	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
+		return -ENXIO;
+	if (!get_device(&sdev->sdev_gendev))
+		return -ENXIO;
+	return 0;
+}
+
 static void __scsi_remove_target(struct scsi_target *starget)
 {
 	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
@@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
 		 */
 		if (sdev->channel != starget->channel ||
 		    sdev->id != starget->id ||
-		    !get_device(&sdev->sdev_gendev))
+		    scsi_device_get_not_deleted(sdev))
 			continue;
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		scsi_remove_device(sdev);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29  3:05 [PATCH] scsi: fix race condition when removing target Jason Yan
@ 2017-11-29  7:41 ` Hannes Reinecke
  2017-11-29 16:18 ` Bart Van Assche
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-11-29  7:41 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang,
	Miao Xie

On 11/29/2017 04:05 AM, Jason Yan wrote:
> In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we
> removed scsi_device_get() and directly called get_device() to increase
> the refcount of the device. But actullay scsi_device_get() will fail in
> three cases:
> 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> 2. get_device() fail
> 3. the module is not alive
> 
> The intended purpose was to remove the check of the module alive.
> Unfortunately the check of the device state was droped too. And this
> introduced a race condition like this:
> 
>       CPU0                                           CPU1
> __scsi_remove_target()
>   ->iterate shost->__devices
>   ->scsi_remove_device()
>   ->put_device()
>       someone still hold a refcount
>                                                    sd_release()
>                                                       ->scsi_disk_put()
>                                                       ->put_device() last put and trigger the device release
> 
>   ->goto restart
>   ->iterate shost->__devices and got the same device
>   ->get_device() while refcount is 0
>   ->scsi_remove_device()
>   ->put_device() refcount decreased to 0 again
>   ->scsi_device_dev_release()
>   ->scsi_device_dev_release_usercontext()
> 
>                                                       ->scsi_device_dev_release()
>                                                       ->scsi_device_dev_release_usercontext()
> 
> The same scsi device will be found agian because it is in the shost->__devices
> list until scsi_device_dev_release_usercontext() called, although the device
> state was set to SDEV_DEL after the first scsi_remove_device().
> 
> Finally we got a oops in scsi_device_dev_release_usercontext() when the second
> time be called.
> 
> Call trace:
> [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0
> [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80
> [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38
> [<ffff0000086662cc>] device_release+0x3c/0xa0
> [<ffff000008c2e780>] kobject_put+0x80/0xf0
> [<ffff0000086666fc>] put_device+0x24/0x30
> [<ffff0000086aeee0>] scsi_device_put+0x30/0x40
> [<ffff000008704894>] scsi_disk_put+0x44/0x60
> [<ffff000008704a50>] sd_release+0x50/0x80
> [<ffff0000082bc704>] __blkdev_put+0x21c/0x230
> [<ffff0000082bcb2c>] blkdev_put+0x54/0x118
> [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40
> [<ffff000008279b64>] __fput+0x94/0x1d8
> [<ffff000008279d20>] ____fput+0x20/0x30
> [<ffff0000080f6f54>] task_work_run+0x9c/0xb8
> [<ffff0000080dba64>] do_exit+0x2b4/0x9f8
> [<ffff0000080dc234>] do_group_exit+0x3c/0xa0
> [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40
> 
> And sometimes in __scsi_remove_target() it will loop for a long time
> removing the same device if someone else holding a refcount until the
> last refcount is released.
> 
> Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered
> because the full refcount implement will prevent the refcount increase
> when it is 0.
> 
> Fix this by checking the sdev_state again like we did before in
> scsi_device_get(). Then when iterating shost again we will skip the device
> deleted because scsi_remove_device() will set the device state to
> SDEV_CANCEL or SDEV_DEL.
> 
> Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()")
> Signed-off-by: Jason Yan <yanaijie@huawei.com>
> CC: Hannes Reinecke <hare@suse.de>
> CC: Christoph Hellwig <hch@lst.de>
> CC: Johannes Thumshirn <jthumshirn@suse.de>
> CC: Zhaohongjiang <zhaohongjiang@huawei.com>
> CC: Miao Xie <miaoxie@huawei.com>
> ---
>  drivers/scsi/scsi_sysfs.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 50e7d7e..d398894 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev)
>  }
>  EXPORT_SYMBOL(scsi_remove_device);
>  
> +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> +{
> +	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
> +		return -ENXIO;
> +	if (!get_device(&sdev->sdev_gendev))
> +		return -ENXIO;
> +	return 0;
> +}
> +
>  static void __scsi_remove_target(struct scsi_target *starget)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
>  		 */
>  		if (sdev->channel != starget->channel ||
>  		    sdev->id != starget->id ||
> -		    !get_device(&sdev->sdev_gendev))
> +		    scsi_device_get_not_deleted(sdev))
>  			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		scsi_remove_device(sdev);
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29  3:05 [PATCH] scsi: fix race condition when removing target Jason Yan
  2017-11-29  7:41 ` Hannes Reinecke
@ 2017-11-29 16:18 ` Bart Van Assche
  2017-11-29 16:20   ` hch
  2017-11-29 17:39   ` gregkh
  2017-11-29 16:31 ` James Bottomley
  2017-11-29 19:05 ` Ewan D. Milne
  3 siblings, 2 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-11-29 16:18 UTC (permalink / raw)
  To: gregkh
  Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote:
> In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we
> removed scsi_device_get() and directly called get_device() to increase
> the refcount of the device. But actullay scsi_device_get() will fail in
> three cases:
> 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> 2. get_device() fail
> 3. the module is not alive
> 
> The intended purpose was to remove the check of the module alive.
> Unfortunately the check of the device state was droped too. And this
> introduced a race condition like this:
> 
>       CPU0                                           CPU1
> __scsi_remove_target()
>   ->iterate shost->__devices
>   ->scsi_remove_device()
>   ->put_device()
>       someone still hold a refcount
>                                                    sd_release()
>                                                       ->scsi_disk_put()
>                                                       ->put_device() last put and trigger the device release
> 
>   ->goto restart
>   ->iterate shost->__devices and got the same device
>   ->get_device() while refcount is 0
>   ->scsi_remove_device()
>   ->put_device() refcount decreased to 0 again
>   ->scsi_device_dev_release()
>   ->scsi_device_dev_release_usercontext()
> 
>                                                       ->scsi_device_dev_release()
>                                                       ->scsi_device_dev_release_usercontext()
> 
> The same scsi device will be found agian because it is in the shost->__devices
> list until scsi_device_dev_release_usercontext() called, although the device
> state was set to SDEV_DEL after the first scsi_remove_device().
> 
> Finally we got a oops in scsi_device_dev_release_usercontext() when the second
> time be called.
> 
> Call trace:
> [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0
> [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80
> [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38
> [<ffff0000086662cc>] device_release+0x3c/0xa0
> [<ffff000008c2e780>] kobject_put+0x80/0xf0
> [<ffff0000086666fc>] put_device+0x24/0x30
> [<ffff0000086aeee0>] scsi_device_put+0x30/0x40
> [<ffff000008704894>] scsi_disk_put+0x44/0x60
> [<ffff000008704a50>] sd_release+0x50/0x80
> [<ffff0000082bc704>] __blkdev_put+0x21c/0x230
> [<ffff0000082bcb2c>] blkdev_put+0x54/0x118
> [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40
> [<ffff000008279b64>] __fput+0x94/0x1d8
> [<ffff000008279d20>] ____fput+0x20/0x30
> [<ffff0000080f6f54>] task_work_run+0x9c/0xb8
> [<ffff0000080dba64>] do_exit+0x2b4/0x9f8
> [<ffff0000080dc234>] do_group_exit+0x3c/0xa0
> [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40
> 
> And sometimes in __scsi_remove_target() it will loop for a long time
> removing the same device if someone else holding a refcount until the
> last refcount is released.
> 
> Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered
> because the full refcount implement will prevent the refcount increase
> when it is 0.
> 
> Fix this by checking the sdev_state again like we did before in
> scsi_device_get(). Then when iterating shost again we will skip the device
> deleted because scsi_remove_device() will set the device state to
> SDEV_CANCEL or SDEV_DEL.
> 
> Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()")
> Signed-off-by: Jason Yan <yanaijie@huawei.com>
> CC: Hannes Reinecke <hare@suse.de>
> CC: Christoph Hellwig <hch@lst.de>
> CC: Johannes Thumshirn <jthumshirn@suse.de>
> CC: Zhaohongjiang <zhaohongjiang@huawei.com>
> CC: Miao Xie <miaoxie@huawei.com>
> ---
>  drivers/scsi/scsi_sysfs.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 50e7d7e..d398894 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev)
>  }
>  EXPORT_SYMBOL(scsi_remove_device);
>  
> +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> +{
> +	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
> +		return -ENXIO;
> +	if (!get_device(&sdev->sdev_gendev))
> +		return -ENXIO;
> +	return 0;
> +}
> +
>  static void __scsi_remove_target(struct scsi_target *starget)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
>  		 */
>  		if (sdev->channel != starget->channel ||
>  		    sdev->id != starget->id ||
> -		    !get_device(&sdev->sdev_gendev))
> +		    scsi_device_get_not_deleted(sdev))
>  			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		scsi_remove_device(sdev);

Hi Greg,

As the above patch description shows it can happen that the SCSI core calls
get_device() after the device reference count has reached zero and before
the memory for struct device is freed. Although the above patch looks fine
to me, would you consider it acceptable to modify get_device() such that it
uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
because that change would help to reduce the complexity of the already too
complicated SCSI core.

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 16:18 ` Bart Van Assche
@ 2017-11-29 16:20   ` hch
  2017-11-29 17:39     ` Bart Van Assche
  2017-11-29 17:39     ` gregkh
  2017-11-29 17:39   ` gregkh
  1 sibling, 2 replies; 29+ messages in thread
From: hch @ 2017-11-29 16:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: gregkh, zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, yanaijie, jejb, miaoxie

On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
> As the above patch description shows it can happen that the SCSI core calls
> get_device() after the device reference count has reached zero and before
> the memory for struct device is freed. Although the above patch looks fine
> to me, would you consider it acceptable to modify get_device() such that it
> uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
> because that change would help to reduce the complexity of the already too
> complicated SCSI core.

I don't think we can just modify get_device, but we can add a new
get_device_unless_zero.  In fact I have an open coded variant of that
in nvme, and was planning to submit one for the current merge window..

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29  3:05 [PATCH] scsi: fix race condition when removing target Jason Yan
  2017-11-29  7:41 ` Hannes Reinecke
  2017-11-29 16:18 ` Bart Van Assche
@ 2017-11-29 16:31 ` James Bottomley
  2017-11-29 16:34   ` Christoph Hellwig
  2017-11-29 19:05 ` Ewan D. Milne
  3 siblings, 1 reply; 29+ messages in thread
From: James Bottomley @ 2017-11-29 16:31 UTC (permalink / raw)
  To: Jason Yan, martin.petersen
  Cc: linux-scsi, Hannes Reinecke, Christoph Hellwig,
	Johannes Thumshirn, Zhaohongjiang, Miao Xie

On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote:
> In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"),
> we
> removed scsi_device_get() and directly called get_device() to
> increase
> the refcount of the device. But actullay scsi_device_get() will fail
> in
> three cases:
> 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> 2. get_device() fail
> 3. the module is not alive
> 
> The intended purpose was to remove the check of the module alive.
> Unfortunately the check of the device state was droped too. And this
> introduced a race condition like this:
> 
>       CPU0                                           CPU1
> __scsi_remove_target()
>   ->iterate shost->__devices
>   ->scsi_remove_device()
>   ->put_device()
>       someone still hold a refcount
>                                                    sd_release()
>                                                       -
> >scsi_disk_put()
>                                                       ->put_device()
> last put and trigger the device release
> 
>   ->goto restart
>   ->iterate shost->__devices and got the same device
>   ->get_device() while refcount is 0

This analysis fails here: get_device() on something with refcount 0
returns NULL.  That triggers the if clause to ignore this device.

We may have a more complex way of triggering a dual put race as the
trace implies, but I don't think this is it.

[...]
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 50e7d7e..d398894 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device
> *sdev)
>  }
>  EXPORT_SYMBOL(scsi_remove_device);
>  
> +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> +{
> +	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state ==
> SDEV_CANCEL)
> +		return -ENXIO;
> +	if (!get_device(&sdev->sdev_gendev))
> +		return -ENXIO;
> +	return 0;
> +}

This is pretty much scsi_device_get() without the try_module get, so
they should probably be combined.

James

>  static void __scsi_remove_target(struct scsi_target *starget)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct
> scsi_target *starget)
>  		 */
>  		if (sdev->channel != starget->channel ||
>  		    sdev->id != starget->id ||
> -		    !get_device(&sdev->sdev_gendev))
> +		    scsi_device_get_not_deleted(sdev))
>  			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		scsi_remove_device(sdev);

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 16:31 ` James Bottomley
@ 2017-11-29 16:34   ` Christoph Hellwig
  2017-11-29 16:47     ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Christoph Hellwig @ 2017-11-29 16:34 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jason Yan, martin.petersen, linux-scsi, Hannes Reinecke,
	Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang, Miao Xie

On Wed, Nov 29, 2017 at 08:31:48AM -0800, James Bottomley wrote:
> This analysis fails here: get_device() on something with refcount 0
> returns NULL.  That triggers the if clause to ignore this device.

No, it doesn't.  Take a look at the get_device and kobject_get
implementations,

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 16:34   ` Christoph Hellwig
@ 2017-11-29 16:47     ` James Bottomley
  0 siblings, 0 replies; 29+ messages in thread
From: James Bottomley @ 2017-11-29 16:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jason Yan, martin.petersen, linux-scsi, Hannes Reinecke,
	Johannes Thumshirn, Zhaohongjiang, Miao Xie

On Wed, 2017-11-29 at 17:34 +0100, Christoph Hellwig wrote:
> On Wed, Nov 29, 2017 at 08:31:48AM -0800, James Bottomley wrote:
> > 
> > This analysis fails here: get_device() on something with refcount 0
> > returns NULL.  That triggers the if clause to ignore this device.
> 
> No, it doesn't.  Take a look at the get_device and kobject_get
> implementations,

Hm, so why doesn't get_device use kref_get_unless_zero()?

James

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 16:18 ` Bart Van Assche
  2017-11-29 16:20   ` hch
@ 2017-11-29 17:39   ` gregkh
  2017-11-29 17:47     ` Bart Van Assche
  1 sibling, 1 reply; 29+ messages in thread
From: gregkh @ 2017-11-29 17:39 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, yanaijie, jejb, miaoxie

On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
> On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote:
> > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we
> > removed scsi_device_get() and directly called get_device() to increase
> > the refcount of the device. But actullay scsi_device_get() will fail in
> > three cases:
> > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> > 2. get_device() fail
> > 3. the module is not alive
> > 
> > The intended purpose was to remove the check of the module alive.
> > Unfortunately the check of the device state was droped too. And this
> > introduced a race condition like this:
> > 
> >       CPU0                                           CPU1
> > __scsi_remove_target()
> >   ->iterate shost->__devices
> >   ->scsi_remove_device()
> >   ->put_device()
> >       someone still hold a refcount
> >                                                    sd_release()
> >                                                       ->scsi_disk_put()
> >                                                       ->put_device() last put and trigger the device release
> > 
> >   ->goto restart
> >   ->iterate shost->__devices and got the same device
> >   ->get_device() while refcount is 0
> >   ->scsi_remove_device()
> >   ->put_device() refcount decreased to 0 again
> >   ->scsi_device_dev_release()
> >   ->scsi_device_dev_release_usercontext()
> > 
> >                                                       ->scsi_device_dev_release()
> >                                                       ->scsi_device_dev_release_usercontext()
> > 
> > The same scsi device will be found agian because it is in the shost->__devices
> > list until scsi_device_dev_release_usercontext() called, although the device
> > state was set to SDEV_DEL after the first scsi_remove_device().
> > 
> > Finally we got a oops in scsi_device_dev_release_usercontext() when the second
> > time be called.
> > 
> > Call trace:
> > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0
> > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80
> > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38
> > [<ffff0000086662cc>] device_release+0x3c/0xa0
> > [<ffff000008c2e780>] kobject_put+0x80/0xf0
> > [<ffff0000086666fc>] put_device+0x24/0x30
> > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40
> > [<ffff000008704894>] scsi_disk_put+0x44/0x60
> > [<ffff000008704a50>] sd_release+0x50/0x80
> > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230
> > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118
> > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40
> > [<ffff000008279b64>] __fput+0x94/0x1d8
> > [<ffff000008279d20>] ____fput+0x20/0x30
> > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8
> > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8
> > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0
> > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40
> > 
> > And sometimes in __scsi_remove_target() it will loop for a long time
> > removing the same device if someone else holding a refcount until the
> > last refcount is released.
> > 
> > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered
> > because the full refcount implement will prevent the refcount increase
> > when it is 0.
> > 
> > Fix this by checking the sdev_state again like we did before in
> > scsi_device_get(). Then when iterating shost again we will skip the device
> > deleted because scsi_remove_device() will set the device state to
> > SDEV_CANCEL or SDEV_DEL.
> > 
> > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()")
> > Signed-off-by: Jason Yan <yanaijie@huawei.com>
> > CC: Hannes Reinecke <hare@suse.de>
> > CC: Christoph Hellwig <hch@lst.de>
> > CC: Johannes Thumshirn <jthumshirn@suse.de>
> > CC: Zhaohongjiang <zhaohongjiang@huawei.com>
> > CC: Miao Xie <miaoxie@huawei.com>
> > ---
> >  drivers/scsi/scsi_sysfs.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> > index 50e7d7e..d398894 100644
> > --- a/drivers/scsi/scsi_sysfs.c
> > +++ b/drivers/scsi/scsi_sysfs.c
> > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev)
> >  }
> >  EXPORT_SYMBOL(scsi_remove_device);
> >  
> > +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> > +{
> > +	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
> > +		return -ENXIO;
> > +	if (!get_device(&sdev->sdev_gendev))
> > +		return -ENXIO;
> > +	return 0;
> > +}
> > +
> >  static void __scsi_remove_target(struct scsi_target *starget)
> >  {
> >  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
> >  		 */
> >  		if (sdev->channel != starget->channel ||
> >  		    sdev->id != starget->id ||
> > -		    !get_device(&sdev->sdev_gendev))
> > +		    scsi_device_get_not_deleted(sdev))
> >  			continue;
> >  		spin_unlock_irqrestore(shost->host_lock, flags);
> >  		scsi_remove_device(sdev);
> 
> Hi Greg,
> 
> As the above patch description shows it can happen that the SCSI core calls
> get_device() after the device reference count has reached zero and before
> the memory for struct device is freed. Although the above patch looks fine
> to me, would you consider it acceptable to modify get_device() such that it
> uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
> because that change would help to reduce the complexity of the already too
> complicated SCSI core.

Shouldn't there be a bus lock somewhere preventing this race?  Having an
open-coded put call isn't good, as you see here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 16:20   ` hch
@ 2017-11-29 17:39     ` Bart Van Assche
  2017-11-30  1:18       ` Jason Yan
  2017-11-29 17:39     ` gregkh
  1 sibling, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2017-11-29 17:39 UTC (permalink / raw)
  To: hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 17:20 +0100, hch@lst.de wrote:
> On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
> > As the above patch description shows it can happen that the SCSI core calls
> > get_device() after the device reference count has reached zero and before
> > the memory for struct device is freed. Although the above patch looks fine
> > to me, would you consider it acceptable to modify get_device() such that it
> > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
> > because that change would help to reduce the complexity of the already too
> > complicated SCSI core.
> 
> I don't think we can just modify get_device, but we can add a new
> get_device_unless_zero.  In fact I have an open coded variant of that
> in nvme, and was planning to submit one for the current merge window..

Sorry but I don't see why we can't modify get_device()? Can you explain why
you think that something like the patch below is wrong?

Thanks,

Bart.


[PATCH] Make it safe to use get_device() if the reference count is zero

---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 110230d86527..049a5d9dba8a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
  */
 struct device *get_device(struct device *dev)
 {
-	return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
+	return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;
 }
 EXPORT_SYMBOL_GPL(get_device);

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 16:20   ` hch
  2017-11-29 17:39     ` Bart Van Assche
@ 2017-11-29 17:39     ` gregkh
  2017-11-29 18:49       ` Ewan D. Milne
  1 sibling, 1 reply; 29+ messages in thread
From: gregkh @ 2017-11-29 17:39 UTC (permalink / raw)
  To: hch
  Cc: Bart Van Assche, zhaohongjiang, jthumshirn, martin.petersen,
	hare, linux-scsi, yanaijie, jejb, miaoxie

On Wed, Nov 29, 2017 at 05:20:50PM +0100, hch@lst.de wrote:
> On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
> > As the above patch description shows it can happen that the SCSI core calls
> > get_device() after the device reference count has reached zero and before
> > the memory for struct device is freed. Although the above patch looks fine
> > to me, would you consider it acceptable to modify get_device() such that it
> > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
> > because that change would help to reduce the complexity of the already too
> > complicated SCSI core.
> 
> I don't think we can just modify get_device, but we can add a new
> get_device_unless_zero.  In fact I have an open coded variant of that
> in nvme, and was planning to submit one for the current merge window..

I feel like that is just delaying the real fix, shouldn't there be a bus
lock somewhere on the put_device path for this bus to prevent this?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 17:39   ` gregkh
@ 2017-11-29 17:47     ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-11-29 17:47 UTC (permalink / raw)
  To: gregkh
  Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 17:39 +0000, gregkh@linuxfoundation.org wrote:
> On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
> > As the above patch description shows it can happen that the SCSI core calls
> > get_device() after the device reference count has reached zero and before
> > the memory for struct device is freed. Although the above patch looks fine
> > to me, would you consider it acceptable to modify get_device() such that it
> > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
> > because that change would help to reduce the complexity of the already too
> > complicated SCSI core.
> 
> Shouldn't there be a bus lock somewhere preventing this race?  Having an
> open-coded put call isn't good, as you see here.

Hello Greg,

The get_device() call occurs with the SCSI host lock held. The SCSI host lock
serializes iteration over the sibling list by the get_device() caller and
removal of the SCSI host from the SCSI device sibling list by
scsi_device_dev_release_usercontext(). If you have a look at __scsi_remove_target()
then you will see that the host lock has to be released after a matching SCSI
target has been found and before scsi_remove_device() is called. The latter
function namely may sleep.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 17:39     ` gregkh
@ 2017-11-29 18:49       ` Ewan D. Milne
  2017-11-29 19:11         ` Bart Van Assche
  0 siblings, 1 reply; 29+ messages in thread
From: Ewan D. Milne @ 2017-11-29 18:49 UTC (permalink / raw)
  To: gregkh
  Cc: hch, Bart Van Assche, zhaohongjiang, jthumshirn, martin.petersen,
	hare, linux-scsi, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 17:39 +0000, gregkh@linuxfoundation.org wrote:
> On Wed, Nov 29, 2017 at 05:20:50PM +0100, hch@lst.de wrote:
> > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
> > > As the above patch description shows it can happen that the SCSI core calls
> > > get_device() after the device reference count has reached zero and before
> > > the memory for struct device is freed. Although the above patch looks fine
> > > to me, would you consider it acceptable to modify get_device() such that it
> > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
> > > because that change would help to reduce the complexity of the already too
> > > complicated SCSI core.
> > 
> > I don't think we can just modify get_device, but we can add a new
> > get_device_unless_zero.  In fact I have an open coded variant of that
> > in nvme, and was planning to submit one for the current merge window..
> 
> I feel like that is just delaying the real fix, shouldn't there be a bus
> lock somewhere on the put_device path for this bus to prevent this?
> 
> thanks,
> 
> greg k-h

Why is it that clients of the kobject code have to have their own
lock / state checking to prevent a duplicate destructor callback?
It seems to me like this is something the core functionality should
provide, because a get inside a destructor would *always* be wrong, no?

It looks like:

void refcount_inc(refcount_t *r)
{
        WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n");
}

would have warned if CONFIG_REFCOUNT_FULL was on, I/we don't normally
enable that though.

-Ewan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29  3:05 [PATCH] scsi: fix race condition when removing target Jason Yan
                   ` (2 preceding siblings ...)
  2017-11-29 16:31 ` James Bottomley
@ 2017-11-29 19:05 ` Ewan D. Milne
  3 siblings, 0 replies; 29+ messages in thread
From: Ewan D. Milne @ 2017-11-29 19:05 UTC (permalink / raw)
  To: Jason Yan
  Cc: Bart Van Assche, martin.petersen, jejb, linux-scsi,
	Hannes Reinecke, Christoph Hellwig, Johannes Thumshirn,
	Zhaohongjiang, Miao Xie

On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote:
> In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we
> removed scsi_device_get() and directly called get_device() to increase
> the refcount of the device. But actullay scsi_device_get() will fail in
> three cases:
> 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> 2. get_device() fail
> 3. the module is not alive
> 
> The intended purpose was to remove the check of the module alive.
> Unfortunately the check of the device state was droped too. And this
> introduced a race condition like this:
> 
>       CPU0                                           CPU1
> __scsi_remove_target()
>   ->iterate shost->__devices
>   ->scsi_remove_device()
>   ->put_device()
>       someone still hold a refcount
>                                                    sd_release()
>                                                       ->scsi_disk_put()
>                                                       ->put_device() last put and trigger the device release
> 
>   ->goto restart
>   ->iterate shost->__devices and got the same device
>   ->get_device() while refcount is 0
>   ->scsi_remove_device()
>   ->put_device() refcount decreased to 0 again
>   ->scsi_device_dev_release()
>   ->scsi_device_dev_release_usercontext()
> 
>                                                       ->scsi_device_dev_release()
>                                                       ->scsi_device_dev_release_usercontext()
> 
> The same scsi device will be found agian because it is in the shost->__devices
> list until scsi_device_dev_release_usercontext() called, although the device
> state was set to SDEV_DEL after the first scsi_remove_device().
> 
> Finally we got a oops in scsi_device_dev_release_usercontext() when the second
> time be called.
> 
> Call trace:
> [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0
> [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80
> [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38
> [<ffff0000086662cc>] device_release+0x3c/0xa0
> [<ffff000008c2e780>] kobject_put+0x80/0xf0
> [<ffff0000086666fc>] put_device+0x24/0x30
> [<ffff0000086aeee0>] scsi_device_put+0x30/0x40
> [<ffff000008704894>] scsi_disk_put+0x44/0x60
> [<ffff000008704a50>] sd_release+0x50/0x80
> [<ffff0000082bc704>] __blkdev_put+0x21c/0x230
> [<ffff0000082bcb2c>] blkdev_put+0x54/0x118
> [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40
> [<ffff000008279b64>] __fput+0x94/0x1d8
> [<ffff000008279d20>] ____fput+0x20/0x30
> [<ffff0000080f6f54>] task_work_run+0x9c/0xb8
> [<ffff0000080dba64>] do_exit+0x2b4/0x9f8
> [<ffff0000080dc234>] do_group_exit+0x3c/0xa0
> [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40
> 
> And sometimes in __scsi_remove_target() it will loop for a long time
> removing the same device if someone else holding a refcount until the
> last refcount is released.
> 
> Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered
> because the full refcount implement will prevent the refcount increase
> when it is 0.
> 
> Fix this by checking the sdev_state again like we did before in
> scsi_device_get(). Then when iterating shost again we will skip the device
> deleted because scsi_remove_device() will set the device state to
> SDEV_CANCEL or SDEV_DEL.
> 
> Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()")
> Signed-off-by: Jason Yan <yanaijie@huawei.com>
> CC: Hannes Reinecke <hare@suse.de>
> CC: Christoph Hellwig <hch@lst.de>
> CC: Johannes Thumshirn <jthumshirn@suse.de>
> CC: Zhaohongjiang <zhaohongjiang@huawei.com>
> CC: Miao Xie <miaoxie@huawei.com>
> ---
>  drivers/scsi/scsi_sysfs.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 50e7d7e..d398894 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev)
>  }
>  EXPORT_SYMBOL(scsi_remove_device);
>  
> +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> +{
> +	if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL)
> +		return -ENXIO;
> +	if (!get_device(&sdev->sdev_gendev))
> +		return -ENXIO;
> +	return 0;
> +}
> +
>  static void __scsi_remove_target(struct scsi_target *starget)
>  {
>  	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
>  		 */
>  		if (sdev->channel != starget->channel ||
>  		    sdev->id != starget->id ||
> -		    !get_device(&sdev->sdev_gendev))
> +		    scsi_device_get_not_deleted(sdev))
>  			continue;
>  		spin_unlock_irqrestore(shost->host_lock, flags);
>  		scsi_remove_device(sdev);

See subsequent discussion, however, we have a reproducible case here
and the patch does appear to fix the issue (500+ iterations).

Reviewed-by: Ewan D. Milne <emilne@redhat.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 18:49       ` Ewan D. Milne
@ 2017-11-29 19:11         ` Bart Van Assche
  2017-11-29 19:20           ` Ewan D. Milne
  0 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2017-11-29 19:11 UTC (permalink / raw)
  To: emilne, gregkh
  Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote:
> because a get inside a destructor would *always* be wrong, no?

Hello Ewan,

That's not what we are discussing. What can happen with the SCSI core is that
get_device() is called concurrently with the destructor. get_device() can be
called concurrently with the destructor because the destructore removes a
device from the siblings list and because the SCSI core can call get_device()
for devices it finds on the siblings list. Personally I think that design is
superior compared to removing a SCSI device from the sibling list before the
last put_device() call because the approach followed in the SCSI core leads to
a simpler implementation. However, it seems like the current get_device()
implementation does not yet support the SCSI core design ...

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 19:11         ` Bart Van Assche
@ 2017-11-29 19:20           ` Ewan D. Milne
  2017-11-29 19:50             ` Bart Van Assche
  0 siblings, 1 reply; 29+ messages in thread
From: Ewan D. Milne @ 2017-11-29 19:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: gregkh, zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 19:11 +0000, Bart Van Assche wrote:
> On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote:
> > because a get inside a destructor would *always* be wrong, no?
> 
> Hello Ewan,
> 
> That's not what we are discussing. What can happen with the SCSI core is that
> get_device() is called concurrently with the destructor. get_device() can be
> called concurrently with the destructor because the destructore removes a
> device from the siblings list and because the SCSI core can call get_device()
> for devices it finds on the siblings list. Personally I think that design is
> superior compared to removing a SCSI device from the sibling list before the
> last put_device() call because the approach followed in the SCSI core leads to
> a simpler implementation. However, it seems like the current get_device()
> implementation does not yet support the SCSI core design ...
> 
> Bart.

OK, well, I think the point still stands, though, once the refcount
goes to zero and the destructor is invoked, a get that then increments
the refcount seems fundamentally wrong to me.  Especially if a
subsequent put causes the destructor to be invoked *simultaneously*
*on another thread*.  The locking has to happen somewhere, why isn't
this done by the kobject?

Relying on the client code to get this right means that there are
opportunities all over the kernel for problems like this to happen,
just like here, where we inadvertently removed the state check that
prevented the get_device() call.

-Ewan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 19:20           ` Ewan D. Milne
@ 2017-11-29 19:50             ` Bart Van Assche
  0 siblings, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-11-29 19:50 UTC (permalink / raw)
  To: emilne
  Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare,
	linux-scsi, gregkh, yanaijie, jejb, miaoxie

On Wed, 2017-11-29 at 14:20 -0500, Ewan D. Milne wrote:
> OK, well, I think the point still stands, though, once the refcount
> goes to zero and the destructor is invoked, a get that then increments
> the refcount seems fundamentally wrong to me.

I agree that incrementing a reference count that has dropped to zero is wrong.
However, that's what happens currently. That behavior has been reported as a
bug. We need to fix this behavior, either through the patch at the start of
this thread or by using code that avoids to increment a zero reference count,
e.g. kobject_get_unless_zero().

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-29 17:39     ` Bart Van Assche
@ 2017-11-30  1:18       ` Jason Yan
  2017-11-30 16:08         ` Bart Van Assche
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Yan @ 2017-11-30  1:18 UTC (permalink / raw)
  To: Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, jejb, miaoxie



On 2017/11/30 1:39, Bart Van Assche wrote:
> On Wed, 2017-11-29 at 17:20 +0100, hch@lst.de wrote:
>> On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote:
>>> As the above patch description shows it can happen that the SCSI core calls
>>> get_device() after the device reference count has reached zero and before
>>> the memory for struct device is freed. Although the above patch looks fine
>>> to me, would you consider it acceptable to modify get_device() such that it
>>> uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this
>>> because that change would help to reduce the complexity of the already too
>>> complicated SCSI core.
>>
>> I don't think we can just modify get_device, but we can add a new
>> get_device_unless_zero.  In fact I have an open coded variant of that
>> in nvme, and was planning to submit one for the current merge window..
>
> Sorry but I don't see why we can't modify get_device()? Can you explain why
> you think that something like the patch below is wrong?
>
> Thanks,
>
> Bart.
>

Hi Bart, I chose the approach in my patch because it has been used in
scsi_device_get() for years and been proved safe. I think using
kobject_get_unless_zero() is safe here and can fix this issue too. And
this approach is beneficial to all users.

>
> [PATCH] Make it safe to use get_device() if the reference count is zero
>
> ---
>   drivers/base/core.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 110230d86527..049a5d9dba8a 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register);
>    */
>   struct device *get_device(struct device *dev)
>   {
> -	return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL;
> +	return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL;
>   }
>   EXPORT_SYMBOL_GPL(get_device);
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-30  1:18       ` Jason Yan
@ 2017-11-30 16:08         ` Bart Van Assche
  2017-11-30 16:40           ` gregkh
  2017-11-30 23:56           ` James Bottomley
  0 siblings, 2 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-11-30 16:08 UTC (permalink / raw)
  To: hch, yanaijie
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, jejb, miaoxie

On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote:
> Hi Bart, I chose the approach in my patch because it has been used in
> scsi_device_get() for years and been proved safe. I think using
> kobject_get_unless_zero() is safe here and can fix this issue too. And
> this approach is beneficial to all users.

Hello Jason,

A possible approach is that we start with your patch and defer any get_device()
changes until after your patch has been applied.

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-30 16:08         ` Bart Van Assche
@ 2017-11-30 16:40           ` gregkh
  2017-11-30 23:56           ` James Bottomley
  1 sibling, 0 replies; 29+ messages in thread
From: gregkh @ 2017-11-30 16:40 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hch, yanaijie, zhaohongjiang, jthumshirn, martin.petersen, hare,
	linux-scsi, jejb, miaoxie

On Thu, Nov 30, 2017 at 04:08:38PM +0000, Bart Van Assche wrote:
> On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote:
> > Hi Bart, I chose the approach in my patch because it has been used in
> > scsi_device_get() for years and been proved safe. I think using
> > kobject_get_unless_zero() is safe here and can fix this issue too. And
> > this approach is beneficial to all users.
> 
> Hello Jason,
> 
> A possible approach is that we start with your patch and defer any get_device()
> changes until after your patch has been applied.

That might be good, I don't have the chance to look at any driver core
changes until Monday as I'm on the road until then, sorry...

greg k-h

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-30 16:08         ` Bart Van Assche
  2017-11-30 16:40           ` gregkh
@ 2017-11-30 23:56           ` James Bottomley
  2017-12-01  1:12             ` Finn Thain
  2017-12-01  8:40             ` Jason Yan
  1 sibling, 2 replies; 29+ messages in thread
From: James Bottomley @ 2017-11-30 23:56 UTC (permalink / raw)
  To: Bart Van Assche, hch, yanaijie
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie

On Thu, 2017-11-30 at 16:08 +0000, Bart Van Assche wrote:
> On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote:
> > 
> > Hi Bart, I chose the approach in my patch because it has been used
> > in scsi_device_get() for years and been proved safe. I think using
> > kobject_get_unless_zero() is safe here and can fix this issue too.
> > And this approach is beneficial to all users.
> 
> Hello Jason,
> 
> A possible approach is that we start with your patch and defer any
> get_device() changes until after your patch has been applied.

It's possible, but not quite good enough: the same race can be produced
with any of our sdev lists that are deleted in the release callback,
because there could be a released device on any one of them.  The only
way to mediate it properly is to get a reference in the iterator using
kobject_get_unless_zero().

It's a bit like a huge can of worms, there's another problem every time
I look.  However, this is something like the mechanism that could work
(and if get_device() ever gets fixed, we can put it in place of
kobject_get_unless_zero()).

James

---

diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
index 6be77b3aa8a5..c3246f26c02c 100644
--- a/drivers/scsi/53c700.c
+++ b/drivers/scsi/53c700.c
@@ -1169,6 +1169,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
 
 			
 		}
+		put_device(&SDp->sdev_gendev);
 	} else if(dsps == A_RESELECTED_DURING_SELECTION) {
 
 		/* This section is full of debugging code because I've
diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
index c3fc34b9964d..7736f3fb2501 100644
--- a/drivers/scsi/esp_scsi.c
+++ b/drivers/scsi/esp_scsi.c
@@ -1198,6 +1198,7 @@ static int esp_reconnect(struct esp *esp)
 		goto do_reset;
 	}
 	lp = dev->hostdata;
+	put_device(&dev->sdev_gendev);
 
 	ent = lp->non_tagged_cmd;
 	if (!ent) {
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index a7e4fba724b7..c96c11716152 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -677,11 +677,10 @@ struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
 {
 	struct scsi_device *sdev;
 
-	list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
-		if (sdev->lun ==lun)
+	__sdev_for_each_get(sdev, &starget->devices, same_target_siblings) {
+		if (sdev->sdev_state != SDEV_DEL && sdev->lun ==lun)
 			return sdev;
+		put_device(&sdev->sdev_gendev);
 	}
 
 	return NULL;
@@ -700,15 +699,16 @@ EXPORT_SYMBOL(__scsi_device_lookup_by_target);
 struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *starget,
 						 u64 lun)
 {
-	struct scsi_device *sdev;
+  struct scsi_device *sdev, *sdev_copy;
 	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	sdev = __scsi_device_lookup_by_target(starget, lun);
+	sdev_copy = sdev = __scsi_device_lookup_by_target(starget, lun);
+	spin_unlock_irqrestore(shost->host_lock, flags);
 	if (sdev && scsi_device_get(sdev))
 		sdev = NULL;
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	put_device(&sdev_copy->sdev_gendev);
 
 	return sdev;
 }
@@ -735,12 +735,12 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
 {
 	struct scsi_device *sdev;
 
-	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
-			continue;
-		if (sdev->channel == channel && sdev->id == id &&
-				sdev->lun ==lun)
+	__sdev_for_each_get(sdev, &shost->__devices, siblings) {
+		if (sdev->sdev_state != SDEV_DEL &&
+		    sdev->channel == channel && sdev->id == id &&
+		    sdev->lun ==lun)
 			return sdev;
+		put_device(&sdev->sdev_gendev);
 	}
 
 	return NULL;
@@ -761,14 +761,15 @@ EXPORT_SYMBOL(__scsi_device_lookup);
 struct scsi_device *scsi_device_lookup(struct Scsi_Host *shost,
 		uint channel, uint id, u64 lun)
 {
-	struct scsi_device *sdev;
+  struct scsi_device *sdev, *sdev_copy;
 	unsigned long flags;
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	sdev = __scsi_device_lookup(shost, channel, id, lun);
+	sdev_copy = sdev = __scsi_device_lookup(shost, channel, id, lun);
+	spin_unlock_irqrestore(shost->host_lock, flags);
 	if (sdev && scsi_device_get(sdev))
 		sdev = NULL;
-	spin_unlock_irqrestore(shost->host_lock, flags);
+	put_device(&sdev_copy->sdev_gendev);
 
 	return sdev;
 }
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 40124648a07b..cddd5a93e962 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1870,11 +1870,14 @@ void scsi_forget_host(struct Scsi_Host *shost)
 
  restart:
 	spin_lock_irqsave(shost->host_lock, flags);
-	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		if (sdev->sdev_state == SDEV_DEL)
+	__sdev_for_each_get(sdev, &shost->__devices, siblings) {
+		if (sdev->sdev_state == SDEV_DEL) {
+			put_device(&sdev->sdev_gendev);
 			continue;
+		}
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		__scsi_remove_device(sdev);
+		put_device(&sdev->sdev_gendev);
 		goto restart;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index f796bd61f3f0..380404ec49cd 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1375,17 +1375,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
 
 	spin_lock_irqsave(shost->host_lock, flags);
  restart:
-	list_for_each_entry(sdev, &shost->__devices, siblings) {
-		/*
-		 * We cannot call scsi_device_get() here, as
-		 * we might've been called from rmmod() causing
-		 * scsi_device_get() to fail the module_is_live()
-		 * check.
-		 */
-		if (sdev->channel != starget->channel ||
-		    sdev->id != starget->id ||
-		    !get_device(&sdev->sdev_gendev))
-			continue;
+	__sdev_for_each_get(sdev, &starget->devices, same_target_siblings) {
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		scsi_remove_device(sdev);
 		put_device(&sdev->sdev_gendev);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 571ddb49b926..2e4d48d8cd68 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -380,6 +380,23 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
 #define __shost_for_each_device(sdev, shost) \
 	list_for_each_entry((sdev), &((shost)->__devices), siblings)
 
+/**
+ * __sdev_list_for_each_get - get a reference to each element
+ * @sdev: the scsi device to use in the body
+ * @head: the head of the list
+ * @list: the element (sdev->list) containing list members
+ *
+ * Iterator that only executes the body if it can obtain a reference
+ * to the element.  This closes a race where the device release can
+ * have been called, but the element is still on the lists.
+ *
+ * The lock protecting the list (the host lock) must be held before
+ * calling this iterator
+ */
+#define __sdev_for_each_get(sdev, head, list)				\
+	list_for_each_entry(sdev, head, list)				\
+		if (kobject_get_unless_zero(&sdev->sdev_gendev.kobj))
+
 extern int scsi_change_queue_depth(struct scsi_device *, int);
 extern int scsi_track_queue_full(struct scsi_device *, int);
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-30 23:56           ` James Bottomley
@ 2017-12-01  1:12             ` Finn Thain
  2017-12-01  8:40             ` Jason Yan
  1 sibling, 0 replies; 29+ messages in thread
From: Finn Thain @ 2017-12-01  1:12 UTC (permalink / raw)
  To: James Bottomley
  Cc: Bart Van Assche, hch, yanaijie, zhaohongjiang, jthumshirn,
	martin.petersen, hare, linux-scsi, gregkh, miaoxie

On Thu, 30 Nov 2017, James Bottomley wrote:

> +#define __sdev_for_each_get(sdev, head, list)				\
> +	list_for_each_entry(sdev, head, list)				\
> +		if (kobject_get_unless_zero(&sdev->sdev_gendev.kobj))
> +

I think that should have an 'else' clause, like this macro from 
include/drm/drmP.h:

#define for_each_if(condition) if (!(condition)) {} else

-- 

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-11-30 23:56           ` James Bottomley
  2017-12-01  1:12             ` Finn Thain
@ 2017-12-01  8:40             ` Jason Yan
  2017-12-01 14:41               ` Ewan D. Milne
  2017-12-01 15:35               ` James Bottomley
  1 sibling, 2 replies; 29+ messages in thread
From: Jason Yan @ 2017-12-01  8:40 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie


On 2017/12/1 7:56, James Bottomley wrote:
> On Thu, 2017-11-30 at 16:08 +0000, Bart Van Assche wrote:
>> On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote:
>>>
>>> Hi Bart, I chose the approach in my patch because it has been used
>>> in scsi_device_get() for years and been proved safe. I think using
>>> kobject_get_unless_zero() is safe here and can fix this issue too.
>>> And this approach is beneficial to all users.
>>
>> Hello Jason,
>>
>> A possible approach is that we start with your patch and defer any
>> get_device() changes until after your patch has been applied.
>
> It's possible, but not quite good enough: the same race can be produced
> with any of our sdev lists that are deleted in the release callback,
> because there could be a released device on any one of them.  The only
> way to mediate it properly is to get a reference in the iterator using
> kobject_get_unless_zero().
>
> It's a bit like a huge can of worms, there's another problem every time
> I look.  However, this is something like the mechanism that could work
> (and if get_device() ever gets fixed, we can put it in place of
> kobject_get_unless_zero()).
>
> James
>
> ---
>
> diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c
> index 6be77b3aa8a5..c3246f26c02c 100644
> --- a/drivers/scsi/53c700.c
> +++ b/drivers/scsi/53c700.c
> @@ -1169,6 +1169,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp,
>
>   			
>   		}
> +		put_device(&SDp->sdev_gendev);
>   	} else if(dsps == A_RESELECTED_DURING_SELECTION) {
>
>   		/* This section is full of debugging code because I've
> diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c
> index c3fc34b9964d..7736f3fb2501 100644
> --- a/drivers/scsi/esp_scsi.c
> +++ b/drivers/scsi/esp_scsi.c
> @@ -1198,6 +1198,7 @@ static int esp_reconnect(struct esp *esp)
>   		goto do_reset;
>   	}
>   	lp = dev->hostdata;
> +	put_device(&dev->sdev_gendev);
>
>   	ent = lp->non_tagged_cmd;
>   	if (!ent) {
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index a7e4fba724b7..c96c11716152 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -677,11 +677,10 @@ struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget,
>   {
>   	struct scsi_device *sdev;
>
> -	list_for_each_entry(sdev, &starget->devices, same_target_siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
> -		if (sdev->lun ==lun)
> +	__sdev_for_each_get(sdev, &starget->devices, same_target_siblings) {
> +		if (sdev->sdev_state != SDEV_DEL && sdev->lun ==lun)
>   			return sdev;
> +		put_device(&sdev->sdev_gendev);
>   	}
>
>   	return NULL;
> @@ -700,15 +699,16 @@ EXPORT_SYMBOL(__scsi_device_lookup_by_target);
>   struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *starget,
>   						 u64 lun)
>   {
> -	struct scsi_device *sdev;
> +  struct scsi_device *sdev, *sdev_copy;
>   	struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
>   	unsigned long flags;
>
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	sdev = __scsi_device_lookup_by_target(starget, lun);
> +	sdev_copy = sdev = __scsi_device_lookup_by_target(starget, lun);
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>   	if (sdev && scsi_device_get(sdev))
>   		sdev = NULL;
> -	spin_unlock_irqrestore(shost->host_lock, flags);
> +	put_device(&sdev_copy->sdev_gendev);
>
>   	return sdev;
>   }
> @@ -735,12 +735,12 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost,
>   {
>   	struct scsi_device *sdev;
>
> -	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> -			continue;
> -		if (sdev->channel == channel && sdev->id == id &&
> -				sdev->lun ==lun)
> +	__sdev_for_each_get(sdev, &shost->__devices, siblings) {
> +		if (sdev->sdev_state != SDEV_DEL &&
> +		    sdev->channel == channel && sdev->id == id &&
> +		    sdev->lun ==lun)
>   			return sdev;
> +		put_device(&sdev->sdev_gendev);
>   	}
>
>   	return NULL;
> @@ -761,14 +761,15 @@ EXPORT_SYMBOL(__scsi_device_lookup);
>   struct scsi_device *scsi_device_lookup(struct Scsi_Host *shost,
>   		uint channel, uint id, u64 lun)
>   {
> -	struct scsi_device *sdev;
> +  struct scsi_device *sdev, *sdev_copy;
>   	unsigned long flags;
>
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	sdev = __scsi_device_lookup(shost, channel, id, lun);
> +	sdev_copy = sdev = __scsi_device_lookup(shost, channel, id, lun);
> +	spin_unlock_irqrestore(shost->host_lock, flags);
>   	if (sdev && scsi_device_get(sdev))
>   		sdev = NULL;
> -	spin_unlock_irqrestore(shost->host_lock, flags);
> +	put_device(&sdev_copy->sdev_gendev);
>
>   	return sdev;
>   }
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index 40124648a07b..cddd5a93e962 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1870,11 +1870,14 @@ void scsi_forget_host(struct Scsi_Host *shost)
>
>    restart:
>   	spin_lock_irqsave(shost->host_lock, flags);
> -	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		if (sdev->sdev_state == SDEV_DEL)
> +	__sdev_for_each_get(sdev, &shost->__devices, siblings) {
> +		if (sdev->sdev_state == SDEV_DEL) {
> +			put_device(&sdev->sdev_gendev);
>   			continue;
> +		}
>   		spin_unlock_irqrestore(shost->host_lock, flags);
>   		__scsi_remove_device(sdev);
> +		put_device(&sdev->sdev_gendev);
>   		goto restart;
>   	}
>   	spin_unlock_irqrestore(shost->host_lock, flags);
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index f796bd61f3f0..380404ec49cd 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1375,17 +1375,7 @@ static void __scsi_remove_target(struct scsi_target *starget)
>
>   	spin_lock_irqsave(shost->host_lock, flags);
>    restart:
> -	list_for_each_entry(sdev, &shost->__devices, siblings) {
> -		/*
> -		 * We cannot call scsi_device_get() here, as
> -		 * we might've been called from rmmod() causing
> -		 * scsi_device_get() to fail the module_is_live()
> -		 * check.
> -		 */
> -		if (sdev->channel != starget->channel ||
> -		    sdev->id != starget->id ||
> -		    !get_device(&sdev->sdev_gendev))
> -			continue;
> +	__sdev_for_each_get(sdev, &starget->devices, same_target_siblings) {
>   		spin_unlock_irqrestore(shost->host_lock, flags);
>   		scsi_remove_device(sdev);
>   		put_device(&sdev->sdev_gendev);
> diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
> index 571ddb49b926..2e4d48d8cd68 100644
> --- a/include/scsi/scsi_device.h
> +++ b/include/scsi/scsi_device.h
> @@ -380,6 +380,23 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *,
>   #define __shost_for_each_device(sdev, shost) \
>   	list_for_each_entry((sdev), &((shost)->__devices), siblings)
>

Seems that __shost_for_each_device() is still not safe. scsi device been 
deleted stays in the list and put_device() can be called anywhere out of 
the host lock.

> +/**
> + * __sdev_list_for_each_get - get a reference to each element
> + * @sdev: the scsi device to use in the body
> + * @head: the head of the list
> + * @list: the element (sdev->list) containing list members
> + *
> + * Iterator that only executes the body if it can obtain a reference
> + * to the element.  This closes a race where the device release can
> + * have been called, but the element is still on the lists.
> + *
> + * The lock protecting the list (the host lock) must be held before
> + * calling this iterator
> + */
> +#define __sdev_for_each_get(sdev, head, list)				\
> +	list_for_each_entry(sdev, head, list)				\
> +		if (kobject_get_unless_zero(&sdev->sdev_gendev.kobj))
> +
>   extern int scsi_change_queue_depth(struct scsi_device *, int);
>   extern int scsi_track_queue_full(struct scsi_device *, int);
>
>
>
> .
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-01  8:40             ` Jason Yan
@ 2017-12-01 14:41               ` Ewan D. Milne
  2017-12-01 15:35               ` James Bottomley
  1 sibling, 0 replies; 29+ messages in thread
From: Ewan D. Milne @ 2017-12-01 14:41 UTC (permalink / raw)
  To: Jason Yan
  Cc: James Bottomley, Bart Van Assche, hch, zhaohongjiang, jthumshirn,
	martin.petersen, hare, linux-scsi, gregkh, miaoxie

We have another test case that demonstrates this issue involving
duplicate invocations of scsi_device_dev_release() on the same
device.  This other test case involves repeated log in / log out
of an iSCSI target.  (The first test case I mentioned in an earlier
mail was an oscillating FC port with a low dev_loss_tmo value.)

The iSCSI test was not fixed by Jason Yan's patch, however adding Bart's
change to use kobject_get_unless_zero() in get_device() as well seems to
have resolved it.  We are going to try with just Bart's change next.

-Ewan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-01  8:40             ` Jason Yan
  2017-12-01 14:41               ` Ewan D. Milne
@ 2017-12-01 15:35               ` James Bottomley
  2017-12-05 12:37                 ` Jason Yan
  1 sibling, 1 reply; 29+ messages in thread
From: James Bottomley @ 2017-12-01 15:35 UTC (permalink / raw)
  To: Jason Yan, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie

On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote:
> On 2017/12/1 7:56, James Bottomley wrote:
> > b/include/scsi/scsi_device.h
> > index 571ddb49b926..2e4d48d8cd68 100644
> > --- a/include/scsi/scsi_device.h
> > +++ b/include/scsi/scsi_device.h
> > @@ -380,6 +380,23 @@ extern struct scsi_device
> > *__scsi_iterate_devices(struct Scsi_Host *,
> >   #define __shost_for_each_device(sdev, shost) \
> >   	list_for_each_entry((sdev), &((shost)->__devices),
> > siblings)
> > 
> 
> Seems that __shost_for_each_device() is still not safe. scsi device
> been deleted stays in the list and put_device() can be called
> anywhere out of the host lock.

Not if it's used with scsi_get_device().  As I said, I only did a
cursory inspectiont, so if I've missed a loop, please specify.

The point was more a demonstration of how we could fix the problem if
we don't change get_device().

James

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-01 15:35               ` James Bottomley
@ 2017-12-05 12:37                 ` Jason Yan
  2017-12-05 15:37                   ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Yan @ 2017-12-05 12:37 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie



On 2017/12/1 23:35, James Bottomley wrote:
> On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote:
>> On 2017/12/1 7:56, James Bottomley wrote:
>>> b/include/scsi/scsi_device.h
>>> index 571ddb49b926..2e4d48d8cd68 100644
>>> --- a/include/scsi/scsi_device.h
>>> +++ b/include/scsi/scsi_device.h
>>> @@ -380,6 +380,23 @@ extern struct scsi_device
>>> *__scsi_iterate_devices(struct Scsi_Host *,
>>>    #define __shost_for_each_device(sdev, shost) \
>>>    	list_for_each_entry((sdev), &((shost)->__devices),
>>> siblings)
>>>
>>
>> Seems that __shost_for_each_device() is still not safe. scsi device
>> been deleted stays in the list and put_device() can be called
>> anywhere out of the host lock.
>
> Not if it's used with scsi_get_device().  As I said, I only did a
> cursory inspectiont, so if I've missed a loop, please specify.
>
> The point was more a demonstration of how we could fix the problem if
> we don't change get_device().
>
> James
>

Yes, it's OK now. __shost_for_each_device() is not used with
scsi_get_device() yet.

Another problem is that put_device() cannot be called while holding the
host lock, so we need to remove all put_device() out of the lock. Some
places like scsi_device_lookup() and scsi_device_lookup_by_target() need 
rework:

@@ -765,12 +772,22 @@ struct scsi_device *scsi_device_lookup(struct 
Scsi_Host *shost,
         unsigned long flags;

         spin_lock_irqsave(shost->host_lock, flags);
-       sdev = __scsi_device_lookup(shost, channel, id, lun);
-       if (sdev && scsi_device_get(sdev))
-               sdev = NULL;
+       __sdev_for_each_get(sdev, &shost->__devices, siblings) {
+               spin_unlock_irqrestore(shost->host_lock, flags);
+               if (sdev->sdev_state != SDEV_DEL &&
+                   sdev->channel == channel && sdev->id == id &&
+                   sdev->lun ==lun) {
+                       if (!scsi_device_get(sdev)) {
+                               put_device(&sdev->sdev_gendev);
+                               return sdev;
+                       }
+               }
+               put_device(&sdev->sdev_gendev);
+               spin_lock_irqsave(shost->host_lock, flags);
+       }
         spin_unlock_irqrestore(shost->host_lock, flags);

-       return sdev;
+       return NULL;
  }
  EXPORT_SYMBOL(scsi_device_lookup);

>
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-05 12:37                 ` Jason Yan
@ 2017-12-05 15:37                   ` James Bottomley
  2017-12-06  0:41                     ` Jason Yan
  0 siblings, 1 reply; 29+ messages in thread
From: James Bottomley @ 2017-12-05 15:37 UTC (permalink / raw)
  To: Jason Yan, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie

On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote:
> 
> On 2017/12/1 23:35, James Bottomley wrote:
> > 
> > On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote:
> > > 
> > > On 2017/12/1 7:56, James Bottomley wrote:
> > > > 
> > > > b/include/scsi/scsi_device.h
> > > > index 571ddb49b926..2e4d48d8cd68 100644
> > > > --- a/include/scsi/scsi_device.h
> > > > +++ b/include/scsi/scsi_device.h
> > > > @@ -380,6 +380,23 @@ extern struct scsi_device
> > > > *__scsi_iterate_devices(struct Scsi_Host *,
> > > >    #define __shost_for_each_device(sdev, shost) \
> > > >    	list_for_each_entry((sdev), &((shost)->__devices),
> > > > siblings)
> > > > 
> > > 
> > > Seems that __shost_for_each_device() is still not safe. scsi
> > > device
> > > been deleted stays in the list and put_device() can be called
> > > anywhere out of the host lock.
> > 
> > Not if it's used with scsi_get_device().  As I said, I only did a
> > cursory inspectiont, so if I've missed a loop, please specify.
> > 
> > The point was more a demonstration of how we could fix the problem
> > if we don't change get_device().
> > 
> > James
> > 
> 
> Yes, it's OK now. __shost_for_each_device() is not used with
> scsi_get_device() yet.
> 
> Another problem is that put_device() cannot be called while holding
> the host lock,

Yes it can.  That's one of the design goals of the execute in process
context: you can call it from interrupt context and you can call it
with locks held and we'll return immediately and delay all the
dangerous stuff until we have a process context.

To get the process context to be acquired, the in_interrupt() test must
pass (so the spin lock must be acquired irqsave) ; is that condition
missing anywhere?

James

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-05 15:37                   ` James Bottomley
@ 2017-12-06  0:41                     ` Jason Yan
  2017-12-06  2:07                       ` James Bottomley
  0 siblings, 1 reply; 29+ messages in thread
From: Jason Yan @ 2017-12-06  0:41 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie


On 2017/12/5 23:37, James Bottomley wrote:
> On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote:
>>
>> On 2017/12/1 23:35, James Bottomley wrote:
>>>
>>> On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote:
>>>>
>>>> On 2017/12/1 7:56, James Bottomley wrote:
>>>>>
>>>>> b/include/scsi/scsi_device.h
>>>>> index 571ddb49b926..2e4d48d8cd68 100644
>>>>> --- a/include/scsi/scsi_device.h
>>>>> +++ b/include/scsi/scsi_device.h
>>>>> @@ -380,6 +380,23 @@ extern struct scsi_device
>>>>> *__scsi_iterate_devices(struct Scsi_Host *,
>>>>>     #define __shost_for_each_device(sdev, shost) \
>>>>>     	list_for_each_entry((sdev), &((shost)->__devices),
>>>>> siblings)
>>>>>
>>>>
>>>> Seems that __shost_for_each_device() is still not safe. scsi
>>>> device
>>>> been deleted stays in the list and put_device() can be called
>>>> anywhere out of the host lock.
>>>
>>> Not if it's used with scsi_get_device().  As I said, I only did a
>>> cursory inspectiont, so if I've missed a loop, please specify.
>>>
>>> The point was more a demonstration of how we could fix the problem
>>> if we don't change get_device().
>>>
>>> James
>>>
>>
>> Yes, it's OK now. __shost_for_each_device() is not used with
>> scsi_get_device() yet.
>>
>> Another problem is that put_device() cannot be called while holding
>> the host lock,
>
> Yes it can.  That's one of the design goals of the execute in process
> context: you can call it from interrupt context and you can call it
> with locks held and we'll return immediately and delay all the
> dangerous stuff until we have a process context.
>
> To get the process context to be acquired, the in_interrupt() test must
> pass (so the spin lock must be acquired irqsave) ; is that condition
> missing anywhere?
>
> James
>
>

Call it from interrupt context is ok. I'm talking about calling it from
process context.

Think about this in a process context:
scsi_device_lookup()
    ->spin_lock_irqsave(shost->host_lock, flags);
    ->__scsi_device_lookup()
       ->iterate and kobject_get_unless_zero()
       ->put_device()
          ->scsi_device_dev_release() if the last put
          ->scsi_device_dev_release_usercontext()
             ->acquire the host lock = deadlock

Jason

> .
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-06  0:41                     ` Jason Yan
@ 2017-12-06  2:07                       ` James Bottomley
  2017-12-06  2:43                         ` Jason Yan
  0 siblings, 1 reply; 29+ messages in thread
From: James Bottomley @ 2017-12-06  2:07 UTC (permalink / raw)
  To: Jason Yan, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie

On Wed, 2017-12-06 at 08:41 +0800, Jason Yan wrote:
> On 2017/12/5 23:37, James Bottomley wrote:
> > 
> > On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote:
> > > 
> > > 
> > > On 2017/12/1 23:35, James Bottomley wrote:
> > > > 
> > > > 
> > > > On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote:
> > > > > 
> > > > > 
> > > > > On 2017/12/1 7:56, James Bottomley wrote:
> > > > > > 
> > > > > > 
> > > > > > b/include/scsi/scsi_device.h
> > > > > > index 571ddb49b926..2e4d48d8cd68 100644
> > > > > > --- a/include/scsi/scsi_device.h
> > > > > > +++ b/include/scsi/scsi_device.h
> > > > > > @@ -380,6 +380,23 @@ extern struct scsi_device
> > > > > > *__scsi_iterate_devices(struct Scsi_Host *,
> > > > > >     #define __shost_for_each_device(sdev, shost) \
> > > > > >     	list_for_each_entry((sdev), &((shost)-
> > > > > > >__devices),
> > > > > > siblings)
> > > > > > 
> > > > > 
> > > > > Seems that __shost_for_each_device() is still not safe. scsi
> > > > > device
> > > > > been deleted stays in the list and put_device() can be called
> > > > > anywhere out of the host lock.
> > > > 
> > > > Not if it's used with scsi_get_device().  As I said, I only did
> > > > a cursory inspectiont, so if I've missed a loop, please
> > > > specify.
> > > > 
> > > > The point was more a demonstration of how we could fix the
> > > > problem if we don't change get_device().
> > > > 
> > > > James
> > > > 
> > > 
> > > Yes, it's OK now. __shost_for_each_device() is not used with
> > > scsi_get_device() yet.
> > > 
> > > Another problem is that put_device() cannot be called while
> > > holding the host lock,
> > 
> > Yes it can.  That's one of the design goals of the execute in
> > process context: you can call it from interrupt context and you can
> > call it with locks held and we'll return immediately and delay all
> > the dangerous stuff until we have a process context.
> > 
> > To get the process context to be acquired, the in_interrupt() test
> > must pass (so the spin lock must be acquired irqsave) ; is that
> > condition missing anywhere?
> > 
> > James
> > 
> > 
> 
> Call it from interrupt context is ok. I'm talking about calling it
> from process context.
> 
> Think about this in a process context:
> scsi_device_lookup()
>     ->spin_lock_irqsave(shost->host_lock, flags);
>     ->__scsi_device_lookup()
>        ->iterate and kobject_get_unless_zero()
>        ->put_device()
>           ->scsi_device_dev_release() if the last put
>           ->scsi_device_dev_release_usercontext()
>              ->acquire the host lock = deadlock

execute_in_process_context() is supposed to produce us a context
whenever the local context isn't available, and that's supposed to
include when interrupts are disabled as in spin_lock_irqsave().

So let me ask this another way: have you seen this deadlock (which
would mean we have a bug in execute_process_context())?

James

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] scsi: fix race condition when removing target
  2017-12-06  2:07                       ` James Bottomley
@ 2017-12-06  2:43                         ` Jason Yan
  0 siblings, 0 replies; 29+ messages in thread
From: Jason Yan @ 2017-12-06  2:43 UTC (permalink / raw)
  To: James Bottomley, Bart Van Assche, hch
  Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi,
	gregkh, miaoxie


> execute_in_process_context() is supposed to produce us a context
> whenever the local context isn't available, and that's supposed to
> include when interrupts are disabled as in spin_lock_irqsave().
>
> So let me ask this another way: have you seen this deadlock (which
> would mean we have a bug in execute_process_context())?
>
> James
>
>

I havn't seen this dead lock but in_interrupt() do not check whether
the interrupts are disabled. Please refer to the definition of
in_interrupt().

#define irq_count()	(preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK\
				 | NMI_MASK))
#define in_interrupt()		(irq_count())


Jason

> .
>

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-12-06  2:46 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29  3:05 [PATCH] scsi: fix race condition when removing target Jason Yan
2017-11-29  7:41 ` Hannes Reinecke
2017-11-29 16:18 ` Bart Van Assche
2017-11-29 16:20   ` hch
2017-11-29 17:39     ` Bart Van Assche
2017-11-30  1:18       ` Jason Yan
2017-11-30 16:08         ` Bart Van Assche
2017-11-30 16:40           ` gregkh
2017-11-30 23:56           ` James Bottomley
2017-12-01  1:12             ` Finn Thain
2017-12-01  8:40             ` Jason Yan
2017-12-01 14:41               ` Ewan D. Milne
2017-12-01 15:35               ` James Bottomley
2017-12-05 12:37                 ` Jason Yan
2017-12-05 15:37                   ` James Bottomley
2017-12-06  0:41                     ` Jason Yan
2017-12-06  2:07                       ` James Bottomley
2017-12-06  2:43                         ` Jason Yan
2017-11-29 17:39     ` gregkh
2017-11-29 18:49       ` Ewan D. Milne
2017-11-29 19:11         ` Bart Van Assche
2017-11-29 19:20           ` Ewan D. Milne
2017-11-29 19:50             ` Bart Van Assche
2017-11-29 17:39   ` gregkh
2017-11-29 17:47     ` Bart Van Assche
2017-11-29 16:31 ` James Bottomley
2017-11-29 16:34   ` Christoph Hellwig
2017-11-29 16:47     ` James Bottomley
2017-11-29 19:05 ` Ewan D. Milne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.