* [PATCH] scsi: fix race condition when removing target @ 2017-11-29 3:05 Jason Yan 2017-11-29 7:41 ` Hannes Reinecke ` (3 more replies) 0 siblings, 4 replies; 29+ messages in thread From: Jason Yan @ 2017-11-29 3:05 UTC (permalink / raw) To: martin.petersen, jejb Cc: linux-scsi, Jason Yan, Hannes Reinecke, Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang, Miao Xie In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we removed scsi_device_get() and directly called get_device() to increase the refcount of the device. But actullay scsi_device_get() will fail in three cases: 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state 2. get_device() fail 3. the module is not alive The intended purpose was to remove the check of the module alive. Unfortunately the check of the device state was droped too. And this introduced a race condition like this: CPU0 CPU1 __scsi_remove_target() ->iterate shost->__devices ->scsi_remove_device() ->put_device() someone still hold a refcount sd_release() ->scsi_disk_put() ->put_device() last put and trigger the device release ->goto restart ->iterate shost->__devices and got the same device ->get_device() while refcount is 0 ->scsi_remove_device() ->put_device() refcount decreased to 0 again ->scsi_device_dev_release() ->scsi_device_dev_release_usercontext() ->scsi_device_dev_release() ->scsi_device_dev_release_usercontext() The same scsi device will be found agian because it is in the shost->__devices list until scsi_device_dev_release_usercontext() called, although the device state was set to SDEV_DEL after the first scsi_remove_device(). Finally we got a oops in scsi_device_dev_release_usercontext() when the second time be called. Call trace: [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 [<ffff0000086662cc>] device_release+0x3c/0xa0 [<ffff000008c2e780>] kobject_put+0x80/0xf0 [<ffff0000086666fc>] put_device+0x24/0x30 [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 [<ffff000008704894>] scsi_disk_put+0x44/0x60 [<ffff000008704a50>] sd_release+0x50/0x80 [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 [<ffff000008279b64>] __fput+0x94/0x1d8 [<ffff000008279d20>] ____fput+0x20/0x30 [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 And sometimes in __scsi_remove_target() it will loop for a long time removing the same device if someone else holding a refcount until the last refcount is released. Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered because the full refcount implement will prevent the refcount increase when it is 0. Fix this by checking the sdev_state again like we did before in scsi_device_get(). Then when iterating shost again we will skip the device deleted because scsi_remove_device() will set the device state to SDEV_CANCEL or SDEV_DEL. Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: Hannes Reinecke <hare@suse.de> CC: Christoph Hellwig <hch@lst.de> CC: Johannes Thumshirn <jthumshirn@suse.de> CC: Zhaohongjiang <zhaohongjiang@huawei.com> CC: Miao Xie <miaoxie@huawei.com> --- drivers/scsi/scsi_sysfs.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 50e7d7e..d398894 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) } EXPORT_SYMBOL(scsi_remove_device); +static int scsi_device_get_not_deleted(struct scsi_device *sdev) +{ + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) + return -ENXIO; + if (!get_device(&sdev->sdev_gendev)) + return -ENXIO; + return 0; +} + static void __scsi_remove_target(struct scsi_target *starget) { struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) */ if (sdev->channel != starget->channel || sdev->id != starget->id || - !get_device(&sdev->sdev_gendev)) + scsi_device_get_not_deleted(sdev)) continue; spin_unlock_irqrestore(shost->host_lock, flags); scsi_remove_device(sdev); -- 2.9.5 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 3:05 [PATCH] scsi: fix race condition when removing target Jason Yan @ 2017-11-29 7:41 ` Hannes Reinecke 2017-11-29 16:18 ` Bart Van Assche ` (2 subsequent siblings) 3 siblings, 0 replies; 29+ messages in thread From: Hannes Reinecke @ 2017-11-29 7:41 UTC (permalink / raw) To: Jason Yan, martin.petersen, jejb Cc: linux-scsi, Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang, Miao Xie On 11/29/2017 04:05 AM, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > removed scsi_device_get() and directly called get_device() to increase > the refcount of the device. But actullay scsi_device_get() will fail in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > ->scsi_disk_put() > ->put_device() last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 > ->scsi_remove_device() > ->put_device() refcount decreased to 0 again > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > The same scsi device will be found agian because it is in the shost->__devices > list until scsi_device_dev_release_usercontext() called, although the device > state was set to SDEV_DEL after the first scsi_remove_device(). > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > time be called. > > Call trace: > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > [<ffff0000086662cc>] device_release+0x3c/0xa0 > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > [<ffff0000086666fc>] put_device+0x24/0x30 > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > [<ffff000008704a50>] sd_release+0x50/0x80 > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > [<ffff000008279b64>] __fput+0x94/0x1d8 > [<ffff000008279d20>] ____fput+0x20/0x30 > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > And sometimes in __scsi_remove_target() it will loop for a long time > removing the same device if someone else holding a refcount until the > last refcount is released. > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > because the full refcount implement will prevent the refcount increase > when it is 0. > > Fix this by checking the sdev_state again like we did before in > scsi_device_get(). Then when iterating shost again we will skip the device > deleted because scsi_remove_device() will set the device state to > SDEV_CANCEL or SDEV_DEL. > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > Signed-off-by: Jason Yan <yanaijie@huawei.com> > CC: Hannes Reinecke <hare@suse.de> > CC: Christoph Hellwig <hch@lst.de> > CC: Johannes Thumshirn <jthumshirn@suse.de> > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > CC: Miao Xie <miaoxie@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} > + > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); > Reviewed-by: Hannes Reinecke <hare@suse.com> Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 3:05 [PATCH] scsi: fix race condition when removing target Jason Yan 2017-11-29 7:41 ` Hannes Reinecke @ 2017-11-29 16:18 ` Bart Van Assche 2017-11-29 16:20 ` hch 2017-11-29 17:39 ` gregkh 2017-11-29 16:31 ` James Bottomley 2017-11-29 19:05 ` Ewan D. Milne 3 siblings, 2 replies; 29+ messages in thread From: Bart Van Assche @ 2017-11-29 16:18 UTC (permalink / raw) To: gregkh Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > removed scsi_device_get() and directly called get_device() to increase > the refcount of the device. But actullay scsi_device_get() will fail in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > ->scsi_disk_put() > ->put_device() last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 > ->scsi_remove_device() > ->put_device() refcount decreased to 0 again > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > The same scsi device will be found agian because it is in the shost->__devices > list until scsi_device_dev_release_usercontext() called, although the device > state was set to SDEV_DEL after the first scsi_remove_device(). > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > time be called. > > Call trace: > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > [<ffff0000086662cc>] device_release+0x3c/0xa0 > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > [<ffff0000086666fc>] put_device+0x24/0x30 > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > [<ffff000008704a50>] sd_release+0x50/0x80 > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > [<ffff000008279b64>] __fput+0x94/0x1d8 > [<ffff000008279d20>] ____fput+0x20/0x30 > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > And sometimes in __scsi_remove_target() it will loop for a long time > removing the same device if someone else holding a refcount until the > last refcount is released. > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > because the full refcount implement will prevent the refcount increase > when it is 0. > > Fix this by checking the sdev_state again like we did before in > scsi_device_get(). Then when iterating shost again we will skip the device > deleted because scsi_remove_device() will set the device state to > SDEV_CANCEL or SDEV_DEL. > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > Signed-off-by: Jason Yan <yanaijie@huawei.com> > CC: Hannes Reinecke <hare@suse.de> > CC: Christoph Hellwig <hch@lst.de> > CC: Johannes Thumshirn <jthumshirn@suse.de> > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > CC: Miao Xie <miaoxie@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} > + > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); Hi Greg, As the above patch description shows it can happen that the SCSI core calls get_device() after the device reference count has reached zero and before the memory for struct device is freed. Although the above patch looks fine to me, would you consider it acceptable to modify get_device() such that it uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this because that change would help to reduce the complexity of the already too complicated SCSI core. Thanks, Bart. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 16:18 ` Bart Van Assche @ 2017-11-29 16:20 ` hch 2017-11-29 17:39 ` Bart Van Assche 2017-11-29 17:39 ` gregkh 2017-11-29 17:39 ` gregkh 1 sibling, 2 replies; 29+ messages in thread From: hch @ 2017-11-29 16:20 UTC (permalink / raw) To: Bart Van Assche Cc: gregkh, zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > As the above patch description shows it can happen that the SCSI core calls > get_device() after the device reference count has reached zero and before > the memory for struct device is freed. Although the above patch looks fine > to me, would you consider it acceptable to modify get_device() such that it > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > because that change would help to reduce the complexity of the already too > complicated SCSI core. I don't think we can just modify get_device, but we can add a new get_device_unless_zero. In fact I have an open coded variant of that in nvme, and was planning to submit one for the current merge window.. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 16:20 ` hch @ 2017-11-29 17:39 ` Bart Van Assche 2017-11-30 1:18 ` Jason Yan 2017-11-29 17:39 ` gregkh 1 sibling, 1 reply; 29+ messages in thread From: Bart Van Assche @ 2017-11-29 17:39 UTC (permalink / raw) To: hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 17:20 +0100, hch@lst.de wrote: > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > As the above patch description shows it can happen that the SCSI core calls > > get_device() after the device reference count has reached zero and before > > the memory for struct device is freed. Although the above patch looks fine > > to me, would you consider it acceptable to modify get_device() such that it > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > because that change would help to reduce the complexity of the already too > > complicated SCSI core. > > I don't think we can just modify get_device, but we can add a new > get_device_unless_zero. In fact I have an open coded variant of that > in nvme, and was planning to submit one for the current merge window.. Sorry but I don't see why we can't modify get_device()? Can you explain why you think that something like the patch below is wrong? Thanks, Bart. [PATCH] Make it safe to use get_device() if the reference count is zero --- drivers/base/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index 110230d86527..049a5d9dba8a 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register); */ struct device *get_device(struct device *dev) { - return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL; + return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL; } EXPORT_SYMBOL_GPL(get_device); ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 17:39 ` Bart Van Assche @ 2017-11-30 1:18 ` Jason Yan 2017-11-30 16:08 ` Bart Van Assche 0 siblings, 1 reply; 29+ messages in thread From: Jason Yan @ 2017-11-30 1:18 UTC (permalink / raw) To: Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, jejb, miaoxie On 2017/11/30 1:39, Bart Van Assche wrote: > On Wed, 2017-11-29 at 17:20 +0100, hch@lst.de wrote: >> On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: >>> As the above patch description shows it can happen that the SCSI core calls >>> get_device() after the device reference count has reached zero and before >>> the memory for struct device is freed. Although the above patch looks fine >>> to me, would you consider it acceptable to modify get_device() such that it >>> uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this >>> because that change would help to reduce the complexity of the already too >>> complicated SCSI core. >> >> I don't think we can just modify get_device, but we can add a new >> get_device_unless_zero. In fact I have an open coded variant of that >> in nvme, and was planning to submit one for the current merge window.. > > Sorry but I don't see why we can't modify get_device()? Can you explain why > you think that something like the patch below is wrong? > > Thanks, > > Bart. > Hi Bart, I chose the approach in my patch because it has been used in scsi_device_get() for years and been proved safe. I think using kobject_get_unless_zero() is safe here and can fix this issue too. And this approach is beneficial to all users. > > [PATCH] Make it safe to use get_device() if the reference count is zero > > --- > drivers/base/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 110230d86527..049a5d9dba8a 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -1916,7 +1916,7 @@ EXPORT_SYMBOL_GPL(device_register); > */ > struct device *get_device(struct device *dev) > { > - return dev ? kobj_to_dev(kobject_get(&dev->kobj)) : NULL; > + return dev && kobject_get_unless_zero(&dev->kobj) ? dev : NULL; > } > EXPORT_SYMBOL_GPL(get_device); > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-30 1:18 ` Jason Yan @ 2017-11-30 16:08 ` Bart Van Assche 2017-11-30 16:40 ` gregkh 2017-11-30 23:56 ` James Bottomley 0 siblings, 2 replies; 29+ messages in thread From: Bart Van Assche @ 2017-11-30 16:08 UTC (permalink / raw) To: hch, yanaijie Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, jejb, miaoxie On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote: > Hi Bart, I chose the approach in my patch because it has been used in > scsi_device_get() for years and been proved safe. I think using > kobject_get_unless_zero() is safe here and can fix this issue too. And > this approach is beneficial to all users. Hello Jason, A possible approach is that we start with your patch and defer any get_device() changes until after your patch has been applied. Bart. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-30 16:08 ` Bart Van Assche @ 2017-11-30 16:40 ` gregkh 2017-11-30 23:56 ` James Bottomley 1 sibling, 0 replies; 29+ messages in thread From: gregkh @ 2017-11-30 16:40 UTC (permalink / raw) To: Bart Van Assche Cc: hch, yanaijie, zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, jejb, miaoxie On Thu, Nov 30, 2017 at 04:08:38PM +0000, Bart Van Assche wrote: > On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote: > > Hi Bart, I chose the approach in my patch because it has been used in > > scsi_device_get() for years and been proved safe. I think using > > kobject_get_unless_zero() is safe here and can fix this issue too. And > > this approach is beneficial to all users. > > Hello Jason, > > A possible approach is that we start with your patch and defer any get_device() > changes until after your patch has been applied. That might be good, I don't have the chance to look at any driver core changes until Monday as I'm on the road until then, sorry... greg k-h ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-30 16:08 ` Bart Van Assche 2017-11-30 16:40 ` gregkh @ 2017-11-30 23:56 ` James Bottomley 2017-12-01 1:12 ` Finn Thain 2017-12-01 8:40 ` Jason Yan 1 sibling, 2 replies; 29+ messages in thread From: James Bottomley @ 2017-11-30 23:56 UTC (permalink / raw) To: Bart Van Assche, hch, yanaijie Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On Thu, 2017-11-30 at 16:08 +0000, Bart Van Assche wrote: > On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote: > > > > Hi Bart, I chose the approach in my patch because it has been used > > in scsi_device_get() for years and been proved safe. I think using > > kobject_get_unless_zero() is safe here and can fix this issue too. > > And this approach is beneficial to all users. > > Hello Jason, > > A possible approach is that we start with your patch and defer any > get_device() changes until after your patch has been applied. It's possible, but not quite good enough: the same race can be produced with any of our sdev lists that are deleted in the release callback, because there could be a released device on any one of them. The only way to mediate it properly is to get a reference in the iterator using kobject_get_unless_zero(). It's a bit like a huge can of worms, there's another problem every time I look. However, this is something like the mechanism that could work (and if get_device() ever gets fixed, we can put it in place of kobject_get_unless_zero()). James --- diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c index 6be77b3aa8a5..c3246f26c02c 100644 --- a/drivers/scsi/53c700.c +++ b/drivers/scsi/53c700.c @@ -1169,6 +1169,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, } + put_device(&SDp->sdev_gendev); } else if(dsps == A_RESELECTED_DURING_SELECTION) { /* This section is full of debugging code because I've diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c index c3fc34b9964d..7736f3fb2501 100644 --- a/drivers/scsi/esp_scsi.c +++ b/drivers/scsi/esp_scsi.c @@ -1198,6 +1198,7 @@ static int esp_reconnect(struct esp *esp) goto do_reset; } lp = dev->hostdata; + put_device(&dev->sdev_gendev); ent = lp->non_tagged_cmd; if (!ent) { diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c index a7e4fba724b7..c96c11716152 100644 --- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -677,11 +677,10 @@ struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget, { struct scsi_device *sdev; - list_for_each_entry(sdev, &starget->devices, same_target_siblings) { - if (sdev->sdev_state == SDEV_DEL) - continue; - if (sdev->lun ==lun) + __sdev_for_each_get(sdev, &starget->devices, same_target_siblings) { + if (sdev->sdev_state != SDEV_DEL && sdev->lun ==lun) return sdev; + put_device(&sdev->sdev_gendev); } return NULL; @@ -700,15 +699,16 @@ EXPORT_SYMBOL(__scsi_device_lookup_by_target); struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *starget, u64 lun) { - struct scsi_device *sdev; + struct scsi_device *sdev, *sdev_copy; struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); unsigned long flags; spin_lock_irqsave(shost->host_lock, flags); - sdev = __scsi_device_lookup_by_target(starget, lun); + sdev_copy = sdev = __scsi_device_lookup_by_target(starget, lun); + spin_unlock_irqrestore(shost->host_lock, flags); if (sdev && scsi_device_get(sdev)) sdev = NULL; - spin_unlock_irqrestore(shost->host_lock, flags); + put_device(&sdev_copy->sdev_gendev); return sdev; } @@ -735,12 +735,12 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost, { struct scsi_device *sdev; - list_for_each_entry(sdev, &shost->__devices, siblings) { - if (sdev->sdev_state == SDEV_DEL) - continue; - if (sdev->channel == channel && sdev->id == id && - sdev->lun ==lun) + __sdev_for_each_get(sdev, &shost->__devices, siblings) { + if (sdev->sdev_state != SDEV_DEL && + sdev->channel == channel && sdev->id == id && + sdev->lun ==lun) return sdev; + put_device(&sdev->sdev_gendev); } return NULL; @@ -761,14 +761,15 @@ EXPORT_SYMBOL(__scsi_device_lookup); struct scsi_device *scsi_device_lookup(struct Scsi_Host *shost, uint channel, uint id, u64 lun) { - struct scsi_device *sdev; + struct scsi_device *sdev, *sdev_copy; unsigned long flags; spin_lock_irqsave(shost->host_lock, flags); - sdev = __scsi_device_lookup(shost, channel, id, lun); + sdev_copy = sdev = __scsi_device_lookup(shost, channel, id, lun); + spin_unlock_irqrestore(shost->host_lock, flags); if (sdev && scsi_device_get(sdev)) sdev = NULL; - spin_unlock_irqrestore(shost->host_lock, flags); + put_device(&sdev_copy->sdev_gendev); return sdev; } diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c index 40124648a07b..cddd5a93e962 100644 --- a/drivers/scsi/scsi_scan.c +++ b/drivers/scsi/scsi_scan.c @@ -1870,11 +1870,14 @@ void scsi_forget_host(struct Scsi_Host *shost) restart: spin_lock_irqsave(shost->host_lock, flags); - list_for_each_entry(sdev, &shost->__devices, siblings) { - if (sdev->sdev_state == SDEV_DEL) + __sdev_for_each_get(sdev, &shost->__devices, siblings) { + if (sdev->sdev_state == SDEV_DEL) { + put_device(&sdev->sdev_gendev); continue; + } spin_unlock_irqrestore(shost->host_lock, flags); __scsi_remove_device(sdev); + put_device(&sdev->sdev_gendev); goto restart; } spin_unlock_irqrestore(shost->host_lock, flags); diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index f796bd61f3f0..380404ec49cd 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -1375,17 +1375,7 @@ static void __scsi_remove_target(struct scsi_target *starget) spin_lock_irqsave(shost->host_lock, flags); restart: - list_for_each_entry(sdev, &shost->__devices, siblings) { - /* - * We cannot call scsi_device_get() here, as - * we might've been called from rmmod() causing - * scsi_device_get() to fail the module_is_live() - * check. - */ - if (sdev->channel != starget->channel || - sdev->id != starget->id || - !get_device(&sdev->sdev_gendev)) - continue; + __sdev_for_each_get(sdev, &starget->devices, same_target_siblings) { spin_unlock_irqrestore(shost->host_lock, flags); scsi_remove_device(sdev); put_device(&sdev->sdev_gendev); diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h index 571ddb49b926..2e4d48d8cd68 100644 --- a/include/scsi/scsi_device.h +++ b/include/scsi/scsi_device.h @@ -380,6 +380,23 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, #define __shost_for_each_device(sdev, shost) \ list_for_each_entry((sdev), &((shost)->__devices), siblings) +/** + * __sdev_list_for_each_get - get a reference to each element + * @sdev: the scsi device to use in the body + * @head: the head of the list + * @list: the element (sdev->list) containing list members + * + * Iterator that only executes the body if it can obtain a reference + * to the element. This closes a race where the device release can + * have been called, but the element is still on the lists. + * + * The lock protecting the list (the host lock) must be held before + * calling this iterator + */ +#define __sdev_for_each_get(sdev, head, list) \ + list_for_each_entry(sdev, head, list) \ + if (kobject_get_unless_zero(&sdev->sdev_gendev.kobj)) + extern int scsi_change_queue_depth(struct scsi_device *, int); extern int scsi_track_queue_full(struct scsi_device *, int); ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-30 23:56 ` James Bottomley @ 2017-12-01 1:12 ` Finn Thain 2017-12-01 8:40 ` Jason Yan 1 sibling, 0 replies; 29+ messages in thread From: Finn Thain @ 2017-12-01 1:12 UTC (permalink / raw) To: James Bottomley Cc: Bart Van Assche, hch, yanaijie, zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On Thu, 30 Nov 2017, James Bottomley wrote: > +#define __sdev_for_each_get(sdev, head, list) \ > + list_for_each_entry(sdev, head, list) \ > + if (kobject_get_unless_zero(&sdev->sdev_gendev.kobj)) > + I think that should have an 'else' clause, like this macro from include/drm/drmP.h: #define for_each_if(condition) if (!(condition)) {} else -- ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-30 23:56 ` James Bottomley 2017-12-01 1:12 ` Finn Thain @ 2017-12-01 8:40 ` Jason Yan 2017-12-01 14:41 ` Ewan D. Milne 2017-12-01 15:35 ` James Bottomley 1 sibling, 2 replies; 29+ messages in thread From: Jason Yan @ 2017-12-01 8:40 UTC (permalink / raw) To: James Bottomley, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On 2017/12/1 7:56, James Bottomley wrote: > On Thu, 2017-11-30 at 16:08 +0000, Bart Van Assche wrote: >> On Thu, 2017-11-30 at 09:18 +0800, Jason Yan wrote: >>> >>> Hi Bart, I chose the approach in my patch because it has been used >>> in scsi_device_get() for years and been proved safe. I think using >>> kobject_get_unless_zero() is safe here and can fix this issue too. >>> And this approach is beneficial to all users. >> >> Hello Jason, >> >> A possible approach is that we start with your patch and defer any >> get_device() changes until after your patch has been applied. > > It's possible, but not quite good enough: the same race can be produced > with any of our sdev lists that are deleted in the release callback, > because there could be a released device on any one of them. The only > way to mediate it properly is to get a reference in the iterator using > kobject_get_unless_zero(). > > It's a bit like a huge can of worms, there's another problem every time > I look. However, this is something like the mechanism that could work > (and if get_device() ever gets fixed, we can put it in place of > kobject_get_unless_zero()). > > James > > --- > > diff --git a/drivers/scsi/53c700.c b/drivers/scsi/53c700.c > index 6be77b3aa8a5..c3246f26c02c 100644 > --- a/drivers/scsi/53c700.c > +++ b/drivers/scsi/53c700.c > @@ -1169,6 +1169,7 @@ process_script_interrupt(__u32 dsps, __u32 dsp, struct scsi_cmnd *SCp, > > > } > + put_device(&SDp->sdev_gendev); > } else if(dsps == A_RESELECTED_DURING_SELECTION) { > > /* This section is full of debugging code because I've > diff --git a/drivers/scsi/esp_scsi.c b/drivers/scsi/esp_scsi.c > index c3fc34b9964d..7736f3fb2501 100644 > --- a/drivers/scsi/esp_scsi.c > +++ b/drivers/scsi/esp_scsi.c > @@ -1198,6 +1198,7 @@ static int esp_reconnect(struct esp *esp) > goto do_reset; > } > lp = dev->hostdata; > + put_device(&dev->sdev_gendev); > > ent = lp->non_tagged_cmd; > if (!ent) { > diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c > index a7e4fba724b7..c96c11716152 100644 > --- a/drivers/scsi/scsi.c > +++ b/drivers/scsi/scsi.c > @@ -677,11 +677,10 @@ struct scsi_device *__scsi_device_lookup_by_target(struct scsi_target *starget, > { > struct scsi_device *sdev; > > - list_for_each_entry(sdev, &starget->devices, same_target_siblings) { > - if (sdev->sdev_state == SDEV_DEL) > - continue; > - if (sdev->lun ==lun) > + __sdev_for_each_get(sdev, &starget->devices, same_target_siblings) { > + if (sdev->sdev_state != SDEV_DEL && sdev->lun ==lun) > return sdev; > + put_device(&sdev->sdev_gendev); > } > > return NULL; > @@ -700,15 +699,16 @@ EXPORT_SYMBOL(__scsi_device_lookup_by_target); > struct scsi_device *scsi_device_lookup_by_target(struct scsi_target *starget, > u64 lun) > { > - struct scsi_device *sdev; > + struct scsi_device *sdev, *sdev_copy; > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > unsigned long flags; > > spin_lock_irqsave(shost->host_lock, flags); > - sdev = __scsi_device_lookup_by_target(starget, lun); > + sdev_copy = sdev = __scsi_device_lookup_by_target(starget, lun); > + spin_unlock_irqrestore(shost->host_lock, flags); > if (sdev && scsi_device_get(sdev)) > sdev = NULL; > - spin_unlock_irqrestore(shost->host_lock, flags); > + put_device(&sdev_copy->sdev_gendev); > > return sdev; > } > @@ -735,12 +735,12 @@ struct scsi_device *__scsi_device_lookup(struct Scsi_Host *shost, > { > struct scsi_device *sdev; > > - list_for_each_entry(sdev, &shost->__devices, siblings) { > - if (sdev->sdev_state == SDEV_DEL) > - continue; > - if (sdev->channel == channel && sdev->id == id && > - sdev->lun ==lun) > + __sdev_for_each_get(sdev, &shost->__devices, siblings) { > + if (sdev->sdev_state != SDEV_DEL && > + sdev->channel == channel && sdev->id == id && > + sdev->lun ==lun) > return sdev; > + put_device(&sdev->sdev_gendev); > } > > return NULL; > @@ -761,14 +761,15 @@ EXPORT_SYMBOL(__scsi_device_lookup); > struct scsi_device *scsi_device_lookup(struct Scsi_Host *shost, > uint channel, uint id, u64 lun) > { > - struct scsi_device *sdev; > + struct scsi_device *sdev, *sdev_copy; > unsigned long flags; > > spin_lock_irqsave(shost->host_lock, flags); > - sdev = __scsi_device_lookup(shost, channel, id, lun); > + sdev_copy = sdev = __scsi_device_lookup(shost, channel, id, lun); > + spin_unlock_irqrestore(shost->host_lock, flags); > if (sdev && scsi_device_get(sdev)) > sdev = NULL; > - spin_unlock_irqrestore(shost->host_lock, flags); > + put_device(&sdev_copy->sdev_gendev); > > return sdev; > } > diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c > index 40124648a07b..cddd5a93e962 100644 > --- a/drivers/scsi/scsi_scan.c > +++ b/drivers/scsi/scsi_scan.c > @@ -1870,11 +1870,14 @@ void scsi_forget_host(struct Scsi_Host *shost) > > restart: > spin_lock_irqsave(shost->host_lock, flags); > - list_for_each_entry(sdev, &shost->__devices, siblings) { > - if (sdev->sdev_state == SDEV_DEL) > + __sdev_for_each_get(sdev, &shost->__devices, siblings) { > + if (sdev->sdev_state == SDEV_DEL) { > + put_device(&sdev->sdev_gendev); > continue; > + } > spin_unlock_irqrestore(shost->host_lock, flags); > __scsi_remove_device(sdev); > + put_device(&sdev->sdev_gendev); > goto restart; > } > spin_unlock_irqrestore(shost->host_lock, flags); > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index f796bd61f3f0..380404ec49cd 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1375,17 +1375,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > > spin_lock_irqsave(shost->host_lock, flags); > restart: > - list_for_each_entry(sdev, &shost->__devices, siblings) { > - /* > - * We cannot call scsi_device_get() here, as > - * we might've been called from rmmod() causing > - * scsi_device_get() to fail the module_is_live() > - * check. > - */ > - if (sdev->channel != starget->channel || > - sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > - continue; > + __sdev_for_each_get(sdev, &starget->devices, same_target_siblings) { > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); > put_device(&sdev->sdev_gendev); > diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h > index 571ddb49b926..2e4d48d8cd68 100644 > --- a/include/scsi/scsi_device.h > +++ b/include/scsi/scsi_device.h > @@ -380,6 +380,23 @@ extern struct scsi_device *__scsi_iterate_devices(struct Scsi_Host *, > #define __shost_for_each_device(sdev, shost) \ > list_for_each_entry((sdev), &((shost)->__devices), siblings) > Seems that __shost_for_each_device() is still not safe. scsi device been deleted stays in the list and put_device() can be called anywhere out of the host lock. > +/** > + * __sdev_list_for_each_get - get a reference to each element > + * @sdev: the scsi device to use in the body > + * @head: the head of the list > + * @list: the element (sdev->list) containing list members > + * > + * Iterator that only executes the body if it can obtain a reference > + * to the element. This closes a race where the device release can > + * have been called, but the element is still on the lists. > + * > + * The lock protecting the list (the host lock) must be held before > + * calling this iterator > + */ > +#define __sdev_for_each_get(sdev, head, list) \ > + list_for_each_entry(sdev, head, list) \ > + if (kobject_get_unless_zero(&sdev->sdev_gendev.kobj)) > + > extern int scsi_change_queue_depth(struct scsi_device *, int); > extern int scsi_track_queue_full(struct scsi_device *, int); > > > > . > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-01 8:40 ` Jason Yan @ 2017-12-01 14:41 ` Ewan D. Milne 2017-12-01 15:35 ` James Bottomley 1 sibling, 0 replies; 29+ messages in thread From: Ewan D. Milne @ 2017-12-01 14:41 UTC (permalink / raw) To: Jason Yan Cc: James Bottomley, Bart Van Assche, hch, zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie We have another test case that demonstrates this issue involving duplicate invocations of scsi_device_dev_release() on the same device. This other test case involves repeated log in / log out of an iSCSI target. (The first test case I mentioned in an earlier mail was an oscillating FC port with a low dev_loss_tmo value.) The iSCSI test was not fixed by Jason Yan's patch, however adding Bart's change to use kobject_get_unless_zero() in get_device() as well seems to have resolved it. We are going to try with just Bart's change next. -Ewan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-01 8:40 ` Jason Yan 2017-12-01 14:41 ` Ewan D. Milne @ 2017-12-01 15:35 ` James Bottomley 2017-12-05 12:37 ` Jason Yan 1 sibling, 1 reply; 29+ messages in thread From: James Bottomley @ 2017-12-01 15:35 UTC (permalink / raw) To: Jason Yan, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote: > On 2017/12/1 7:56, James Bottomley wrote: > > b/include/scsi/scsi_device.h > > index 571ddb49b926..2e4d48d8cd68 100644 > > --- a/include/scsi/scsi_device.h > > +++ b/include/scsi/scsi_device.h > > @@ -380,6 +380,23 @@ extern struct scsi_device > > *__scsi_iterate_devices(struct Scsi_Host *, > > #define __shost_for_each_device(sdev, shost) \ > > list_for_each_entry((sdev), &((shost)->__devices), > > siblings) > > > > Seems that __shost_for_each_device() is still not safe. scsi device > been deleted stays in the list and put_device() can be called > anywhere out of the host lock. Not if it's used with scsi_get_device(). As I said, I only did a cursory inspectiont, so if I've missed a loop, please specify. The point was more a demonstration of how we could fix the problem if we don't change get_device(). James ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-01 15:35 ` James Bottomley @ 2017-12-05 12:37 ` Jason Yan 2017-12-05 15:37 ` James Bottomley 0 siblings, 1 reply; 29+ messages in thread From: Jason Yan @ 2017-12-05 12:37 UTC (permalink / raw) To: James Bottomley, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On 2017/12/1 23:35, James Bottomley wrote: > On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote: >> On 2017/12/1 7:56, James Bottomley wrote: >>> b/include/scsi/scsi_device.h >>> index 571ddb49b926..2e4d48d8cd68 100644 >>> --- a/include/scsi/scsi_device.h >>> +++ b/include/scsi/scsi_device.h >>> @@ -380,6 +380,23 @@ extern struct scsi_device >>> *__scsi_iterate_devices(struct Scsi_Host *, >>> #define __shost_for_each_device(sdev, shost) \ >>> list_for_each_entry((sdev), &((shost)->__devices), >>> siblings) >>> >> >> Seems that __shost_for_each_device() is still not safe. scsi device >> been deleted stays in the list and put_device() can be called >> anywhere out of the host lock. > > Not if it's used with scsi_get_device(). As I said, I only did a > cursory inspectiont, so if I've missed a loop, please specify. > > The point was more a demonstration of how we could fix the problem if > we don't change get_device(). > > James > Yes, it's OK now. __shost_for_each_device() is not used with scsi_get_device() yet. Another problem is that put_device() cannot be called while holding the host lock, so we need to remove all put_device() out of the lock. Some places like scsi_device_lookup() and scsi_device_lookup_by_target() need rework: @@ -765,12 +772,22 @@ struct scsi_device *scsi_device_lookup(struct Scsi_Host *shost, unsigned long flags; spin_lock_irqsave(shost->host_lock, flags); - sdev = __scsi_device_lookup(shost, channel, id, lun); - if (sdev && scsi_device_get(sdev)) - sdev = NULL; + __sdev_for_each_get(sdev, &shost->__devices, siblings) { + spin_unlock_irqrestore(shost->host_lock, flags); + if (sdev->sdev_state != SDEV_DEL && + sdev->channel == channel && sdev->id == id && + sdev->lun ==lun) { + if (!scsi_device_get(sdev)) { + put_device(&sdev->sdev_gendev); + return sdev; + } + } + put_device(&sdev->sdev_gendev); + spin_lock_irqsave(shost->host_lock, flags); + } spin_unlock_irqrestore(shost->host_lock, flags); - return sdev; + return NULL; } EXPORT_SYMBOL(scsi_device_lookup); > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-05 12:37 ` Jason Yan @ 2017-12-05 15:37 ` James Bottomley 2017-12-06 0:41 ` Jason Yan 0 siblings, 1 reply; 29+ messages in thread From: James Bottomley @ 2017-12-05 15:37 UTC (permalink / raw) To: Jason Yan, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote: > > On 2017/12/1 23:35, James Bottomley wrote: > > > > On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote: > > > > > > On 2017/12/1 7:56, James Bottomley wrote: > > > > > > > > b/include/scsi/scsi_device.h > > > > index 571ddb49b926..2e4d48d8cd68 100644 > > > > --- a/include/scsi/scsi_device.h > > > > +++ b/include/scsi/scsi_device.h > > > > @@ -380,6 +380,23 @@ extern struct scsi_device > > > > *__scsi_iterate_devices(struct Scsi_Host *, > > > > #define __shost_for_each_device(sdev, shost) \ > > > > list_for_each_entry((sdev), &((shost)->__devices), > > > > siblings) > > > > > > > > > > Seems that __shost_for_each_device() is still not safe. scsi > > > device > > > been deleted stays in the list and put_device() can be called > > > anywhere out of the host lock. > > > > Not if it's used with scsi_get_device(). As I said, I only did a > > cursory inspectiont, so if I've missed a loop, please specify. > > > > The point was more a demonstration of how we could fix the problem > > if we don't change get_device(). > > > > James > > > > Yes, it's OK now. __shost_for_each_device() is not used with > scsi_get_device() yet. > > Another problem is that put_device() cannot be called while holding > the host lock, Yes it can. That's one of the design goals of the execute in process context: you can call it from interrupt context and you can call it with locks held and we'll return immediately and delay all the dangerous stuff until we have a process context. To get the process context to be acquired, the in_interrupt() test must pass (so the spin lock must be acquired irqsave) ; is that condition missing anywhere? James ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-05 15:37 ` James Bottomley @ 2017-12-06 0:41 ` Jason Yan 2017-12-06 2:07 ` James Bottomley 0 siblings, 1 reply; 29+ messages in thread From: Jason Yan @ 2017-12-06 0:41 UTC (permalink / raw) To: James Bottomley, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On 2017/12/5 23:37, James Bottomley wrote: > On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote: >> >> On 2017/12/1 23:35, James Bottomley wrote: >>> >>> On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote: >>>> >>>> On 2017/12/1 7:56, James Bottomley wrote: >>>>> >>>>> b/include/scsi/scsi_device.h >>>>> index 571ddb49b926..2e4d48d8cd68 100644 >>>>> --- a/include/scsi/scsi_device.h >>>>> +++ b/include/scsi/scsi_device.h >>>>> @@ -380,6 +380,23 @@ extern struct scsi_device >>>>> *__scsi_iterate_devices(struct Scsi_Host *, >>>>> #define __shost_for_each_device(sdev, shost) \ >>>>> list_for_each_entry((sdev), &((shost)->__devices), >>>>> siblings) >>>>> >>>> >>>> Seems that __shost_for_each_device() is still not safe. scsi >>>> device >>>> been deleted stays in the list and put_device() can be called >>>> anywhere out of the host lock. >>> >>> Not if it's used with scsi_get_device(). As I said, I only did a >>> cursory inspectiont, so if I've missed a loop, please specify. >>> >>> The point was more a demonstration of how we could fix the problem >>> if we don't change get_device(). >>> >>> James >>> >> >> Yes, it's OK now. __shost_for_each_device() is not used with >> scsi_get_device() yet. >> >> Another problem is that put_device() cannot be called while holding >> the host lock, > > Yes it can. That's one of the design goals of the execute in process > context: you can call it from interrupt context and you can call it > with locks held and we'll return immediately and delay all the > dangerous stuff until we have a process context. > > To get the process context to be acquired, the in_interrupt() test must > pass (so the spin lock must be acquired irqsave) ; is that condition > missing anywhere? > > James > > Call it from interrupt context is ok. I'm talking about calling it from process context. Think about this in a process context: scsi_device_lookup() ->spin_lock_irqsave(shost->host_lock, flags); ->__scsi_device_lookup() ->iterate and kobject_get_unless_zero() ->put_device() ->scsi_device_dev_release() if the last put ->scsi_device_dev_release_usercontext() ->acquire the host lock = deadlock Jason > . > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-06 0:41 ` Jason Yan @ 2017-12-06 2:07 ` James Bottomley 2017-12-06 2:43 ` Jason Yan 0 siblings, 1 reply; 29+ messages in thread From: James Bottomley @ 2017-12-06 2:07 UTC (permalink / raw) To: Jason Yan, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie On Wed, 2017-12-06 at 08:41 +0800, Jason Yan wrote: > On 2017/12/5 23:37, James Bottomley wrote: > > > > On Tue, 2017-12-05 at 20:37 +0800, Jason Yan wrote: > > > > > > > > > On 2017/12/1 23:35, James Bottomley wrote: > > > > > > > > > > > > On Fri, 2017-12-01 at 16:40 +0800, Jason Yan wrote: > > > > > > > > > > > > > > > On 2017/12/1 7:56, James Bottomley wrote: > > > > > > > > > > > > > > > > > > b/include/scsi/scsi_device.h > > > > > > index 571ddb49b926..2e4d48d8cd68 100644 > > > > > > --- a/include/scsi/scsi_device.h > > > > > > +++ b/include/scsi/scsi_device.h > > > > > > @@ -380,6 +380,23 @@ extern struct scsi_device > > > > > > *__scsi_iterate_devices(struct Scsi_Host *, > > > > > > #define __shost_for_each_device(sdev, shost) \ > > > > > > list_for_each_entry((sdev), &((shost)- > > > > > > >__devices), > > > > > > siblings) > > > > > > > > > > > > > > > > Seems that __shost_for_each_device() is still not safe. scsi > > > > > device > > > > > been deleted stays in the list and put_device() can be called > > > > > anywhere out of the host lock. > > > > > > > > Not if it's used with scsi_get_device(). As I said, I only did > > > > a cursory inspectiont, so if I've missed a loop, please > > > > specify. > > > > > > > > The point was more a demonstration of how we could fix the > > > > problem if we don't change get_device(). > > > > > > > > James > > > > > > > > > > Yes, it's OK now. __shost_for_each_device() is not used with > > > scsi_get_device() yet. > > > > > > Another problem is that put_device() cannot be called while > > > holding the host lock, > > > > Yes it can. That's one of the design goals of the execute in > > process context: you can call it from interrupt context and you can > > call it with locks held and we'll return immediately and delay all > > the dangerous stuff until we have a process context. > > > > To get the process context to be acquired, the in_interrupt() test > > must pass (so the spin lock must be acquired irqsave) ; is that > > condition missing anywhere? > > > > James > > > > > > Call it from interrupt context is ok. I'm talking about calling it > from process context. > > Think about this in a process context: > scsi_device_lookup() > ->spin_lock_irqsave(shost->host_lock, flags); > ->__scsi_device_lookup() > ->iterate and kobject_get_unless_zero() > ->put_device() > ->scsi_device_dev_release() if the last put > ->scsi_device_dev_release_usercontext() > ->acquire the host lock = deadlock execute_in_process_context() is supposed to produce us a context whenever the local context isn't available, and that's supposed to include when interrupts are disabled as in spin_lock_irqsave(). So let me ask this another way: have you seen this deadlock (which would mean we have a bug in execute_process_context())? James ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-12-06 2:07 ` James Bottomley @ 2017-12-06 2:43 ` Jason Yan 0 siblings, 0 replies; 29+ messages in thread From: Jason Yan @ 2017-12-06 2:43 UTC (permalink / raw) To: James Bottomley, Bart Van Assche, hch Cc: zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, gregkh, miaoxie > execute_in_process_context() is supposed to produce us a context > whenever the local context isn't available, and that's supposed to > include when interrupts are disabled as in spin_lock_irqsave(). > > So let me ask this another way: have you seen this deadlock (which > would mean we have a bug in execute_process_context())? > > James > > I havn't seen this dead lock but in_interrupt() do not check whether the interrupts are disabled. Please refer to the definition of in_interrupt(). #define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK\ | NMI_MASK)) #define in_interrupt() (irq_count()) Jason > . > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 16:20 ` hch 2017-11-29 17:39 ` Bart Van Assche @ 2017-11-29 17:39 ` gregkh 2017-11-29 18:49 ` Ewan D. Milne 1 sibling, 1 reply; 29+ messages in thread From: gregkh @ 2017-11-29 17:39 UTC (permalink / raw) To: hch Cc: Bart Van Assche, zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, Nov 29, 2017 at 05:20:50PM +0100, hch@lst.de wrote: > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > As the above patch description shows it can happen that the SCSI core calls > > get_device() after the device reference count has reached zero and before > > the memory for struct device is freed. Although the above patch looks fine > > to me, would you consider it acceptable to modify get_device() such that it > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > because that change would help to reduce the complexity of the already too > > complicated SCSI core. > > I don't think we can just modify get_device, but we can add a new > get_device_unless_zero. In fact I have an open coded variant of that > in nvme, and was planning to submit one for the current merge window.. I feel like that is just delaying the real fix, shouldn't there be a bus lock somewhere on the put_device path for this bus to prevent this? thanks, greg k-h ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 17:39 ` gregkh @ 2017-11-29 18:49 ` Ewan D. Milne 2017-11-29 19:11 ` Bart Van Assche 0 siblings, 1 reply; 29+ messages in thread From: Ewan D. Milne @ 2017-11-29 18:49 UTC (permalink / raw) To: gregkh Cc: hch, Bart Van Assche, zhaohongjiang, jthumshirn, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 17:39 +0000, gregkh@linuxfoundation.org wrote: > On Wed, Nov 29, 2017 at 05:20:50PM +0100, hch@lst.de wrote: > > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > > As the above patch description shows it can happen that the SCSI core calls > > > get_device() after the device reference count has reached zero and before > > > the memory for struct device is freed. Although the above patch looks fine > > > to me, would you consider it acceptable to modify get_device() such that it > > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > > because that change would help to reduce the complexity of the already too > > > complicated SCSI core. > > > > I don't think we can just modify get_device, but we can add a new > > get_device_unless_zero. In fact I have an open coded variant of that > > in nvme, and was planning to submit one for the current merge window.. > > I feel like that is just delaying the real fix, shouldn't there be a bus > lock somewhere on the put_device path for this bus to prevent this? > > thanks, > > greg k-h Why is it that clients of the kobject code have to have their own lock / state checking to prevent a duplicate destructor callback? It seems to me like this is something the core functionality should provide, because a get inside a destructor would *always* be wrong, no? It looks like: void refcount_inc(refcount_t *r) { WARN_ONCE(!refcount_inc_not_zero(r), "refcount_t: increment on 0; use-after-free.\n"); } would have warned if CONFIG_REFCOUNT_FULL was on, I/we don't normally enable that though. -Ewan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 18:49 ` Ewan D. Milne @ 2017-11-29 19:11 ` Bart Van Assche 2017-11-29 19:20 ` Ewan D. Milne 0 siblings, 1 reply; 29+ messages in thread From: Bart Van Assche @ 2017-11-29 19:11 UTC (permalink / raw) To: emilne, gregkh Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote: > because a get inside a destructor would *always* be wrong, no? Hello Ewan, That's not what we are discussing. What can happen with the SCSI core is that get_device() is called concurrently with the destructor. get_device() can be called concurrently with the destructor because the destructore removes a device from the siblings list and because the SCSI core can call get_device() for devices it finds on the siblings list. Personally I think that design is superior compared to removing a SCSI device from the sibling list before the last put_device() call because the approach followed in the SCSI core leads to a simpler implementation. However, it seems like the current get_device() implementation does not yet support the SCSI core design ... Bart. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 19:11 ` Bart Van Assche @ 2017-11-29 19:20 ` Ewan D. Milne 2017-11-29 19:50 ` Bart Van Assche 0 siblings, 1 reply; 29+ messages in thread From: Ewan D. Milne @ 2017-11-29 19:20 UTC (permalink / raw) To: Bart Van Assche Cc: gregkh, zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 19:11 +0000, Bart Van Assche wrote: > On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote: > > because a get inside a destructor would *always* be wrong, no? > > Hello Ewan, > > That's not what we are discussing. What can happen with the SCSI core is that > get_device() is called concurrently with the destructor. get_device() can be > called concurrently with the destructor because the destructore removes a > device from the siblings list and because the SCSI core can call get_device() > for devices it finds on the siblings list. Personally I think that design is > superior compared to removing a SCSI device from the sibling list before the > last put_device() call because the approach followed in the SCSI core leads to > a simpler implementation. However, it seems like the current get_device() > implementation does not yet support the SCSI core design ... > > Bart. OK, well, I think the point still stands, though, once the refcount goes to zero and the destructor is invoked, a get that then increments the refcount seems fundamentally wrong to me. Especially if a subsequent put causes the destructor to be invoked *simultaneously* *on another thread*. The locking has to happen somewhere, why isn't this done by the kobject? Relying on the client code to get this right means that there are opportunities all over the kernel for problems like this to happen, just like here, where we inadvertently removed the state check that prevented the get_device() call. -Ewan ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 19:20 ` Ewan D. Milne @ 2017-11-29 19:50 ` Bart Van Assche 0 siblings, 0 replies; 29+ messages in thread From: Bart Van Assche @ 2017-11-29 19:50 UTC (permalink / raw) To: emilne Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, gregkh, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 14:20 -0500, Ewan D. Milne wrote: > OK, well, I think the point still stands, though, once the refcount > goes to zero and the destructor is invoked, a get that then increments > the refcount seems fundamentally wrong to me. I agree that incrementing a reference count that has dropped to zero is wrong. However, that's what happens currently. That behavior has been reported as a bug. We need to fix this behavior, either through the patch at the start of this thread or by using code that avoids to increment a zero reference count, e.g. kobject_get_unless_zero(). Bart. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 16:18 ` Bart Van Assche 2017-11-29 16:20 ` hch @ 2017-11-29 17:39 ` gregkh 2017-11-29 17:47 ` Bart Van Assche 1 sibling, 1 reply; 29+ messages in thread From: gregkh @ 2017-11-29 17:39 UTC (permalink / raw) To: Bart Van Assche Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > > removed scsi_device_get() and directly called get_device() to increase > > the refcount of the device. But actullay scsi_device_get() will fail in > > three cases: > > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > > 2. get_device() fail > > 3. the module is not alive > > > > The intended purpose was to remove the check of the module alive. > > Unfortunately the check of the device state was droped too. And this > > introduced a race condition like this: > > > > CPU0 CPU1 > > __scsi_remove_target() > > ->iterate shost->__devices > > ->scsi_remove_device() > > ->put_device() > > someone still hold a refcount > > sd_release() > > ->scsi_disk_put() > > ->put_device() last put and trigger the device release > > > > ->goto restart > > ->iterate shost->__devices and got the same device > > ->get_device() while refcount is 0 > > ->scsi_remove_device() > > ->put_device() refcount decreased to 0 again > > ->scsi_device_dev_release() > > ->scsi_device_dev_release_usercontext() > > > > ->scsi_device_dev_release() > > ->scsi_device_dev_release_usercontext() > > > > The same scsi device will be found agian because it is in the shost->__devices > > list until scsi_device_dev_release_usercontext() called, although the device > > state was set to SDEV_DEL after the first scsi_remove_device(). > > > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > > time be called. > > > > Call trace: > > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > > [<ffff0000086662cc>] device_release+0x3c/0xa0 > > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > > [<ffff0000086666fc>] put_device+0x24/0x30 > > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > > [<ffff000008704a50>] sd_release+0x50/0x80 > > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > > [<ffff000008279b64>] __fput+0x94/0x1d8 > > [<ffff000008279d20>] ____fput+0x20/0x30 > > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > > > And sometimes in __scsi_remove_target() it will loop for a long time > > removing the same device if someone else holding a refcount until the > > last refcount is released. > > > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > > because the full refcount implement will prevent the refcount increase > > when it is 0. > > > > Fix this by checking the sdev_state again like we did before in > > scsi_device_get(). Then when iterating shost again we will skip the device > > deleted because scsi_remove_device() will set the device state to > > SDEV_CANCEL or SDEV_DEL. > > > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > > Signed-off-by: Jason Yan <yanaijie@huawei.com> > > CC: Hannes Reinecke <hare@suse.de> > > CC: Christoph Hellwig <hch@lst.de> > > CC: Johannes Thumshirn <jthumshirn@suse.de> > > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > > CC: Miao Xie <miaoxie@huawei.com> > > --- > > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > > index 50e7d7e..d398894 100644 > > --- a/drivers/scsi/scsi_sysfs.c > > +++ b/drivers/scsi/scsi_sysfs.c > > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > > } > > EXPORT_SYMBOL(scsi_remove_device); > > > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > > +{ > > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > > + return -ENXIO; > > + if (!get_device(&sdev->sdev_gendev)) > > + return -ENXIO; > > + return 0; > > +} > > + > > static void __scsi_remove_target(struct scsi_target *starget) > > { > > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > > */ > > if (sdev->channel != starget->channel || > > sdev->id != starget->id || > > - !get_device(&sdev->sdev_gendev)) > > + scsi_device_get_not_deleted(sdev)) > > continue; > > spin_unlock_irqrestore(shost->host_lock, flags); > > scsi_remove_device(sdev); > > Hi Greg, > > As the above patch description shows it can happen that the SCSI core calls > get_device() after the device reference count has reached zero and before > the memory for struct device is freed. Although the above patch looks fine > to me, would you consider it acceptable to modify get_device() such that it > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > because that change would help to reduce the complexity of the already too > complicated SCSI core. Shouldn't there be a bus lock somewhere preventing this race? Having an open-coded put call isn't good, as you see here. thanks, greg k-h ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 17:39 ` gregkh @ 2017-11-29 17:47 ` Bart Van Assche 0 siblings, 0 replies; 29+ messages in thread From: Bart Van Assche @ 2017-11-29 17:47 UTC (permalink / raw) To: gregkh Cc: zhaohongjiang, jthumshirn, hch, martin.petersen, hare, linux-scsi, yanaijie, jejb, miaoxie On Wed, 2017-11-29 at 17:39 +0000, gregkh@linuxfoundation.org wrote: > On Wed, Nov 29, 2017 at 04:18:30PM +0000, Bart Van Assche wrote: > > As the above patch description shows it can happen that the SCSI core calls > > get_device() after the device reference count has reached zero and before > > the memory for struct device is freed. Although the above patch looks fine > > to me, would you consider it acceptable to modify get_device() such that it > > uses kobject_get_unless_zero() instead of kobject_get()? I'm asking this > > because that change would help to reduce the complexity of the already too > > complicated SCSI core. > > Shouldn't there be a bus lock somewhere preventing this race? Having an > open-coded put call isn't good, as you see here. Hello Greg, The get_device() call occurs with the SCSI host lock held. The SCSI host lock serializes iteration over the sibling list by the get_device() caller and removal of the SCSI host from the SCSI device sibling list by scsi_device_dev_release_usercontext(). If you have a look at __scsi_remove_target() then you will see that the host lock has to be released after a matching SCSI target has been found and before scsi_remove_device() is called. The latter function namely may sleep. Bart. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 3:05 [PATCH] scsi: fix race condition when removing target Jason Yan 2017-11-29 7:41 ` Hannes Reinecke 2017-11-29 16:18 ` Bart Van Assche @ 2017-11-29 16:31 ` James Bottomley 2017-11-29 16:34 ` Christoph Hellwig 2017-11-29 19:05 ` Ewan D. Milne 3 siblings, 1 reply; 29+ messages in thread From: James Bottomley @ 2017-11-29 16:31 UTC (permalink / raw) To: Jason Yan, martin.petersen Cc: linux-scsi, Hannes Reinecke, Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang, Miao Xie On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), > we > removed scsi_device_get() and directly called get_device() to > increase > the refcount of the device. But actullay scsi_device_get() will fail > in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > - > >scsi_disk_put() > ->put_device() > last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 This analysis fails here: get_device() on something with refcount 0 returns NULL. That triggers the if clause to ignore this device. We may have a more complex way of triggering a dual put race as the trace implies, but I don't think this is it. [...] > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device > *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == > SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} This is pretty much scsi_device_get() without the try_module get, so they should probably be combined. James > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct > scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 16:31 ` James Bottomley @ 2017-11-29 16:34 ` Christoph Hellwig 2017-11-29 16:47 ` James Bottomley 0 siblings, 1 reply; 29+ messages in thread From: Christoph Hellwig @ 2017-11-29 16:34 UTC (permalink / raw) To: James Bottomley Cc: Jason Yan, martin.petersen, linux-scsi, Hannes Reinecke, Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang, Miao Xie On Wed, Nov 29, 2017 at 08:31:48AM -0800, James Bottomley wrote: > This analysis fails here: get_device() on something with refcount 0 > returns NULL. That triggers the if clause to ignore this device. No, it doesn't. Take a look at the get_device and kobject_get implementations, ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 16:34 ` Christoph Hellwig @ 2017-11-29 16:47 ` James Bottomley 0 siblings, 0 replies; 29+ messages in thread From: James Bottomley @ 2017-11-29 16:47 UTC (permalink / raw) To: Christoph Hellwig Cc: Jason Yan, martin.petersen, linux-scsi, Hannes Reinecke, Johannes Thumshirn, Zhaohongjiang, Miao Xie On Wed, 2017-11-29 at 17:34 +0100, Christoph Hellwig wrote: > On Wed, Nov 29, 2017 at 08:31:48AM -0800, James Bottomley wrote: > > > > This analysis fails here: get_device() on something with refcount 0 > > returns NULL. That triggers the if clause to ignore this device. > > No, it doesn't. Take a look at the get_device and kobject_get > implementations, Hm, so why doesn't get_device use kref_get_unless_zero()? James ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH] scsi: fix race condition when removing target 2017-11-29 3:05 [PATCH] scsi: fix race condition when removing target Jason Yan ` (2 preceding siblings ...) 2017-11-29 16:31 ` James Bottomley @ 2017-11-29 19:05 ` Ewan D. Milne 3 siblings, 0 replies; 29+ messages in thread From: Ewan D. Milne @ 2017-11-29 19:05 UTC (permalink / raw) To: Jason Yan Cc: Bart Van Assche, martin.petersen, jejb, linux-scsi, Hannes Reinecke, Christoph Hellwig, Johannes Thumshirn, Zhaohongjiang, Miao Xie On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote: > In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"), we > removed scsi_device_get() and directly called get_device() to increase > the refcount of the device. But actullay scsi_device_get() will fail in > three cases: > 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state > 2. get_device() fail > 3. the module is not alive > > The intended purpose was to remove the check of the module alive. > Unfortunately the check of the device state was droped too. And this > introduced a race condition like this: > > CPU0 CPU1 > __scsi_remove_target() > ->iterate shost->__devices > ->scsi_remove_device() > ->put_device() > someone still hold a refcount > sd_release() > ->scsi_disk_put() > ->put_device() last put and trigger the device release > > ->goto restart > ->iterate shost->__devices and got the same device > ->get_device() while refcount is 0 > ->scsi_remove_device() > ->put_device() refcount decreased to 0 again > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > ->scsi_device_dev_release() > ->scsi_device_dev_release_usercontext() > > The same scsi device will be found agian because it is in the shost->__devices > list until scsi_device_dev_release_usercontext() called, although the device > state was set to SDEV_DEL after the first scsi_remove_device(). > > Finally we got a oops in scsi_device_dev_release_usercontext() when the second > time be called. > > Call trace: > [<ffff0000086bc624>] scsi_device_dev_release_usercontext+0x7c/0x1c0 > [<ffff0000080f1f90>] execute_in_process_context+0x70/0x80 > [<ffff0000086bc598>] scsi_device_dev_release+0x28/0x38 > [<ffff0000086662cc>] device_release+0x3c/0xa0 > [<ffff000008c2e780>] kobject_put+0x80/0xf0 > [<ffff0000086666fc>] put_device+0x24/0x30 > [<ffff0000086aeee0>] scsi_device_put+0x30/0x40 > [<ffff000008704894>] scsi_disk_put+0x44/0x60 > [<ffff000008704a50>] sd_release+0x50/0x80 > [<ffff0000082bc704>] __blkdev_put+0x21c/0x230 > [<ffff0000082bcb2c>] blkdev_put+0x54/0x118 > [<ffff0000082bcc1c>] blkdev_close+0x2c/0x40 > [<ffff000008279b64>] __fput+0x94/0x1d8 > [<ffff000008279d20>] ____fput+0x20/0x30 > [<ffff0000080f6f54>] task_work_run+0x9c/0xb8 > [<ffff0000080dba64>] do_exit+0x2b4/0x9f8 > [<ffff0000080dc234>] do_group_exit+0x3c/0xa0 > [<ffff0000080dc2b8>] __wake_up_parent+0x0/0x40 > > And sometimes in __scsi_remove_target() it will loop for a long time > removing the same device if someone else holding a refcount until the > last refcount is released. > > Notice that if CONFIG_REFCOUNT_FULL is open this race won't be triggered > because the full refcount implement will prevent the refcount increase > when it is 0. > > Fix this by checking the sdev_state again like we did before in > scsi_device_get(). Then when iterating shost again we will skip the device > deleted because scsi_remove_device() will set the device state to > SDEV_CANCEL or SDEV_DEL. > > Fixes: fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()") > Signed-off-by: Jason Yan <yanaijie@huawei.com> > CC: Hannes Reinecke <hare@suse.de> > CC: Christoph Hellwig <hch@lst.de> > CC: Johannes Thumshirn <jthumshirn@suse.de> > CC: Zhaohongjiang <zhaohongjiang@huawei.com> > CC: Miao Xie <miaoxie@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 50e7d7e..d398894 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device *sdev) > } > EXPORT_SYMBOL(scsi_remove_device); > > +static int scsi_device_get_not_deleted(struct scsi_device *sdev) > +{ > + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state == SDEV_CANCEL) > + return -ENXIO; > + if (!get_device(&sdev->sdev_gendev)) > + return -ENXIO; > + return 0; > +} > + > static void __scsi_remove_target(struct scsi_target *starget) > { > struct Scsi_Host *shost = dev_to_shost(starget->dev.parent); > @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct scsi_target *starget) > */ > if (sdev->channel != starget->channel || > sdev->id != starget->id || > - !get_device(&sdev->sdev_gendev)) > + scsi_device_get_not_deleted(sdev)) > continue; > spin_unlock_irqrestore(shost->host_lock, flags); > scsi_remove_device(sdev); See subsequent discussion, however, we have a reproducible case here and the patch does appear to fix the issue (500+ iterations). Reviewed-by: Ewan D. Milne <emilne@redhat.com> ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2017-12-06 2:46 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-11-29 3:05 [PATCH] scsi: fix race condition when removing target Jason Yan 2017-11-29 7:41 ` Hannes Reinecke 2017-11-29 16:18 ` Bart Van Assche 2017-11-29 16:20 ` hch 2017-11-29 17:39 ` Bart Van Assche 2017-11-30 1:18 ` Jason Yan 2017-11-30 16:08 ` Bart Van Assche 2017-11-30 16:40 ` gregkh 2017-11-30 23:56 ` James Bottomley 2017-12-01 1:12 ` Finn Thain 2017-12-01 8:40 ` Jason Yan 2017-12-01 14:41 ` Ewan D. Milne 2017-12-01 15:35 ` James Bottomley 2017-12-05 12:37 ` Jason Yan 2017-12-05 15:37 ` James Bottomley 2017-12-06 0:41 ` Jason Yan 2017-12-06 2:07 ` James Bottomley 2017-12-06 2:43 ` Jason Yan 2017-11-29 17:39 ` gregkh 2017-11-29 18:49 ` Ewan D. Milne 2017-11-29 19:11 ` Bart Van Assche 2017-11-29 19:20 ` Ewan D. Milne 2017-11-29 19:50 ` Bart Van Assche 2017-11-29 17:39 ` gregkh 2017-11-29 17:47 ` Bart Van Assche 2017-11-29 16:31 ` James Bottomley 2017-11-29 16:34 ` Christoph Hellwig 2017-11-29 16:47 ` James Bottomley 2017-11-29 19:05 ` Ewan D. Milne
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.