* [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()
@ 2015-10-22 17:12 Vitaly Kuznetsov
2015-10-22 17:33 ` Bart Van Assche
0 siblings, 1 reply; 5+ messages in thread
From: Vitaly Kuznetsov @ 2015-10-22 17:12 UTC (permalink / raw)
To: James E.J. Bottomley; +Cc: linux-scsi, linux-kernel, K. Y. Srinivasan
On some host errors storvsc module tries to remove sdev by scheduling a job
which does the following:
sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
if (sdev) {
scsi_remove_device(sdev);
scsi_device_put(sdev);
}
While this code seems correct the following crash is observed:
general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
RIP: 0010:[<ffffffff81169979>] [<ffffffff81169979>] bdi_destroy+0x39/0x220
...
[<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
[<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
[<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
[<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
[<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
[<ffffffff81080791>] process_one_work+0x1b1/0x530
...
The problem comes with the fact that many such jobs (for the same device)
are being scheduled simultaneously. While scsi_remove_device() uses
shost->scan_mutex and scsi_device_lookup() will fail for a device in
SDEV_DEL state there is no protection against someone who did
scsi_device_lookup() before we actually entered __scsi_remove_device(). So
the whole scenario looks like that: two callers do simultaneous (or
preemption happens) calls to scsi_device_lookup() ant these calls succeed
for all of them, after that both callers try doing scsi_remove_device().
shost->scan_mutex only serializes their calls to __scsi_remove_device()
and we end up doing the cleanup path twice.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
drivers/scsi/scsi_sysfs.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index b333389..e0d2707 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
{
struct device *dev = &sdev->sdev_gendev;
+ /*
+ * This cleanup path is not reentrant and while it is impossible
+ * to get a new reference with scsi_device_get() someone can still
+ * hold a previously acquired one.
+ */
+ if (sdev->sdev_state == SDEV_DEL)
+ return;
+
if (sdev->is_visible) {
if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
return;
--
2.4.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()
2015-10-22 17:12 [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device() Vitaly Kuznetsov
@ 2015-10-22 17:33 ` Bart Van Assche
0 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2015-10-22 17:33 UTC (permalink / raw)
To: Vitaly Kuznetsov, James E.J. Bottomley
Cc: linux-scsi, linux-kernel, K. Y. Srinivasan
On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:
> On some host errors storvsc module tries to remove sdev by scheduling a job
> which does the following:
>
> sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
> if (sdev) {
> scsi_remove_device(sdev);
> scsi_device_put(sdev);
> }
>
> While this code seems correct the following crash is observed:
>
> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> RIP: 0010:[<ffffffff81169979>] [<ffffffff81169979>] bdi_destroy+0x39/0x220
> ...
> [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
> [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
> [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
> [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
> [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
> [<ffffffff81080791>] process_one_work+0x1b1/0x530
> ...
>
> The problem comes with the fact that many such jobs (for the same device)
> are being scheduled simultaneously. While scsi_remove_device() uses
> shost->scan_mutex and scsi_device_lookup() will fail for a device in
> SDEV_DEL state there is no protection against someone who did
> scsi_device_lookup() before we actually entered __scsi_remove_device(). So
> the whole scenario looks like that: two callers do simultaneous (or
> preemption happens) calls to scsi_device_lookup() ant these calls succeed
> for all of them, after that both callers try doing scsi_remove_device().
> shost->scan_mutex only serializes their calls to __scsi_remove_device()
> and we end up doing the cleanup path twice.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> drivers/scsi/scsi_sysfs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index b333389..e0d2707 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
> {
> struct device *dev = &sdev->sdev_gendev;
>
> + /*
> + * This cleanup path is not reentrant and while it is impossible
> + * to get a new reference with scsi_device_get() someone can still
> + * hold a previously acquired one.
> + */
> + if (sdev->sdev_state == SDEV_DEL)
> + return;
> +
> if (sdev->is_visible) {
> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> return;
Hello Vitaly,
Sorry but I don't see how the above patch could be a proper fix. If two
calls to __scsi_remove_device() occur concurrently the crash explained
above can still occur. The storsvc driver should be modified such that
concurrent __scsi_remove_device() calls do not occur. How about
preventing concurrent calls via a mutex ? Another possible approach is
to use the workqueue mechanism. An example can be found in the SRP
initiator driver (ib_srp).
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()
@ 2015-10-22 17:33 ` Bart Van Assche
0 siblings, 0 replies; 5+ messages in thread
From: Bart Van Assche @ 2015-10-22 17:33 UTC (permalink / raw)
To: Vitaly Kuznetsov, James E.J. Bottomley
Cc: linux-scsi, linux-kernel, K. Y. Srinivasan
On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:
> On some host errors storvsc module tries to remove sdev by scheduling a job
> which does the following:
>
> sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
> if (sdev) {
> scsi_remove_device(sdev);
> scsi_device_put(sdev);
> }
>
> While this code seems correct the following crash is observed:
>
> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
> RIP: 0010:[<ffffffff81169979>] [<ffffffff81169979>] bdi_destroy+0x39/0x220
> ...
> [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
> [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
> [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
> [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
> [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
> [<ffffffff81080791>] process_one_work+0x1b1/0x530
> ...
>
> The problem comes with the fact that many such jobs (for the same device)
> are being scheduled simultaneously. While scsi_remove_device() uses
> shost->scan_mutex and scsi_device_lookup() will fail for a device in
> SDEV_DEL state there is no protection against someone who did
> scsi_device_lookup() before we actually entered __scsi_remove_device(). So
> the whole scenario looks like that: two callers do simultaneous (or
> preemption happens) calls to scsi_device_lookup() ant these calls succeed
> for all of them, after that both callers try doing scsi_remove_device().
> shost->scan_mutex only serializes their calls to __scsi_remove_device()
> and we end up doing the cleanup path twice.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> drivers/scsi/scsi_sysfs.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index b333389..e0d2707 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
> {
> struct device *dev = &sdev->sdev_gendev;
>
> + /*
> + * This cleanup path is not reentrant and while it is impossible
> + * to get a new reference with scsi_device_get() someone can still
> + * hold a previously acquired one.
> + */
> + if (sdev->sdev_state == SDEV_DEL)
> + return;
> +
> if (sdev->is_visible) {
> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> return;
Hello Vitaly,
Sorry but I don't see how the above patch could be a proper fix. If two
calls to __scsi_remove_device() occur concurrently the crash explained
above can still occur. The storsvc driver should be modified such that
concurrent __scsi_remove_device() calls do not occur. How about
preventing concurrent calls via a mutex ? Another possible approach is
to use the workqueue mechanism. An example can be found in the SRP
initiator driver (ib_srp).
Bart.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()
2015-10-22 17:33 ` Bart Van Assche
@ 2015-10-23 9:14 ` Vitaly Kuznetsov
-1 siblings, 0 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2015-10-23 9:14 UTC (permalink / raw)
To: Bart Van Assche
Cc: James E.J. Bottomley, linux-scsi, linux-kernel, K. Y. Srinivasan
Bart Van Assche <bart.vanassche@sandisk.com> writes:
> On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:
>> On some host errors storvsc module tries to remove sdev by scheduling a job
>> which does the following:
>>
>> sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
>> if (sdev) {
>> scsi_remove_device(sdev);
>> scsi_device_put(sdev);
>> }
>>
>> While this code seems correct the following crash is observed:
>>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>> RIP: 0010:[<ffffffff81169979>] [<ffffffff81169979>] bdi_destroy+0x39/0x220
>> ...
>> [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
>> [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
>> [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
>> [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
>> [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
>> [<ffffffff81080791>] process_one_work+0x1b1/0x530
>> ...
>>
>> The problem comes with the fact that many such jobs (for the same device)
>> are being scheduled simultaneously. While scsi_remove_device() uses
>> shost->scan_mutex and scsi_device_lookup() will fail for a device in
>> SDEV_DEL state there is no protection against someone who did
>> scsi_device_lookup() before we actually entered __scsi_remove_device(). So
>> the whole scenario looks like that: two callers do simultaneous (or
>> preemption happens) calls to scsi_device_lookup() ant these calls succeed
>> for all of them, after that both callers try doing scsi_remove_device().
>> shost->scan_mutex only serializes their calls to __scsi_remove_device()
>> and we end up doing the cleanup path twice.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> drivers/scsi/scsi_sysfs.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index b333389..e0d2707 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
>> {
>> struct device *dev = &sdev->sdev_gendev;
>>
>> + /*
>> + * This cleanup path is not reentrant and while it is impossible
>> + * to get a new reference with scsi_device_get() someone can still
>> + * hold a previously acquired one.
>> + */
>> + if (sdev->sdev_state == SDEV_DEL)
>> + return;
>> +
>> if (sdev->is_visible) {
>> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>> return;
>
> Hello Vitaly,
>
> Sorry but I don't see how the above patch could be a proper fix. If
> two calls to __scsi_remove_device() occur concurrently the crash
> explained above can still occur. The storsvc driver should be modified
> such that concurrent __scsi_remove_device() calls do not occur. How
> about preventing concurrent calls via a mutex ?
Nobody is supposed to call __scsi_remove_device() without holding
shost->scan_mutex and scsi_remove_device() does that. Here I'm trying to
protect against two *consequent* calls to the __scsi_remove_device(). As
we set sdev_state to SDEV_DEL on the cleanup path checking it should be
enough.
> Another possible
> approach is to use the workqueue mechanism. An example can be found in
> the SRP initiator driver (ib_srp).
Yes, but I think the existent approach is good enough:
1) Every caller is supposed to get a reference to the device with
scsi_device_get() (scsi_device_lookup() does that).
2) shost->scan_mutex is suppose to be held by all __scsi_remove_device()
callers.
--
Vitaly
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device()
@ 2015-10-23 9:14 ` Vitaly Kuznetsov
0 siblings, 0 replies; 5+ messages in thread
From: Vitaly Kuznetsov @ 2015-10-23 9:14 UTC (permalink / raw)
To: Bart Van Assche
Cc: James E.J. Bottomley, linux-scsi, linux-kernel, K. Y. Srinivasan
Bart Van Assche <bart.vanassche@sandisk.com> writes:
> On 10/22/2015 10:12 AM, Vitaly Kuznetsov wrote:
>> On some host errors storvsc module tries to remove sdev by scheduling a job
>> which does the following:
>>
>> sdev = scsi_device_lookup(wrk->host, 0, 0, wrk->lun);
>> if (sdev) {
>> scsi_remove_device(sdev);
>> scsi_device_put(sdev);
>> }
>>
>> While this code seems correct the following crash is observed:
>>
>> general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
>> RIP: 0010:[<ffffffff81169979>] [<ffffffff81169979>] bdi_destroy+0x39/0x220
>> ...
>> [<ffffffff814aecdc>] ? _raw_spin_unlock_irq+0x2c/0x40
>> [<ffffffff8127b7db>] blk_cleanup_queue+0x17b/0x270
>> [<ffffffffa00b54c4>] __scsi_remove_device+0x54/0xd0 [scsi_mod]
>> [<ffffffffa00b556b>] scsi_remove_device+0x2b/0x40 [scsi_mod]
>> [<ffffffffa00ec47d>] storvsc_remove_lun+0x3d/0x60 [hv_storvsc]
>> [<ffffffff81080791>] process_one_work+0x1b1/0x530
>> ...
>>
>> The problem comes with the fact that many such jobs (for the same device)
>> are being scheduled simultaneously. While scsi_remove_device() uses
>> shost->scan_mutex and scsi_device_lookup() will fail for a device in
>> SDEV_DEL state there is no protection against someone who did
>> scsi_device_lookup() before we actually entered __scsi_remove_device(). So
>> the whole scenario looks like that: two callers do simultaneous (or
>> preemption happens) calls to scsi_device_lookup() ant these calls succeed
>> for all of them, after that both callers try doing scsi_remove_device().
>> shost->scan_mutex only serializes their calls to __scsi_remove_device()
>> and we end up doing the cleanup path twice.
>>
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> drivers/scsi/scsi_sysfs.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
>> index b333389..e0d2707 100644
>> --- a/drivers/scsi/scsi_sysfs.c
>> +++ b/drivers/scsi/scsi_sysfs.c
>> @@ -1076,6 +1076,14 @@ void __scsi_remove_device(struct scsi_device *sdev)
>> {
>> struct device *dev = &sdev->sdev_gendev;
>>
>> + /*
>> + * This cleanup path is not reentrant and while it is impossible
>> + * to get a new reference with scsi_device_get() someone can still
>> + * hold a previously acquired one.
>> + */
>> + if (sdev->sdev_state == SDEV_DEL)
>> + return;
>> +
>> if (sdev->is_visible) {
>> if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
>> return;
>
> Hello Vitaly,
>
> Sorry but I don't see how the above patch could be a proper fix. If
> two calls to __scsi_remove_device() occur concurrently the crash
> explained above can still occur. The storsvc driver should be modified
> such that concurrent __scsi_remove_device() calls do not occur. How
> about preventing concurrent calls via a mutex ?
Nobody is supposed to call __scsi_remove_device() without holding
shost->scan_mutex and scsi_remove_device() does that. Here I'm trying to
protect against two *consequent* calls to the __scsi_remove_device(). As
we set sdev_state to SDEV_DEL on the cleanup path checking it should be
enough.
> Another possible
> approach is to use the workqueue mechanism. An example can be found in
> the SRP initiator driver (ib_srp).
Yes, but I think the existent approach is good enough:
1) Every caller is supposed to get a reference to the device with
scsi_device_get() (scsi_device_lookup() does that).
2) shost->scan_mutex is suppose to be held by all __scsi_remove_device()
callers.
--
Vitaly
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-10-23 9:14 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-22 17:12 [PATCH] scsi_sysfs: protect against double execution of __scsi_remove_device() Vitaly Kuznetsov
2015-10-22 17:33 ` Bart Van Assche
2015-10-22 17:33 ` Bart Van Assche
2015-10-23 9:14 ` Vitaly Kuznetsov
2015-10-23 9:14 ` Vitaly Kuznetsov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.