* [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
@ 2017-04-21 21:13 Song Liu
2017-04-21 21:17 ` Bart Van Assche
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Song Liu @ 2017-04-21 21:13 UTC (permalink / raw)
To: linux-scsi; +Cc: hch, Bart.VanAssche, Song Liu
When a device is deleted through sysfs handle "delete", the code
locks shost->scan_mutex. If multiple devices are deleted at the
same time, these deletes will be handled in series.
On the other hand, some devices do long latency IO during deletion,
for example, sd_shutdown() may do sync cache and/or start_stop.
It is not necessary for these commands to run in series.
To reduce latency of parallel "delete" requests, this patch reduces
the protection of scan_mutex. The only function with Scsi_Host
called in __scsi_remove_device() is the optional slave_destroy().
Therefore, the protection of scan_mutex is only necessary for this
function.
Signed-off-by: Song Liu <songliubraving@fb.com>
---
drivers/scsi/scsi_sysfs.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..e7a9e28 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -610,7 +610,7 @@ static int scsi_sdev_check_buf_bit(const char *buf)
return 1;
else if (buf[0] == '0')
return 0;
- else
+ else
return -EINVAL;
} else
return -EINVAL;
@@ -774,7 +774,7 @@ store_queue_type_field(struct device *dev, struct device_attribute *attr,
if (!sdev->tagged_supported)
return -EINVAL;
-
+
sdev_printk(KERN_INFO, sdev,
"ignoring write to deprecated queue_type attribute");
return count;
@@ -1302,8 +1302,11 @@ void __scsi_remove_device(struct scsi_device *sdev)
blk_cleanup_queue(sdev->request_queue);
cancel_work_sync(&sdev->requeue_work);
- if (sdev->host->hostt->slave_destroy)
+ if (sdev->host->hostt->slave_destroy) {
+ mutex_lock(&sdev->host->scan_mutex);
sdev->host->hostt->slave_destroy(sdev);
+ mutex_unlock(&sdev->host->scan_mutex);
+ }
transport_destroy_device(dev);
/*
@@ -1322,11 +1325,7 @@ void __scsi_remove_device(struct scsi_device *sdev)
**/
void scsi_remove_device(struct scsi_device *sdev)
{
- struct Scsi_Host *shost = sdev->host;
-
- mutex_lock(&shost->scan_mutex);
__scsi_remove_device(sdev);
- mutex_unlock(&shost->scan_mutex);
}
EXPORT_SYMBOL(scsi_remove_device);
--
2.9.3
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 21:13 [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device Song Liu
@ 2017-04-21 21:17 ` Bart Van Assche
2017-04-21 22:20 ` Song Liu
2017-04-21 21:20 ` Bart Van Assche
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2017-04-21 21:17 UTC (permalink / raw)
To: linux-scsi, songliubraving; +Cc: hch
On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
> When a device is deleted through sysfs handle "delete", [ ... ]
If I try to use that sysfs attribute then I encounter a deadlock (see
also the report below). How is it possible that you have not hit that
deadlock in your tests?
Bart.
======================================================
[ INFO: possible circular locking dependency detected ]
4.11.0-rc6-dbg+ #3 Tainted: G I
-------------------------------------------------------
bash/7858 is trying to acquire lock:
(&shost->scan_mutex){+.+.+.}, at: [<ffffffff814de090>] scsi_remove_device+0x20/0x40
but task is already holding lock:
(s_active#326){++++.+}, at: [<ffffffff81293e20>] kernfs_remove_self+0xe0/0x140
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (s_active#326){++++.+}:
lock_acquire+0xd5/0x1c0
__kernfs_remove+0x248/0x310
kernfs_remove_by_name_ns+0x45/0xa0
remove_files.isra.1+0x35/0x70
sysfs_remove_group+0x44/0x90
sysfs_remove_groups+0x2e/0x50
device_remove_attrs+0x5e/0x80
device_del+0x1fd/0x320
__scsi_remove_device+0xe9/0x120
scsi_forget_host+0x60/0x70
scsi_remove_host+0x71/0x110
0xffffffffa0703690
process_one_work+0x20b/0x6a0
worker_thread+0x4e/0x4a0
kthread+0x113/0x150
ret_from_fork+0x2e/0x40
-> #0 (&shost->scan_mutex){+.+.+.}:
__lock_acquire+0x1109/0x1280
lock_acquire+0xd5/0x1c0
__mutex_lock+0x83/0x980
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x20/0x40
sdev_store_delete+0x27/0x30
dev_attr_store+0x18/0x30
sysfs_kf_write+0x45/0x60
kernfs_fop_write+0x13c/0x1c0
__vfs_write+0x28/0x140
vfs_write+0xc8/0x1e0
SyS_write+0x49/0xa0
entry_SYSCALL_64_fastpath+0x18/0xad
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(s_active#326);
lock(&shost->scan_mutex);
lock(s_active#326);
lock(&shost->scan_mutex);
*** DEADLOCK ***
3 locks held by bash/7858:
#0: (sb_writers#4){.+.+.+}, at: [<ffffffff81206b45>] vfs_write+0x195/0x1e0
#1: (&of->mutex){+.+.+.}, at: [<ffffffff81294af6>] kernfs_fop_write+0x106/0x1c0
#2: (s_active#326){++++.+}, at: [<ffffffff81293e20>] kernfs_remove_self+0xe0/0x140
stack backtrace:
CPU: 3 PID: 7858 Comm: bash Tainted: G I 4.11.0-rc6-dbg+ #3
Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 1.0.2 11/17/2014
Call Trace:
dump_stack+0x68/0x93
print_circular_bug+0x1be/0x210
__lock_acquire+0x1109/0x1280
lock_acquire+0xd5/0x1c0
__mutex_lock+0x83/0x980
mutex_lock_nested+0x1b/0x20
scsi_remove_device+0x20/0x40
sdev_store_delete+0x27/0x30
dev_attr_store+0x18/0x30
sysfs_kf_write+0x45/0x60
kernfs_fop_write+0x13c/0x1c0
__vfs_write+0x28/0x140
vfs_write+0xc8/0x1e0
SyS_write+0x49/0xa0
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fec0f748500
RSP: 002b:00007ffc1ddaec98 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000046 RCX: 00007fec0f748500
RDX: 0000000000000002 RSI: 0000000002012aa0 RDI: 0000000000000001
RBP: 00007ffc1ddaebf0 R08: 00007fec0fa0a740 R09: 00007fec10061100
R10: 0000000000000098 R11: 0000000000000246 R12: 00007fec10084bf0
R13: 0000000000000001 R14: 0000000000000000 R15: 00007ffc1ddaec18
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 21:13 [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device Song Liu
2017-04-21 21:17 ` Bart Van Assche
@ 2017-04-21 21:20 ` Bart Van Assche
2017-04-21 22:31 ` Song Liu
2017-04-24 15:28 ` Christoph Hellwig
2017-04-25 17:23 ` Bart Van Assche
3 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2017-04-21 21:20 UTC (permalink / raw)
To: linux-scsi, songliubraving; +Cc: hch
On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
> On the other hand, some devices do long latency IO during deletion,
> for example, sd_shutdown() may do sync cache and/or start_stop.
> It is not necessary for these commands to run in series.
Have you noticed my patch series that makes sd_shutdown() submit the
SYNCHRONIZE CACHE command asynchronously? Have you tried whether that
patch series would be a good alternative?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 21:17 ` Bart Van Assche
@ 2017-04-21 22:20 ` Song Liu
0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2017-04-21 22:20 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, hch
> On Apr 21, 2017, at 2:17 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
> On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
>> When a device is deleted through sysfs handle "delete", [ ... ]
>
> If I try to use that sysfs attribute then I encounter a deadlock (see
> also the report below). How is it possible that you have not hit that
> deadlock in your tests?
>
> Bart.
Hmm... Seems my test environment (VM) doesn't call the following path
__scsi_remove_device+0xe9/0x120
scsi_forget_host+0x60/0x70
scsi_remove_host+0x71/0x110
Let me fix it.
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 21:20 ` Bart Van Assche
@ 2017-04-21 22:31 ` Song Liu
2017-04-25 20:59 ` Bart Van Assche
0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2017-04-21 22:31 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, hch
> On Apr 21, 2017, at 2:20 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
> On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
>> On the other hand, some devices do long latency IO during deletion,
>> for example, sd_shutdown() may do sync cache and/or start_stop.
>> It is not necessary for these commands to run in series.
>
> Have you noticed my patch series that makes sd_shutdown() submit the
> SYNCHRONIZE CACHE command asynchronously? Have you tried whether that
> patch series would be a good alternative?
>
> Thanks,
>
> Bart.
The asynchronous SYNCHRONIZE CACHE will not help our use case, where the
latency comes from sd_start_stop_device(). Seems it is not easy to make
the START STOP UNIT command async.
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 21:13 [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device Song Liu
2017-04-21 21:17 ` Bart Van Assche
2017-04-21 21:20 ` Bart Van Assche
@ 2017-04-24 15:28 ` Christoph Hellwig
2017-04-25 17:23 ` Bart Van Assche
3 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2017-04-24 15:28 UTC (permalink / raw)
To: Song Liu; +Cc: linux-scsi, hch, Bart.VanAssche
On Fri, Apr 21, 2017 at 02:13:02PM -0700, Song Liu wrote:
> When a device is deleted through sysfs handle "delete", the code
> locks shost->scan_mutex. If multiple devices are deleted at the
> same time, these deletes will be handled in series.
>
> On the other hand, some devices do long latency IO during deletion,
> for example, sd_shutdown() may do sync cache and/or start_stop.
> It is not necessary for these commands to run in series.
>
> To reduce latency of parallel "delete" requests, this patch reduces
> the protection of scan_mutex. The only function with Scsi_Host
> called in __scsi_remove_device() is the optional slave_destroy().
> Therefore, the protection of scan_mutex is only necessary for this
> function.
And I don't think it makes sense for slave_destroy either. Please
do a quick audit of the instances and drop the lock for it, too.
>
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 82dfe07..e7a9e28 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -610,7 +610,7 @@ static int scsi_sdev_check_buf_bit(const char *buf)
> return 1;
> else if (buf[0] == '0')
> return 0;
> - else
> + else
> return -EINVAL;
Also the patch has a few odd whitespace changes like this. Please
remove those before reposting.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 21:13 [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device Song Liu
` (2 preceding siblings ...)
2017-04-24 15:28 ` Christoph Hellwig
@ 2017-04-25 17:23 ` Bart Van Assche
2017-04-25 17:42 ` Song Liu
3 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2017-04-25 17:23 UTC (permalink / raw)
To: linux-scsi, songliubraving; +Cc: hch
On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
> When a device is deleted through sysfs handle "delete", the code
> locks shost->scan_mutex. If multiple devices are deleted at the
> same time, these deletes will be handled in series.
>
> On the other hand, some devices do long latency IO during deletion,
> for example, sd_shutdown() may do sync cache and/or start_stop.
> It is not necessary for these commands to run in series.
>
> To reduce latency of parallel "delete" requests, this patch reduces
> the protection of scan_mutex. The only function with Scsi_Host
> called in __scsi_remove_device() is the optional slave_destroy().
> Therefore, the protection of scan_mutex is only necessary for this
> function.
Hello Song,
What you wrote above is wrong. It is necessary to serialize SCSI
scanning against SCSI device removal. That is why scan_mutex is held
around the __scsi_remove_device() call. When adding a LUN the following
(and several other) actions are performed:
* Allocation of memory for struct scsi_device.
* Allocation of a block layer request queue.
* Initialization of the .sdev_gendev and .sdev_dev device structures.
* Association of the transport layer driver with the SCSI device.
* Association of a device handler with the SCSI device (e.g. ALUA).
* Making the .sdev_gendev and .sdev_dev devices visible in sysfs.
* Making the transport layer attributes visible in sysfs.
* Creating a bsg device node in sysfs.
* Association of an upper layer driver
(e.g. sd or sr) with the SCSI LUN.
Removal of a LUN means undoing all of the above. If adding and removing
a LUN would not be not serialized then there would be a risk that removal
and immediate reregistration of a LUN will fail due to the reregistration
process trying to add sysfs attributes that were not yet removed by the
removal step.
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-25 17:23 ` Bart Van Assche
@ 2017-04-25 17:42 ` Song Liu
2017-04-25 17:52 ` Bart Van Assche
0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2017-04-25 17:42 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, hch
> On Apr 25, 2017, at 10:23 AM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
> On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
>> When a device is deleted through sysfs handle "delete", the code
>> locks shost->scan_mutex. If multiple devices are deleted at the
>> same time, these deletes will be handled in series.
>>
>> On the other hand, some devices do long latency IO during deletion,
>> for example, sd_shutdown() may do sync cache and/or start_stop.
>> It is not necessary for these commands to run in series.
>>
>> To reduce latency of parallel "delete" requests, this patch reduces
>> the protection of scan_mutex. The only function with Scsi_Host
>> called in __scsi_remove_device() is the optional slave_destroy().
>> Therefore, the protection of scan_mutex is only necessary for this
>> function.
>
> Hello Song,
>
> What you wrote above is wrong. It is necessary to serialize SCSI
> scanning against SCSI device removal. That is why scan_mutex is held
> around the __scsi_remove_device() call. When adding a LUN the following
> (and several other) actions are performed:
> * Allocation of memory for struct scsi_device.
> * Allocation of a block layer request queue.
> * Initialization of the .sdev_gendev and .sdev_dev device structures.
> * Association of the transport layer driver with the SCSI device.
> * Association of a device handler with the SCSI device (e.g. ALUA).
> * Making the .sdev_gendev and .sdev_dev devices visible in sysfs.
> * Making the transport layer attributes visible in sysfs.
> * Creating a bsg device node in sysfs.
> * Association of an upper layer driver
> (e.g. sd or sr) with the SCSI LUN.
>
> Removal of a LUN means undoing all of the above. If adding and removing
> a LUN would not be not serialized then there would be a risk that removal
> and immediate reregistration of a LUN will fail due to the reregistration
> process trying to add sysfs attributes that were not yet removed by the
> removal step.
>
> Bart.
Hello Bart,
Thanks a lot for looking into this.
I have been studying the code recently. I am wondering whether the following
would work:
1. Introduce a new mutex for scsi_device to protect most operations in the
list you gathered above;
2. For operations like host->slave_destroy(), ensure they access scsi_host
data with host_lock (or another spin lock).
I looked into all instances of slave_destroy, only 2 of them:
dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data
without protection of spin lock.
3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex
for the scsi_device. scan_mutex is no longer required.
Is this a valid path?
Thanks again,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-25 17:42 ` Song Liu
@ 2017-04-25 17:52 ` Bart Van Assche
2017-04-25 21:17 ` Song Liu
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2017-04-25 17:52 UTC (permalink / raw)
To: songliubraving; +Cc: linux-scsi, hch
On Tue, 2017-04-25 at 17:42 +0000, Song Liu wrote:
> I have been studying the code recently. I am wondering whether the following
> would work:
>
> 1. Introduce a new mutex for scsi_device to protect most operations in the
> list you gathered above;
>
> 2. For operations like host->slave_destroy(), ensure they access scsi_host
> data with host_lock (or another spin lock).
>
> I looked into all instances of slave_destroy, only 2 of them:
> dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data
> without protection of spin lock.
>
> 3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex
> for the scsi_device. scan_mutex is no longer required.
>
> Is this a valid path?
Sorry but I don't think so. Unlocking and reacquiring scan_mutex would create
the potential that LUN scanning occurs in the meantime and hence that it fails
because LUN removal is incomplete.
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-21 22:31 ` Song Liu
@ 2017-04-25 20:59 ` Bart Van Assche
2017-04-25 21:29 ` Song Liu
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2017-04-25 20:59 UTC (permalink / raw)
To: songliubraving; +Cc: linux-scsi, hch
On Fri, 2017-04-21 at 22:31 +0000, Song Liu wrote:
> On Apr 21, 2017, at 2:20 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
> > On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
> > > On the other hand, some devices do long latency IO during deletion,
> > > for example, sd_shutdown() may do sync cache and/or start_stop.
> > > It is not necessary for these commands to run in series.
> >
> > Have you noticed my patch series that makes sd_shutdown() submit the
> > SYNCHRONIZE CACHE command asynchronously? Have you tried whether that
> > patch series would be a good alternative?
>
> The asynchronous SYNCHRONIZE CACHE will not help our use case, where the
> latency comes from sd_start_stop_device(). Seems it is not easy to make
> the START STOP UNIT command async.
Hello Song,
It should be possible to make the START STOP UNIT command asynchronous too
by issuing it asynchronously from inside sd_sync_cache_done(). To avoid
dereferencing a stale struct scsi_disk pointer you will either have to hold
an additional reference as long as the SYNCHRONIZE CACHE command is in
progress or copy the data needed from struct scsi_disk into the SCSI request.
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-25 17:52 ` Bart Van Assche
@ 2017-04-25 21:17 ` Song Liu
2017-04-25 22:17 ` Bart Van Assche
0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2017-04-25 21:17 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, hch
> On Apr 25, 2017, at 10:52 AM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
> On Tue, 2017-04-25 at 17:42 +0000, Song Liu wrote:
>> I have been studying the code recently. I am wondering whether the following
>> would work:
>>
>> 1. Introduce a new mutex for scsi_device to protect most operations in the
>> list you gathered above;
>>
>> 2. For operations like host->slave_destroy(), ensure they access scsi_host
>> data with host_lock (or another spin lock).
>>
>> I looked into all instances of slave_destroy, only 2 of them:
>> dc395x_slave_destroy() and visorhba_slave_destroy() access scsi_host data
>> without protection of spin lock.
>>
>> 3. Once 1 and 2 is ready, __scsi_remove_device() only need to hold the mutex
>> for the scsi_device. scan_mutex is no longer required.
>>
>> Is this a valid path?
>
> Sorry but I don't think so. Unlocking and reacquiring scan_mutex would create
> the potential that LUN scanning occurs in the meantime and hence that it fails
> because LUN removal is incomplete.
>
Hello Bart,
I am not sure I fully understand the problem here. If I understand the logic
correctly, when a device is being removed, it will stay in scsi_host->__devices
until fully the remove routine is finished. And LUN scanning in parallel will
find the device with scsi_device_lookup_by_target(), and thus it would not
rescan the device until the device is fully removed? Did I miss anything here?
Thanks,
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-25 20:59 ` Bart Van Assche
@ 2017-04-25 21:29 ` Song Liu
0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2017-04-25 21:29 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, hch
> On Apr 25, 2017, at 1:59 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
> On Fri, 2017-04-21 at 22:31 +0000, Song Liu wrote:
>> On Apr 21, 2017, at 2:20 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>>> On Fri, 2017-04-21 at 14:13 -0700, Song Liu wrote:
>>>> On the other hand, some devices do long latency IO during deletion,
>>>> for example, sd_shutdown() may do sync cache and/or start_stop.
>>>> It is not necessary for these commands to run in series.
>>>
>>> Have you noticed my patch series that makes sd_shutdown() submit the
>>> SYNCHRONIZE CACHE command asynchronously? Have you tried whether that
>>> patch series would be a good alternative?
>>
>> The asynchronous SYNCHRONIZE CACHE will not help our use case, where the
>> latency comes from sd_start_stop_device(). Seems it is not easy to make
>> the START STOP UNIT command async.
>
> Hello Song,
>
> It should be possible to make the START STOP UNIT command asynchronous too
> by issuing it asynchronously from inside sd_sync_cache_done(). To avoid
> dereferencing a stale struct scsi_disk pointer you will either have to hold
> an additional reference as long as the SYNCHRONIZE CACHE command is in
> progress or copy the data needed from struct scsi_disk into the SCSI request.
>
Hello Bart,
As you stated in the patch: blk_cleanup_queue() in __scsi_remove_device()
will wait for async SYNCHRONIZE CACHE and START STOP UNIT command to complete.
Since __scsi_remove_device() is called with scan_mutex held, simultaneous
START STOP UNIT requires are still serialized by the scan_mutex.
Song
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-25 21:17 ` Song Liu
@ 2017-04-25 22:17 ` Bart Van Assche
2017-04-26 0:41 ` Song Liu
0 siblings, 1 reply; 14+ messages in thread
From: Bart Van Assche @ 2017-04-25 22:17 UTC (permalink / raw)
To: songliubraving; +Cc: linux-scsi, hch
On Tue, 2017-04-25 at 21:17 +0000, Song Liu wrote:
> I am not sure I fully understand the problem here. If I understand the logic
> correctly, when a device is being removed, it will stay in scsi_host->__devices
> until fully the remove routine is finished. And LUN scanning in parallel will
> find the device with scsi_device_lookup_by_target(), and thus it would not
> rescan the device until the device is fully removed? Did I miss anything here?
Hello Song,
The SCSI core is already complicated enough. Please don't complicate it further
by making subtle changes to the semantics of scan_mutex. Please also note that
I have proposed an alternative, namely to make the START STOP UNIT command
asynchronous.
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device
2017-04-25 22:17 ` Bart Van Assche
@ 2017-04-26 0:41 ` Song Liu
0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2017-04-26 0:41 UTC (permalink / raw)
To: Bart Van Assche; +Cc: linux-scsi, hch
> On Apr 25, 2017, at 3:17 PM, Bart Van Assche <Bart.VanAssche@sandisk.com> wrote:
>
> On Tue, 2017-04-25 at 21:17 +0000, Song Liu wrote:
>> I am not sure I fully understand the problem here. If I understand the logic
>> correctly, when a device is being removed, it will stay in scsi_host->__devices
>> until fully the remove routine is finished. And LUN scanning in parallel will
>> find the device with scsi_device_lookup_by_target(), and thus it would not
>> rescan the device until the device is fully removed? Did I miss anything here?
>
> Hello Song,
>
> The SCSI core is already complicated enough. Please don't complicate it further
> by making subtle changes to the semantics of scan_mutex. Please also note that
> I have proposed an alternative, namely to make the START STOP UNIT command
> asynchronous.
>
Hello Bart,
I total agree with your concern in making SCSI core more complicated. However, I am
not sure whether how asynchronous START STOP UNIT command makes multiple deletes run
in parallel. If I understand correctly, each device delete has the following steps:
1. lock scan_mutex
2. issue async START STOP UNIT
3. wait START STOP UNIT to complete
4. unlock scan_mutex
scan_mutex will prevent multiple device deletes to run in parallel.
On the other hand, maybe the problem is simpler than I thought? Once we ensure all
slave_destroy() holds proper lock to access host level data, something as follows
might just work. In this way, echoing into delete handle is not protected by
scan_mutex. However, when the sysfs entry exists, the device should be fully added
to the system. And before delete operation finishes, future scan should not re-add
the device (as the device can be found by scsi_device_lookup_by_target).
Am I too optimistic here?
Thanks,
Song
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index 82dfe07..d1c3338 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -710,7 +710,7 @@ sdev_store_delete(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
if (device_remove_file_self(dev, attr))
- scsi_remove_device(to_scsi_device(dev));
+ __scsi_remove_device(to_scsi_device(dev));
return count;
};
^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2017-04-26 0:44 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-21 21:13 [RFC] scsi: reduce protection of scan_mutex in scsi_remove_device Song Liu
2017-04-21 21:17 ` Bart Van Assche
2017-04-21 22:20 ` Song Liu
2017-04-21 21:20 ` Bart Van Assche
2017-04-21 22:31 ` Song Liu
2017-04-25 20:59 ` Bart Van Assche
2017-04-25 21:29 ` Song Liu
2017-04-24 15:28 ` Christoph Hellwig
2017-04-25 17:23 ` Bart Van Assche
2017-04-25 17:42 ` Song Liu
2017-04-25 17:52 ` Bart Van Assche
2017-04-25 21:17 ` Song Liu
2017-04-25 22:17 ` Bart Van Assche
2017-04-26 0:41 ` Song Liu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.