* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-06-21 16:01 [PATCH] scsi/sg: don't grab scsi host module reference Yu Kuai
@ 2023-06-21 15:47 ` Bart Van Assche
2023-06-21 16:06 ` Bart Van Assche
` (3 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2023-06-21 15:47 UTC (permalink / raw)
To: Yu Kuai, hch, chaitanyak, shinichiro.kawasaki, dgilbert, jejb,
martin.petersen, yukuai3, axboe
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun
On 6/21/23 09:01, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> In order to prevent request_queue to be freed before cleaning up
> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
> debugfs entries leakage") use scsi_device_get(), however,
> scsi_device_get() will also grab scsi module reference and scsi module
> can't be removed.
>
> It's reported that blktests can't unload scsi_debug after block/001:
>
> blktests (master) # ./check block
> block/001 (stress device hotplugging) [failed]
> +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
> Running block/001
> Stressing sd
> +modprobe: FATAL: Module scsi_debug is in use.
>
> Fix this problem by grabbing request_queue reference directly, so that
> scsi host module can still be unloaded while request_queue will be
> pinged by sg device.
pinged -> pinned
Otherwise this patch looks good to me.
Bart.
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH] scsi/sg: don't grab scsi host module reference
@ 2023-06-21 16:01 Yu Kuai
2023-06-21 15:47 ` Bart Van Assche
` (4 more replies)
0 siblings, 5 replies; 12+ messages in thread
From: Yu Kuai @ 2023-06-21 16:01 UTC (permalink / raw)
To: hch, chaitanyak, shinichiro.kawasaki, dgilbert, jejb,
martin.petersen, yukuai3, axboe
Cc: linux-scsi, linux-kernel, yukuai1, yi.zhang, yangerkun
From: Yu Kuai <yukuai3@huawei.com>
In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.
It's reported that blktests can't unload scsi_debug after block/001:
blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
+++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
Running block/001
Stressing sd
+modprobe: FATAL: Module scsi_debug is in use.
Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
drivers/scsi/sg.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 2433eeef042a..dcb73787c29d 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
int error;
unsigned long iflags;
- error = scsi_device_get(scsidp);
+ error = blk_get_queue(scsidp->request_queue);
if (error)
return error;
@@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
out:
if (cdev)
cdev_del(cdev);
- scsi_device_put(scsidp);
+ blk_put_queue(scsidp->request_queue);
return error;
}
@@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
*/
blk_trace_remove(q);
- scsi_device_put(sdp->device);
+ blk_put_queue(q);
write_lock_irqsave(&sg_index_lock, flags);
idr_remove(&sg_index_idr, sdp->index);
--
2.39.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-06-21 16:01 [PATCH] scsi/sg: don't grab scsi host module reference Yu Kuai
2023-06-21 15:47 ` Bart Van Assche
@ 2023-06-21 16:06 ` Bart Van Assche
2023-06-23 5:44 ` Christoph Hellwig
` (2 subsequent siblings)
4 siblings, 0 replies; 12+ messages in thread
From: Bart Van Assche @ 2023-06-21 16:06 UTC (permalink / raw)
To: Yu Kuai, hch, chaitanyak, shinichiro.kawasaki, dgilbert, jejb,
martin.petersen, yukuai3, axboe
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun
On 6/21/23 09:01, Yu Kuai wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> In order to prevent request_queue to be freed before cleaning up
> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
> debugfs entries leakage") use scsi_device_get(), however,
> scsi_device_get() will also grab scsi module reference and scsi module
> can't be removed.
I just noticed that this patch has been posted on the linux-scsi mailing
list. If you plan to resend this patch, please send it to Jens and Cc
both linux-block and linux-scsi because this patch fixes a bug in a
patch that only exists in Jens' tree.
Thanks,
Bart.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-06-21 16:01 [PATCH] scsi/sg: don't grab scsi host module reference Yu Kuai
2023-06-21 15:47 ` Bart Van Assche
2023-06-21 16:06 ` Bart Van Assche
@ 2023-06-23 5:44 ` Christoph Hellwig
2023-06-23 14:28 ` Jens Axboe
2023-07-04 17:04 ` Marc Hartmayer
4 siblings, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2023-06-23 5:44 UTC (permalink / raw)
To: Yu Kuai
Cc: hch, chaitanyak, shinichiro.kawasaki, dgilbert, jejb,
martin.petersen, yukuai3, axboe, linux-scsi, linux-kernel,
yi.zhang, yangerkun
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-06-21 16:01 [PATCH] scsi/sg: don't grab scsi host module reference Yu Kuai
` (2 preceding siblings ...)
2023-06-23 5:44 ` Christoph Hellwig
@ 2023-06-23 14:28 ` Jens Axboe
2023-07-04 17:04 ` Marc Hartmayer
4 siblings, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2023-06-23 14:28 UTC (permalink / raw)
To: hch, chaitanyak, shinichiro.kawasaki, dgilbert, jejb,
martin.petersen, yukuai3, Yu Kuai
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun
On Thu, 22 Jun 2023 00:01:11 +0800, Yu Kuai wrote:
> In order to prevent request_queue to be freed before cleaning up
> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
> debugfs entries leakage") use scsi_device_get(), however,
> scsi_device_get() will also grab scsi module reference and scsi module
> can't be removed.
>
> It's reported that blktests can't unload scsi_debug after block/001:
>
> [...]
Applied, thanks!
[1/1] scsi/sg: don't grab scsi host module reference
commit: fcaa174a9c995cf0af3967e55644a1543ea07e36
Best regards,
--
Jens Axboe
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-06-21 16:01 [PATCH] scsi/sg: don't grab scsi host module reference Yu Kuai
` (3 preceding siblings ...)
2023-06-23 14:28 ` Jens Axboe
@ 2023-07-04 17:04 ` Marc Hartmayer
2023-07-04 18:51 ` Benjamin Block
2023-07-05 1:43 ` Yu Kuai
4 siblings, 2 replies; 12+ messages in thread
From: Marc Hartmayer @ 2023-07-04 17:04 UTC (permalink / raw)
To: Yu Kuai
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun, hch, chaitanyak,
shinichiro.kawasaki, dgilbert, jejb, martin.petersen, yukuai3,
axboe, Benjamin Block
On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> From: Yu Kuai <yukuai3@huawei.com>
>
> In order to prevent request_queue to be freed before cleaning up
> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
> debugfs entries leakage") use scsi_device_get(), however,
> scsi_device_get() will also grab scsi module reference and scsi module
> can't be removed.
>
> It's reported that blktests can't unload scsi_debug after block/001:
>
> blktests (master) # ./check block
> block/001 (stress device hotplugging) [failed]
> +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
> Running block/001
> Stressing sd
> +modprobe: FATAL: Module scsi_debug is in use.
>
> Fix this problem by grabbing request_queue reference directly, so that
> scsi host module can still be unloaded while request_queue will be
> pinged by sg device.
>
> Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
> Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
> Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
> drivers/scsi/sg.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 2433eeef042a..dcb73787c29d 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
> int error;
> unsigned long iflags;
>
> - error = scsi_device_get(scsidp);
> + error = blk_get_queue(scsidp->request_queue);
> if (error)
> return error;
>
> @@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
> out:
> if (cdev)
> cdev_del(cdev);
> - scsi_device_put(scsidp);
> + blk_put_queue(scsidp->request_queue);
> return error;
> }
>
> @@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
> */
>
> blk_trace_remove(q);
> - scsi_device_put(sdp->device);
> + blk_put_queue(q);
>
> write_lock_irqsave(&sg_index_lock, flags);
> idr_remove(&sg_index_idr, sdp->index);
> --
> 2.39.2
Hi,
This change (bisected) triggers a regression in our KVM on s390x CI. The
symptom is that a “scsi_debug device” does not bind to the scsi_generic
driver. On s390x you can reproduce the problem as follows (I have not
tested on x86):
With this patch applied:
$ sudo modprobe scsi_debug
$ # Get the 'scsi_host,channel,target_number,LUN' tuple for the scsi_debug device
$ lsscsi |grep scsi_debug |awk '{ print $1 }'
[0:0:0:0]
$ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No such file or directory
Patch reverted:
$ sudo modprobe scsi_debug
$ lsscsi |grep scsi_debug |awk '{ print $1 }'
[0:0:0:0]
$ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
Size: 0 Blocks: 0 IO Block: 4096 directory
Device: 0,20 Inode: 12155 Links: 3
…
Any ideas?
Marc
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-07-04 17:04 ` Marc Hartmayer
@ 2023-07-04 18:51 ` Benjamin Block
2023-07-05 2:16 ` Yu Kuai
2023-07-05 1:43 ` Yu Kuai
1 sibling, 1 reply; 12+ messages in thread
From: Benjamin Block @ 2023-07-04 18:51 UTC (permalink / raw)
To: Marc Hartmayer
Cc: Yu Kuai, linux-scsi, linux-kernel, yi.zhang, yangerkun, hch,
chaitanyak, shinichiro.kawasaki, dgilbert, jejb, martin.petersen,
yukuai3, axboe
On Tue, Jul 04, 2023 at 07:04:00PM +0200, Marc Hartmayer wrote:
> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> > From: Yu Kuai <yukuai3@huawei.com>
> > diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> > index 2433eeef042a..dcb73787c29d 100644
> > --- a/drivers/scsi/sg.c
> > +++ b/drivers/scsi/sg.c
> > @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
> > int error;
> > unsigned long iflags;
> >
> > - error = scsi_device_get(scsidp);
> > + error = blk_get_queue(scsidp->request_queue);
> > if (error)
> > return error;
> >
Might be interesting as well. Marc showed me a `dmesg` snipped earlier
from when the bind fails:
[ 15.441817] scsi host2: scsi_eh_2: sleeping
[ 15.441899] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[ 15.441907] scsi host2: scsi_debug: version 0191 [20210520]
dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
[ 15.442078] scsi host2: scsi_scan_host_selected: <4294967295:4294967295:18446744073709551615>
[ 15.442267] scsi 2:0:0:0: scsi scan: INQUIRY pass 1 length 36
[ 15.442286] scsi 2:0:0:0: scsi scan: INQUIRY successful with code 0x0
[ 15.442296] scsi 2:0:0:0: scsi scan: INQUIRY pass 2 length 96
[ 15.442308] scsi 2:0:0:0: scsi scan: INQUIRY successful with code 0x0
[ 15.442317] scsi 2:0:0:0: Direct-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7
[ 15.442554] scsi 2:0:0:0: Power-on or device reset occurred
[ 15.442560] scsi 2:0:0:0: tag#50 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 15.442565] scsi 2:0:0:0: tag#50 CDB: Report supported operation codes a3 0c 01 88 00 00 00 00 00 14 00 00
[ 15.442569] scsi 2:0:0:0: tag#50 Sense Key : Unit Attention [current]
[ 15.442573] scsi 2:0:0:0: tag#50 Add. Sense: Power on occurred
The bind should happend around here somewhere I think.
[ 15.472680] sd 2:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
[ 15.472703] sd 2:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
[ 15.472706] sd 2:0:0:0: scsi scan: REPORT LUN scan
[ 15.472709] sd 2:0:0:0: scsi scan: device exists on 2:0:0:0
[ 15.492874] sd 2:0:0:0: [sdi] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
[ 15.502853] sd 2:0:0:0: [sdi] Write Protect is off
[ 15.502856] sd 2:0:0:0: [sdi] Mode Sense: 73 00 10 08
[ 15.522819] sd 2:0:0:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 15.552773] sd 2:0:0:0: [sdi] Preferred minimum I/O size 512 bytes
[ 15.552776] sd 2:0:0:0: [sdi] Optimal transfer size 524288 bytes
[ 15.575373] sd 2:0:0:0: [sdi] tag#62 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[ 15.575377] sd 2:0:0:0: [sdi] tag#62 CDB: Inquiry 12 01 b9 00 04 00
[ 15.575380] sd 2:0:0:0: [sdi] tag#62 Sense Key : Illegal Request [current]
[ 15.575383] sd 2:0:0:0: [sdi] tag#62 Add. Sense: Invalid field in cdb
[ 15.645749] sd 2:0:0:0: [sdi] Attached SCSI disk
But we don't even see the `sg_alloc: dev=...` message that is logged in
`sg_alloc()`. And between the change above and the call to `sg_alloc()`,
there is only the character device allocation; and if that failed, it
would print an error. So either the bind is never even tried, or the new
`blk_get_queue()` fails to get a reference.
Which is odd, since the only way that would happen is, if the queue
was marked dying; but we see that the stack is using it for LUN probing
in `sd`.
> This change (bisected) triggers a regression in our KVM on s390x CI. The
> symptom is that a “scsi_debug device” does not bind to the scsi_generic
> driver. On s390x you can reproduce the problem as follows (I have not
> tested on x86):
>
> With this patch applied:
>
> $ sudo modprobe scsi_debug
One more thing maybe worth mentioning: in the kernel configuration we
use in the CI we have `sg` built-in. I guess most have it built as
module.
> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the scsi_debug device
> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
> [0:0:0:0]
> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No such file or directory
>
>
> Patch reverted:
>
> $ sudo modprobe scsi_debug
> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
> [0:0:0:0]
> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
> File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
> Size: 0 Blocks: 0 IO Block: 4096 directory
> Device: 0,20 Inode: 12155 Links: 3
> …
That's all I got from looking at it earlier, so far.
--
Best Regards, Benjamin Block / Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH / https://www.ibm.com/privacy
Vors. Aufs.-R.: Gregor Pillen / Geschäftsführung: David Faller
Sitz der Ges.: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-07-04 17:04 ` Marc Hartmayer
2023-07-04 18:51 ` Benjamin Block
@ 2023-07-05 1:43 ` Yu Kuai
2023-07-05 2:28 ` Yu Kuai
1 sibling, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2023-07-05 1:43 UTC (permalink / raw)
To: Marc Hartmayer, Yu Kuai
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun, hch, chaitanyak,
shinichiro.kawasaki, dgilbert, jejb, martin.petersen, axboe,
Benjamin Block, yukuai (C)
Hi,
在 2023/07/05 1:04, Marc Hartmayer 写道:
> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>> From: Yu Kuai <yukuai3@huawei.com>
>>
>> In order to prevent request_queue to be freed before cleaning up
>> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
>> debugfs entries leakage") use scsi_device_get(), however,
>> scsi_device_get() will also grab scsi module reference and scsi module
>> can't be removed.
>>
>> It's reported that blktests can't unload scsi_debug after block/001:
>>
>> blktests (master) # ./check block
>> block/001 (stress device hotplugging) [failed]
>> +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
>> Running block/001
>> Stressing sd
>> +modprobe: FATAL: Module scsi_debug is in use.
>>
>> Fix this problem by grabbing request_queue reference directly, so that
>> scsi host module can still be unloaded while request_queue will be
>> pinged by sg device.
>>
>> Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
>> Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
>> Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>> ---
>> drivers/scsi/sg.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>> index 2433eeef042a..dcb73787c29d 100644
>> --- a/drivers/scsi/sg.c
>> +++ b/drivers/scsi/sg.c
>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>> int error;
>> unsigned long iflags;
>>
>> - error = scsi_device_get(scsidp);
>> + error = blk_get_queue(scsidp->request_queue);
>> if (error)
>> return error;
>>
>> @@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
>> out:
>> if (cdev)
>> cdev_del(cdev);
>> - scsi_device_put(scsidp);
>> + blk_put_queue(scsidp->request_queue);
>> return error;
>> }
>>
>> @@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
>> */
>>
>> blk_trace_remove(q);
>> - scsi_device_put(sdp->device);
>> + blk_put_queue(q);
>>
>> write_lock_irqsave(&sg_index_lock, flags);
>> idr_remove(&sg_index_idr, sdp->index);
>> --
>> 2.39.2
>
> Hi,
>
> This change (bisected) triggers a regression in our KVM on s390x CI. The
> symptom is that a “scsi_debug device” does not bind to the scsi_generic
> driver. On s390x you can reproduce the problem as follows (I have not
> tested on x86):
>
> With this patch applied:
>
> $ sudo modprobe scsi_debug
> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the scsi_debug device
> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
> [0:0:0:0]
> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No such file or directory
>
>
> Patch reverted:
>
I didn't figure out the root cause, howver, have you tried to reviert
this patch as well?
db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage"
Thanks,
Kuai
> $ sudo modprobe scsi_debug
> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
> [0:0:0:0]
> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
> File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
> Size: 0 Blocks: 0 IO Block: 4096 directory
> Device: 0,20 Inode: 12155 Links: 3
> …
>
> Any ideas?
>
> Marc
> .
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-07-04 18:51 ` Benjamin Block
@ 2023-07-05 2:16 ` Yu Kuai
2023-07-05 2:26 ` Yu Kuai
0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2023-07-05 2:16 UTC (permalink / raw)
To: Benjamin Block, Marc Hartmayer
Cc: Yu Kuai, linux-scsi, linux-kernel, yi.zhang, yangerkun, hch,
chaitanyak, shinichiro.kawasaki, dgilbert, jejb, martin.petersen,
axboe, yukuai (C)
Hi,
在 2023/07/05 2:51, Benjamin Block 写道:
> On Tue, Jul 04, 2023 at 07:04:00PM +0200, Marc Hartmayer wrote:
>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>> index 2433eeef042a..dcb73787c29d 100644
>>> --- a/drivers/scsi/sg.c
>>> +++ b/drivers/scsi/sg.c
>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>> int error;
>>> unsigned long iflags;
>>>
>>> - error = scsi_device_get(scsidp);
>>> + error = blk_get_queue(scsidp->request_queue);
>>> if (error)
>>> return error;
>>>
>
> Might be interesting as well. Marc showed me a `dmesg` snipped earlier
> from when the bind fails:
>
> [ 15.441817] scsi host2: scsi_eh_2: sleeping
> [ 15.441899] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
> [ 15.441907] scsi host2: scsi_debug: version 0191 [20210520]
> dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
> [ 15.442078] scsi host2: scsi_scan_host_selected: <4294967295:4294967295:18446744073709551615>
> [ 15.442267] scsi 2:0:0:0: scsi scan: INQUIRY pass 1 length 36
> [ 15.442286] scsi 2:0:0:0: scsi scan: INQUIRY successful with code 0x0
> [ 15.442296] scsi 2:0:0:0: scsi scan: INQUIRY pass 2 length 96
> [ 15.442308] scsi 2:0:0:0: scsi scan: INQUIRY successful with code 0x0
> [ 15.442317] scsi 2:0:0:0: Direct-Access Linux scsi_debug 0191 PQ: 0 ANSI: 7
> [ 15.442554] scsi 2:0:0:0: Power-on or device reset occurred
> [ 15.442560] scsi 2:0:0:0: tag#50 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> [ 15.442565] scsi 2:0:0:0: tag#50 CDB: Report supported operation codes a3 0c 01 88 00 00 00 00 00 14 00 00
> [ 15.442569] scsi 2:0:0:0: tag#50 Sense Key : Unit Attention [current]
> [ 15.442573] scsi 2:0:0:0: tag#50 Add. Sense: Power on occurred
>
> The bind should happend around here somewhere I think.
>
> [ 15.472680] sd 2:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
> [ 15.472703] sd 2:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
> [ 15.472706] sd 2:0:0:0: scsi scan: REPORT LUN scan
> [ 15.472709] sd 2:0:0:0: scsi scan: device exists on 2:0:0:0
> [ 15.492874] sd 2:0:0:0: [sdi] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
> [ 15.502853] sd 2:0:0:0: [sdi] Write Protect is off
> [ 15.502856] sd 2:0:0:0: [sdi] Mode Sense: 73 00 10 08
> [ 15.522819] sd 2:0:0:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
> [ 15.552773] sd 2:0:0:0: [sdi] Preferred minimum I/O size 512 bytes
> [ 15.552776] sd 2:0:0:0: [sdi] Optimal transfer size 524288 bytes
> [ 15.575373] sd 2:0:0:0: [sdi] tag#62 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
> [ 15.575377] sd 2:0:0:0: [sdi] tag#62 CDB: Inquiry 12 01 b9 00 04 00
> [ 15.575380] sd 2:0:0:0: [sdi] tag#62 Sense Key : Illegal Request [current]
> [ 15.575383] sd 2:0:0:0: [sdi] tag#62 Add. Sense: Invalid field in cdb
> [ 15.645749] sd 2:0:0:0: [sdi] Attached SCSI disk
>
> But we don't even see the `sg_alloc: dev=...` message that is logged in
> `sg_alloc()`. And between the change above and the call to `sg_alloc()`,
> there is only the character device allocation; and if that failed, it
> would print an error. So either the bind is never even tried, or the new
> `blk_get_queue()` fails to get a reference.
> Which is odd, since the only way that would happen is, if the queue
> was marked dying; but we see that the stack is using it for LUN probing
> in `sd`.
Yes, if scsi_device_get() works fine, but blk_get_queue() has problems,
it seems to me that sg_add_device() can be called with scsi_device queue
mark dying? This is odd, but I'm not sure if it's the case.
Thanks,
Kuai
>
>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>> driver. On s390x you can reproduce the problem as follows (I have not
>> tested on x86):
>>
>> With this patch applied:
>>
>> $ sudo modprobe scsi_debug
>
> One more thing maybe worth mentioning: in the kernel configuration we
> use in the CI we have `sg` built-in. I guess most have it built as
> module.
>
>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the scsi_debug device
>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>> [0:0:0:0]
>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No such file or directory
>>
>>
>> Patch reverted:
>>
>> $ sudo modprobe scsi_debug
>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>> [0:0:0:0]
>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> Size: 0 Blocks: 0 IO Block: 4096 directory
>> Device: 0,20 Inode: 12155 Links: 3
>> …
>
> That's all I got from looking at it earlier, so far.
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-07-05 2:16 ` Yu Kuai
@ 2023-07-05 2:26 ` Yu Kuai
0 siblings, 0 replies; 12+ messages in thread
From: Yu Kuai @ 2023-07-05 2:26 UTC (permalink / raw)
To: Yu Kuai, Benjamin Block, Marc Hartmayer
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun, hch, chaitanyak,
shinichiro.kawasaki, dgilbert, jejb, martin.petersen, axboe,
yukuai (C)
Hi,
在 2023/07/05 10:16, Yu Kuai 写道:
> Hi,
>
> 在 2023/07/05 2:51, Benjamin Block 写道:
>> On Tue, Jul 04, 2023 at 07:04:00PM +0200, Marc Hartmayer wrote:
>>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai
>>> <yukuai1@huaweicloud.com> wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>>> index 2433eeef042a..dcb73787c29d 100644
>>>> --- a/drivers/scsi/sg.c
>>>> +++ b/drivers/scsi/sg.c
>>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>>> int error;
>>>> unsigned long iflags;
>>>> - error = scsi_device_get(scsidp);
>>>> + error = blk_get_queue(scsidp->request_queue);
>>>> if (error)
>>>> return error;
>>
>> Might be interesting as well. Marc showed me a `dmesg` snipped earlier
>> from when the bind fails:
>>
>> [ 15.441817] scsi host2: scsi_eh_2: sleeping
>> [ 15.441899] scsi_debug:sdebug_driver_probe: scsi_debug: trim
>> poll_queues to 0. poll_q/nr_hw = (0/1)
>> [ 15.441907] scsi host2: scsi_debug: version 0191 [20210520]
>> dev_size_mb=8, opts=0x0, submit_queues=1,
>> statistics=0
>> [ 15.442078] scsi host2: scsi_scan_host_selected:
>> <4294967295:4294967295:18446744073709551615>
>> [ 15.442267] scsi 2:0:0:0: scsi scan: INQUIRY pass 1 length 36
>> [ 15.442286] scsi 2:0:0:0: scsi scan: INQUIRY successful with
>> code 0x0
>> [ 15.442296] scsi 2:0:0:0: scsi scan: INQUIRY pass 2 length 96
>> [ 15.442308] scsi 2:0:0:0: scsi scan: INQUIRY successful with
>> code 0x0
>> [ 15.442317] scsi 2:0:0:0: Direct-Access Linux
>> scsi_debug 0191 PQ: 0 ANSI: 7
>> [ 15.442554] scsi 2:0:0:0: Power-on or device reset occurred
>> [ 15.442560] scsi 2:0:0:0: tag#50 Done: SUCCESS Result:
>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>> [ 15.442565] scsi 2:0:0:0: tag#50 CDB: Report supported operation
>> codes a3 0c 01 88 00 00 00 00 00 14 00 00
>> [ 15.442569] scsi 2:0:0:0: tag#50 Sense Key : Unit Attention
>> [current]
>> [ 15.442573] scsi 2:0:0:0: tag#50 Add. Sense: Power on occurred
>>
>> The bind should happend around here somewhere I think.
>>
>> [ 15.472680] sd 2:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
>> [ 15.472703] sd 2:0:0:0: scsi scan: REPORT LUNS successful (try
>> 0) result 0x0
>> [ 15.472706] sd 2:0:0:0: scsi scan: REPORT LUN scan
>> [ 15.472709] sd 2:0:0:0: scsi scan: device exists on 2:0:0:0
>> [ 15.492874] sd 2:0:0:0: [sdi] 16384 512-byte logical blocks:
>> (8.39 MB/8.00 MiB)
>> [ 15.502853] sd 2:0:0:0: [sdi] Write Protect is off
>> [ 15.502856] sd 2:0:0:0: [sdi] Mode Sense: 73 00 10 08
>> [ 15.522819] sd 2:0:0:0: [sdi] Write cache: enabled, read cache:
>> enabled, supports DPO and FUA
>> [ 15.552773] sd 2:0:0:0: [sdi] Preferred minimum I/O size 512 bytes
>> [ 15.552776] sd 2:0:0:0: [sdi] Optimal transfer size 524288 bytes
>> [ 15.575373] sd 2:0:0:0: [sdi] tag#62 Done: SUCCESS Result:
>> hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
>> [ 15.575377] sd 2:0:0:0: [sdi] tag#62 CDB: Inquiry 12 01 b9 00 04 00
>> [ 15.575380] sd 2:0:0:0: [sdi] tag#62 Sense Key : Illegal Request
>> [current]
>> [ 15.575383] sd 2:0:0:0: [sdi] tag#62 Add. Sense: Invalid field
>> in cdb
>> [ 15.645749] sd 2:0:0:0: [sdi] Attached SCSI disk
>>
>> But we don't even see the `sg_alloc: dev=...` message that is logged in
>> `sg_alloc()`. And between the change above and the call to `sg_alloc()`,
>> there is only the character device allocation; and if that failed, it
>> would print an error. So either the bind is never even tried, or the new
>> `blk_get_queue()` fails to get a reference.
>> Which is odd, since the only way that would happen is, if the queue
>> was marked dying; but we see that the stack is using it for LUN probing
>> in `sd`.
>
> Yes, if scsi_device_get() works fine, but blk_get_queue() has problems,
> it seems to me that sg_add_device() can be called with scsi_device queue
> mark dying? This is odd, but I'm not sure if it's the case.
Sorry that I totally messed how blk_get_queue() is called, it returns
true on success and false on failure, not 0 on success and errno on
failure.
Sorry for all the troble.
Kuai
>
> Thanks,
> Kuai
>>
>>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>>> driver. On s390x you can reproduce the problem as follows (I have not
>>> tested on x86):
>>>
>>> With this patch applied:
>>>
>>> $ sudo modprobe scsi_debug
>>
>> One more thing maybe worth mentioning: in the kernel configuration we
>> use in the CI we have `sg` built-in. I guess most have it built as
>> module.
>>
>>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the
>>> scsi_debug device
>>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>>> [0:0:0:0]
>>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No
>>> such file or directory
>>>
>>>
>>> Patch reverted:
>>>
>>> $ sudo modprobe scsi_debug
>>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>>> [0:0:0:0]
>>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>> File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>> Size: 0 Blocks: 0 IO Block: 4096 directory
>>> Device: 0,20 Inode: 12155 Links: 3
>>> …
>>
>> That's all I got from looking at it earlier, so far.
>>
>
> .
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-07-05 1:43 ` Yu Kuai
@ 2023-07-05 2:28 ` Yu Kuai
2023-07-05 6:13 ` Marc Hartmayer
0 siblings, 1 reply; 12+ messages in thread
From: Yu Kuai @ 2023-07-05 2:28 UTC (permalink / raw)
To: Yu Kuai, Marc Hartmayer
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun, hch, chaitanyak,
shinichiro.kawasaki, dgilbert, jejb, martin.petersen, axboe,
Benjamin Block, yukuai (C)
Hi,
在 2023/07/05 9:43, Yu Kuai 写道:
> Hi,
>
> 在 2023/07/05 1:04, Marc Hartmayer 写道:
>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai
>> <yukuai1@huaweicloud.com> wrote:
>>> From: Yu Kuai <yukuai3@huawei.com>
>>>
>>> In order to prevent request_queue to be freed before cleaning up
>>> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
>>> debugfs entries leakage") use scsi_device_get(), however,
>>> scsi_device_get() will also grab scsi module reference and scsi module
>>> can't be removed.
>>>
>>> It's reported that blktests can't unload scsi_debug after block/001:
>>>
>>> blktests (master) # ./check block
>>> block/001 (stress device hotplugging) [failed]
>>> +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
>>> Running block/001
>>> Stressing sd
>>> +modprobe: FATAL: Module scsi_debug is in use.
>>>
>>> Fix this problem by grabbing request_queue reference directly, so that
>>> scsi host module can still be unloaded while request_queue will be
>>> pinged by sg device.
>>>
>>> Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
>>> Link:
>>> https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
>>>
>>> Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>> ---
>>> drivers/scsi/sg.c | 6 +++---
>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>> index 2433eeef042a..dcb73787c29d 100644
>>> --- a/drivers/scsi/sg.c
>>> +++ b/drivers/scsi/sg.c
>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>> int error;
>>> unsigned long iflags;
>>> - error = scsi_device_get(scsidp);
>>> + error = blk_get_queue(scsidp->request_queue);
>>> if (error)
>>> return error;
>>> @@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
>>> out:
>>> if (cdev)
>>> cdev_del(cdev);
>>> - scsi_device_put(scsidp);
>>> + blk_put_queue(scsidp->request_queue);
>>> return error;
>>> }
>>> @@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
>>> */
>>> blk_trace_remove(q);
>>> - scsi_device_put(sdp->device);
>>> + blk_put_queue(q);
>>> write_lock_irqsave(&sg_index_lock, flags);
>>> idr_remove(&sg_index_idr, sdp->index);
>>> --
>>> 2.39.2
>>
>> Hi,
>>
>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>> driver. On s390x you can reproduce the problem as follows (I have not
>> tested on x86):
>>
>> With this patch applied:
>>
>> $ sudo modprobe scsi_debug
>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the
>> scsi_debug device
>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>> [0:0:0:0]
>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No
>> such file or directory
>>
>>
>> Patch reverted:
>>
>
> I didn't figure out the root cause, howver, have you tried to reviert
> this patch as well?
>
> db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage"
Never mind this, root cause is that the checking of return value of
blk_get_queue() is wrong.
This shoud be fixed by following patch:
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 89fa046c7158..0d8afffd1683 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1497,9 +1497,10 @@ sg_add_device(struct device *cl_dev)
int error;
unsigned long iflags;
- error = blk_get_queue(scsidp->request_queue);
- if (error)
- return error;
+ if (!blk_get_queue(scsidp->request_queue)) {
+ pr_warn("%s: get scsi_device queue failed\n", __func__);
+ return -ENODEV;
+ }
error = -ENOMEM;
cdev = cdev_alloc();
>
> Thanks,
> Kuai
>> $ sudo modprobe scsi_debug
>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>> [0:0:0:0]
>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> File: /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>> Size: 0 Blocks: 0 IO Block: 4096 directory
>> Device: 0,20 Inode: 12155 Links: 3
>> …
>>
>> Any ideas?
>>
>> Marc
>> .
>>
>
> .
>
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] scsi/sg: don't grab scsi host module reference
2023-07-05 2:28 ` Yu Kuai
@ 2023-07-05 6:13 ` Marc Hartmayer
0 siblings, 0 replies; 12+ messages in thread
From: Marc Hartmayer @ 2023-07-05 6:13 UTC (permalink / raw)
To: Yu Kuai
Cc: linux-scsi, linux-kernel, yi.zhang, yangerkun, hch, chaitanyak,
shinichiro.kawasaki, dgilbert, jejb, martin.petersen, axboe,
Benjamin Block, yukuai (C)
On Wed, Jul 05, 2023 at 10:28 AM +0800, Yu Kuai <yukuai1@huaweicloud.com> wrote:
> Hi,
>
> 在 2023/07/05 9:43, Yu Kuai 写道:
>> Hi,
>>
>> 在 2023/07/05 1:04, Marc Hartmayer 写道:
>>> On Thu, Jun 22, 2023 at 12:01 AM +0800, Yu Kuai
>>> <yukuai1@huaweicloud.com> wrote:
>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>
>>>> In order to prevent request_queue to be freed before cleaning up
>>>> blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
>>>> debugfs entries leakage") use scsi_device_get(), however,
>>>> scsi_device_get() will also grab scsi module reference and scsi module
>>>> can't be removed.
>>>>
>>>> It's reported that blktests can't unload scsi_debug after block/001:
>>>>
>>>> blktests (master) # ./check block
>>>> block/001 (stress device hotplugging) [failed]
>>>> +++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
>>>> Running block/001
>>>> Stressing sd
>>>> +modprobe: FATAL: Module scsi_debug is in use.
>>>>
>>>> Fix this problem by grabbing request_queue reference directly, so that
>>>> scsi host module can still be unloaded while request_queue will be
>>>> pinged by sg device.
>>>>
>>>> Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
>>>> Link:
>>>> https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
>>>>
>>>> Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>> ---
>>>> drivers/scsi/sg.c | 6 +++---
>>>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
>>>> index 2433eeef042a..dcb73787c29d 100644
>>>> --- a/drivers/scsi/sg.c
>>>> +++ b/drivers/scsi/sg.c
>>>> @@ -1497,7 +1497,7 @@ sg_add_device(struct device *cl_dev)
>>>> int error;
>>>> unsigned long iflags;
>>>> - error = scsi_device_get(scsidp);
>>>> + error = blk_get_queue(scsidp->request_queue);
>>>> if (error)
>>>> return error;
>>>> @@ -1558,7 +1558,7 @@ sg_add_device(struct device *cl_dev)
>>>> out:
>>>> if (cdev)
>>>> cdev_del(cdev);
>>>> - scsi_device_put(scsidp);
>>>> + blk_put_queue(scsidp->request_queue);
>>>> return error;
>>>> }
>>>> @@ -1575,7 +1575,7 @@ sg_device_destroy(struct kref *kref)
>>>> */
>>>> blk_trace_remove(q);
>>>> - scsi_device_put(sdp->device);
>>>> + blk_put_queue(q);
>>>> write_lock_irqsave(&sg_index_lock, flags);
>>>> idr_remove(&sg_index_idr, sdp->index);
>>>> --
>>>> 2.39.2
>>>
>>> Hi,
>>>
>>> This change (bisected) triggers a regression in our KVM on s390x CI. The
>>> symptom is that a “scsi_debug device” does not bind to the scsi_generic
>>> driver. On s390x you can reproduce the problem as follows (I have not
>>> tested on x86):
>>>
>>> With this patch applied:
>>>
>>> $ sudo modprobe scsi_debug
>>> $ # Get the 'scsi_host,channel,target_number,LUN' tuple for the
>>> scsi_debug device
>>> $ lsscsi |grep scsi_debug |awk '{ print $1 }'
>>> [0:0:0:0]
>>> $ sudo stat /sys/bus/scsi/devices/0:0:0:0/scsi_generic
>>> stat: cannot statx '/sys/bus/scsi/devices/0:0:0:0/scsi_generic': No
>>> such file or directory
>>>
>>>
>>> Patch reverted:
>>>
>>
>> I didn't figure out the root cause, howver, have you tried to reviert
>> this patch as well?
>>
>> db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage"
>
> Never mind this, root cause is that the checking of return value of
> blk_get_queue() is wrong.
>
> This shoud be fixed by following patch:
>
> diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
> index 89fa046c7158..0d8afffd1683 100644
> --- a/drivers/scsi/sg.c
> +++ b/drivers/scsi/sg.c
> @@ -1497,9 +1497,10 @@ sg_add_device(struct device *cl_dev)
> int error;
> unsigned long iflags;
>
> - error = blk_get_queue(scsidp->request_queue);
> - if (error)
> - return error;
> + if (!blk_get_queue(scsidp->request_queue)) {
> + pr_warn("%s: get scsi_device queue failed\n", __func__);
> + return -ENODEV;
> + }
Hi Kuai,
I just tried your fix and it works - thanks.
Tested-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Marc
[…snip]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2023-07-05 6:14 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-21 16:01 [PATCH] scsi/sg: don't grab scsi host module reference Yu Kuai
2023-06-21 15:47 ` Bart Van Assche
2023-06-21 16:06 ` Bart Van Assche
2023-06-23 5:44 ` Christoph Hellwig
2023-06-23 14:28 ` Jens Axboe
2023-07-04 17:04 ` Marc Hartmayer
2023-07-04 18:51 ` Benjamin Block
2023-07-05 2:16 ` Yu Kuai
2023-07-05 2:26 ` Yu Kuai
2023-07-05 1:43 ` Yu Kuai
2023-07-05 2:28 ` Yu Kuai
2023-07-05 6:13 ` Marc Hartmayer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).