* [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 9:04 ` Naohiro Aota
0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20 9:04 UTC (permalink / raw)
To: linux-scsi, target-devel
Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota
If there is no corresponding scsi_device for a LUN,
tcm_loop_port_unlink() complains that it "Unable to locate struct
scsi_device for " the device and keep %tl_tpg_port_count as is. However,
such situation is legal when we delete a SCSI device using
/sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
missing SCSI device case here.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 3305b47fdf53..0942f3bd7eec 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
se_lun->unpacked_lun);
- if (!sd) {
- pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
- 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
- return;
+ if (sd) {
+ /*
+ * Remove Linux/SCSI struct scsi_device by HCTL
+ */
+ scsi_remove_device(sd);
+ scsi_device_put(sd);
+ } else {
+ pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
+ 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
}
- /*
- * Remove Linux/SCSI struct scsi_device by HCTL
- */
- scsi_remove_device(sd);
- scsi_device_put(sd);
atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 9:04 ` Naohiro Aota
0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20 9:04 UTC (permalink / raw)
To: linux-scsi, target-devel
Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota
If there is no corresponding scsi_device for a LUN,
tcm_loop_port_unlink() complains that it "Unable to locate struct
scsi_device for " the device and keep %tl_tpg_port_count as is. However,
such situation is legal when we delete a SCSI device using
/sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
missing SCSI device case here.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 3305b47fdf53..0942f3bd7eec 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
se_lun->unpacked_lun);
- if (!sd) {
- pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
- 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
- return;
+ if (sd) {
+ /*
+ * Remove Linux/SCSI struct scsi_device by HCTL
+ */
+ scsi_remove_device(sd);
+ scsi_device_put(sd);
+ } else {
+ pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
+ 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
}
- /*
- * Remove Linux/SCSI struct scsi_device by HCTL
- */
- scsi_remove_device(sd);
- scsi_device_put(sd);
atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/2] scsi: target/tcm_loop: update upper limit of LUN
2019-08-20 9:04 ` Naohiro Aota
@ 2019-08-20 9:04 ` Naohiro Aota
-1 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20 9:04 UTC (permalink / raw)
To: linux-scsi, target-devel
Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota
targetcli-fb (or its library: rtslib-fb) allows us to create LUN up to
65535. On the other hand, the kernel driver is limiting max_lun to 0.
This limitation causes actual problem when you delete a loopback device
(using /sys/class/scsi_device/${lun}/device/delete) and rescan it (using
/sys/class/scsi_host/host${h}/scan). You can delete the device, but
cannot rescan it because its LUN is larger than max_lun and so the scan
request results in -EINVAL error in scsi_scan_host_selected().
This commit fix the upper limit to a max possible value: U64_MAX.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
drivers/target/loopback/tcm_loop.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 0942f3bd7eec..50e93b487375 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -336,10 +336,10 @@ static int tcm_loop_driver_probe(struct device *dev)
*/
*((struct tcm_loop_hba **)sh->hostdata) = tl_hba;
/*
- * Setup single ID, Channel and LUN for now..
+ * Setup single ID, and Channel for now..
*/
sh->max_id = 2;
- sh->max_lun = 0;
+ sh->max_lun = U64_MAX;
sh->max_channel = 0;
sh->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 2/2] scsi: target/tcm_loop: update upper limit of LUN
@ 2019-08-20 9:04 ` Naohiro Aota
0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20 9:04 UTC (permalink / raw)
To: linux-scsi, target-devel
Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota
targetcli-fb (or its library: rtslib-fb) allows us to create LUN up to
65535. On the other hand, the kernel driver is limiting max_lun to 0.
This limitation causes actual problem when you delete a loopback device
(using /sys/class/scsi_device/${lun}/device/delete) and rescan it (using
/sys/class/scsi_host/host${h}/scan). You can delete the device, but
cannot rescan it because its LUN is larger than max_lun and so the scan
request results in -EINVAL error in scsi_scan_host_selected().
This commit fix the upper limit to a max possible value: U64_MAX.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
drivers/target/loopback/tcm_loop.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 0942f3bd7eec..50e93b487375 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -336,10 +336,10 @@ static int tcm_loop_driver_probe(struct device *dev)
*/
*((struct tcm_loop_hba **)sh->hostdata) = tl_hba;
/*
- * Setup single ID, Channel and LUN for now..
+ * Setup single ID, and Channel for now..
*/
sh->max_id = 2;
- sh->max_lun = 0;
+ sh->max_lun = U64_MAX;
sh->max_channel = 0;
sh->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
--
2.23.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
2019-08-20 9:04 ` Naohiro Aota
@ 2019-08-20 14:02 ` Bart Van Assche
-1 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 14:02 UTC (permalink / raw)
To: Naohiro Aota, linux-scsi, target-devel
Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen
On 8/20/19 2:04 AM, Naohiro Aota wrote:
> If there is no corresponding scsi_device for a LUN,
> tcm_loop_port_unlink() complains that it "Unable to locate struct
> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
> such situation is legal when we delete a SCSI device using
> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
> missing SCSI device case here.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
> index 3305b47fdf53..0942f3bd7eec 100644
> --- a/drivers/target/loopback/tcm_loop.c
> +++ b/drivers/target/loopback/tcm_loop.c
> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>
> sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
> se_lun->unpacked_lun);
> - if (!sd) {
> - pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
> - 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
> - return;
> + if (sd) {
> + /*
> + * Remove Linux/SCSI struct scsi_device by HCTL
> + */
> + scsi_remove_device(sd);
> + scsi_device_put(sd);
> + } else {
> + pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
> + 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
> }
> - /*
> - * Remove Linux/SCSI struct scsi_device by HCTL
> - */
> - scsi_remove_device(sd);
> - scsi_device_put(sd);
>
> atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
The above patch looks wrong to me. I think this patch does not fix the
reference leak present in the current code. Have you considered to
modify tcm_loop_port_link() such that it saves the pointer returned by
scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 14:02 ` Bart Van Assche
0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 14:02 UTC (permalink / raw)
To: Naohiro Aota, linux-scsi, target-devel
Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen
On 8/20/19 2:04 AM, Naohiro Aota wrote:
> If there is no corresponding scsi_device for a LUN,
> tcm_loop_port_unlink() complains that it "Unable to locate struct
> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
> such situation is legal when we delete a SCSI device using
> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
> missing SCSI device case here.
>
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
> drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
> index 3305b47fdf53..0942f3bd7eec 100644
> --- a/drivers/target/loopback/tcm_loop.c
> +++ b/drivers/target/loopback/tcm_loop.c
> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>
> sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
> se_lun->unpacked_lun);
> - if (!sd) {
> - pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
> - 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
> - return;
> + if (sd) {
> + /*
> + * Remove Linux/SCSI struct scsi_device by HCTL
> + */
> + scsi_remove_device(sd);
> + scsi_device_put(sd);
> + } else {
> + pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
> + 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
> }
> - /*
> - * Remove Linux/SCSI struct scsi_device by HCTL
> - */
> - scsi_remove_device(sd);
> - scsi_device_put(sd);
>
> atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
The above patch looks wrong to me. I think this patch does not fix the
reference leak present in the current code. Have you considered to
modify tcm_loop_port_link() such that it saves the pointer returned by
scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
2019-08-20 14:02 ` Bart Van Assche
@ 2019-08-20 15:27 ` Mike Christie
-1 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 15:27 UTC (permalink / raw)
To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
Cc: Nicholas Bellinger, Martin K . Petersen
On 08/20/2019 09:02 AM, Bart Van Assche wrote:
> On 8/20/19 2:04 AM, Naohiro Aota wrote:
>> If there is no corresponding scsi_device for a LUN,
>> tcm_loop_port_unlink() complains that it "Unable to locate struct
>> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
>> such situation is legal when we delete a SCSI device using
>> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
>> missing SCSI device case here.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>> drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
>> 1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/target/loopback/tcm_loop.c
>> b/drivers/target/loopback/tcm_loop.c
>> index 3305b47fdf53..0942f3bd7eec 100644
>> --- a/drivers/target/loopback/tcm_loop.c
>> +++ b/drivers/target/loopback/tcm_loop.c
>> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>> sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
>> se_lun->unpacked_lun);
>> - if (!sd) {
>> - pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> - 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>> - return;
>> + if (sd) {
>> + /*
>> + * Remove Linux/SCSI struct scsi_device by HCTL
>> + */
>> + scsi_remove_device(sd);
>> + scsi_device_put(sd);
>> + } else {
>> + pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> + 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>> }
>> - /*
>> - * Remove Linux/SCSI struct scsi_device by HCTL
>> - */
>> - scsi_remove_device(sd);
>> - scsi_device_put(sd);
>> atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
>
> The above patch looks wrong to me. I think this patch does not fix the
> reference leak present in the current code. Have you considered to
> modify tcm_loop_port_link() such that it saves the pointer returned by
> scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?
>
Are you guys talking about different issues?
tcm loop does not take a reference to the scsi_device at creation/link
time then need to release at removal/unlink time. The above
scsi_device_put is for the successful scsi_device_lookup call. tcm loop
works like a scsi host driver that does its own scanning via
scsi_add_device (maybe similar to scsi drivers that are raid cards).
Like other host drivers it does not take a reference to the device when
it is added and relies on scsi-ml to handle all that for it before doing
operations like queuecommand.
The leak is if you removed the scsi_device via the scsi ml sysfs
interface then there is no way to completely unlink the lio port because
if scsi_device_lookup fails we return from the function and do not do
not release our refcount on the tl_tpg_port_count.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 15:27 ` Mike Christie
0 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 15:27 UTC (permalink / raw)
To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
Cc: Nicholas Bellinger, Martin K . Petersen
On 08/20/2019 09:02 AM, Bart Van Assche wrote:
> On 8/20/19 2:04 AM, Naohiro Aota wrote:
>> If there is no corresponding scsi_device for a LUN,
>> tcm_loop_port_unlink() complains that it "Unable to locate struct
>> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
>> such situation is legal when we delete a SCSI device using
>> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
>> missing SCSI device case here.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>> drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
>> 1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/target/loopback/tcm_loop.c
>> b/drivers/target/loopback/tcm_loop.c
>> index 3305b47fdf53..0942f3bd7eec 100644
>> --- a/drivers/target/loopback/tcm_loop.c
>> +++ b/drivers/target/loopback/tcm_loop.c
>> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>> sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
>> se_lun->unpacked_lun);
>> - if (!sd) {
>> - pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> - 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>> - return;
>> + if (sd) {
>> + /*
>> + * Remove Linux/SCSI struct scsi_device by HCTL
>> + */
>> + scsi_remove_device(sd);
>> + scsi_device_put(sd);
>> + } else {
>> + pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> + 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>> }
>> - /*
>> - * Remove Linux/SCSI struct scsi_device by HCTL
>> - */
>> - scsi_remove_device(sd);
>> - scsi_device_put(sd);
>> atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
>
> The above patch looks wrong to me. I think this patch does not fix the
> reference leak present in the current code. Have you considered to
> modify tcm_loop_port_link() such that it saves the pointer returned by
> scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?
>
Are you guys talking about different issues?
tcm loop does not take a reference to the scsi_device at creation/link
time then need to release at removal/unlink time. The above
scsi_device_put is for the successful scsi_device_lookup call. tcm loop
works like a scsi host driver that does its own scanning via
scsi_add_device (maybe similar to scsi drivers that are raid cards).
Like other host drivers it does not take a reference to the device when
it is added and relies on scsi-ml to handle all that for it before doing
operations like queuecommand.
The leak is if you removed the scsi_device via the scsi ml sysfs
interface then there is no way to completely unlink the lio port because
if scsi_device_lookup fails we return from the function and do not do
not release our refcount on the tl_tpg_port_count.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
2019-08-20 15:27 ` Mike Christie
@ 2019-08-20 15:43 ` Bart Van Assche
-1 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 15:43 UTC (permalink / raw)
To: Mike Christie, Naohiro Aota, linux-scsi, target-devel
Cc: Nicholas Bellinger, Martin K . Petersen
On 8/20/19 8:27 AM, Mike Christie wrote:
> tcm loop does not take a reference to the scsi_device at creation/link
> time then need to release at removal/unlink time. The above
> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
> works like a scsi host driver that does its own scanning via
> scsi_add_device (maybe similar to scsi drivers that are raid cards).
> Like other host drivers it does not take a reference to the device when
> it is added and relies on scsi-ml to handle all that for it before doing
> operations like queuecommand.
>
> The leak is if you removed the scsi_device via the scsi ml sysfs
> interface then there is no way to completely unlink the lio port because
> if scsi_device_lookup fails we return from the function and do not do
> not release our refcount on the tl_tpg_port_count.
Hi Mike,
Does this mean that you think that this patch is the right way to
address the reported issue?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 15:43 ` Bart Van Assche
0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 15:43 UTC (permalink / raw)
To: Mike Christie, Naohiro Aota, linux-scsi, target-devel
Cc: Nicholas Bellinger, Martin K . Petersen
On 8/20/19 8:27 AM, Mike Christie wrote:
> tcm loop does not take a reference to the scsi_device at creation/link
> time then need to release at removal/unlink time. The above
> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
> works like a scsi host driver that does its own scanning via
> scsi_add_device (maybe similar to scsi drivers that are raid cards).
> Like other host drivers it does not take a reference to the device when
> it is added and relies on scsi-ml to handle all that for it before doing
> operations like queuecommand.
>
> The leak is if you removed the scsi_device via the scsi ml sysfs
> interface then there is no way to completely unlink the lio port because
> if scsi_device_lookup fails we return from the function and do not do
> not release our refcount on the tl_tpg_port_count.
Hi Mike,
Does this mean that you think that this patch is the right way to
address the reported issue?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
2019-08-20 15:43 ` Bart Van Assche
@ 2019-08-20 17:19 ` Mike Christie
-1 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 17:19 UTC (permalink / raw)
To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
Cc: Nicholas Bellinger, Martin K . Petersen
On 08/20/2019 10:43 AM, Bart Van Assche wrote:
> On 8/20/19 8:27 AM, Mike Christie wrote:
>> tcm loop does not take a reference to the scsi_device at creation/link
>> time then need to release at removal/unlink time. The above
>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>> works like a scsi host driver that does its own scanning via
>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>> Like other host drivers it does not take a reference to the device when
>> it is added and relies on scsi-ml to handle all that for it before doing
>> operations like queuecommand.
>>
>> The leak is if you removed the scsi_device via the scsi ml sysfs
>> interface then there is no way to completely unlink the lio port because
>> if scsi_device_lookup fails we return from the function and do not do
>> not release our refcount on the tl_tpg_port_count.
>
> Hi Mike,
>
> Does this mean that you think that this patch is the right way to
> address the reported issue?
>
It fixes that very specific issue, but it leaves others. Maybe it could
be used in a patchset that builds on it?
I think we could hit issues like:
1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
2. userspace initiated scan commands were in progress and complete.
target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
waiting for those last IOs and now completes and deletes the mapped lun
from lio.
3. scsi-ml scan completed before the unmapping and so now we have a
scsi_device but no lio lun mapping.
The problem with this is that:
1. We can hit mismatched settings like this:
A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
everything got cleaned up ok, so they decide to map another lio lun.
B. tcm_loop_port_link just does a scsi_add_device which does
scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
returned. It is not reprobed so whatever settings the old device has
will be used. Maybe that scsi_device was for a disk and now the user was
adding a CD.
2. And hit races like:
A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
B. That removal can race with IO being sent to the scsi_device. If a
command has passed tcm_loop_submission_work's NULL tl_nexus check then
we will hit a NULL pointer crash later in the function.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 17:19 ` Mike Christie
0 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 17:19 UTC (permalink / raw)
To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
Cc: Nicholas Bellinger, Martin K . Petersen
On 08/20/2019 10:43 AM, Bart Van Assche wrote:
> On 8/20/19 8:27 AM, Mike Christie wrote:
>> tcm loop does not take a reference to the scsi_device at creation/link
>> time then need to release at removal/unlink time. The above
>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>> works like a scsi host driver that does its own scanning via
>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>> Like other host drivers it does not take a reference to the device when
>> it is added and relies on scsi-ml to handle all that for it before doing
>> operations like queuecommand.
>>
>> The leak is if you removed the scsi_device via the scsi ml sysfs
>> interface then there is no way to completely unlink the lio port because
>> if scsi_device_lookup fails we return from the function and do not do
>> not release our refcount on the tl_tpg_port_count.
>
> Hi Mike,
>
> Does this mean that you think that this patch is the right way to
> address the reported issue?
>
It fixes that very specific issue, but it leaves others. Maybe it could
be used in a patchset that builds on it?
I think we could hit issues like:
1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
2. userspace initiated scan commands were in progress and complete.
target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
waiting for those last IOs and now completes and deletes the mapped lun
from lio.
3. scsi-ml scan completed before the unmapping and so now we have a
scsi_device but no lio lun mapping.
The problem with this is that:
1. We can hit mismatched settings like this:
A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
everything got cleaned up ok, so they decide to map another lio lun.
B. tcm_loop_port_link just does a scsi_add_device which does
scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
returned. It is not reprobed so whatever settings the old device has
will be used. Maybe that scsi_device was for a disk and now the user was
adding a CD.
2. And hit races like:
A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
B. That removal can race with IO being sent to the scsi_device. If a
command has passed tcm_loop_submission_work's NULL tl_nexus check then
we will hit a NULL pointer crash later in the function.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
2019-08-20 17:19 ` Mike Christie
@ 2019-08-22 6:51 ` Naohiro Aota
-1 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-22 6:51 UTC (permalink / raw)
To: Mike Christie, Bart Van Assche
Cc: linux-scsi, target-devel, Nicholas Bellinger, Martin K . Petersen
On Tue, Aug 20, 2019 at 12:19:25PM -0500, Mike Christie wrote:
>CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.
>
>
>On 08/20/2019 10:43 AM, Bart Van Assche wrote:
>> On 8/20/19 8:27 AM, Mike Christie wrote:
>>> tcm loop does not take a reference to the scsi_device at creation/link
>>> time then need to release at removal/unlink time. The above
>>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>>> works like a scsi host driver that does its own scanning via
>>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>>> Like other host drivers it does not take a reference to the device when
>>> it is added and relies on scsi-ml to handle all that for it before doing
>>> operations like queuecommand.
>>>
>>> The leak is if you removed the scsi_device via the scsi ml sysfs
>>> interface then there is no way to completely unlink the lio port because
>>> if scsi_device_lookup fails we return from the function and do not do
>>> not release our refcount on the tl_tpg_port_count.
>>
>> Hi Mike,
>>
>> Does this mean that you think that this patch is the right way to
>> address the reported issue?
>>
>
>It fixes that very specific issue, but it leaves others. Maybe it could
>be used in a patchset that builds on it?
>
>I think we could hit issues like:
>
>1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
>2. userspace initiated scan commands were in progress and complete.
>target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
>waiting for those last IOs and now completes and deletes the mapped lun
>from lio.
>3. scsi-ml scan completed before the unmapping and so now we have a
>scsi_device but no lio lun mapping.
Right, this can happen. Actually, this can happen even without my
patch if the scan can occur between scsi_remove_device() in
tcm_loop_port_unlink() and core_tpg_remove_lun(), right?
I presumed we need to use some lock around here. I digged around the
code but cannot figure out a suitable lock here. Actually, I tried
(struct Scsi_Host)->scan_mutex, but it didn't work.
Or, we should block tcm_loop_queuecommand() to proceed the INQUIRY
commads on this LUN? move/introduce new hook after
transport_clear_lun_ref(lun)?
Any idea?
>The problem with this is that:
>
>1. We can hit mismatched settings like this:
>
>A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
>everything got cleaned up ok, so they decide to map another lio lun.
>B. tcm_loop_port_link just does a scsi_add_device which does
>scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
>returned. It is not reprobed so whatever settings the old device has
>will be used. Maybe that scsi_device was for a disk and now the user was
>adding a CD.
>
>2. And hit races like:
>
>A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
>B. That removal can race with IO being sent to the scsi_device. If a
>command has passed tcm_loop_submission_work's NULL tl_nexus check then
>we will hit a NULL pointer crash later in the function.
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-22 6:51 ` Naohiro Aota
0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-22 6:51 UTC (permalink / raw)
To: Mike Christie, Bart Van Assche
Cc: linux-scsi, target-devel, Nicholas Bellinger, Martin K . Petersen
On Tue, Aug 20, 2019 at 12:19:25PM -0500, Mike Christie wrote:
>CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.
>
>
>On 08/20/2019 10:43 AM, Bart Van Assche wrote:
>> On 8/20/19 8:27 AM, Mike Christie wrote:
>>> tcm loop does not take a reference to the scsi_device at creation/link
>>> time then need to release at removal/unlink time. The above
>>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>>> works like a scsi host driver that does its own scanning via
>>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>>> Like other host drivers it does not take a reference to the device when
>>> it is added and relies on scsi-ml to handle all that for it before doing
>>> operations like queuecommand.
>>>
>>> The leak is if you removed the scsi_device via the scsi ml sysfs
>>> interface then there is no way to completely unlink the lio port because
>>> if scsi_device_lookup fails we return from the function and do not do
>>> not release our refcount on the tl_tpg_port_count.
>>
>> Hi Mike,
>>
>> Does this mean that you think that this patch is the right way to
>> address the reported issue?
>>
>
>It fixes that very specific issue, but it leaves others. Maybe it could
>be used in a patchset that builds on it?
>
>I think we could hit issues like:
>
>1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
>2. userspace initiated scan commands were in progress and complete.
>target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
>waiting for those last IOs and now completes and deletes the mapped lun
>from lio.
>3. scsi-ml scan completed before the unmapping and so now we have a
>scsi_device but no lio lun mapping.
Right, this can happen. Actually, this can happen even without my
patch if the scan can occur between scsi_remove_device() in
tcm_loop_port_unlink() and core_tpg_remove_lun(), right?
I presumed we need to use some lock around here. I digged around the
code but cannot figure out a suitable lock here. Actually, I tried
(struct Scsi_Host)->scan_mutex, but it didn't work.
Or, we should block tcm_loop_queuecommand() to proceed the INQUIRY
commads on this LUN? move/introduce new hook after
transport_clear_lun_ref(lun)?
Any idea?
>The problem with this is that:
>
>1. We can hit mismatched settings like this:
>
>A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
>everything got cleaned up ok, so they decide to map another lio lun.
>B. tcm_loop_port_link just does a scsi_add_device which does
>scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
>returned. It is not reprobed so whatever settings the old device has
>will be used. Maybe that scsi_device was for a disk and now the user was
>adding a CD.
>
>2. And hit races like:
>
>A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
>B. That removal can race with IO being sent to the scsi_device. If a
>command has passed tcm_loop_submission_work's NULL tl_nexus check then
>we will hit a NULL pointer crash later in the function.
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2019-08-22 6:51 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20 9:04 [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device Naohiro Aota
2019-08-20 9:04 ` Naohiro Aota
2019-08-20 9:04 ` [PATCH v2 2/2] scsi: target/tcm_loop: update upper limit of LUN Naohiro Aota
2019-08-20 9:04 ` Naohiro Aota
2019-08-20 14:02 ` [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device Bart Van Assche
2019-08-20 14:02 ` Bart Van Assche
2019-08-20 15:27 ` Mike Christie
2019-08-20 15:27 ` Mike Christie
2019-08-20 15:43 ` Bart Van Assche
2019-08-20 15:43 ` Bart Van Assche
2019-08-20 17:19 ` Mike Christie
2019-08-20 17:19 ` Mike Christie
2019-08-22 6:51 ` Naohiro Aota
2019-08-22 6:51 ` Naohiro Aota
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.