All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20  9:04 ` Naohiro Aota
  0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20  9:04 UTC (permalink / raw)
  To: linux-scsi, target-devel
  Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota

If there is no corresponding scsi_device for a LUN,
tcm_loop_port_unlink() complains that it "Unable to locate struct
scsi_device for " the device and keep %tl_tpg_port_count as is. However,
such situation is legal when we delete a SCSI device using
/sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
missing SCSI device case here.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 3305b47fdf53..0942f3bd7eec 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
 
 	sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
 				se_lun->unpacked_lun);
-	if (!sd) {
-		pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
-		       0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
-		return;
+	if (sd) {
+		/*
+		 * Remove Linux/SCSI struct scsi_device by HCTL
+		 */
+		scsi_remove_device(sd);
+		scsi_device_put(sd);
+	} else {
+		pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
+			 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
 	}
-	/*
-	 * Remove Linux/SCSI struct scsi_device by HCTL
-	 */
-	scsi_remove_device(sd);
-	scsi_device_put(sd);
 
 	atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
 
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20  9:04 ` Naohiro Aota
  0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20  9:04 UTC (permalink / raw)
  To: linux-scsi, target-devel
  Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota

If there is no corresponding scsi_device for a LUN,
tcm_loop_port_unlink() complains that it "Unable to locate struct
scsi_device for " the device and keep %tl_tpg_port_count as is. However,
such situation is legal when we delete a SCSI device using
/sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
missing SCSI device case here.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 3305b47fdf53..0942f3bd7eec 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
 
 	sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
 				se_lun->unpacked_lun);
-	if (!sd) {
-		pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
-		       0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
-		return;
+	if (sd) {
+		/*
+		 * Remove Linux/SCSI struct scsi_device by HCTL
+		 */
+		scsi_remove_device(sd);
+		scsi_device_put(sd);
+	} else {
+		pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
+			 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
 	}
-	/*
-	 * Remove Linux/SCSI struct scsi_device by HCTL
-	 */
-	scsi_remove_device(sd);
-	scsi_device_put(sd);
 
 	atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] scsi: target/tcm_loop: update upper limit of LUN
  2019-08-20  9:04 ` Naohiro Aota
@ 2019-08-20  9:04   ` Naohiro Aota
  -1 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20  9:04 UTC (permalink / raw)
  To: linux-scsi, target-devel
  Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota

targetcli-fb (or its library: rtslib-fb) allows us to create LUN up to
65535. On the other hand, the kernel driver is limiting max_lun to 0.

This limitation causes actual problem when you delete a loopback device
(using /sys/class/scsi_device/${lun}/device/delete) and rescan it (using
/sys/class/scsi_host/host${h}/scan). You can delete the device, but
cannot rescan it because its LUN is larger than max_lun and so the scan
request results in -EINVAL error in scsi_scan_host_selected().

This commit fix the upper limit to a max possible value: U64_MAX.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 drivers/target/loopback/tcm_loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 0942f3bd7eec..50e93b487375 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -336,10 +336,10 @@ static int tcm_loop_driver_probe(struct device *dev)
 	 */
 	*((struct tcm_loop_hba **)sh->hostdata) = tl_hba;
 	/*
-	 * Setup single ID, Channel and LUN for now..
+	 * Setup single ID, and Channel for now..
 	 */
 	sh->max_id = 2;
-	sh->max_lun = 0;
+	sh->max_lun = U64_MAX;
 	sh->max_channel = 0;
 	sh->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
 
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] scsi: target/tcm_loop: update upper limit of LUN
@ 2019-08-20  9:04   ` Naohiro Aota
  0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-20  9:04 UTC (permalink / raw)
  To: linux-scsi, target-devel
  Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen, Naohiro Aota

targetcli-fb (or its library: rtslib-fb) allows us to create LUN up to
65535. On the other hand, the kernel driver is limiting max_lun to 0.

This limitation causes actual problem when you delete a loopback device
(using /sys/class/scsi_device/${lun}/device/delete) and rescan it (using
/sys/class/scsi_host/host${h}/scan). You can delete the device, but
cannot rescan it because its LUN is larger than max_lun and so the scan
request results in -EINVAL error in scsi_scan_host_selected().

This commit fix the upper limit to a max possible value: U64_MAX.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
---
 drivers/target/loopback/tcm_loop.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
index 0942f3bd7eec..50e93b487375 100644
--- a/drivers/target/loopback/tcm_loop.c
+++ b/drivers/target/loopback/tcm_loop.c
@@ -336,10 +336,10 @@ static int tcm_loop_driver_probe(struct device *dev)
 	 */
 	*((struct tcm_loop_hba **)sh->hostdata) = tl_hba;
 	/*
-	 * Setup single ID, Channel and LUN for now..
+	 * Setup single ID, and Channel for now..
 	 */
 	sh->max_id = 2;
-	sh->max_lun = 0;
+	sh->max_lun = U64_MAX;
 	sh->max_channel = 0;
 	sh->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
  2019-08-20  9:04 ` Naohiro Aota
@ 2019-08-20 14:02   ` Bart Van Assche
  -1 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 14:02 UTC (permalink / raw)
  To: Naohiro Aota, linux-scsi, target-devel
  Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen

On 8/20/19 2:04 AM, Naohiro Aota wrote:
> If there is no corresponding scsi_device for a LUN,
> tcm_loop_port_unlink() complains that it "Unable to locate struct
> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
> such situation is legal when we delete a SCSI device using
> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
> missing SCSI device case here.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
>   1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
> index 3305b47fdf53..0942f3bd7eec 100644
> --- a/drivers/target/loopback/tcm_loop.c
> +++ b/drivers/target/loopback/tcm_loop.c
> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>   
>   	sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
>   				se_lun->unpacked_lun);
> -	if (!sd) {
> -		pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
> -		       0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
> -		return;
> +	if (sd) {
> +		/*
> +		 * Remove Linux/SCSI struct scsi_device by HCTL
> +		 */
> +		scsi_remove_device(sd);
> +		scsi_device_put(sd);
> +	} else {
> +		pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
> +			 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>   	}
> -	/*
> -	 * Remove Linux/SCSI struct scsi_device by HCTL
> -	 */
> -	scsi_remove_device(sd);
> -	scsi_device_put(sd);
>   
>   	atomic_dec_mb(&tl_tpg->tl_tpg_port_count);

The above patch looks wrong to me. I think this patch does not fix the 
reference leak present in the current code. Have you considered to 
modify tcm_loop_port_link() such that it saves the pointer returned by 
scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 14:02   ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 14:02 UTC (permalink / raw)
  To: Naohiro Aota, linux-scsi, target-devel
  Cc: Mike Christie, Nicholas Bellinger, Martin K . Petersen

On 8/20/19 2:04 AM, Naohiro Aota wrote:
> If there is no corresponding scsi_device for a LUN,
> tcm_loop_port_unlink() complains that it "Unable to locate struct
> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
> such situation is legal when we delete a SCSI device using
> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
> missing SCSI device case here.
> 
> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
> ---
>   drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
>   1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/target/loopback/tcm_loop.c b/drivers/target/loopback/tcm_loop.c
> index 3305b47fdf53..0942f3bd7eec 100644
> --- a/drivers/target/loopback/tcm_loop.c
> +++ b/drivers/target/loopback/tcm_loop.c
> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>   
>   	sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
>   				se_lun->unpacked_lun);
> -	if (!sd) {
> -		pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
> -		       0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
> -		return;
> +	if (sd) {
> +		/*
> +		 * Remove Linux/SCSI struct scsi_device by HCTL
> +		 */
> +		scsi_remove_device(sd);
> +		scsi_device_put(sd);
> +	} else {
> +		pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
> +			 0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>   	}
> -	/*
> -	 * Remove Linux/SCSI struct scsi_device by HCTL
> -	 */
> -	scsi_remove_device(sd);
> -	scsi_device_put(sd);
>   
>   	atomic_dec_mb(&tl_tpg->tl_tpg_port_count);

The above patch looks wrong to me. I think this patch does not fix the 
reference leak present in the current code. Have you considered to 
modify tcm_loop_port_link() such that it saves the pointer returned by 
scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?

Bart.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
  2019-08-20 14:02   ` Bart Van Assche
@ 2019-08-20 15:27     ` Mike Christie
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 15:27 UTC (permalink / raw)
  To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
  Cc: Nicholas Bellinger, Martin K . Petersen

On 08/20/2019 09:02 AM, Bart Van Assche wrote:
> On 8/20/19 2:04 AM, Naohiro Aota wrote:
>> If there is no corresponding scsi_device for a LUN,
>> tcm_loop_port_unlink() complains that it "Unable to locate struct
>> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
>> such situation is legal when we delete a SCSI device using
>> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
>> missing SCSI device case here.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>>   drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
>>   1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/target/loopback/tcm_loop.c
>> b/drivers/target/loopback/tcm_loop.c
>> index 3305b47fdf53..0942f3bd7eec 100644
>> --- a/drivers/target/loopback/tcm_loop.c
>> +++ b/drivers/target/loopback/tcm_loop.c
>> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>>         sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
>>                   se_lun->unpacked_lun);
>> -    if (!sd) {
>> -        pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> -               0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>> -        return;
>> +    if (sd) {
>> +        /*
>> +         * Remove Linux/SCSI struct scsi_device by HCTL
>> +         */
>> +        scsi_remove_device(sd);
>> +        scsi_device_put(sd);
>> +    } else {
>> +        pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> +             0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>>       }
>> -    /*
>> -     * Remove Linux/SCSI struct scsi_device by HCTL
>> -     */
>> -    scsi_remove_device(sd);
>> -    scsi_device_put(sd);
>>         atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
> 
> The above patch looks wrong to me. I think this patch does not fix the
> reference leak present in the current code. Have you considered to
> modify tcm_loop_port_link() such that it saves the pointer returned by
> scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?
> 

Are you guys talking about different issues?

tcm loop does not take a reference to the scsi_device at creation/link
time then need to release at removal/unlink time. The above
scsi_device_put is for the successful scsi_device_lookup call. tcm loop
works like a scsi host driver that does its own scanning via
scsi_add_device (maybe similar to scsi drivers that are raid cards).
Like other host drivers it does not take a reference to the device when
it is added and relies on scsi-ml to handle all that for it before doing
operations like queuecommand.

The leak is if you removed the scsi_device via the scsi ml sysfs
interface then there is no way to completely unlink the lio port because
if scsi_device_lookup fails we return from the function and do not do
not release our refcount on the tl_tpg_port_count.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 15:27     ` Mike Christie
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 15:27 UTC (permalink / raw)
  To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
  Cc: Nicholas Bellinger, Martin K . Petersen

On 08/20/2019 09:02 AM, Bart Van Assche wrote:
> On 8/20/19 2:04 AM, Naohiro Aota wrote:
>> If there is no corresponding scsi_device for a LUN,
>> tcm_loop_port_unlink() complains that it "Unable to locate struct
>> scsi_device for " the device and keep %tl_tpg_port_count as is. However,
>> such situation is legal when we delete a SCSI device using
>> /sys/class/scsi_device/${lun}/device/delete. We can safely ignore the
>> missing SCSI device case here.
>>
>> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
>> ---
>>   drivers/target/loopback/tcm_loop.c | 18 +++++++++---------
>>   1 file changed, 9 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/target/loopback/tcm_loop.c
>> b/drivers/target/loopback/tcm_loop.c
>> index 3305b47fdf53..0942f3bd7eec 100644
>> --- a/drivers/target/loopback/tcm_loop.c
>> +++ b/drivers/target/loopback/tcm_loop.c
>> @@ -654,16 +654,16 @@ static void tcm_loop_port_unlink(
>>         sd = scsi_device_lookup(tl_hba->sh, 0, tl_tpg->tl_tpgt,
>>                   se_lun->unpacked_lun);
>> -    if (!sd) {
>> -        pr_err("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> -               0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>> -        return;
>> +    if (sd) {
>> +        /*
>> +         * Remove Linux/SCSI struct scsi_device by HCTL
>> +         */
>> +        scsi_remove_device(sd);
>> +        scsi_device_put(sd);
>> +    } else {
>> +        pr_debug("Unable to locate struct scsi_device for %d:%d:%llu\n",
>> +             0, tl_tpg->tl_tpgt, se_lun->unpacked_lun);
>>       }
>> -    /*
>> -     * Remove Linux/SCSI struct scsi_device by HCTL
>> -     */
>> -    scsi_remove_device(sd);
>> -    scsi_device_put(sd);
>>         atomic_dec_mb(&tl_tpg->tl_tpg_port_count);
> 
> The above patch looks wrong to me. I think this patch does not fix the
> reference leak present in the current code. Have you considered to
> modify tcm_loop_port_link() such that it saves the pointer returned by
> scsi_add_device() and to use that pointer in tcm_loop_port_unlink()?
> 

Are you guys talking about different issues?

tcm loop does not take a reference to the scsi_device at creation/link
time then need to release at removal/unlink time. The above
scsi_device_put is for the successful scsi_device_lookup call. tcm loop
works like a scsi host driver that does its own scanning via
scsi_add_device (maybe similar to scsi drivers that are raid cards).
Like other host drivers it does not take a reference to the device when
it is added and relies on scsi-ml to handle all that for it before doing
operations like queuecommand.

The leak is if you removed the scsi_device via the scsi ml sysfs
interface then there is no way to completely unlink the lio port because
if scsi_device_lookup fails we return from the function and do not do
not release our refcount on the tl_tpg_port_count.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
  2019-08-20 15:27     ` Mike Christie
@ 2019-08-20 15:43       ` Bart Van Assche
  -1 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 15:43 UTC (permalink / raw)
  To: Mike Christie, Naohiro Aota, linux-scsi, target-devel
  Cc: Nicholas Bellinger, Martin K . Petersen

On 8/20/19 8:27 AM, Mike Christie wrote:
> tcm loop does not take a reference to the scsi_device at creation/link
> time then need to release at removal/unlink time. The above
> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
> works like a scsi host driver that does its own scanning via
> scsi_add_device (maybe similar to scsi drivers that are raid cards).
> Like other host drivers it does not take a reference to the device when
> it is added and relies on scsi-ml to handle all that for it before doing
> operations like queuecommand.
> 
> The leak is if you removed the scsi_device via the scsi ml sysfs
> interface then there is no way to completely unlink the lio port because
> if scsi_device_lookup fails we return from the function and do not do
> not release our refcount on the tl_tpg_port_count.

Hi Mike,

Does this mean that you think that this patch is the right way to 
address the reported issue?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 15:43       ` Bart Van Assche
  0 siblings, 0 replies; 14+ messages in thread
From: Bart Van Assche @ 2019-08-20 15:43 UTC (permalink / raw)
  To: Mike Christie, Naohiro Aota, linux-scsi, target-devel
  Cc: Nicholas Bellinger, Martin K . Petersen

On 8/20/19 8:27 AM, Mike Christie wrote:
> tcm loop does not take a reference to the scsi_device at creation/link
> time then need to release at removal/unlink time. The above
> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
> works like a scsi host driver that does its own scanning via
> scsi_add_device (maybe similar to scsi drivers that are raid cards).
> Like other host drivers it does not take a reference to the device when
> it is added and relies on scsi-ml to handle all that for it before doing
> operations like queuecommand.
> 
> The leak is if you removed the scsi_device via the scsi ml sysfs
> interface then there is no way to completely unlink the lio port because
> if scsi_device_lookup fails we return from the function and do not do
> not release our refcount on the tl_tpg_port_count.

Hi Mike,

Does this mean that you think that this patch is the right way to 
address the reported issue?

Thanks,

Bart.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
  2019-08-20 15:43       ` Bart Van Assche
@ 2019-08-20 17:19         ` Mike Christie
  -1 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 17:19 UTC (permalink / raw)
  To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
  Cc: Nicholas Bellinger, Martin K . Petersen

On 08/20/2019 10:43 AM, Bart Van Assche wrote:
> On 8/20/19 8:27 AM, Mike Christie wrote:
>> tcm loop does not take a reference to the scsi_device at creation/link
>> time then need to release at removal/unlink time. The above
>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>> works like a scsi host driver that does its own scanning via
>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>> Like other host drivers it does not take a reference to the device when
>> it is added and relies on scsi-ml to handle all that for it before doing
>> operations like queuecommand.
>>
>> The leak is if you removed the scsi_device via the scsi ml sysfs
>> interface then there is no way to completely unlink the lio port because
>> if scsi_device_lookup fails we return from the function and do not do
>> not release our refcount on the tl_tpg_port_count.
> 
> Hi Mike,
> 
> Does this mean that you think that this patch is the right way to
> address the reported issue?
> 

It fixes that very specific issue, but it leaves others. Maybe it could
be used in a patchset that builds on it?

I think we could hit issues like:

1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
2. userspace initiated scan commands were in progress and complete.
target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
waiting for those last IOs and now completes and deletes the mapped lun
from lio.
3. scsi-ml scan completed before the unmapping and so now we have a
scsi_device but no lio lun mapping.


The problem with this is that:

1. We can hit mismatched settings like this:

A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
everything got cleaned up ok, so they decide to map another lio lun.
B. tcm_loop_port_link just does a scsi_add_device which does
scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
returned. It is not reprobed so whatever settings the old device has
will be used. Maybe that scsi_device was for a disk and now the user was
adding a CD.

2. And hit races like:

A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
B. That removal can race with IO being sent to the scsi_device. If a
command has passed tcm_loop_submission_work's NULL tl_nexus check then
we will hit a NULL pointer crash later in the function.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-20 17:19         ` Mike Christie
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Christie @ 2019-08-20 17:19 UTC (permalink / raw)
  To: Bart Van Assche, Naohiro Aota, linux-scsi, target-devel
  Cc: Nicholas Bellinger, Martin K . Petersen

On 08/20/2019 10:43 AM, Bart Van Assche wrote:
> On 8/20/19 8:27 AM, Mike Christie wrote:
>> tcm loop does not take a reference to the scsi_device at creation/link
>> time then need to release at removal/unlink time. The above
>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>> works like a scsi host driver that does its own scanning via
>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>> Like other host drivers it does not take a reference to the device when
>> it is added and relies on scsi-ml to handle all that for it before doing
>> operations like queuecommand.
>>
>> The leak is if you removed the scsi_device via the scsi ml sysfs
>> interface then there is no way to completely unlink the lio port because
>> if scsi_device_lookup fails we return from the function and do not do
>> not release our refcount on the tl_tpg_port_count.
> 
> Hi Mike,
> 
> Does this mean that you think that this patch is the right way to
> address the reported issue?
> 

It fixes that very specific issue, but it leaves others. Maybe it could
be used in a patchset that builds on it?

I think we could hit issues like:

1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
2. userspace initiated scan commands were in progress and complete.
target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
waiting for those last IOs and now completes and deletes the mapped lun
from lio.
3. scsi-ml scan completed before the unmapping and so now we have a
scsi_device but no lio lun mapping.


The problem with this is that:

1. We can hit mismatched settings like this:

A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
everything got cleaned up ok, so they decide to map another lio lun.
B. tcm_loop_port_link just does a scsi_add_device which does
scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
returned. It is not reprobed so whatever settings the old device has
will be used. Maybe that scsi_device was for a disk and now the user was
adding a CD.

2. And hit races like:

A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
B. That removal can race with IO being sent to the scsi_device. If a
command has passed tcm_loop_submission_work's NULL tl_nexus check then
we will hit a NULL pointer crash later in the function.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
  2019-08-20 17:19         ` Mike Christie
@ 2019-08-22  6:51           ` Naohiro Aota
  -1 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-22  6:51 UTC (permalink / raw)
  To: Mike Christie, Bart Van Assche
  Cc: linux-scsi, target-devel, Nicholas Bellinger, Martin K . Petersen

On Tue, Aug 20, 2019 at 12:19:25PM -0500, Mike Christie wrote:
>CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.
>
>
>On 08/20/2019 10:43 AM, Bart Van Assche wrote:
>> On 8/20/19 8:27 AM, Mike Christie wrote:
>>> tcm loop does not take a reference to the scsi_device at creation/link
>>> time then need to release at removal/unlink time. The above
>>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>>> works like a scsi host driver that does its own scanning via
>>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>>> Like other host drivers it does not take a reference to the device when
>>> it is added and relies on scsi-ml to handle all that for it before doing
>>> operations like queuecommand.
>>>
>>> The leak is if you removed the scsi_device via the scsi ml sysfs
>>> interface then there is no way to completely unlink the lio port because
>>> if scsi_device_lookup fails we return from the function and do not do
>>> not release our refcount on the tl_tpg_port_count.
>>
>> Hi Mike,
>>
>> Does this mean that you think that this patch is the right way to
>> address the reported issue?
>>
>
>It fixes that very specific issue, but it leaves others. Maybe it could
>be used in a patchset that builds on it?
>
>I think we could hit issues like:
>
>1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
>2. userspace initiated scan commands were in progress and complete.
>target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
>waiting for those last IOs and now completes and deletes the mapped lun
>from lio.
>3. scsi-ml scan completed before the unmapping and so now we have a
>scsi_device but no lio lun mapping.

Right, this can happen. Actually, this can happen even without my
patch if the scan can occur between scsi_remove_device() in
tcm_loop_port_unlink() and core_tpg_remove_lun(), right?

I presumed we need to use some lock around here. I digged around the
code but cannot figure out a suitable lock here. Actually, I tried
(struct Scsi_Host)->scan_mutex, but it didn't work.

Or, we should block tcm_loop_queuecommand() to proceed the INQUIRY
commads on this LUN? move/introduce new hook after
transport_clear_lun_ref(lun)?

Any idea?

>The problem with this is that:
>
>1. We can hit mismatched settings like this:
>
>A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
>everything got cleaned up ok, so they decide to map another lio lun.
>B. tcm_loop_port_link just does a scsi_add_device which does
>scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
>returned. It is not reprobed so whatever settings the old device has
>will be used. Maybe that scsi_device was for a disk and now the user was
>adding a CD.
>
>2. And hit races like:
>
>A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
>B. That removal can race with IO being sent to the scsi_device. If a
>command has passed tcm_loop_submission_work's NULL tl_nexus check then
>we will hit a NULL pointer crash later in the function.
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device
@ 2019-08-22  6:51           ` Naohiro Aota
  0 siblings, 0 replies; 14+ messages in thread
From: Naohiro Aota @ 2019-08-22  6:51 UTC (permalink / raw)
  To: Mike Christie, Bart Van Assche
  Cc: linux-scsi, target-devel, Nicholas Bellinger, Martin K . Petersen

On Tue, Aug 20, 2019 at 12:19:25PM -0500, Mike Christie wrote:
>CAUTION: This email originated from outside of Western Digital. Do not click on links or open attachments unless you recognize the sender and know that the content is safe.
>
>
>On 08/20/2019 10:43 AM, Bart Van Assche wrote:
>> On 8/20/19 8:27 AM, Mike Christie wrote:
>>> tcm loop does not take a reference to the scsi_device at creation/link
>>> time then need to release at removal/unlink time. The above
>>> scsi_device_put is for the successful scsi_device_lookup call. tcm loop
>>> works like a scsi host driver that does its own scanning via
>>> scsi_add_device (maybe similar to scsi drivers that are raid cards).
>>> Like other host drivers it does not take a reference to the device when
>>> it is added and relies on scsi-ml to handle all that for it before doing
>>> operations like queuecommand.
>>>
>>> The leak is if you removed the scsi_device via the scsi ml sysfs
>>> interface then there is no way to completely unlink the lio port because
>>> if scsi_device_lookup fails we return from the function and do not do
>>> not release our refcount on the tl_tpg_port_count.
>>
>> Hi Mike,
>>
>> Does this mean that you think that this patch is the right way to
>> address the reported issue?
>>
>
>It fixes that very specific issue, but it leaves others. Maybe it could
>be used in a patchset that builds on it?
>
>I think we could hit issues like:
>
>1. tcm_loop_port_unlink runs and releases tl_tpg_port_count count.
>2. userspace initiated scan commands were in progress and complete.
>target_fabric_port_unlink->core_dev_del_lun->core_tpg_remove_lun was
>waiting for those last IOs and now completes and deletes the mapped lun
>from lio.
>3. scsi-ml scan completed before the unmapping and so now we have a
>scsi_device but no lio lun mapping.

Right, this can happen. Actually, this can happen even without my
patch if the scan can occur between scsi_remove_device() in
tcm_loop_port_unlink() and core_tpg_remove_lun(), right?

I presumed we need to use some lock around here. I digged around the
code but cannot figure out a suitable lock here. Actually, I tried
(struct Scsi_Host)->scan_mutex, but it didn't work.

Or, we should block tcm_loop_queuecommand() to proceed the INQUIRY
commads on this LUN? move/introduce new hook after
transport_clear_lun_ref(lun)?

Any idea?

>The problem with this is that:
>
>1. We can hit mismatched settings like this:
>
>A. We now have this scsi_device at LUN0, but no lio mapping. User thinks
>everything got cleaned up ok, so they decide to map another lio lun.
>B. tcm_loop_port_link just does a scsi_add_device which does
>scsi_probe_and_add_lun with rescan=true. So the original scsi_device is
>returned. It is not reprobed so whatever settings the old device has
>will be used. Maybe that scsi_device was for a disk and now the user was
>adding a CD.
>
>2. And hit races like:
>
>A. tl_tpg_port_count might now be zero so the tpg and nexus can be removed.
>B. That removal can race with IO being sent to the scsi_device. If a
>command has passed tcm_loop_submission_work's NULL tl_nexus check then
>we will hit a NULL pointer crash later in the function.
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-08-22  6:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-20  9:04 [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device Naohiro Aota
2019-08-20  9:04 ` Naohiro Aota
2019-08-20  9:04 ` [PATCH v2 2/2] scsi: target/tcm_loop: update upper limit of LUN Naohiro Aota
2019-08-20  9:04   ` Naohiro Aota
2019-08-20 14:02 ` [PATCH v2 1/2] scsi: target/tcm_loop: ignore already deleted scsi device Bart Van Assche
2019-08-20 14:02   ` Bart Van Assche
2019-08-20 15:27   ` Mike Christie
2019-08-20 15:27     ` Mike Christie
2019-08-20 15:43     ` Bart Van Assche
2019-08-20 15:43       ` Bart Van Assche
2019-08-20 17:19       ` Mike Christie
2019-08-20 17:19         ` Mike Christie
2019-08-22  6:51         ` Naohiro Aota
2019-08-22  6:51           ` Naohiro Aota

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.