All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] scsi: libsas: abort all inflight requests when device is gone
@ 2023-03-28 11:15 Jason Yan
  2023-03-28 12:35 ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Yan @ 2023-03-28 11:15 UTC (permalink / raw)
  To: martin.petersen, jejb
  Cc: linux-scsi, hare, hch, bvanassche, jinpu.wang, damien.lemoal,
	john.g.garry, Jason Yan, Xingui Yang

When a disk is removed with inflight IO, the userspace application need
to wait for 30 senconds(depends on the timeout configuration) to getback
from the kernel. Xingui tried to fix this issue by aborting the ata link
for SATA devices[1]. However this approach left the SAS devices unresolved.

This patch try to fix this issue by aborting all inflight requests while
the device is gone. This is implemented by itering the tagset.

[1] https://lore.kernel.org/lkml/234e04db-7539-07e4-a6b8-c6b05f78193d@opensource.wdc.com/T/

Cc: Xingui Yang <yangxingui@huawei.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
---
 drivers/scsi/libsas/sas_discover.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
index 72fdb2e5d047..d2be67f348e0 100644
--- a/drivers/scsi/libsas/sas_discover.c
+++ b/drivers/scsi/libsas/sas_discover.c
@@ -360,8 +360,28 @@ static void sas_destruct_ports(struct asd_sas_port *port)
 	}
 }
 
+static bool sas_abort_cmd(struct request *req, void *data)
+{
+	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
+	struct domain_device *dev = data;
+
+	if (dev == cmd_to_domain_dev(cmd))
+		blk_abort_request(req);
+	return true;
+}
+
+static void sas_abort_domain_cmds(struct domain_device *dev)
+{
+	struct sas_ha_struct *sas_ha = dev->port->ha;
+	struct Scsi_Host *shost = sas_ha->core.shost;
+	blk_mq_tagset_busy_iter(&shost->tag_set, sas_abort_cmd, dev);
+}
+
 void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
 {
+	if (test_bit(SAS_DEV_GONE, &dev->state))
+		sas_abort_domain_cmds(dev);
+
 	if (!test_bit(SAS_DEV_DESTROY, &dev->state) &&
 	    !list_empty(&dev->disco_list_node)) {
 		/* this rphy never saw sas_rphy_add */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: libsas: abort all inflight requests when device is gone
  2023-03-28 11:15 [PATCH] scsi: libsas: abort all inflight requests when device is gone Jason Yan
@ 2023-03-28 12:35 ` John Garry
  2023-03-28 14:25   ` Jason Yan
  0 siblings, 1 reply; 4+ messages in thread
From: John Garry @ 2023-03-28 12:35 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, hare, hch, bvanassche, jinpu.wang, damien.lemoal,
	Xingui Yang

On 28/03/2023 12:15, Jason Yan wrote:
> When a disk is removed with inflight IO, the userspace application need
> to wait for 30 senconds(depends on the timeout configuration) to getback
> from the kernel. Xingui tried to fix this issue by aborting the ata link
> for SATA devices[1]. However this approach left the SAS devices unresolved.
> 
> This patch try to fix this issue by aborting all inflight requests while
> the device is gone. This is implemented by itering the tagset.
> 
> [1] https://lore.kernel.org/lkml/234e04db-7539-07e4-a6b8-c6b05f78193d@opensource.wdc.com/T/
> 
> Cc: Xingui Yang <yangxingui@huawei.com>
> Cc: John Garry <john.g.garry@oracle.com>
> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
> Cc: Hannes Reinecke <hare@suse.com>
> Signed-off-by: Jason Yan <yanaijie@huawei.com>
> ---
>   drivers/scsi/libsas/sas_discover.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/scsi/libsas/sas_discover.c b/drivers/scsi/libsas/sas_discover.c
> index 72fdb2e5d047..d2be67f348e0 100644
> --- a/drivers/scsi/libsas/sas_discover.c
> +++ b/drivers/scsi/libsas/sas_discover.c
> @@ -360,8 +360,28 @@ static void sas_destruct_ports(struct asd_sas_port *port)
>   	}
>   }
>   
> +static bool sas_abort_cmd(struct request *req, void *data)
> +{
> +	struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
> +	struct domain_device *dev = data;
> +
> +	if (dev == cmd_to_domain_dev(cmd))
> +		blk_abort_request(req);

I suppose that this is ok, but we're not dealing with libsas "internal" 
commands or libata internal commands, though. What about them?

I suppose my series here would help:

https://lore.kernel.org/linux-scsi/1666693096-180008-1-git-send-email-john.garry@huawei.com/

Along with Part II

> +	return true;
> +}
> +
> +static void sas_abort_domain_cmds(struct domain_device *dev)
> +{
> +	struct sas_ha_struct *sas_ha = dev->port->ha;
> +	struct Scsi_Host *shost = sas_ha->core.shost;
> +	blk_mq_tagset_busy_iter(&shost->tag_set, sas_abort_cmd, dev);

blk_mq_queue_tag_busy_iter() would be nicer to use here, but it's not 
exported - I am not advocating exporting it either. And we don't have 
direct access to the scsi device pointer (from which we can look up the 
request queue pointer), either.

> +}
> +
>   void sas_unregister_dev(struct asd_sas_port *port, struct domain_device *dev)
>   {
> +	if (test_bit(SAS_DEV_GONE, &dev->state))
> +		sas_abort_domain_cmds(dev);

This code is common to expanders. Should we really be calling this for 
an expander, even if it is harmless (as it does nothing currently)?

And we also seem to be calling this for rphy "which never saw 
sas_rphy_add" (see code comment, not shown below), which is 
questionable. Should we really do that?

> +
>   	if (!test_bit(SAS_DEV_DESTROY, &dev->state) &&
>   	    !list_empty(&dev->disco_list_node)) {
>   		/* this rphy never saw sas_rphy_add */


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: libsas: abort all inflight requests when device is gone
  2023-03-28 12:35 ` John Garry
@ 2023-03-28 14:25   ` Jason Yan
  2023-03-29  8:01     ` John Garry
  0 siblings, 1 reply; 4+ messages in thread
From: Jason Yan @ 2023-03-28 14:25 UTC (permalink / raw)
  To: John Garry, martin.petersen, jejb
  Cc: linux-scsi, hare, hch, bvanassche, jinpu.wang, damien.lemoal,
	Xingui Yang

On 2023/3/28 20:35, John Garry wrote:
> On 28/03/2023 12:15, Jason Yan wrote:
>> When a disk is removed with inflight IO, the userspace application need
>> to wait for 30 senconds(depends on the timeout configuration) to getback
>> from the kernel. Xingui tried to fix this issue by aborting the ata link
>> for SATA devices[1]. However this approach left the SAS devices 
>> unresolved.
>>
>> This patch try to fix this issue by aborting all inflight requests while
>> the device is gone. This is implemented by itering the tagset.
>>
>> [1] 
>> https://lore.kernel.org/lkml/234e04db-7539-07e4-a6b8-c6b05f78193d@opensource.wdc.com/T/ 
>>
>>
>> Cc: Xingui Yang <yangxingui@huawei.com>
>> Cc: John Garry <john.g.garry@oracle.com>
>> Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
>> Cc: Hannes Reinecke <hare@suse.com>
>> Signed-off-by: Jason Yan <yanaijie@huawei.com>
>> ---
>>   drivers/scsi/libsas/sas_discover.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/drivers/scsi/libsas/sas_discover.c 
>> b/drivers/scsi/libsas/sas_discover.c
>> index 72fdb2e5d047..d2be67f348e0 100644
>> --- a/drivers/scsi/libsas/sas_discover.c
>> +++ b/drivers/scsi/libsas/sas_discover.c
>> @@ -360,8 +360,28 @@ static void sas_destruct_ports(struct 
>> asd_sas_port *port)
>>       }
>>   }
>> +static bool sas_abort_cmd(struct request *req, void *data)
>> +{
>> +    struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
>> +    struct domain_device *dev = data;
>> +
>> +    if (dev == cmd_to_domain_dev(cmd))
>> +        blk_abort_request(req);
> 
> I suppose that this is ok, but we're not dealing with libsas "internal" 
> commands or libata internal commands, though. What about them?

Hi John,

Most of the time user space applications are not sensitive to libsas(or 
libata) "internal" commands. The applications only need the kernel 
response quickly on their "read" or "write" syscalls. This patch aborts 
all the application's IO and they will return immediately. We have 
tested this patch for more than 300 times. The "internal" commands may 
affects the EH process but in out test it does not affect the 
application's IO return.

> 
> I suppose my series here would help:
> 
> https://lore.kernel.org/linux-scsi/1666693096-180008-1-git-send-email-john.garry@huawei.com/ 

This is great. After your series we can manage "internal" commands more 
effectively, I think. And we can easily control all the commands 
including the "internal" commands.

> 
> 
> Along with Part II
> 
>> +    return true;
>> +}
>> +
>> +static void sas_abort_domain_cmds(struct domain_device *dev)
>> +{
>> +    struct sas_ha_struct *sas_ha = dev->port->ha;
>> +    struct Scsi_Host *shost = sas_ha->core.shost;
>> +    blk_mq_tagset_busy_iter(&shost->tag_set, sas_abort_cmd, dev);
> 
> blk_mq_queue_tag_busy_iter() would be nicer to use here, but it's not 
> exported - I am not advocating exporting it either. And we don't have 
> direct access to the scsi device pointer (from which we can look up the 
> request queue pointer), either.
> 
>> +}
>> +
>>   void sas_unregister_dev(struct asd_sas_port *port, struct 
>> domain_device *dev)
>>   {
>> +    if (test_bit(SAS_DEV_GONE, &dev->state))
>> +        sas_abort_domain_cmds(dev);
> 
> This code is common to expanders. Should we really be calling this for 
> an expander, even if it is harmless (as it does nothing currently)?

Yes, It does nothing now for expanders. I can filter out the expander 
devices in advance to avoid the wasting tag itering.

> 
> And we also seem to be calling this for rphy "which never saw 
> sas_rphy_add" (see code comment, not shown below), which is 
> questionable. Should we really do that?

This device has not been initialized completely if "never saw 
sas_rphy_add", and there will be no IO from the block layer. I think 
there is no real bug but we need to avoid to iter the tag set too.

Thanks,
Jason

> 
>> +
>>       if (!test_bit(SAS_DEV_DESTROY, &dev->state) &&
>>           !list_empty(&dev->disco_list_node)) {
>>           /* this rphy never saw sas_rphy_add */
> 
> 
> .

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] scsi: libsas: abort all inflight requests when device is gone
  2023-03-28 14:25   ` Jason Yan
@ 2023-03-29  8:01     ` John Garry
  0 siblings, 0 replies; 4+ messages in thread
From: John Garry @ 2023-03-29  8:01 UTC (permalink / raw)
  To: Jason Yan, martin.petersen, jejb
  Cc: linux-scsi, hare, hch, bvanassche, jinpu.wang, damien.lemoal,
	Xingui Yang

On 28/03/2023 15:25, Jason Yan wrote:
>>>
>>> +static bool sas_abort_cmd(struct request *req, void *data)

Maybe sas_abort_cmd_iter() would be a better name, but see comment on 
naming, below.

>>> +{
>>> +    struct scsi_cmnd *cmd = blk_mq_rq_to_pdu(req);
>>> +    struct domain_device *dev = data;
>>> +
>>> +    if (dev == cmd_to_domain_dev(cmd))
>>> +        blk_abort_request(req);
>>
>> I suppose that this is ok, but we're not dealing with libsas 
>> "internal" commands or libata internal commands, though. What about them?
> 
> Hi John,
> 
> Most of the time user space applications are not sensitive to libsas(or 
> libata) "internal" commands. The applications only need the kernel 
> response quickly on their "read" or "write" syscalls. This patch aborts 
> all the application's IO and they will return immediately. We have 
> tested this patch for more than 300 times. The "internal" commands may 
> affects the EH process but in out test it does not affect the 
> application's IO return.
> 
>>
>> I suppose my series here would help:
>>
>> https://urldefense.com/v3/__https://lore.kernel.org/linux-scsi/1666693096-180008-1-git-send-email-john.garry@huawei.com/__;!!ACWV5N9M2RV99hQ!KG7BcS-ahRI8lUq_H-dXiVqMuVRP3VUDz2q5XnAkQkUCPGuKRAOXQlo7UmuHnS3P5ibgF89UER7UUNO2hfmp$ 
> 
> This is great. After your series we can manage "internal" commands more 
> effectively, I think. And we can easily control all the commands 
> including the "internal" commands.
> 
>>
>>
>> Along with Part II
>>
>>> +    return true;
>>> +}
>>> +
>>> +static void sas_abort_domain_cmds(struct domain_device *dev)

Let's make it clear that it is for a specific domain device and only 
affects scsi_cmds, i.e. not libsas internal or libata internal, so maybe 
sas_abort_device_scsi_cmds() or sas_abort_device_rqs() or something like 
that.

Please also add some sort of comment to say that we just want EH to kick 
in quickly.

>>> +{
>>> +    struct sas_ha_struct *sas_ha = dev->port->ha;
>>> +    struct Scsi_Host *shost = sas_ha->core.shost;
>>> +    blk_mq_tagset_busy_iter(&shost->tag_set, sas_abort_cmd, dev);
>>
>> blk_mq_queue_tag_busy_iter() would be nicer to use here, but it's not 
>> exported - I am not advocating exporting it either. And we don't have 
>> direct access to the scsi device pointer (from which we can look up 
>> the request queue pointer), either.
>>
>>> +}
>>> +
>>>   void sas_unregister_dev(struct asd_sas_port *port, struct 
>>> domain_device *dev)
>>>   {
>>> +    if (test_bit(SAS_DEV_GONE, &dev->state))
>>> +        sas_abort_domain_cmds(dev);
>>
>> This code is common to expanders. Should we really be calling this for 
>> an expander, even if it is harmless (as it does nothing currently)?
> 
> Yes, It does nothing now for expanders. I can filter out the expander 
> devices in advance to avoid the wasting tag itering.

ok, maybe you can put the check (for an expander) in sas_abort_domain_cmds()

> 
>>
>> And we also seem to be calling this for rphy "which never saw 
>> sas_rphy_add" (see code comment, not shown below), which is 
>> questionable. Should we really do that?
> 
> This device has not been initialized completely if "never saw 
> sas_rphy_add", and there will be no IO from the block layer. I think 
> there is no real bug but we need to avoid to iter the tag set too.

Thanks,
John


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-29  8:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-28 11:15 [PATCH] scsi: libsas: abort all inflight requests when device is gone Jason Yan
2023-03-28 12:35 ` John Garry
2023-03-28 14:25   ` Jason Yan
2023-03-29  8:01     ` John Garry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.