Re: [PATCH v3 4/4] scsi: scsi_core: Fix IO hang when device removing

From: Wenchao Hao <haowenchao22@gmail.com>
To: Mike Christie <michael.christie@oracle.com>,
	Wenchao Hao <haowenchao2@huawei.com>,
	"James E . J . Bottomley" <jejb@linux.ibm.com>,
	"Martin K . Petersen" <martin.petersen@oracle.com>,
	linux-scsi@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 4/4] scsi: scsi_core: Fix IO hang when device removing
Date: Thu, 16 Nov 2023 00:24:54 +0800	[thread overview]
Message-ID: <b33a6052-572e-4e4b-b521-dd407c8b32b6@gmail.com> (raw)
In-Reply-To: <b00a9920-f16f-4187-9a7c-6083c5d98fb8@oracle.com>

On 11/15/23 5:47 AM, Mike Christie wrote:
> On 11/14/23 3:23 PM, Mike Christie wrote:
>> On 10/15/23 9:03 PM, Wenchao Hao wrote:
>>> shost_for_each_device() would skip devices which is in progress of
>>> removing, so scsi_run_queue() for these devices would be skipped in
>>> scsi_run_host_queues() after blocking hosts' IO.
>>>
>>> IO hang would be caused if return true when state is SDEV_CANCEL with
>>> following order:
>>>
>>> T1:					    T2:scsi_error_handler
>>> __scsi_remove_device()
>>>   scsi_device_set_state(sdev, SDEV_CANCEL)
>>>   ...
>>>   sd_remove()
>>>   del_gendisk()
>>>   blk_mq_freeze_queue_wait()
>>>   					    scsi_eh_flush_done_q()
>>> 					      scsi_queue_insert(scmd,...)
>>>
>>> scsi_queue_insert() would not kick device's queue since commit
>>> 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary")
>>>
>>> After scsi_unjam_host(), the scsi error handler would call
>>> scsi_run_host_queues() to trigger run queue for devices, while it
>>> would not run queue for devices which is in progress of removing
>>> because shost_for_each_device() would skip them.
>>>
>>> So the requests added to these queues would not be handled any more,
>>> and the removing device process would hang too.
>>>
>>> Fix this issue by using shost_for_each_device_include_deleted() in
>>> scsi_run_host_queues() to trigger a run queue for devices in removing.
>>>
>>> Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
>>> ---
>>>  drivers/scsi/scsi_lib.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>> index 195ca80667d0..40f407ffd26f 100644
>>> --- a/drivers/scsi/scsi_lib.c
>>> +++ b/drivers/scsi/scsi_lib.c
>>> @@ -466,7 +466,7 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
>>>  {
>>>  	struct scsi_device *sdev;
>>>  
>>> -	shost_for_each_device(sdev, shost)
>>> +	shost_for_each_device_include_deleted(sdev, shost)
>>>  		scsi_run_queue(sdev->request_queue);
>>
>> What happens if there were no commands for the device that
>> was destroyed and we race with this code and device deletion?
>>
>> So thread1 has set the device state tp SDEV_DEL and has finished
>> blk_mq_destroy_queue because there were no commands running.
>>
>> The above eh thread, then is calling:
>>
>> scsi_run_queue -> blk_mq_kick_requeue_list
>>
>> and that queues the requeue work.
>>
>> blk_mq_destroy_queue had done blk_mq_cancel_work_sync but
>> blk_mq_kick_requeue_list just added it back on the kblockd_workqueue.
>>
>> When __scsi_iterate_devices does scsi_device_put it would call
>> scsi_device_dev_release and call blk_put_queue which frees the
>> request_queue while it's requeue work might still be queued on
>> kblockd_workqueue.
>>

Hi Mike, thank you for the review.

Sorry I did not take the above flow into consideration and it's a bug
should be fixed in next version.

> 
> Oh yeah, for your other lun/target reset patches were you trying to
> do something where you have a list for each scsi_device or a list of
> scsi_devices that needed error handler work? If so, maybe break that
> part out and use it here first.
> 

The lun/target reset changes are not general for all drivers in my
design, so it should not work here.

> You can then just loop over the list of devices that needed work and
> start those above.

What about introduce a new flag "recovery" for each scsi_device to mark
if there is error command happened on it, the new flag is set in
scsi_eh_scmd_add() and cleared after error handle finished. 

Since clear is always after scsi_error_handle() is waked up and no more
scsi_eh_scmd_add() would be called after scsi_error_handle() is waked
up, we do not need lock between set and clear this flag.

This change can help me to fix the issue you described above too.

Here is a brief changes:

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c67cdcdc3ba8..36af294c2cef 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -310,6 +310,8 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;

+	scmd->device->recovery = 1;
+
 	scsi_eh_reset(scmd);
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	spin_unlock_irqrestore(shost->host_lock, flags);
@@ -2149,7 +2151,7 @@ static void scsi_restart_operations(struct Scsi_Host *shost)
 	 * now that error recovery is done, we will need to ensure that these
 	 * requests are started.
 	 */
-	scsi_run_host_queues(shost);
+	scsi_run_host_recovery_queues(shost);

 	/*
 	 * if eh is active and host_eh_scheduled is pending we need to re-run
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index cf3864f72093..0bf4423b6b9a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -470,6 +470,17 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
 		scsi_run_queue(sdev->request_queue);
 }

+void scsi_run_host_recovery_queues(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+
+	shost_for_each_device_include_deleted(sdev, shost)
+		if (sdev->recovery) {
+			scsi_run_queue(sdev->request_queue);
+			sdev->recovery = 0;
+		}
+}
+
 static void scsi_uninit_cmd(struct scsi_cmnd *cmd)
 {
 	if (!blk_rq_is_passthrough(scsi_cmd_to_rq(cmd))) {
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 3f0dfb97db6b..3aba8ddd0101 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -107,6 +107,7 @@ extern void scsi_device_unbusy(struct scsi_device *sdev, struct scsi_cmnd *cmd);
 extern void scsi_queue_insert(struct scsi_cmnd *cmd, int reason);
 extern void scsi_io_completion(struct scsi_cmnd *, unsigned int);
 extern void scsi_run_host_queues(struct Scsi_Host *shost);
+extern void scsi_run_host_recovery_queues(struct Scsi_Host *shost);
 extern void scsi_requeue_run_queue(struct work_struct *work);
 extern void scsi_start_queue(struct scsi_device *sdev);
 extern int scsi_mq_setup_tags(struct Scsi_Host *shost);
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 10480eb582b2..b730ceab9996 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -239,6 +239,7 @@ struct scsi_device {

 	unsigned cdl_supported:1;	/* Command duration limits supported */
 	unsigned cdl_enable:1;		/* Enable/disable Command duration limits */
+	unsigned recovery;		/* Mark it error command happened */

 	unsigned int queue_stopped;	/* request queue is quiesced */