* [PATCH 0/2] Fix a deadlock in the UFS error handler @ 2021-11-03 0:05 Bart Van Assche 2021-11-03 0:05 ` [PATCH 1/2] scsi: core: Add support for reserved tags Bart Van Assche 2021-11-03 0:05 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche 0 siblings, 2 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 0:05 UTC (permalink / raw) To: Martin K . Petersen; +Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche This patch series fixes a deadlock in the UFS error handler. I will repost this patch series after the merge window has closed. Bart Van Assche (2): scsi: core: Add support for reserved tags scsi: ufs: Fix a deadlock in the error handler drivers/scsi/scsi_lib.c | 4 +++- drivers/scsi/ufs/ufshcd.c | 20 ++++++-------------- include/scsi/scsi_host.h | 9 ++++++++- 3 files changed, 17 insertions(+), 16 deletions(-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-03 0:05 [PATCH 0/2] Fix a deadlock in the UFS error handler Bart Van Assche @ 2021-11-03 0:05 ` Bart Van Assche 2021-11-09 21:09 ` Bart Van Assche 2021-11-10 6:59 ` Hannes Reinecke 2021-11-03 0:05 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche 1 sibling, 2 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 0:05 UTC (permalink / raw) To: Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche, Hannes Reinecke, John Garry, James E.J. Bottomley Allow SCSI LLDs to allocate reserved tags by passing the BLK_MQ_REQ_RESERVED flag to blk_get_request(). Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/scsi_lib.c | 4 +++- include/scsi/scsi_host.h | 9 ++++++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index d0b7c6dc74f8..1cbade5fe990 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -1917,6 +1917,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) { unsigned int cmd_size, sgl_size; struct blk_mq_tag_set *tag_set = &shost->tag_set; + unsigned int reserved_tags = shost->hostt->reserved_tags; sgl_size = max_t(unsigned int, sizeof(struct scatterlist), scsi_mq_inline_sgl_size(shost)); @@ -1932,7 +1933,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) tag_set->ops = &scsi_mq_ops_no_commit; tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1; tag_set->nr_maps = shost->nr_maps ? : 1; - tag_set->queue_depth = shost->can_queue; + tag_set->queue_depth = shost->can_queue + reserved_tags; + tag_set->reserved_tags = reserved_tags; tag_set->cmd_size = cmd_size; tag_set->numa_node = NUMA_NO_NODE; tag_set->flags = BLK_MQ_F_SHOULD_MERGE; diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 97cdad14de56..5502fcb306db 100644 --- a/include/scsi/scsi_host.h +++ b/include/scsi/scsi_host.h @@ -367,10 +367,17 @@ struct scsi_host_template { /* * This determines if we will use a non-interrupt driven * or an interrupt driven scheme. It is set to the maximum number - * of simultaneous commands a single hw queue in HBA will accept. + * of simultaneous commands a single hw queue in HBA will accept. Does + * not include @reserved_tags. */ int can_queue; + /* + * Number of tags to reserve. A reserved tag can be allocated by passing + * the BLK_MQ_REQ_RESERVED flag to blk_get_request(). + */ + unsigned reserved_tags; + /* * In many instances, especially where disconnect / reconnect are * supported, our host also has an ID on the SCSI bus. If this is ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-03 0:05 ` [PATCH 1/2] scsi: core: Add support for reserved tags Bart Van Assche @ 2021-11-09 21:09 ` Bart Van Assche 2021-11-10 6:59 ` Hannes Reinecke 1 sibling, 0 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-09 21:09 UTC (permalink / raw) To: Hannes Reinecke, John Garry Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley On 11/2/21 5:05 PM, Bart Van Assche wrote: > Allow SCSI LLDs to allocate reserved tags by passing the BLK_MQ_REQ_RESERVED > flag to blk_get_request(). (replying to my own email) Hannes and John, please help with reviewing this patch. Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-03 0:05 ` [PATCH 1/2] scsi: core: Add support for reserved tags Bart Van Assche 2021-11-09 21:09 ` Bart Van Assche @ 2021-11-10 6:59 ` Hannes Reinecke 2021-11-10 19:19 ` Bart Van Assche 1 sibling, 1 reply; 23+ messages in thread From: Hannes Reinecke @ 2021-11-10 6:59 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, John Garry, James E.J. Bottomley On 11/3/21 1:05 AM, Bart Van Assche wrote: > Allow SCSI LLDs to allocate reserved tags by passing the BLK_MQ_REQ_RESERVED > flag to blk_get_request(). > > Cc: Hannes Reinecke <hare@suse.de> > Cc: John Garry <john.garry@huawei.com> > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/scsi_lib.c | 4 +++- > include/scsi/scsi_host.h | 9 ++++++++- > 2 files changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c > index d0b7c6dc74f8..1cbade5fe990 100644 > --- a/drivers/scsi/scsi_lib.c > +++ b/drivers/scsi/scsi_lib.c > @@ -1917,6 +1917,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) > { > unsigned int cmd_size, sgl_size; > struct blk_mq_tag_set *tag_set = &shost->tag_set; > + unsigned int reserved_tags = shost->hostt->reserved_tags; > > sgl_size = max_t(unsigned int, sizeof(struct scatterlist), > scsi_mq_inline_sgl_size(shost)); > @@ -1932,7 +1933,8 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost) > tag_set->ops = &scsi_mq_ops_no_commit; > tag_set->nr_hw_queues = shost->nr_hw_queues ? : 1; > tag_set->nr_maps = shost->nr_maps ? : 1; > - tag_set->queue_depth = shost->can_queue; > + tag_set->queue_depth = shost->can_queue + reserved_tags; > + tag_set->reserved_tags = reserved_tags; > tag_set->cmd_size = cmd_size; > tag_set->numa_node = NUMA_NO_NODE; > tag_set->flags = BLK_MQ_F_SHOULD_MERGE; > diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h > index 97cdad14de56..5502fcb306db 100644 > --- a/include/scsi/scsi_host.h > +++ b/include/scsi/scsi_host.h > @@ -367,10 +367,17 @@ struct scsi_host_template { > /* > * This determines if we will use a non-interrupt driven > * or an interrupt driven scheme. It is set to the maximum number > - * of simultaneous commands a single hw queue in HBA will accept. > + * of simultaneous commands a single hw queue in HBA will accept. Does > + * not include @reserved_tags. > */ > int can_queue; > > + /* > + * Number of tags to reserve. A reserved tag can be allocated by passing > + * the BLK_MQ_REQ_RESERVED flag to blk_get_request(). > + */ > + unsigned reserved_tags; > + > /* > * In many instances, especially where disconnect / reconnect are > * supported, our host also has an ID on the SCSI bus. If this is > This is essentially the same patch as I posted a while ago (cf https://lore.kernel.org/linux-scsi/20210222132405.91369-1- hare@suse.de/). But there had been push-back on that series as it would also try to use a scsi host device for driver-internal commands. Maybe we should combine our efforts; patch 2 is equivalent to your patch, and 3-5 are a conversion of the fnic driver to use those commands. So we should merge our efforts to get this thing off the ground. For the remainder of the patchset I'm currently working on a solution to address the upstream concerns. Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-10 6:59 ` Hannes Reinecke @ 2021-11-10 19:19 ` Bart Van Assche 2021-11-11 6:54 ` Hannes Reinecke 0 siblings, 1 reply; 23+ messages in thread From: Bart Van Assche @ 2021-11-10 19:19 UTC (permalink / raw) To: Hannes Reinecke, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, John Garry, James E.J. Bottomley On 11/9/21 10:59 PM, Hannes Reinecke wrote: > This is essentially the same patch as I posted a while ago (cf > https://lore.kernel.org/linux-scsi/20210222132405.91369-1- hare@suse.de/). > > But there had been push-back on that series as it would also try to use > a scsi host device for driver-internal commands. > > Maybe we should combine our efforts; patch 2 is equivalent to your > patch, and 3-5 are a conversion of the fnic driver to use those > commands. So we should merge our efforts to get this thing off the ground. > > For the remainder of the patchset I'm currently working on a solution to > address the upstream concerns. Hi Hannes, In the UFS driver we are using a request queue that is not associated with any SCSI device for allocating driver-internal tags from the same tags space as SCSI commands. Is this a solution that is generic enough to be re-used by other SCSI drivers? See also the output of git grep -nH 'hba->cmd_queue'. Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-10 19:19 ` Bart Van Assche @ 2021-11-11 6:54 ` Hannes Reinecke 2021-11-15 18:46 ` Bart Van Assche 2021-11-17 0:46 ` Bart Van Assche 0 siblings, 2 replies; 23+ messages in thread From: Hannes Reinecke @ 2021-11-11 6:54 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, John Garry, James E.J. Bottomley On 11/10/21 8:19 PM, Bart Van Assche wrote: > On 11/9/21 10:59 PM, Hannes Reinecke wrote: >> This is essentially the same patch as I posted a while ago (cf >> https://lore.kernel.org/linux-scsi/20210222132405.91369-1- >> hare@suse.de/). >> >> But there had been push-back on that series as it would also try to >> use a scsi host device for driver-internal commands. >> >> Maybe we should combine our efforts; patch 2 is equivalent to your >> patch, and 3-5 are a conversion of the fnic driver to use those >> commands. So we should merge our efforts to get this thing off the >> ground. >> >> For the remainder of the patchset I'm currently working on a solution >> to address the upstream concerns. > > Hi Hannes, > > In the UFS driver we are using a request queue that is not associated > with any SCSI device for allocating driver-internal tags from the same > tags space as SCSI commands. Is this a solution that is generic enough > to be re-used by other SCSI drivers? See also the output of git grep -nH > 'hba->cmd_queue'. > Ah. Even easier. I've made a prototype for this kind of operation in git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git branch scsi-internal.v1 That introduces a function 'scsi_host_get_internal_tag()' to retrieve a tag from the reserved pool of the host tagset. Would that work for you? Cheers, Hannes -- Dr. Hannes Reinecke Kernel Storage Architect hare@suse.de +49 911 74053 688 SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-11 6:54 ` Hannes Reinecke @ 2021-11-15 18:46 ` Bart Van Assche 2021-11-17 0:46 ` Bart Van Assche 1 sibling, 0 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-15 18:46 UTC (permalink / raw) To: Hannes Reinecke, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, John Garry, James E.J. Bottomley On 11/10/21 10:54 PM, Hannes Reinecke wrote: > On 11/10/21 8:19 PM, Bart Van Assche wrote: >> In the UFS driver we are using a request queue that is not associated >> with any SCSI device for allocating driver-internal tags from the same >> tags space as SCSI commands. Is this a solution that is generic enough >> to be re-used by other SCSI drivers? See also the output of git grep >> -nH 'hba->cmd_queue'. > > Ah. Even easier. > I've made a prototype for this kind of operation in > git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git > branch scsi-internal.v1 > > That introduces a function 'scsi_host_get_internal_tag()' > to retrieve a tag from the reserved pool of the host tagset. > Would that work for you? That works for me. Thanks for the feedback! Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 1/2] scsi: core: Add support for reserved tags 2021-11-11 6:54 ` Hannes Reinecke 2021-11-15 18:46 ` Bart Van Assche @ 2021-11-17 0:46 ` Bart Van Assche 1 sibling, 0 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-17 0:46 UTC (permalink / raw) To: Hannes Reinecke, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, John Garry, James E.J. Bottomley On 11/10/21 22:54, Hannes Reinecke wrote: > Ah. Even easier. > I've made a prototype for this kind of operation in > git.kernel.org/pub/scm/linux/kernel/git/hare/scsi-devel.git > branch scsi-internal.v1 > > That introduces a function 'scsi_host_get_internal_tag()' > to retrieve a tag from the reserved pool of the host tagset. > Would that work for you? Hi Hannes, Thanks for the pointer, that patch looks very interesting. However, in the UFS driver we need both a request pointer and a tag. Since converting a unique tag into a request pointer is much more work than converting a request or SCSI command pointer into a tag, I prefer functions that return a request pointer or SCSI command pointer. Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 0:05 [PATCH 0/2] Fix a deadlock in the UFS error handler Bart Van Assche 2021-11-03 0:05 ` [PATCH 1/2] scsi: core: Add support for reserved tags Bart Van Assche @ 2021-11-03 0:05 ` Bart Van Assche 2021-11-03 6:56 ` Adrian Hunter 2021-11-03 7:40 ` Christoph Hellwig 1 sibling, 2 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 0:05 UTC (permalink / raw) To: Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, Bart Van Assche, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Adrian Hunter, Asutosh Das The following deadlock has been observed on a test setup: * All tags allocated. * The SCSI error handler calls ufshcd_eh_host_reset_handler() * ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() * ufshcd_err_handler() locks up as follows: Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace: __switch_to+0x298/0x5d8 __schedule+0x6cc/0xa94 schedule+0x12c/0x298 blk_mq_get_tag+0x210/0x480 __blk_mq_alloc_request+0x1c8/0x284 blk_get_request+0x74/0x134 ufshcd_exec_dev_cmd+0x68/0x640 ufshcd_verify_dev_init+0x68/0x35c ufshcd_probe_hba+0x12c/0x1cb8 ufshcd_host_reset_and_restore+0x88/0x254 ufshcd_reset_and_restore+0xd0/0x354 ufshcd_err_handler+0x408/0xc58 process_one_work+0x24c/0x66c worker_thread+0x3e8/0xa4c kthread+0x150/0x1b4 ret_from_fork+0x10/0x30 Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request. Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/ufs/ufshcd.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 9d964b979aa2..6b0101169974 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -2904,12 +2904,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba, down_read(&hba->clk_scaling_lock); - /* - * Get free slot, sleep if slots are unavailable. - * Even though we use wait_event() which sleeps indefinitely, - * the maximum wait time is bounded by SCSI request timeout. - */ - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); + req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED); if (IS_ERR(req)) { err = PTR_ERR(req); goto out_unlock; @@ -4919,11 +4914,7 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev) */ static int ufshcd_change_queue_depth(struct scsi_device *sdev, int depth) { - struct ufs_hba *hba = shost_priv(sdev->host); - - if (depth > hba->nutrs) - depth = hba->nutrs; - return scsi_change_queue_depth(sdev, depth); + return scsi_change_queue_depth(sdev, min(depth, sdev->host->can_queue)); } static void ufshcd_hpb_destroy(struct ufs_hba *hba, struct scsi_device *sdev) @@ -8124,7 +8115,8 @@ static struct scsi_host_template ufshcd_driver_template = { .this_id = -1, .sg_tablesize = SG_ALL, .cmd_per_lun = UFSHCD_CMD_PER_LUN, - .can_queue = UFSHCD_CAN_QUEUE, + .can_queue = UFSHCD_CAN_QUEUE - 1, + .reserved_tags = 1, .max_segment_size = PRDT_DATA_BYTE_COUNT_MAX, .max_host_blocked = 1, .track_queue_depth = 1, @@ -9485,8 +9477,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) /* Configure LRB */ ufshcd_host_memory_configure(hba); - host->can_queue = hba->nutrs; - host->cmd_per_lun = hba->nutrs; + host->can_queue = hba->nutrs - 1; + host->cmd_per_lun = hba->nutrs - 1; host->max_id = UFSHCD_MAX_ID; host->max_lun = UFS_MAX_LUNS; host->max_channel = UFSHCD_MAX_CHANNEL; ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 0:05 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche @ 2021-11-03 6:56 ` Adrian Hunter 2021-11-03 13:40 ` Bart Van Assche 2021-11-07 11:07 ` Avri Altman 2021-11-03 7:40 ` Christoph Hellwig 1 sibling, 2 replies; 23+ messages in thread From: Adrian Hunter @ 2021-11-03 6:56 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das On 03/11/2021 02:05, Bart Van Assche wrote: > The following deadlock has been observed on a test setup: > * All tags allocated. > * The SCSI error handler calls ufshcd_eh_host_reset_handler() > * ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() > * ufshcd_err_handler() locks up as follows: > > Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt > Call trace: > __switch_to+0x298/0x5d8 > __schedule+0x6cc/0xa94 > schedule+0x12c/0x298 > blk_mq_get_tag+0x210/0x480 > __blk_mq_alloc_request+0x1c8/0x284 > blk_get_request+0x74/0x134 > ufshcd_exec_dev_cmd+0x68/0x640 > ufshcd_verify_dev_init+0x68/0x35c > ufshcd_probe_hba+0x12c/0x1cb8 > ufshcd_host_reset_and_restore+0x88/0x254 > ufshcd_reset_and_restore+0xd0/0x354 > ufshcd_err_handler+0x408/0xc58 > process_one_work+0x24c/0x66c > worker_thread+0x3e8/0xa4c > kthread+0x150/0x1b4 > ret_from_fork+0x10/0x30 > > Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved > request. It is worth noting that the error handler itself could always find a free slot, either by waiting for one, or by taking the reset path which clears all slots. However, the problem would then be places that cause the error handler to wait, like sysfs (due to hba->host_sem), exception event handler (due to cancel_work_sync(&hba->eeh_work)), or potentially any other dev cmd user (due to hba->dev_cmd.lock). Once the layering and locking is sorted out, it might be possible to get rid of the reserved tag, if there was a performance benefit. More to the point though, for the reasons above, you need to change the other dev cmd path also i.e. ufshcd_issue_devman_upiu_cmd() For UFS-specific patch sets please always cc me on all patches in a series including the cover letter. > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > --- > drivers/scsi/ufs/ufshcd.c | 20 ++++++-------------- > 1 file changed, 6 insertions(+), 14 deletions(-) > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c > index 9d964b979aa2..6b0101169974 100644 > --- a/drivers/scsi/ufs/ufshcd.c > +++ b/drivers/scsi/ufs/ufshcd.c > @@ -2904,12 +2904,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba, > > down_read(&hba->clk_scaling_lock); > > - /* > - * Get free slot, sleep if slots are unavailable. > - * Even though we use wait_event() which sleeps indefinitely, > - * the maximum wait time is bounded by SCSI request timeout. > - */ > - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); > + req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED); ufshcd_issue_devman_upiu_cmd() needs this also. > if (IS_ERR(req)) { > err = PTR_ERR(req); > goto out_unlock; > @@ -4919,11 +4914,7 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev) > */ > static int ufshcd_change_queue_depth(struct scsi_device *sdev, int depth) > { > - struct ufs_hba *hba = shost_priv(sdev->host); > - > - if (depth > hba->nutrs) > - depth = hba->nutrs; > - return scsi_change_queue_depth(sdev, depth); > + return scsi_change_queue_depth(sdev, min(depth, sdev->host->can_queue)); > } > > static void ufshcd_hpb_destroy(struct ufs_hba *hba, struct scsi_device *sdev) > @@ -8124,7 +8115,8 @@ static struct scsi_host_template ufshcd_driver_template = { > .this_id = -1, > .sg_tablesize = SG_ALL, > .cmd_per_lun = UFSHCD_CMD_PER_LUN, > - .can_queue = UFSHCD_CAN_QUEUE, > + .can_queue = UFSHCD_CAN_QUEUE - 1, > + .reserved_tags = 1, Number of reserved tags needs to be #define'd > .max_segment_size = PRDT_DATA_BYTE_COUNT_MAX, > .max_host_blocked = 1, > .track_queue_depth = 1, > @@ -9485,8 +9477,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) > /* Configure LRB */ > ufshcd_host_memory_configure(hba); > > - host->can_queue = hba->nutrs; > - host->cmd_per_lun = hba->nutrs; > + host->can_queue = hba->nutrs - 1; > + host->cmd_per_lun = hba->nutrs - 1; Number of reserved tags needs to be #define'd > host->max_id = UFSHCD_MAX_ID; > host->max_lun = UFS_MAX_LUNS; > host->max_channel = UFSHCD_MAX_CHANNEL; > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 6:56 ` Adrian Hunter @ 2021-11-03 13:40 ` Bart Van Assche 2021-11-03 14:03 ` Adrian Hunter 2021-11-07 11:07 ` Avri Altman 1 sibling, 1 reply; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 13:40 UTC (permalink / raw) To: Adrian Hunter, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das On 11/2/21 23:56, Adrian Hunter wrote: > On 03/11/2021 02:05, Bart Van Assche wrote: >> The following deadlock has been observed on a test setup: >> * All tags allocated. >> * The SCSI error handler calls ufshcd_eh_host_reset_handler() >> * ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() >> * ufshcd_err_handler() locks up as follows: >> >> Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt >> Call trace: >> __switch_to+0x298/0x5d8 >> __schedule+0x6cc/0xa94 >> schedule+0x12c/0x298 >> blk_mq_get_tag+0x210/0x480 >> __blk_mq_alloc_request+0x1c8/0x284 >> blk_get_request+0x74/0x134 >> ufshcd_exec_dev_cmd+0x68/0x640 >> ufshcd_verify_dev_init+0x68/0x35c >> ufshcd_probe_hba+0x12c/0x1cb8 >> ufshcd_host_reset_and_restore+0x88/0x254 >> ufshcd_reset_and_restore+0xd0/0x354 >> ufshcd_err_handler+0x408/0xc58 >> process_one_work+0x24c/0x66c >> worker_thread+0x3e8/0xa4c >> kthread+0x150/0x1b4 >> ret_from_fork+0x10/0x30 >> >> Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved >> request. > > It is worth noting that the error handler itself could always find > a free slot, either by waiting for one, or by taking the reset > path which clears all slots. I do not agree. As mentioned in the patch description, this patch is a fix for a scenario in which ufshcd_eh_host_reset_handler() waits until ufshcd_err_handler() finishes. ufshcd_err_handler() does not finish since there are no tags and no tags will be freed since that is the responsibility of ufshcd_eh_host_reset_handler() but it is blocked ... > For UFS-specific patch sets please always cc me on all patches > in a series including the cover letter. I will do that. Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 13:40 ` Bart Van Assche @ 2021-11-03 14:03 ` Adrian Hunter 0 siblings, 0 replies; 23+ messages in thread From: Adrian Hunter @ 2021-11-03 14:03 UTC (permalink / raw) To: Bart Van Assche, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das On 03/11/2021 15:40, Bart Van Assche wrote: > On 11/2/21 23:56, Adrian Hunter wrote: >> On 03/11/2021 02:05, Bart Van Assche wrote: >>> The following deadlock has been observed on a test setup: >>> * All tags allocated. >>> * The SCSI error handler calls ufshcd_eh_host_reset_handler() >>> * ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() >>> * ufshcd_err_handler() locks up as follows: >>> >>> Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt >>> Call trace: >>> __switch_to+0x298/0x5d8 >>> __schedule+0x6cc/0xa94 >>> schedule+0x12c/0x298 >>> blk_mq_get_tag+0x210/0x480 >>> __blk_mq_alloc_request+0x1c8/0x284 >>> blk_get_request+0x74/0x134 >>> ufshcd_exec_dev_cmd+0x68/0x640 >>> ufshcd_verify_dev_init+0x68/0x35c >>> ufshcd_probe_hba+0x12c/0x1cb8 >>> ufshcd_host_reset_and_restore+0x88/0x254 >>> ufshcd_reset_and_restore+0xd0/0x354 >>> ufshcd_err_handler+0x408/0xc58 >>> process_one_work+0x24c/0x66c >>> worker_thread+0x3e8/0xa4c >>> kthread+0x150/0x1b4 >>> ret_from_fork+0x10/0x30 >>> >>> Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved >>> request. >> >> It is worth noting that the error handler itself could always find >> a free slot, either by waiting for one, or by taking the reset >> path which clears all slots. > > I do not agree. As mentioned in the patch description, this patch is a fix for a scenario in which ufshcd_eh_host_reset_handler() waits until ufshcd_err_handler() finishes. ufshcd_err_handler() does not finish since there are no tags and no tags will be freed since that is the responsibility of ufshcd_eh_host_reset_handler() but it is blocked ... I am referring to the host controller slots, not block layer tags. The error handler does not need a free tag, it only needs a free slot. ^ permalink raw reply [flat|nested] 23+ messages in thread
* RE: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 6:56 ` Adrian Hunter 2021-11-03 13:40 ` Bart Van Assche @ 2021-11-07 11:07 ` Avri Altman 2021-11-08 18:26 ` Bart Van Assche 1 sibling, 1 reply; 23+ messages in thread From: Avri Altman @ 2021-11-07 11:07 UTC (permalink / raw) To: Adrian Hunter, Bart Van Assche, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das > > Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved > > request. > > It is worth noting that the error handler itself could always find > a free slot, either by waiting for one, or by taking the reset > path which clears all slots. > > However, the problem would then be places that cause the error > handler to wait, like sysfs (due to hba->host_sem), exception > event handler (due to cancel_work_sync(&hba->eeh_work)), or > potentially any other dev cmd user (due to hba->dev_cmd.lock). Instead of the reserved tag mechanism, since we are in reset-and-restore path, how about forcing one slot out if the doorbell is full? > > Once the layering and locking is sorted out, it might be possible > to get rid of the reserved tag, if there was a performance > benefit. > > More to the point though, for the reasons above, you need to > change the other dev cmd path also > i.e. ufshcd_issue_devman_upiu_cmd() Maybe we can just return ebusy if ufshcd_eh_in_progress()? Thanks, Avri ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-07 11:07 ` Avri Altman @ 2021-11-08 18:26 ` Bart Van Assche 0 siblings, 0 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-08 18:26 UTC (permalink / raw) To: Avri Altman, Adrian Hunter, Martin K . Petersen Cc: Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das On 11/7/21 3:07 AM, Avri Altman wrote: > Instead of the reserved tag mechanism, since we are in reset-and-restore path, > how about forcing one slot out if the doorbell is full? That would result in significantly more complex code. I prefer the reserved tag approach since it results in elegant code and since this approach does not affect performance significantly. Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 0:05 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche 2021-11-03 6:56 ` Adrian Hunter @ 2021-11-03 7:40 ` Christoph Hellwig 2021-11-03 8:37 ` Adrian Hunter 2021-11-03 13:45 ` Bart Van Assche 1 sibling, 2 replies; 23+ messages in thread From: Christoph Hellwig @ 2021-11-03 7:40 UTC (permalink / raw) To: Bart Van Assche Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Adrian Hunter, Asutosh Das, Hannes Reinecke, John Garry On Tue, Nov 02, 2021 at 05:05:29PM -0700, Bart Van Assche wrote: > - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); > + req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED); blk_get_request will be gone in 5.16-rc, so this won't apply. But more importantly: SCSI LLDDs have absolutel no business calling blk_get_request or blk_mq_alloc_request directly, but as usual UFS is completely fucked up here. Please add a SCSI midlayer helper to allocate the reserved tags, and switch _all_ of this UFS we're sending our own commands crap to it so it doesn't mix with the actual SCSI requests. We might or might not want a separate request_queue for them as well as non-SCSI requests really should not show up in ->queuecommand. Hannes and John have been looking into this for a while and we need to sort this out properly. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 7:40 ` Christoph Hellwig @ 2021-11-03 8:37 ` Adrian Hunter 2021-11-03 16:33 ` Christoph Hellwig 2021-11-03 13:45 ` Bart Van Assche 1 sibling, 1 reply; 23+ messages in thread From: Adrian Hunter @ 2021-11-03 8:37 UTC (permalink / raw) To: Christoph Hellwig, Bart Van Assche Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das, Hannes Reinecke, John Garry On 03/11/2021 09:40, Christoph Hellwig wrote: > On Tue, Nov 02, 2021 at 05:05:29PM -0700, Bart Van Assche wrote: >> - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); >> + req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED); > > blk_get_request will be gone in 5.16-rc, so this won't apply. > > But more importantly: SCSI LLDDs have absolutel no business calling > blk_get_request or blk_mq_alloc_request directly, but as usual UFS is > completely fucked up here. The UFS driver does not issue device commands to the block layer. blk_get_request() is used only to get a free slot. > Please add a SCSI midlayer helper to allocate the reserved tags, and > switch _all_ of this UFS we're sending our own commands crap to it > so it doesn't mix with the actual SCSI requests. It doesn't mix them. > We might or might not > want a separate request_queue for them as well as non-SCSI requests > really should not show up in ->queuecommand. Already has one (hba->cmd_queue), but as noted above, device commands aren't issued to it at the moment - they are sent directly. > Hannes and John have been > looking into this for a while and we need to sort this out properly. To avoid involving SCSI, since device commands aren't issued to the block layer, the UFS driver could instead keep 1 slot for itself (i.e. reduce the number of tags by one), and drop blk_get_request(); noting that currently only 1 device command is sent at a time due to hba->dev_cmd.lock mutex. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 8:37 ` Adrian Hunter @ 2021-11-03 16:33 ` Christoph Hellwig 2021-11-03 16:48 ` Bart Van Assche 0 siblings, 1 reply; 23+ messages in thread From: Christoph Hellwig @ 2021-11-03 16:33 UTC (permalink / raw) To: Adrian Hunter Cc: Christoph Hellwig, Bart Van Assche, Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das, Hannes Reinecke, John Garry On Wed, Nov 03, 2021 at 10:37:30AM +0200, Adrian Hunter wrote: > The UFS driver does not issue device commands to the block layer. > blk_get_request() is used only to get a free slot. That is indeed the case for the code touched here, but not in general: ch@brick:~/work/linux$ git-grep blk_execute_rq drivers/scsi/ufs/ drivers/scsi/ufs/ufshcd.c: blk_execute_rq_nowait(/*bd_disk=*/NULL, req, /*at_head=*/true, drivers/scsi/ufs/ufshpb.c: blk_execute_rq_nowait(NULL, req, 1, ufshpb_umap_req_compl_fn); drivers/scsi/ufs/ufshpb.c: blk_execute_rq_nowait(NULL, req, 1, ufshpb_map_req_compl_fn); ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 16:33 ` Christoph Hellwig @ 2021-11-03 16:48 ` Bart Van Assche 0 siblings, 0 replies; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 16:48 UTC (permalink / raw) To: Christoph Hellwig, Adrian Hunter Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Asutosh Das, Hannes Reinecke, John Garry On 11/3/21 9:33 AM, Christoph Hellwig wrote: > On Wed, Nov 03, 2021 at 10:37:30AM +0200, Adrian Hunter wrote: >> The UFS driver does not issue device commands to the block layer. >> blk_get_request() is used only to get a free slot. > > That is indeed the case for the code touched here, but not in general: > > ch@brick:~/work/linux$ git-grep blk_execute_rq drivers/scsi/ufs/ > drivers/scsi/ufs/ufshcd.c: blk_execute_rq_nowait(/*bd_disk=*/NULL, req, /*at_head=*/true, > drivers/scsi/ufs/ufshpb.c: blk_execute_rq_nowait(NULL, req, 1, ufshpb_umap_req_compl_fn); > drivers/scsi/ufs/ufshpb.c: blk_execute_rq_nowait(NULL, req, 1, ufshpb_map_req_compl_fn); Hi Christoph, The blk_execute_rq_nowait() call in drivers/scsi/ufs/ufshcd.c will disappear from Linus' master branch once Linus merges the v5.16 scsi core pull request. See also the following patch authored by myself: "[PATCH 2/2] scsi: ufs: Stop clearing unit attentions" (https://lore.kernel.org/linux-scsi/20211001182015.1347587-3-jaegeuk@kernel.org/). Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 7:40 ` Christoph Hellwig 2021-11-03 8:37 ` Adrian Hunter @ 2021-11-03 13:45 ` Bart Van Assche 2021-11-03 16:27 ` Christoph Hellwig 1 sibling, 1 reply; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 13:45 UTC (permalink / raw) To: Christoph Hellwig Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Adrian Hunter, Asutosh Das, Hannes Reinecke, John Garry On 11/3/21 00:40, Christoph Hellwig wrote: > On Tue, Nov 02, 2021 at 05:05:29PM -0700, Bart Van Assche wrote: >> - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); >> + req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED); > > blk_get_request will be gone in 5.16-rc, so this won't apply. Thanks for the reminder Christoph. This is something I am aware of. Hence the promise in the cover letter to rebase and repost this patch series after the merge window has closed. > But more importantly: SCSI LLDDs have absolutel no business calling > blk_get_request or blk_mq_alloc_request directly, but as usual UFS is > completely fucked up here. As explained by Adrian, the UFS protocol uses a single tag space for SCSI commands and UFS device commands. blk_mq_alloc_request() is used in this context to allocate a tag only from the shared tag space only. I think using blk_mq_alloc_request() for that purpose is fine. Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 13:45 ` Bart Van Assche @ 2021-11-03 16:27 ` Christoph Hellwig 2021-11-03 16:39 ` Bart Van Assche 0 siblings, 1 reply; 23+ messages in thread From: Christoph Hellwig @ 2021-11-03 16:27 UTC (permalink / raw) To: Bart Van Assche Cc: Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Adrian Hunter, Asutosh Das, Hannes Reinecke, John Garry On Wed, Nov 03, 2021 at 06:45:34AM -0700, Bart Van Assche wrote: > > But more importantly: SCSI LLDDs have absolutel no business calling > > blk_get_request or blk_mq_alloc_request directly, but as usual UFS is > > completely fucked up here. > > As explained by Adrian, the UFS protocol uses a single tag space for SCSI > commands and UFS device commands. blk_mq_alloc_request() is used in this > context to allocate a tag only from the shared tag space only. I think using > blk_mq_alloc_request() for that purpose is fine. The problem is not the shared tag space, the problem is that it is poking through layers. IFF we have good use cases for just allocating a tag (and we don't want to reserve it, which does make sense for many things like task management) we need a SCSI layer API that returns just the tag (and preferably also poisons the request in some way). ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 16:27 ` Christoph Hellwig @ 2021-11-03 16:39 ` Bart Van Assche 2021-11-03 16:41 ` Christoph Hellwig 0 siblings, 1 reply; 23+ messages in thread From: Bart Van Assche @ 2021-11-03 16:39 UTC (permalink / raw) To: Christoph Hellwig Cc: Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Adrian Hunter, Asutosh Das, Hannes Reinecke, John Garry On 11/3/21 9:27 AM, Christoph Hellwig wrote: > On Wed, Nov 03, 2021 at 06:45:34AM -0700, Bart Van Assche wrote: >>> But more importantly: SCSI LLDDs have absolutel no business calling >>> blk_get_request or blk_mq_alloc_request directly, but as usual UFS is >>> completely fucked up here. >> >> As explained by Adrian, the UFS protocol uses a single tag space for SCSI >> commands and UFS device commands. blk_mq_alloc_request() is used in this >> context to allocate a tag only from the shared tag space only. I think using >> blk_mq_alloc_request() for that purpose is fine. > > The problem is not the shared tag space, the problem is that it is > poking through layers. IFF we have good use cases for just allocating > a tag (and we don't want to reserve it, which does make sense for many > things like task management) we need a SCSI layer API that returns just > the tag (and preferably also poisons the request in some way). How about changing the blk_mq_alloc_request() / blk_put_request() calls in ufshcd_exec_dev_cmd() and __ufshcd_issue_tm_cmd() into scsi_{get,put}_internal_command() calls once Hannes' patch series that introduces these functions is upstream? See also "[PATCH 03/18] scsi: add scsi_{get,put}_internal_cmd() helper" (https://lore.kernel.org/linux-scsi/20210503150333.130310-4-hare@suse.de/). See also patch "[PATCH 10/18] scsi: implement reserved command handling" (https://lore.kernel.org/linux-scsi/20210503150333.130310-11-hare@suse.de/). Thanks, Bart. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2021-11-03 16:39 ` Bart Van Assche @ 2021-11-03 16:41 ` Christoph Hellwig 0 siblings, 0 replies; 23+ messages in thread From: Christoph Hellwig @ 2021-11-03 16:41 UTC (permalink / raw) To: Bart Van Assche Cc: Christoph Hellwig, Martin K . Petersen, Jaegeuk Kim, linux-scsi, James E.J. Bottomley, Can Guo, Bean Huo, Stanley Chu, Adrian Hunter, Asutosh Das, Hannes Reinecke, John Garry On Wed, Nov 03, 2021 at 09:39:32AM -0700, Bart Van Assche wrote: > How about changing the blk_mq_alloc_request() / blk_put_request() calls in > ufshcd_exec_dev_cmd() and __ufshcd_issue_tm_cmd() into > scsi_{get,put}_internal_command() calls once Hannes' patch series that > introduces these functions is upstream? See also "[PATCH 03/18] scsi: add > scsi_{get,put}_internal_cmd() helper" > (https://lore.kernel.org/linux-scsi/20210503150333.130310-4-hare@suse.de/). > See also patch "[PATCH 10/18] scsi: implement reserved command handling" > (https://lore.kernel.org/linux-scsi/20210503150333.130310-11-hare@suse.de/). Well, if they don't even need the command at all an API that just returns the tag might be even better. But yes, we need some SCSI layer abstraction and Hannes and John are currently looking into that. ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 0/2] Backport an UFS error handler fix to the 5.15 stable tree @ 2022-02-18 18:04 Bart Van Assche 2022-02-18 18:04 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche 0 siblings, 1 reply; 23+ messages in thread From: Bart Van Assche @ 2022-02-18 18:04 UTC (permalink / raw) To: Greg Kroah-Hartman; +Cc: Jaegeuk Kim, Kiwoong Kim, stable, Bart Van Assche Hi Greg, This series includes two patches: - A backport of "scsi: ufs: Remove dead code". - A backport of "scsi: ufs: Fix a deadlock in the error handler" Although the first patch is a code cleanup patch, I have included it in this series because the second patch depends on it. An Android developer requested a backport of the second patch to the android13-5.15 stable tree. Please consider these patches for inclusion in the 5.15 stable tree. Thanks, Bart. Bart Van Assche (2): scsi: ufs: Remove dead code scsi: ufs: Fix a deadlock in the error handler drivers/scsi/ufs/ufshcd.c | 58 ++++++++++----------------------------- drivers/scsi/ufs/ufshcd.h | 2 ++ 2 files changed, 16 insertions(+), 44 deletions(-) ^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler 2022-02-18 18:04 [PATCH 0/2] Backport an UFS error handler fix to the 5.15 stable tree Bart Van Assche @ 2022-02-18 18:04 ` Bart Van Assche 0 siblings, 0 replies; 23+ messages in thread From: Bart Van Assche @ 2022-02-18 18:04 UTC (permalink / raw) To: Greg Kroah-Hartman Cc: Jaegeuk Kim, Kiwoong Kim, stable, Bart Van Assche, Bean Huo, Adrian Hunter, Martin K . Petersen commit 945c3cca05d78351bba29fa65d93834cb7934c7b upstream. The following deadlock has been observed on a test setup: - All tags allocated - The SCSI error handler calls ufshcd_eh_host_reset_handler() - ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler() - ufshcd_err_handler() locks up as follows: Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt Call trace: __switch_to+0x298/0x5d8 __schedule+0x6cc/0xa94 schedule+0x12c/0x298 blk_mq_get_tag+0x210/0x480 __blk_mq_alloc_request+0x1c8/0x284 blk_get_request+0x74/0x134 ufshcd_exec_dev_cmd+0x68/0x640 ufshcd_verify_dev_init+0x68/0x35c ufshcd_probe_hba+0x12c/0x1cb8 ufshcd_host_reset_and_restore+0x88/0x254 ufshcd_reset_and_restore+0xd0/0x354 ufshcd_err_handler+0x408/0xc58 process_one_work+0x24c/0x66c worker_thread+0x3e8/0xa4c kthread+0x150/0x1b4 ret_from_fork+0x10/0x30 Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved request. Link: https://lore.kernel.org/r/20211203231950.193369-10-bvanassche@acm.org Tested-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/scsi/ufs/ufshcd.c | 53 +++++++++++---------------------------- drivers/scsi/ufs/ufshcd.h | 2 ++ 2 files changed, 16 insertions(+), 39 deletions(-) diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 92e4f0dbb791..cdec85bcc4cc 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -125,8 +125,9 @@ EXPORT_SYMBOL_GPL(ufshcd_dump_regs); enum { UFSHCD_MAX_CHANNEL = 0, UFSHCD_MAX_ID = 1, - UFSHCD_CMD_PER_LUN = 32, - UFSHCD_CAN_QUEUE = 32, + UFSHCD_NUM_RESERVED = 1, + UFSHCD_CMD_PER_LUN = 32 - UFSHCD_NUM_RESERVED, + UFSHCD_CAN_QUEUE = 32 - UFSHCD_NUM_RESERVED, }; /* UFSHCD error handling flags */ @@ -2185,6 +2186,7 @@ static inline int ufshcd_hba_capabilities(struct ufs_hba *hba) hba->nutrs = (hba->capabilities & MASK_TRANSFER_REQUESTS_SLOTS) + 1; hba->nutmrs = ((hba->capabilities & MASK_TASK_MANAGEMENT_REQUEST_SLOTS) >> 16) + 1; + hba->reserved_slot = hba->nutrs - 1; /* Read crypto capabilities */ err = ufshcd_hba_init_crypto_capabilities(hba); @@ -2910,30 +2912,15 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba *hba, static int ufshcd_exec_dev_cmd(struct ufs_hba *hba, enum dev_cmd_type cmd_type, int timeout) { - struct request_queue *q = hba->cmd_queue; DECLARE_COMPLETION_ONSTACK(wait); - struct request *req; + const u32 tag = hba->reserved_slot; struct ufshcd_lrb *lrbp; int err; - int tag; - down_read(&hba->clk_scaling_lock); + /* Protects use of hba->reserved_slot. */ + lockdep_assert_held(&hba->dev_cmd.lock); - /* - * Get free slot, sleep if slots are unavailable. - * Even though we use wait_event() which sleeps indefinitely, - * the maximum wait time is bounded by SCSI request timeout. - */ - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); - if (IS_ERR(req)) { - err = PTR_ERR(req); - goto out_unlock; - } - tag = req->tag; - WARN_ONCE(tag < 0, "Invalid tag %d\n", tag); - /* Set the timeout such that the SCSI error handler is not activated. */ - req->timeout = msecs_to_jiffies(2 * timeout); - blk_mq_start_request(req); + down_read(&hba->clk_scaling_lock); lrbp = &hba->lrb[tag]; WARN_ON(lrbp->cmd); @@ -2951,8 +2938,6 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba, (struct utp_upiu_req *)lrbp->ucd_rsp_ptr); out: - blk_put_request(req); -out_unlock: up_read(&hba->clk_scaling_lock); return err; } @@ -6640,23 +6625,16 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba, enum dev_cmd_type cmd_type, enum query_opcode desc_op) { - struct request_queue *q = hba->cmd_queue; DECLARE_COMPLETION_ONSTACK(wait); - struct request *req; + const u32 tag = hba->reserved_slot; struct ufshcd_lrb *lrbp; int err = 0; - int tag; u8 upiu_flags; - down_read(&hba->clk_scaling_lock); + /* Protects use of hba->reserved_slot. */ + lockdep_assert_held(&hba->dev_cmd.lock); - req = blk_get_request(q, REQ_OP_DRV_OUT, 0); - if (IS_ERR(req)) { - err = PTR_ERR(req); - goto out_unlock; - } - tag = req->tag; - WARN_ONCE(tag < 0, "Invalid tag %d\n", tag); + down_read(&hba->clk_scaling_lock); lrbp = &hba->lrb[tag]; WARN_ON(lrbp->cmd); @@ -6725,9 +6703,6 @@ static int ufshcd_issue_devman_upiu_cmd(struct ufs_hba *hba, ufshcd_add_query_upiu_trace(hba, err ? UFS_QUERY_ERR : UFS_QUERY_COMP, (struct utp_upiu_req *)lrbp->ucd_rsp_ptr); - blk_put_request(req); - -out_unlock: up_read(&hba->clk_scaling_lock); return err; } @@ -9418,8 +9393,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) /* Configure LRB */ ufshcd_host_memory_configure(hba); - host->can_queue = hba->nutrs; - host->cmd_per_lun = hba->nutrs; + host->can_queue = hba->nutrs - UFSHCD_NUM_RESERVED; + host->cmd_per_lun = hba->nutrs - UFSHCD_NUM_RESERVED; host->max_id = UFSHCD_MAX_ID; host->max_lun = UFS_MAX_LUNS; host->max_channel = UFSHCD_MAX_CHANNEL; diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h index 07ada6676c3b..d470a52ff24c 100644 --- a/drivers/scsi/ufs/ufshcd.h +++ b/drivers/scsi/ufs/ufshcd.h @@ -725,6 +725,7 @@ struct ufs_hba_monitor { * @capabilities: UFS Controller Capabilities * @nutrs: Transfer Request Queue depth supported by controller * @nutmrs: Task Management Queue depth supported by controller + * @reserved_slot: Used to submit device commands. Protected by @dev_cmd.lock. * @ufs_version: UFS Version to which controller complies * @vops: pointer to variant specific operations * @priv: pointer to variant specific private data @@ -813,6 +814,7 @@ struct ufs_hba { u32 capabilities; int nutrs; int nutmrs; + u32 reserved_slot; u32 ufs_version; const struct ufs_hba_variant_ops *vops; struct ufs_hba_variant_params *vps; ^ permalink raw reply related [flat|nested] 23+ messages in thread
end of thread, other threads:[~2022-02-18 18:04 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-03 0:05 [PATCH 0/2] Fix a deadlock in the UFS error handler Bart Van Assche 2021-11-03 0:05 ` [PATCH 1/2] scsi: core: Add support for reserved tags Bart Van Assche 2021-11-09 21:09 ` Bart Van Assche 2021-11-10 6:59 ` Hannes Reinecke 2021-11-10 19:19 ` Bart Van Assche 2021-11-11 6:54 ` Hannes Reinecke 2021-11-15 18:46 ` Bart Van Assche 2021-11-17 0:46 ` Bart Van Assche 2021-11-03 0:05 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche 2021-11-03 6:56 ` Adrian Hunter 2021-11-03 13:40 ` Bart Van Assche 2021-11-03 14:03 ` Adrian Hunter 2021-11-07 11:07 ` Avri Altman 2021-11-08 18:26 ` Bart Van Assche 2021-11-03 7:40 ` Christoph Hellwig 2021-11-03 8:37 ` Adrian Hunter 2021-11-03 16:33 ` Christoph Hellwig 2021-11-03 16:48 ` Bart Van Assche 2021-11-03 13:45 ` Bart Van Assche 2021-11-03 16:27 ` Christoph Hellwig 2021-11-03 16:39 ` Bart Van Assche 2021-11-03 16:41 ` Christoph Hellwig 2022-02-18 18:04 [PATCH 0/2] Backport an UFS error handler fix to the 5.15 stable tree Bart Van Assche 2022-02-18 18:04 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.