All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Bart Van Assche <bvanassche@acm.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-scsi@vger.kernel.org,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Can Guo <cang@codeaurora.org>, Bean Huo <beanhuo@micron.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Asutosh Das <asutoshd@codeaurora.org>
Subject: Re: [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler
Date: Wed, 3 Nov 2021 08:56:07 +0200	[thread overview]
Message-ID: <82dffddc-15e8-dc1a-abda-e84e7e441d87@intel.com> (raw)
In-Reply-To: <20211103000529.1549411-3-bvanassche@acm.org>

On 03/11/2021 02:05, Bart Van Assche wrote:
> The following deadlock has been observed on a test setup:
> * All tags allocated.
> * The SCSI error handler calls ufshcd_eh_host_reset_handler()
> * ufshcd_eh_host_reset_handler() queues work that calls ufshcd_err_handler()
> * ufshcd_err_handler() locks up as follows:
> 
> Workqueue: ufs_eh_wq_0 ufshcd_err_handler.cfi_jt
> Call trace:
>  __switch_to+0x298/0x5d8
>  __schedule+0x6cc/0xa94
>  schedule+0x12c/0x298
>  blk_mq_get_tag+0x210/0x480
>  __blk_mq_alloc_request+0x1c8/0x284
>  blk_get_request+0x74/0x134
>  ufshcd_exec_dev_cmd+0x68/0x640
>  ufshcd_verify_dev_init+0x68/0x35c
>  ufshcd_probe_hba+0x12c/0x1cb8
>  ufshcd_host_reset_and_restore+0x88/0x254
>  ufshcd_reset_and_restore+0xd0/0x354
>  ufshcd_err_handler+0x408/0xc58
>  process_one_work+0x24c/0x66c
>  worker_thread+0x3e8/0xa4c
>  kthread+0x150/0x1b4
>  ret_from_fork+0x10/0x30
> 
> Fix this lockup by making ufshcd_exec_dev_cmd() allocate a reserved
> request.

It is worth noting that the error handler itself could always find
a free slot, either by waiting for one, or by taking the reset
path which clears all slots.

However, the problem would then be places that cause the error
handler to wait, like sysfs (due to hba->host_sem), exception
event handler (due to cancel_work_sync(&hba->eeh_work)), or
potentially any other dev cmd user (due to hba->dev_cmd.lock).

Once the layering and locking is sorted out, it might be possible
to get rid of the reserved tag, if there was a performance
benefit.

More to the point though, for the reasons above, you need to
change the other dev cmd path also
i.e. ufshcd_issue_devman_upiu_cmd()

For UFS-specific patch sets please always cc me on all patches
in a series including the cover letter.

> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/scsi/ufs/ufshcd.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> index 9d964b979aa2..6b0101169974 100644
> --- a/drivers/scsi/ufs/ufshcd.c
> +++ b/drivers/scsi/ufs/ufshcd.c
> @@ -2904,12 +2904,7 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba *hba,
>  
>  	down_read(&hba->clk_scaling_lock);
>  
> -	/*
> -	 * Get free slot, sleep if slots are unavailable.
> -	 * Even though we use wait_event() which sleeps indefinitely,
> -	 * the maximum wait time is bounded by SCSI request timeout.
> -	 */
> -	req = blk_get_request(q, REQ_OP_DRV_OUT, 0);
> +	req = blk_mq_alloc_request(q, REQ_OP_DRV_OUT, BLK_MQ_REQ_RESERVED);

ufshcd_issue_devman_upiu_cmd() needs this also.

>  	if (IS_ERR(req)) {
>  		err = PTR_ERR(req);
>  		goto out_unlock;
> @@ -4919,11 +4914,7 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
>   */
>  static int ufshcd_change_queue_depth(struct scsi_device *sdev, int depth)
>  {
> -	struct ufs_hba *hba = shost_priv(sdev->host);
> -
> -	if (depth > hba->nutrs)
> -		depth = hba->nutrs;
> -	return scsi_change_queue_depth(sdev, depth);
> +	return scsi_change_queue_depth(sdev, min(depth, sdev->host->can_queue));
>  }
>  
>  static void ufshcd_hpb_destroy(struct ufs_hba *hba, struct scsi_device *sdev)
> @@ -8124,7 +8115,8 @@ static struct scsi_host_template ufshcd_driver_template = {
>  	.this_id		= -1,
>  	.sg_tablesize		= SG_ALL,
>  	.cmd_per_lun		= UFSHCD_CMD_PER_LUN,
> -	.can_queue		= UFSHCD_CAN_QUEUE,
> +	.can_queue		= UFSHCD_CAN_QUEUE - 1,
> +	.reserved_tags		= 1,

Number of reserved tags needs to be #define'd

>  	.max_segment_size	= PRDT_DATA_BYTE_COUNT_MAX,
>  	.max_host_blocked	= 1,
>  	.track_queue_depth	= 1,
> @@ -9485,8 +9477,8 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
>  	/* Configure LRB */
>  	ufshcd_host_memory_configure(hba);
>  
> -	host->can_queue = hba->nutrs;
> -	host->cmd_per_lun = hba->nutrs;
> +	host->can_queue = hba->nutrs - 1;
> +	host->cmd_per_lun = hba->nutrs - 1;

Number of reserved tags needs to be #define'd

>  	host->max_id = UFSHCD_MAX_ID;
>  	host->max_lun = UFS_MAX_LUNS;
>  	host->max_channel = UFSHCD_MAX_CHANNEL;
> 


  reply	other threads:[~2021-11-03  6:56 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03  0:05 [PATCH 0/2] Fix a deadlock in the UFS error handler Bart Van Assche
2021-11-03  0:05 ` [PATCH 1/2] scsi: core: Add support for reserved tags Bart Van Assche
2021-11-09 21:09   ` Bart Van Assche
2021-11-10  6:59   ` Hannes Reinecke
2021-11-10 19:19     ` Bart Van Assche
2021-11-11  6:54       ` Hannes Reinecke
2021-11-15 18:46         ` Bart Van Assche
2021-11-17  0:46         ` Bart Van Assche
2021-11-03  0:05 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche
2021-11-03  6:56   ` Adrian Hunter [this message]
2021-11-03 13:40     ` Bart Van Assche
2021-11-03 14:03       ` Adrian Hunter
2021-11-07 11:07     ` Avri Altman
2021-11-08 18:26       ` Bart Van Assche
2021-11-03  7:40   ` Christoph Hellwig
2021-11-03  8:37     ` Adrian Hunter
2021-11-03 16:33       ` Christoph Hellwig
2021-11-03 16:48         ` Bart Van Assche
2021-11-03 13:45     ` Bart Van Assche
2021-11-03 16:27       ` Christoph Hellwig
2021-11-03 16:39         ` Bart Van Assche
2021-11-03 16:41           ` Christoph Hellwig
2022-02-18 18:04 [PATCH 0/2] Backport an UFS error handler fix to the 5.15 stable tree Bart Van Assche
2022-02-18 18:04 ` [PATCH 2/2] scsi: ufs: Fix a deadlock in the error handler Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82dffddc-15e8-dc1a-abda-e84e7e441d87@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=asutoshd@codeaurora.org \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=stanley.chu@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.