All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Bart Van Assche <bvanassche@acm.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-scsi@vger.kernel.org, dh0421.hwang@samsung.com,
	Asutosh Das <asutoshd@codeaurora.org>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Bean Huo <beanhuo@micron.com>, Avri Altman <avri.altman@wdc.com>,
	Jinyoung Choi <j-young.choi@samsung.com>
Subject: Re: [PATCH] scsi: ufs: Fix deadlocks between power management and error handler
Date: Mon, 19 Sep 2022 14:34:03 +0300	[thread overview]
Message-ID: <3712590b-20cb-7d27-3017-4567f1fcddc2@intel.com> (raw)
In-Reply-To: <20220916184220.867535-1-bvanassche@acm.org>

On 16/09/22 21:42, Bart Van Assche wrote:
> The following deadlocks have been observed on multiple test setups:
> 
> * ufshcd_wl_suspend() is waiting for blk_execute_rq() to complete while it
>   holds host_sem.
> * ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
>   latter function tries to obtain host_sem.
> This is a deadlock because blk_execute_rq() can't execute SCSI commands
> while the host is in the SHOST_RECOVERY state and because the error
> handler cannot make progress either.

Hi Bart

Did you consider something like:

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 7256e6c43ca6..dc83b38dfde9 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -7374,6 +7374,9 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
 
 	hba = shost_priv(cmd->device->host);
 
+	if (hba->pm_op_in_progress)
+		return FAST_IO_FAIL;
+
 	spin_lock_irqsave(hba->host->host_lock, flags);
 	hba->force_reset = true;
 	ufshcd_schedule_eh_work(hba);

> 
> * ufshcd_wl_runtime_resume() is waiting for blk_execute_rq() to finish
>   while it holds host_sem.
> * ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
>   latter function calls pm_runtime_resume().
> This is a deadlock because of the same reason as the previous scenario.
> 
> Fix both deadlocks by not obtaining host_sem from the power management
> code paths. Removing the host_sem locking from the power management code
> is safe because the ufshcd_err_handler() is already serialized against
> SCSI command execution.

The original commit for host_sem was aimed at sysfs (see commit below).
Did you consider how sysfs access is affected?

  commit 9cd20d3f473619d8d482551d15d4cebfb3ce73c8
  Author: Can Guo <cang@codeaurora.org>
  Date:   Wed Jan 13 19:13:28 2021 -0800

    scsi: ufs: Protect PM ops and err_handler from user access through sysfs
    
    User layer may access sysfs nodes when system PM ops or error handling is
    running. This can cause various problems. Rename eh_sem to host_sem and use
    it to protect PM ops and error handling from user layer intervention.

> 
> The ufshcd_rpm_get_sync() call at the start of
> ufshcd_err_handling_prepare() may deadlock since calling scsi_execute()
> is required by the UFS runtime resume implementation. Fixing that
> deadlock falls outside the scope of this patch.

Do you mean:

static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
{
	ufshcd_rpm_get_sync(hba);

because that is the host controller, not the UFS device, that is
being resumed.

> 
> Cc: dh0421.hwang@samsung.com
> Cc: Asutosh Das <asutoshd@codeaurora.org>
> Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
>  drivers/ufs/core/ufshcd.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
> index 7c15cbc737b4..cd3c2aa981c6 100644
> --- a/drivers/ufs/core/ufshcd.c
> +++ b/drivers/ufs/core/ufshcd.c
> @@ -9254,16 +9254,13 @@ static int ufshcd_wl_suspend(struct device *dev)
>  	ktime_t start = ktime_get();
>  
>  	hba = shost_priv(sdev->host);
> -	down(&hba->host_sem);
>  
>  	if (pm_runtime_suspended(dev))
>  		goto out;
>  
>  	ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
> -	if (ret) {
> +	if (ret)
>  		dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,  ret);
> -		up(&hba->host_sem);
> -	}
>  
>  out:
>  	if (!ret)
> @@ -9296,7 +9293,6 @@ static int ufshcd_wl_resume(struct device *dev)
>  		hba->curr_dev_pwr_mode, hba->uic_link_state);
>  	if (!ret)
>  		hba->is_sys_suspended = false;
> -	up(&hba->host_sem);
>  	return ret;
>  }
>  #endif


  parent reply	other threads:[~2022-09-19 11:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 18:42 [PATCH] scsi: ufs: Fix deadlocks between power management and error handler Bart Van Assche
2022-09-19  3:10 ` Asutosh Das (asd)
2022-09-19 23:17   ` Bart Van Assche
2022-09-19 11:34 ` Adrian Hunter [this message]
2022-09-19 13:54   ` Bart Van Assche
2022-09-19 17:21     ` Adrian Hunter
2022-09-19 23:22       ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3712590b-20cb-7d27-3017-4567f1fcddc2@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=dh0421.hwang@samsung.com \
    --cc=j-young.choi@samsung.com \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.