All of lore.kernel.org
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Bart Van Assche <bvanassche@acm.org>,
	"Martin K . Petersen" <martin.petersen@oracle.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>,
	linux-scsi@vger.kernel.org, dh0421.hwang@samsung.com,
	Asutosh Das <asutoshd@codeaurora.org>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	Bean Huo <beanhuo@micron.com>, Avri Altman <avri.altman@wdc.com>,
	Jinyoung Choi <j-young.choi@samsung.com>
Subject: Re: [PATCH] scsi: ufs: Fix deadlocks between power management and error handler
Date: Mon, 19 Sep 2022 20:21:07 +0300	[thread overview]
Message-ID: <c98a4226-f1be-f84b-267c-5ce4e6c387d7@intel.com> (raw)
In-Reply-To: <913f72ad-7f6f-9067-df36-f9507359c816@acm.org>

On 19/09/22 16:54, Bart Van Assche wrote:
> On 9/19/22 04:34, Adrian Hunter wrote:
>> Did you consider something like:
>>
>> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
>> index 7256e6c43ca6..dc83b38dfde9 100644
>> --- a/drivers/ufs/core/ufshcd.c
>> +++ b/drivers/ufs/core/ufshcd.c
>> @@ -7374,6 +7374,9 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
>>         hba = shost_priv(cmd->device->host);
>>   +    if (hba->pm_op_in_progress)
>> +        return FAST_IO_FAIL;
>> +
>>       spin_lock_irqsave(hba->host->host_lock, flags);
>>       hba->force_reset = true;
>>       ufshcd_schedule_eh_work(hba);
> 
> The above change could cause error handling to be skipped if an error happened that requires the link to be reset. That seems wrong to me.

Hopefully a PM op with an error that really needed a host reset
would show up as a UFS error that the error handler could fix
successfully.

Alternatively, the same change but after scheduling the error handler?

> 
>> The original commit for host_sem was aimed at sysfs (see commit below).
>> Did you consider how sysfs access is affected?
>>
>>    commit 9cd20d3f473619d8d482551d15d4cebfb3ce73c8
>>    Author: Can Guo <cang@codeaurora.org>
>>    Date:   Wed Jan 13 19:13:28 2021 -0800
>>
>>      scsi: ufs: Protect PM ops and err_handler from user access through sysfs
>>           User layer may access sysfs nodes when system PM ops or error handling is
>>      running. This can cause various problems. Rename eh_sem to host_sem and use
>>      it to protect PM ops and error handling from user layer intervention.
> 
> The sysfs and debugfs attribute callback methods already call pm_runtime_get_sync() and pm_runtime_put_sync() so how could the power state change while a sysfs or debugfs attribute callback method is in progress?

Without PM holding host_sem, maybe it would give a similar
deadlock to what was described:

ufs_sysfs_read_desc_param
down(&hba->host_sem); <------------------------------------
ufshcd_rpm_get_sync(hba);
	waits for blk_execute_rq()
	waits for ufshcd_eh_host_reset_handler()
	waits for ufshcd_err_handler()
	waits for down(&hba->host_sem); <------------------

> 
>>> The ufshcd_rpm_get_sync() call at the start of
>>> ufshcd_err_handling_prepare() may deadlock since calling scsi_execute()
>>> is required by the UFS runtime resume implementation. Fixing that
>>> deadlock falls outside the scope of this patch.
>>
>> Do you mean:
>>
>> static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>> {
>>     ufshcd_rpm_get_sync(hba);
>>
>> because that is the host controller, not the UFS device, that is
>> being resumed.
> 
> Hmm ... I think that ufshcd_rpm_get_sync() affects the power state of the UFS device and not the power state of the UFS host controller. From ufshcd-priv.h:
> 
> static inline int ufshcd_rpm_get_sync(struct ufs_hba *hba)
> {
>     return pm_runtime_get_sync(&hba->ufs_device_wlun->sdev_gendev);
> }

Yes, I misread that, sorry.

I guess it goes unnoticed because it is very unlikely i.e. the UFS
device would need to be suspending but not yet have claimed host_sem.
There would not be any outstanding requests otherwise the suspend
would not have started, so chance of errors at that point is very low.

Maybe deadlock could be sidestepped by changing:

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 7256e6c43ca6..9cb04c6f8dc3 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -9258,7 +9261,10 @@ static int ufshcd_wl_suspend(struct device *dev)
 	ktime_t start = ktime_get();
 
 	hba = shost_priv(sdev->host);
-	down(&hba->host_sem);
+	if (down_trylock(&hba->host_sem)) {
+		ret = -EBUSY;
+		goto out;
+	}
 
 	if (pm_runtime_suspended(dev))
 		goto out;


  reply	other threads:[~2022-09-19 17:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16 18:42 [PATCH] scsi: ufs: Fix deadlocks between power management and error handler Bart Van Assche
2022-09-19  3:10 ` Asutosh Das (asd)
2022-09-19 23:17   ` Bart Van Assche
2022-09-19 11:34 ` Adrian Hunter
2022-09-19 13:54   ` Bart Van Assche
2022-09-19 17:21     ` Adrian Hunter [this message]
2022-09-19 23:22       ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c98a4226-f1be-f84b-267c-5ce4e6c387d7@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=dh0421.hwang@samsung.com \
    --cc=j-young.choi@samsung.com \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.