linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume
Date: Mon, 28 Jun 2021 15:26:34 +0800	[thread overview]
Message-ID: <7c6e2baa3578eb30f2d4bd1696e800eb@codeaurora.org> (raw)
In-Reply-To: <f1c997f3-66e4-3f1f-08f5-83449b65c397@intel.com>

On 2021-06-24 18:04, Adrian Hunter wrote:
> On 24/06/21 9:31 am, Can Guo wrote:
>> On 2021-06-24 14:23, Adrian Hunter wrote:
>>> On 24/06/21 9:12 am, Can Guo wrote:
>>>> On 2021-06-24 13:52, Adrian Hunter wrote:
>>>>> On 24/06/21 5:16 am, Can Guo wrote:
>>>>>> On 2021-06-23 22:30, Adrian Hunter wrote:
>>>>>>> On 23/06/21 10:35 am, Can Guo wrote:
>>>>>>>> To protect system suspend/resume from being disturbed by error 
>>>>>>>> handling,
>>>>>>>> instead of using host_sem, let error handler call 
>>>>>>>> lock_system_sleep() and
>>>>>>>> unlock_system_sleep() which achieve the same purpose. Remove the 
>>>>>>>> host_sem
>>>>>>>> used in suspend/resume paths to make the code more readable.
>>>>>>>> 
>>>>>>>> Suggested-by: Bart Van Assche <bvanassche@acm.org>
>>>>>>>> Signed-off-by: Can Guo <cang@codeaurora.org>
>>>>>>>> ---
>>>>>>>>  drivers/scsi/ufs/ufshcd.c | 12 +++++++-----
>>>>>>>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>>>>>>> 
>>>>>>>> diff --git a/drivers/scsi/ufs/ufshcd.c 
>>>>>>>> b/drivers/scsi/ufs/ufshcd.c
>>>>>>>> index 3695dd2..a09e4a2 100644
>>>>>>>> --- a/drivers/scsi/ufs/ufshcd.c
>>>>>>>> +++ b/drivers/scsi/ufs/ufshcd.c
>>>>>>>> @@ -5907,6 +5907,11 @@ static void 
>>>>>>>> ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
>>>>>>>> 
>>>>>>>>  static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>>>>>>>>  {
>>>>>>>> +    /*
>>>>>>>> +     * It is not safe to perform error handling while suspend 
>>>>>>>> or resume is
>>>>>>>> +     * in progress. Hence the lock_system_sleep() call.
>>>>>>>> +     */
>>>>>>>> +    lock_system_sleep();
>>>>>>> 
>>>>>>> It looks to me like the system takes this lock quite early, even 
>>>>>>> before
>>>>>>> freezing tasks, so if anything needs the error handler to run it 
>>>>>>> will
>>>>>>> deadlock.
>>>>>> 
>>>>>> Hi Adrian,
>>>>>> 
>>>>>> UFS/hba system suspend/resume does not invoke or call error 
>>>>>> handling in a
>>>>>> synchronous way. So, whatever UFS errors (which schedules the 
>>>>>> error handler)
>>>>>> happens during suspend/resume, error handler will just wait here 
>>>>>> till system
>>>>>> suspend/resume release the lock. Hence no worries of deadlock 
>>>>>> here.
>>>>> 
>>>>> It looks to me like the state can change to 
>>>>> UFSHCD_STATE_EH_SCHEDULED_FATAL
>>>>> and since user processes are not frozen, nor file systems sync'ed, 
>>>>> everything
>>>>> is going to deadlock.
>>>>> i.e.
>>>>> I/O is blocked waiting on error handling
>>>>> error handling is blocked waiting on lock_system_sleep()
>>>>> suspend is blocked waiting on I/O
>>>>> 
>>>> 
>>>> Hi Adrian,
>>>> 
>>>> First of all, enter_state(suspend_state_t state) uses 
>>>> mutex_trylock(&system_transition_mutex).
>>> 
>>> Yes, in the case I am outlining it gets the mutex.
>>> 
>>>> Second, even that happens, in ufshcd_queuecommand(), below logic 
>>>> will break the cycle, by
>>>> fast failing the PM request (below codes are from the code tip with 
>>>> this whole series applied).
>>> 
>>> It won't get that far because the suspend will be waiting to sync 
>>> filesystems.
>>> Filesystems will be waiting on I/O.
>>> I/O will be waiting on the error handler.
>>> The error handler will be waiting on system_transition_mutex.
>>> But system_transition_mutex is already held by PM core.
>> 
>> Hi Adrian,
>> 
>> You are right.... I missed the action of syncing filesystems...
>> 
>> Using back host_sem in suspend_prepare()/resume_complete() won't have 
>> this
>> problem of deadlock, right?
> 
> I am not sure, but what was problem that the V3 patch was fixing?
> Can you give an example?

V3 was moving host_sem from wl_system_suspend/resume() to
ufshcd_suspend_prepare()/ufshcd_resume_complete(). It is to
make sure error handling does not run concurrenly with system
PM, since error handling is recovering/clearing runtime PM
errors of all the scsi devices under hba (in patch #8). Having the
error handling doing so (in patch 8) is because runtime PM framework
may save the runtime errors of the supplier to one or more consumers (
unlike the children - parent relationship), for example if wlu resume
fails, sda and/or other scsi devices may save the resume error, then
they will be left runtime suspended permanently.

Thanks,

Can Guo.

> 
>> 
>> Thanks,
>> 
>> Can Guo.
>> 
>>> 
>>>> 
>>>>         case UFSHCD_STATE_EH_SCHEDULED_FATAL:
>>>>                 /*
>>>>                  * ufshcd_rpm_get_sync() is used at error handling 
>>>> preparation
>>>>                  * stage. If a scsi cmd, e.g., the SSU cmd, is sent 
>>>> from the
>>>>                  * PM ops, it can never be finished if we let SCSI 
>>>> layer keep
>>>>                  * retrying it, which gets err handler stuck 
>>>> forever. Neither
>>>>                  * can we let the scsi cmd pass through, because UFS 
>>>> is in bad
>>>>                  * state, the scsi cmd may eventually time out, 
>>>> which will get
>>>>                  * err handler blocked for too long. So, just fail 
>>>> the scsi cmd
>>>>                  * sent from PM ops, err handler can recover PM 
>>>> error anyways.
>>>>                  */
>>>>                 if (cmd->request->rq_flags & RQF_PM) {
>>>>                         hba->force_reset = true;
>>>>                         set_host_byte(cmd, DID_BAD_TARGET);
>>>>                         cmd->scsi_done(cmd);
>>>>                         goto out;
>>>>                 }
>>>>                 fallthrough;
>>>>         case UFSHCD_STATE_RESET:
>>>> 
>>>> Thanks,
>>>> 
>>>> Can Guo.
>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> Can Guo.
>>>>>> 
>>>>>>> 
>>>>>>>>      ufshcd_rpm_get_sync(hba);
>>>>>>>>      if 
>>>>>>>> (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) 
>>>>>>>> ||
>>>>>>>>          hba->is_wlu_sys_suspended) {
>>>>>>>> @@ -5951,6 +5956,7 @@ static void 
>>>>>>>> ufshcd_err_handling_unprepare(struct ufs_hba *hba)
>>>>>>>>          ufshcd_clk_scaling_suspend(hba, false);
>>>>>>>>      ufshcd_clear_ua_wluns(hba);
>>>>>>>>      ufshcd_rpm_put(hba);
>>>>>>>> +    unlock_system_sleep();
>>>>>>>>  }
>>>>>>>> 
>>>>>>>>  static inline bool ufshcd_err_handling_should_stop(struct 
>>>>>>>> ufs_hba *hba)
>>>>>>>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct 
>>>>>>>> device *dev)
>>>>>>>>      ktime_t start = ktime_get();
>>>>>>>> 
>>>>>>>>      hba = shost_priv(sdev->host);
>>>>>>>> -    down(&hba->host_sem);
>>>>>>>> 
>>>>>>>>      if (pm_runtime_suspended(dev))
>>>>>>>>          goto out;
>>>>>>>> 
>>>>>>>>      ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
>>>>>>>> -    if (ret) {
>>>>>>>> +    if (ret)
>>>>>>>>          dev_err(&sdev->sdev_gendev, "%s failed: %d\n", 
>>>>>>>> __func__,  ret);
>>>>>>>> -        up(&hba->host_sem);
>>>>>>>> -    }
>>>>>>>> 
>>>>>>>>  out:
>>>>>>>>      if (!ret)
>>>>>>>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device 
>>>>>>>> *dev)
>>>>>>>>          hba->curr_dev_pwr_mode, hba->uic_link_state);
>>>>>>>>      if (!ret)
>>>>>>>>          hba->is_wlu_sys_suspended = false;
>>>>>>>> -    up(&hba->host_sem);
>>>>>>>>      return ret;
>>>>>>>>  }
>>>>>>>>  #endif
>>>>>>>> 

  reply	other threads:[~2021-06-28  7:26 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1624433711-9339-1-git-send-email-cang@codeaurora.org>
2021-06-23  7:35 ` [PATCH v4 01/10] scsi: ufs: Rename flags pm_op_in_progress and is_sys_suspended Can Guo
2021-06-23 20:05   ` Bart Van Assche
2021-06-23 20:57     ` Bart Van Assche
2021-06-24  2:02       ` Can Guo
2021-06-24  2:34         ` Can Guo
2021-06-24  6:04         ` Adrian Hunter
2021-06-23 20:42   ` Bjorn Andersson
2021-06-23 22:41     ` Bart Van Assche
2021-06-24  2:04     ` Can Guo
2021-06-24 17:32   ` Bart Van Assche
2021-06-24 23:42   ` Bart Van Assche
2021-06-28  7:01     ` Can Guo
2021-06-28  7:35       ` Can Guo
2021-06-28 17:07       ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 02/10] scsi: ufs: Add " Can Guo
2021-06-23 12:33   ` Adrian Hunter
2021-06-24  2:05     ` Can Guo
2021-06-23 20:59   ` Bart Van Assche
2021-06-24  2:07     ` Can Guo
2021-06-24 17:35   ` Bart Van Assche
2021-06-28  7:11     ` Can Guo
2021-06-23  7:35 ` [PATCH v4 03/10] scsi: ufs: Update the return value of supplier pm ops Can Guo
2021-06-23 21:08   ` Bart Van Assche
2021-06-24  2:11     ` Can Guo
2021-06-23  7:35 ` [PATCH v4 04/10] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-23 21:20   ` Bart Van Assche
2021-06-23  7:35 ` [PATCH 05/10] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-23  7:39   ` Can Guo
2021-06-23  7:35 ` [PATCH v4 05/10] scsi: ufs: Remove a redundant tag check in ufshcd_queuecommand() Can Guo
2021-06-23 21:24   ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume Can Guo
2021-06-23 14:30   ` Adrian Hunter
2021-06-24  2:16     ` Can Guo
2021-06-24  5:52       ` Adrian Hunter
2021-06-24  6:12         ` Can Guo
2021-06-24  6:23           ` Adrian Hunter
2021-06-24  6:31             ` Can Guo
2021-06-24 10:04               ` Adrian Hunter
2021-06-28  7:26                 ` Can Guo [this message]
2021-07-07 19:04                   ` Adrian Hunter
2021-06-24 17:11               ` Bart Van Assche
2021-06-28  8:17                 ` Can Guo
2021-06-28 17:31                   ` Bart Van Assche
2021-06-29  6:23                     ` Can Guo
2021-06-29 18:01                       ` Bart Van Assche
2021-06-29 21:50                         ` Can Guo
2021-06-23  7:35 ` [PATCH v4 07/10] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-23 21:30   ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 08/10] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-23  7:35 ` [PATCH v4 09/10] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-23 21:33   ` Bart Van Assche
2021-06-24  4:16     ` Can Guo
2021-06-24 16:57       ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 10/10] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-23 21:51   ` Bart Van Assche
2021-06-24  2:23     ` Can Guo
2021-06-24 22:25       ` Bart Van Assche
2021-06-28  7:16         ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7c6e2baa3578eb30f2d4bd1696e800eb@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=adrian.hunter@intel.com \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).