linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Adrian Hunter <adrian.hunter@intel.com>
To: Can Guo <cang@codeaurora.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume
Date: Wed, 7 Jul 2021 22:04:06 +0300	[thread overview]
Message-ID: <bd464b9f-b6d5-cd52-7377-c64c0cf933ff@intel.com> (raw)
In-Reply-To: <7c6e2baa3578eb30f2d4bd1696e800eb@codeaurora.org>

On 28/06/21 10:26 am, Can Guo wrote:
> On 2021-06-24 18:04, Adrian Hunter wrote:
>> On 24/06/21 9:31 am, Can Guo wrote:
>>> On 2021-06-24 14:23, Adrian Hunter wrote:
>>>> On 24/06/21 9:12 am, Can Guo wrote:
>>>>> On 2021-06-24 13:52, Adrian Hunter wrote:
>>>>>> On 24/06/21 5:16 am, Can Guo wrote:
>>>>>>> On 2021-06-23 22:30, Adrian Hunter wrote:
>>>>>>>> On 23/06/21 10:35 am, Can Guo wrote:
>>>>>>>>> To protect system suspend/resume from being disturbed by error handling,
>>>>>>>>> instead of using host_sem, let error handler call lock_system_sleep() and
>>>>>>>>> unlock_system_sleep() which achieve the same purpose. Remove the host_sem
>>>>>>>>> used in suspend/resume paths to make the code more readable.
>>>>>>>>>
>>>>>>>>> Suggested-by: Bart Van Assche <bvanassche@acm.org>
>>>>>>>>> Signed-off-by: Can Guo <cang@codeaurora.org>
>>>>>>>>> ---
>>>>>>>>>  drivers/scsi/ufs/ufshcd.c | 12 +++++++-----
>>>>>>>>>  1 file changed, 7 insertions(+), 5 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
>>>>>>>>> index 3695dd2..a09e4a2 100644
>>>>>>>>> --- a/drivers/scsi/ufs/ufshcd.c
>>>>>>>>> +++ b/drivers/scsi/ufs/ufshcd.c
>>>>>>>>> @@ -5907,6 +5907,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend)
>>>>>>>>>
>>>>>>>>>  static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
>>>>>>>>>  {
>>>>>>>>> +    /*
>>>>>>>>> +     * It is not safe to perform error handling while suspend or resume is
>>>>>>>>> +     * in progress. Hence the lock_system_sleep() call.
>>>>>>>>> +     */
>>>>>>>>> +    lock_system_sleep();
>>>>>>>>
>>>>>>>> It looks to me like the system takes this lock quite early, even before
>>>>>>>> freezing tasks, so if anything needs the error handler to run it will
>>>>>>>> deadlock.
>>>>>>>
>>>>>>> Hi Adrian,
>>>>>>>
>>>>>>> UFS/hba system suspend/resume does not invoke or call error handling in a
>>>>>>> synchronous way. So, whatever UFS errors (which schedules the error handler)
>>>>>>> happens during suspend/resume, error handler will just wait here till system
>>>>>>> suspend/resume release the lock. Hence no worries of deadlock here.
>>>>>>
>>>>>> It looks to me like the state can change to UFSHCD_STATE_EH_SCHEDULED_FATAL
>>>>>> and since user processes are not frozen, nor file systems sync'ed, everything
>>>>>> is going to deadlock.
>>>>>> i.e.
>>>>>> I/O is blocked waiting on error handling
>>>>>> error handling is blocked waiting on lock_system_sleep()
>>>>>> suspend is blocked waiting on I/O
>>>>>>
>>>>>
>>>>> Hi Adrian,
>>>>>
>>>>> First of all, enter_state(suspend_state_t state) uses mutex_trylock(&system_transition_mutex).
>>>>
>>>> Yes, in the case I am outlining it gets the mutex.
>>>>
>>>>> Second, even that happens, in ufshcd_queuecommand(), below logic will break the cycle, by
>>>>> fast failing the PM request (below codes are from the code tip with this whole series applied).
>>>>
>>>> It won't get that far because the suspend will be waiting to sync filesystems.
>>>> Filesystems will be waiting on I/O.
>>>> I/O will be waiting on the error handler.
>>>> The error handler will be waiting on system_transition_mutex.
>>>> But system_transition_mutex is already held by PM core.
>>>
>>> Hi Adrian,
>>>
>>> You are right.... I missed the action of syncing filesystems...
>>>
>>> Using back host_sem in suspend_prepare()/resume_complete() won't have this
>>> problem of deadlock, right?
>>
>> I am not sure, but what was problem that the V3 patch was fixing?
>> Can you give an example?
> 
> V3 was moving host_sem from wl_system_suspend/resume() to
> ufshcd_suspend_prepare()/ufshcd_resume_complete(). It is to
> make sure error handling does not run concurrenly with system
> PM, since error handling is recovering/clearing runtime PM
> errors of all the scsi devices under hba (in patch #8). Having the
> error handling doing so (in patch 8) is because runtime PM framework
> may save the runtime errors of the supplier to one or more consumers (
> unlike the children - parent relationship), for example if wlu resume
> fails, sda and/or other scsi devices may save the resume error, then
> they will be left runtime suspended permanently.

Sorry for the slow reply.  I was going to do some more investigation but
never found time.

I was wondering if it would be simpler to do the error recovery for
wl_system_suspend/resume() before exiting wl_system_suspend/resume().

Then it would be possible to do something along the lines:
	- prevent runtime suspend while the error handler is outstanding
	- at suspend, block queuing of the error handler work and flush it
	- at resume, allow queuing of the error handler work

  reply	other threads:[~2021-07-07 19:04 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1624433711-9339-1-git-send-email-cang@codeaurora.org>
2021-06-23  7:35 ` [PATCH v4 01/10] scsi: ufs: Rename flags pm_op_in_progress and is_sys_suspended Can Guo
2021-06-23 20:05   ` Bart Van Assche
2021-06-23 20:57     ` Bart Van Assche
2021-06-24  2:02       ` Can Guo
2021-06-24  2:34         ` Can Guo
2021-06-24  6:04         ` Adrian Hunter
2021-06-23 20:42   ` Bjorn Andersson
2021-06-23 22:41     ` Bart Van Assche
2021-06-24  2:04     ` Can Guo
2021-06-24 17:32   ` Bart Van Assche
2021-06-24 23:42   ` Bart Van Assche
2021-06-28  7:01     ` Can Guo
2021-06-28  7:35       ` Can Guo
2021-06-28 17:07       ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 02/10] scsi: ufs: Add " Can Guo
2021-06-23 12:33   ` Adrian Hunter
2021-06-24  2:05     ` Can Guo
2021-06-23 20:59   ` Bart Van Assche
2021-06-24  2:07     ` Can Guo
2021-06-24 17:35   ` Bart Van Assche
2021-06-28  7:11     ` Can Guo
2021-06-23  7:35 ` [PATCH v4 03/10] scsi: ufs: Update the return value of supplier pm ops Can Guo
2021-06-23 21:08   ` Bart Van Assche
2021-06-24  2:11     ` Can Guo
2021-06-23  7:35 ` [PATCH v4 04/10] scsi: ufs: Enable IRQ after enabling clocks in error handling preparation Can Guo
2021-06-23 21:20   ` Bart Van Assche
2021-06-23  7:35 ` [PATCH 05/10] scsi: ufs: Complete the cmd before returning in queuecommand Can Guo
2021-06-23  7:39   ` Can Guo
2021-06-23  7:35 ` [PATCH v4 05/10] scsi: ufs: Remove a redundant tag check in ufshcd_queuecommand() Can Guo
2021-06-23 21:24   ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume Can Guo
2021-06-23 14:30   ` Adrian Hunter
2021-06-24  2:16     ` Can Guo
2021-06-24  5:52       ` Adrian Hunter
2021-06-24  6:12         ` Can Guo
2021-06-24  6:23           ` Adrian Hunter
2021-06-24  6:31             ` Can Guo
2021-06-24 10:04               ` Adrian Hunter
2021-06-28  7:26                 ` Can Guo
2021-07-07 19:04                   ` Adrian Hunter [this message]
2021-06-24 17:11               ` Bart Van Assche
2021-06-28  8:17                 ` Can Guo
2021-06-28 17:31                   ` Bart Van Assche
2021-06-29  6:23                     ` Can Guo
2021-06-29 18:01                       ` Bart Van Assche
2021-06-29 21:50                         ` Can Guo
2021-06-23  7:35 ` [PATCH v4 07/10] scsi: ufs: Simplify error handling preparation Can Guo
2021-06-23 21:30   ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 08/10] scsi: ufs: Update ufshcd_recover_pm_error() Can Guo
2021-06-23  7:35 ` [PATCH v4 09/10] scsi: ufs: Update the fast abort path in ufshcd_abort() for PM requests Can Guo
2021-06-23 21:33   ` Bart Van Assche
2021-06-24  4:16     ` Can Guo
2021-06-24 16:57       ` Bart Van Assche
2021-06-23  7:35 ` [PATCH v4 10/10] scsi: ufs: Apply more limitations to user access Can Guo
2021-06-23 21:51   ` Bart Van Assche
2021-06-24  2:23     ` Can Guo
2021-06-24 22:25       ` Bart Van Assche
2021-06-28  7:16         ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd464b9f-b6d5-cd52-7377-c64c0cf933ff@intel.com \
    --to=adrian.hunter@intel.com \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=cang@codeaurora.org \
    --cc=hongwus@codeaurora.org \
    --cc=jaegeuk@kernel.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=stanley.chu@mediatek.com \
    --cc=ziqichen@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).