Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation

From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
	asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, ziqichen@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Bean Huo <beanhuo@micron.com>, Jaegeuk Kim <jaegeuk@kernel.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3 5/9] scsi: ufs: Simplify error handling preparation
Date: Sat, 12 Jun 2021 17:49:26 +0800	[thread overview]
Message-ID: <d3c57c8e52f7a251b5c536a893b1f101@codeaurora.org> (raw)
In-Reply-To: <645c0e3c83c8917a8fd5c0493c5815a0@codeaurora.org>

Hi Bart,

On 2021-06-12 14:46, Can Guo wrote:
> On 2021-06-12 04:58, Bart Van Assche wrote:
>> On 6/10/21 8:01 PM, Can Guo wrote:
>>> Previously, without commit cb7e6f05fce67c965194ac04467e1ba7bc70b069,
>>> ufshcd_resume() may turn off pwr and clk due to UFS error, e.g., link
>>> transition failure and SSU error/abort (and these UFS error would
>>> invoke error handling).  When error handling kicks start, it should
>>> re-enable the pwr and clk before proceeding. Now, commit
>>> cb7e6f05fce67c965194ac04467e1ba7bc70b069 makes ufshcd_resume()
>>> purely control pwr and clk, meaning if ufshcd_resume() fails, there
>>> is nothing we can do about it - pwr or clk enabling must have failed,
>>> and it is not because of UFS error. This is why I am removing the
>>> re-enabling pwr/clk in error handling prepare.
>> 
>> Why are link transition failures handled in the error handler instead 
>> of
>> in the context where these errors are detected (ufshcd_resume())? Is 
>> it
>> even possible to recover from a link transition failure or does this
>> perhaps indicate a broken UFS controller?
> 
> Basically, almost all UFS failures are caused by errors in underlaying 
> layers,
> i.e., UIC errors, including link transition failures. And according to 
> UFSHCI
> spec, SW should do a full reset to recover it, just like handle any 
> other
> fatal UIC errors. All UIC errors are detected by HW and reported by IRQ 
> handler.
> 
> UFSHCI Spec Ver. 31
> 8.2.7 Hibernate Enter/Exit Error Handling
> Hibernate Enter/Exit Error occurs when the UniPro link is broken. When
> this condition occurs,
> host software should reset the host controller by setting register HCE
> to ‘0’, re-initialize the host
> controller by setting register HCE to ‘1', and then start link startup
> sequence as shown in Figure 16.
> 
>> 
>>>> but what I really wonder is why we don't just do recovery directly
>>>> in __ufshcd_wl_suspend() and  __ufshcd_wl_resume() and strip all
>>>> the PM complexity out of ufshcd_err_handling()?
>> 
>> +1
> 
> I've explained why I chose not to do this in my last reply to Adrian.
> Please kindly check it.
> 
>> 
>>> For system suspend/resume, since error handling has the same nature
>>> like user access, so we are using host_sem to avoid concurrency of
>>> error handling and system suspend/resume.
>> 
>> Why is host_sem used for that purpose instead of lock_system_sleep() 
>> and
>> unlock_system_sleep()?
>> 
> 
> I was aware of it, but the situation is that host_sem is also used to
> avoid concurrency among user access, error handling and shutdown, so
> I think just use host_sem anyways to simply the lockings, otherwise
> user access and error handling would have to take both 
> system_transition_mutex
> and host_sem

On second thought, I will take your suggestion to use 
lock_system_sleep()
and unlock_system_sleep() in error handler and remove the host_sem used
in suspend/resume, which can make the code more readable by keeping the
changes within error handler itself. However, please note that host_sem
will still be used to avoid concurrency of user access, error handler 
and
shutdown.

Thanks,
Can Guo.

> 
> Thanks,
> 
> Can Guo.
> 
>> Thanks,
>> 
>> Bart.