linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Stanley Chu <stanley.chu@mediatek.com>
To: Avri Altman <Avri.Altman@wdc.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"andy.teng@mediatek.com" <andy.teng@mediatek.com>,
	"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
	"chun-hung.wu@mediatek.com" <chun-hung.wu@mediatek.com>,
	"kuohong.wang@mediatek.com" <kuohong.wang@mediatek.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"cc.chou@mediatek.com" <cc.chou@mediatek.com>,
	"cang@codeaurora.org" <cang@codeaurora.org>,
	"linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"peter.wang@mediatek.com" <peter.wang@mediatek.com>,
	"alim.akhtar@samsung.com" <alim.akhtar@samsung.com>,
	"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
	"asutoshd@codeaurora.org" <asutoshd@codeaurora.org>,
	"chaotian.jing@mediatek.com" <chaotian.jing@mediatek.com>,
	"bvanassche@acm.org" <bvanassche@acm.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"beanhuo@micron.com" <beanhuo@micron.com>
Subject: RE: [PATCH v3] scsi: ufs: Cleanup completed request without interrupt notification
Date: Tue, 14 Jul 2020 16:48:47 +0800	[thread overview]
Message-ID: <1594716527.22878.28.camel@mtkswgap22> (raw)
In-Reply-To: <SN6PR04MB4640F34CAA25B3CB58F94CABFC630@SN6PR04MB4640.namprd04.prod.outlook.com>

Hi Avri,

Sorry for the late response.

On Sun, 2020-07-12 at 10:04 +0000, Avri Altman wrote:
> 
> > 
> > Hi Avri,
> > 
> > On Thu, 2020-07-09 at 08:31 +0000, Avri Altman wrote:
> > > >
> > > > If somehow no interrupt notification is raised for a completed request
> > > > and its doorbell bit is cleared by host, UFS driver needs to cleanup
> > > > its outstanding bit in ufshcd_abort().
> > > Theoretically, this case is already accounted for -
> > > See line 6407: a proper error is issued and eventually outstanding req is
> > cleared.
> > >
> > > Can you go over the scenario you are attending line by line,
> > > And explain why ufshcd_abort does not account for it?
> > 
> > Sure.
> > 
> > If a request using tag N is completed by UFS device without interrupt
> > notification till timeout happens, ufshcd_abort() will be invoked.
> > 
> > Since request completion flow is not executed, current status may be
> > 
> > - Tag N in hba->outstanding_reqs is set
> > - Tag N in doorbell register is not set
> > 
> > In this case, ufshcd_abort() flow would be
> > 
> > - This log is printed: "ufshcd_abort: cmd was completed, but without a
> > notifying intr, tag = N"
> > - This log is printed: "ufshcd_abort: Device abort task at tag N"
> > - If hba->req_abort_skip is zero, QUERY_TASK command is sent
> > - Device responds "UPIU_TASK_MANAGEMENT_FUNC_COMPL"
> > - This log is printed: "ufshcd_abort: cmd at tag N not pending in the
> > device."
> > - Doorbell tells that tag N is not set, so the driver goes to label
> > "out" with this log printed: "ufshcd_abort: cmd at tag %d successfully
> > cleared from DB."
> > - In label "out" section, no cleanup will be made, and then ufshcd_abort
> > exits
> > - This request will be re-queued to request queue by SCSI timeout
> > handler
> > 
> > Now, Inconsistent state shows-up: A request is "re-queued" but its
> > corresponding resource in UFS layer is not cleared, below flow will
> > trigger bad things,
> > 
> > - A new request with tag M is finished
> > - Interrupt is raised and ufshcd_transfer_req_compl() found both tag N
> > and M can process the completion flow
> > - The post-processing flow for tag N will be executed while its request
> > is still alive
> > 
> > I am sorry that below messages are only for old kernel in non-blk-mq
> > case. However above scenario will also trigger bad thing in blk-mq case.
> 
> Ok.  Thanks.
> 
> > 
> > >
> > > >
> > > > Otherwise, system may crash by below abnormal flow:
> > > >
> > > > After this request is requeued by SCSI layer with its
> > > > outstanding bit set, the next completed request will trigger
> > > > ufshcd_transfer_req_compl() to handle all "completed outstanding
> > > > bits". In this time, the "abnormal outstanding bit" will be detected
> > > > and the "requeued request" will be chosen to execute request
> > > > post-processing flow. This is wrong and blk_finish_request() will
> > > > BUG_ON because this request is still "alive".
> > > >
> > > > It is worth mentioning that before ufshcd_abort() cleans the timed-out
> > > > request, driver need to check again if this request is really not
> > > > handled by __ufshcd_transfer_req_compl() yet because it may be
> > > > possible that the interrupt comes very lately before the cleaning.
> > > What do you mean? Why checking the outstanding reqs isn't enough?
> > >
> > > >
> > > > Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
> > > > ---
> > > >  drivers/scsi/ufs/ufshcd.c | 9 +++++++--
> > > >  1 file changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > > > index 8603b07045a6..f23fb14df9f6 100644
> > > > --- a/drivers/scsi/ufs/ufshcd.c
> > > > +++ b/drivers/scsi/ufs/ufshcd.c
> > > > @@ -6462,7 +6462,7 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
> > > >                         /* command completed already */
> > > >                         dev_err(hba->dev, "%s: cmd at tag %d successfully cleared
> > from
> > > > DB.\n",
> > > >                                 __func__, tag);
> > > > -                       goto out;
> > > > +                       goto cleanup;
> > > But you've arrived here only if (!(test_bit(tag, &hba->outstanding_reqs))) -
> > > See line 6400.
> > >
> > > >                 } else {
> > > >                         dev_err(hba->dev,
> > > >                                 "%s: no response from device. tag = %d, err %d\n",
> > > > @@ -6496,9 +6496,14 @@ static int ufshcd_abort(struct scsi_cmnd *cmd)
> > > >                 goto out;
> > > >         }
> > > >
> > > > +cleanup:
> > > > +       spin_lock_irqsave(host->host_lock, flags);
> > > > +       if (!test_bit(tag, &hba->outstanding_reqs)) {
> Is this needed?  it was already checked in line 6439.
> 

I am worried about the case that interrupt comes very lately. For
example, if interrupt finally comes while ufshcd_abort() is handling
this command, then probably this command may be completed first by
interrupt handler. In this case, ufshcd_abort() shall not clear this
command again. In contrast, if ufshcd_abort() clears this command first,
then interrupt shall not complete it. Thus here checking
hba->outstanding_req with host lock held is required to prevent above
racing.

Thanks,
Stanley Chu


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-07-14  8:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-06 13:21 [PATCH v3] scsi: ufs: Cleanup completed request without interrupt notification Stanley Chu
2020-07-09  8:31 ` Avri Altman
2020-07-12  1:26   ` Stanley Chu
2020-07-12 10:04     ` Avri Altman
2020-07-14  8:48       ` Stanley Chu [this message]
2020-07-14  9:29         ` Avri Altman
2020-07-14 10:00           ` Stanley Chu
2020-07-13  1:39 ` Bart Van Assche
2020-07-13  2:27   ` Stanley Chu
2020-07-13  8:10     ` Avri Altman
2020-07-15  4:00       ` Bart Van Assche
2020-07-22 10:07         ` Stanley Chu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1594716527.22878.28.camel@mtkswgap22 \
    --to=stanley.chu@mediatek.com \
    --cc=Avri.Altman@wdc.com \
    --cc=alim.akhtar@samsung.com \
    --cc=andy.teng@mediatek.com \
    --cc=asutoshd@codeaurora.org \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=cang@codeaurora.org \
    --cc=cc.chou@mediatek.com \
    --cc=chaotian.jing@mediatek.com \
    --cc=chun-hung.wu@mediatek.com \
    --cc=jejb@linux.ibm.com \
    --cc=kuohong.wang@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=matthias.bgg@gmail.com \
    --cc=peter.wang@mediatek.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).