From: Ulf Hansson <ulf.hansson@linaro.org>
To: 陈文超 <wenchao.chen666@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>,
baolin.wang@linux.alibaba.com, zhang.lyra@gmail.com,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org,
megoo.tang@gmail.com, lzx.stg@gmail.com
Subject: Re: [PATCH] mmc: host: Fix data stomping during mmc recovery
Date: Tue, 27 Sep 2022 14:13:34 +0200 [thread overview]
Message-ID: <CAPDyKFqZUEse-bzNjdAX-jNDCq6qOhg9q7h8qMTZyLm9nPL5tQ@mail.gmail.com> (raw)
In-Reply-To: <CA+Da2qxNU9cuYLm3-RcTeqpSszeCtHw4bzQ2xe_=76mwaKiKSQ@mail.gmail.com>
On Tue, 27 Sept 2022 at 07:45, 陈文超 <wenchao.chen666@gmail.com> wrote:
>
> On Tue, Sep 20, 2022 at 6:19 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
> >
> > On 20/09/22 12:32, Ulf Hansson wrote:
> > > + Adrian
> > >
> > > On Fri, 16 Sept 2022 at 11:05, Wenchao Chen <wenchao.chen666@gmail.com> wrote:
> > >>
> > >> From: Wenchao Chen <wenchao.chen@unisoc.com>
> > >>
> > >> The block device uses multiple queues to access emmc. There will be up to 3
> > >> requests in the hsq of the host. The current code will check whether there
> > >> is a request doing recovery before entering the queue, but it will not check
> > >> whether there is a request when the lock is issued. The request is in recovery
> > >> mode. If there is a request in recovery, then a read and write request is
> > >> initiated at this time, and the conflict between the request and the recovery
> > >> request will cause the data to be trampled.
> > >>
> > >> Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com>
> > >
> > > Looks like we should consider tagging this for stable kernels too, right?
> Yes,
>
> Kernel crash log:
> [ 242.987575] process reclaim queue work at vmpressure 79
> [ 243.041611] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G WC O
> 5.4.147-ab07227 #1
> [ 243.041615] Hardware name: Generic DT based system
> [ 243.041628] Workqueue: events mmc_mq_recovery_handler
> [ 243.041638] PC is at mmc_blk_mq_recovery+0x2c/0x120
> [ 243.041643] LR is at mmc_mq_recovery_handler+0x54/0xb8
> [ 243.041648] pc : [<c0b9511c>] lr : [<c06e331c>] psr: 20030013
> [ 243.041651] sp : ee941f00 ip : eed191a0 fp : ee941f08
> [ 243.041655] r10: 00000000 r9 : e00aca0c r8 : e0243fc0
> [ 243.041659] r7 : e0096000 r6 : eed1b280 r5 : 00000000 r4 : e00aca08
> [ 243.041667] r3 : c0cb7fd0 r2 : 00000000 r1 : a68e26a3 r0 : e0096000
> [ 243.059012] process reclaim queue work at vmpressure 72
>
> dis -lx mmc_blk_mq_recovery
> 0xc0b950f0 <mmc_blk_mq_recovery>: push {r4, r5, r11, lr}
> 0xc0b950f4 <mmc_blk_mq_recovery+0x4>: add r11, sp, #8
> 0xc0b950f8 <mmc_blk_mq_recovery+0x8>: mov r4, r0
> 0xc0b950fc <mmc_blk_mq_recovery+0xc>: stmfd sp!, {lr}
> 0xc0b95100 <mmc_blk_mq_recovery+0x10>: ldmfd sp!, {lr}
> 0xc0b95104 <mmc_blk_mq_recovery+0x14>: ldr r0, [r4]
> 0xc0b95108 <mmc_blk_mq_recovery+0x18>: mov r2, #0
> 0xc0b9510c <mmc_blk_mq_recovery+0x1c>: ldr r5, [r4, #196] ; 0xc4
> 0xc0b95110 <mmc_blk_mq_recovery+0x20>: ldr r0, [r0]
> 0xc0b95114 <mmc_blk_mq_recovery+0x24>: str r2, [r4, #196] ; 0xc4
> 0xc0b95118 <mmc_blk_mq_recovery+0x28>: strb r2, [r4, #148] ; 0x94
> 0xc0b9511c <mmc_blk_mq_recovery+0x2c>: ldr r1, [r5, #404] ; 0x194
>
> Analyze:
> 0xc0b9510c <mmc_blk_mq_recovery+0x1c>: ldr r5, [r4, #196] ; 0xc4
> r4 = e00aca08
> r4 + 0xc4 = E00ACACC
> crash_arm> rd E00ACACC
> e00acacc: ec00cc00
> But r5 is 0, the correct value should be ec00cc00
>
> Code:
> void mmc_blk_mq_recovery(struct mmc_queue *mq)
> {
> struct request *req = mq->recovery_req;
> struct mmc_host *host = mq->card->host;
> struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
>
> mq->recovery_req = NULL;//<2>
> mq->rw_wait = false;
>
> if (mmc_blk_rq_error(&mqrq->brq)) {
> mmc_retune_hold_now(host);
> mmc_blk_mq_rw_recovery(mq, req);
> }
>
> mmc_blk_urgent_bkops(mq, mqrq);
>
> mmc_blk_mq_post_req(mq, req, true);
> }
>
> static void mmc_blk_hsq_req_done(struct mmc_request *mrq)
> {
> struct mmc_queue_req *mqrq =
> container_of(mrq, struct mmc_queue_req, brq.mrq);
> struct request *req = mmc_queue_req_to_req(mqrq);
> struct request_queue *q = req->q;
> struct mmc_queue *mq = q->queuedata;
> struct mmc_host *host = mq->card->host;
> unsigned long flags;
>
> if (mmc_blk_rq_error(&mqrq->brq) ||
> mmc_blk_urgent_bkops_needed(mq, mqrq)) {
> spin_lock_irqsave(&mq->lock, flags);
> mq->recovery_needed = true;
> mq->recovery_req = req; //<1>
> spin_unlock_irqrestore(&mq->lock, flags);
>
> host->cqe_ops->cqe_recovery_start(host);
>
> schedule_work(&mq->recovery_work);
> return;
> }
>
> mmc_blk_rw_reset_success(mq, req);
>
> /*
> * Block layer timeouts race with completions which means the normal
> * completion path cannot be used during recovery.
> */
> if (mq->in_recovery)
> mmc_blk_cqe_complete_rq(mq, req);
> else if (likely(!blk_should_fake_timeout(req->q)))
> blk_mq_complete_request(req);
> }
>
> When <1> is executed, just after the previous work is executed <2>, at
> this time, mq->recovery_req will be reassigned to NULL, causing the
> kernel to crash.
> So we need to wait for the recovery to complete before continuing to issue req.
>
> > >
> > >> ---
> > >> drivers/mmc/host/mmc_hsq.c | 2 +-
> > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
> > >> index a5e05ed0fda3..9d35453e7371 100644
> > >> --- a/drivers/mmc/host/mmc_hsq.c
> > >> +++ b/drivers/mmc/host/mmc_hsq.c
> > >> @@ -34,7 +34,7 @@ static void mmc_hsq_pump_requests(struct mmc_hsq *hsq)
> > >> spin_lock_irqsave(&hsq->lock, flags);
> > >>
> > >> /* Make sure we are not already running a request now */
> > >> - if (hsq->mrq) {
> > >> + if (hsq->mrq || hsq->recovery_halt) {
> > >
> > > This still looks a bit odd to me, but I may not fully understand the
> > > code, as it's been a while since I looked at this.
> > >
> > > In particular, I wonder why the callers of mmc_hsq_pump_requests()
> > > need to release the spin_lock before they call
> > > mmc_hsq_pump_requests()? Is it because we want to allow some other
> > > code that may be waiting for the spin_lock to be released, to run too?
> >
> > FWIW, I am not aware of any reason.
> >
> > >
> > > If that isn't the case, it seems better to let the callers of
> > > mmc_hsq_pump_requests() to keep holding the lock - and thus we can
> > > avoid the additional check(s). In that case, it means the
> > > "recovery_halt" flag has already been checked, for example.
> > >
> > >> spin_unlock_irqrestore(&hsq->lock, flags);
> > >> return;
> > >> }
> > >> --
> > >> 2.17.1
> > >>
> > >
> > > Kind regards
> > > Uffe
> >
Alright, as I am tagging this for stable it's nice to keep the change
small and simple. So I decided to pick $subject patch as is and
applied it on my fixes branch.
That said, would you mind having a look at whether it makes sense to
change the locking paths, as suggested earlier? Note that, this is
better done as a separate change on top (if even possible).
Thanks and kind regards
Uffe
prev parent reply other threads:[~2022-09-27 12:14 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 9:05 [PATCH] mmc: host: Fix data stomping during mmc recovery Wenchao Chen
2022-09-20 9:32 ` Ulf Hansson
2022-09-20 10:19 ` Adrian Hunter
2022-09-27 5:45 ` 陈文超
2022-09-27 12:13 ` Ulf Hansson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAPDyKFqZUEse-bzNjdAX-jNDCq6qOhg9q7h8qMTZyLm9nPL5tQ@mail.gmail.com \
--to=ulf.hansson@linaro.org \
--cc=adrian.hunter@intel.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
--cc=lzx.stg@gmail.com \
--cc=megoo.tang@gmail.com \
--cc=wenchao.chen666@gmail.com \
--cc=zhang.lyra@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.