From: 陈文超 <wenchao.chen666@gmail.com>
To: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ulf Hansson <ulf.hansson@linaro.org>,
baolin.wang@linux.alibaba.com, zhang.lyra@gmail.com,
linux-mmc@vger.kernel.org, linux-kernel@vger.kernel.org,
megoo.tang@gmail.com, lzx.stg@gmail.com
Subject: Re: [PATCH] mmc: host: Fix data stomping during mmc recovery
Date: Tue, 27 Sep 2022 13:45:21 +0800 [thread overview]
Message-ID: <CA+Da2qxNU9cuYLm3-RcTeqpSszeCtHw4bzQ2xe_=76mwaKiKSQ@mail.gmail.com> (raw)
In-Reply-To: <25977700-7ab7-9a3b-5c67-c6c5fe35a13a@intel.com>
On Tue, Sep 20, 2022 at 6:19 PM Adrian Hunter <adrian.hunter@intel.com> wrote:
>
> On 20/09/22 12:32, Ulf Hansson wrote:
> > + Adrian
> >
> > On Fri, 16 Sept 2022 at 11:05, Wenchao Chen <wenchao.chen666@gmail.com> wrote:
> >>
> >> From: Wenchao Chen <wenchao.chen@unisoc.com>
> >>
> >> The block device uses multiple queues to access emmc. There will be up to 3
> >> requests in the hsq of the host. The current code will check whether there
> >> is a request doing recovery before entering the queue, but it will not check
> >> whether there is a request when the lock is issued. The request is in recovery
> >> mode. If there is a request in recovery, then a read and write request is
> >> initiated at this time, and the conflict between the request and the recovery
> >> request will cause the data to be trampled.
> >>
> >> Signed-off-by: Wenchao Chen <wenchao.chen@unisoc.com>
> >
> > Looks like we should consider tagging this for stable kernels too, right?
Yes,
Kernel crash log:
[ 242.987575] process reclaim queue work at vmpressure 79
[ 243.041611] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G WC O
5.4.147-ab07227 #1
[ 243.041615] Hardware name: Generic DT based system
[ 243.041628] Workqueue: events mmc_mq_recovery_handler
[ 243.041638] PC is at mmc_blk_mq_recovery+0x2c/0x120
[ 243.041643] LR is at mmc_mq_recovery_handler+0x54/0xb8
[ 243.041648] pc : [<c0b9511c>] lr : [<c06e331c>] psr: 20030013
[ 243.041651] sp : ee941f00 ip : eed191a0 fp : ee941f08
[ 243.041655] r10: 00000000 r9 : e00aca0c r8 : e0243fc0
[ 243.041659] r7 : e0096000 r6 : eed1b280 r5 : 00000000 r4 : e00aca08
[ 243.041667] r3 : c0cb7fd0 r2 : 00000000 r1 : a68e26a3 r0 : e0096000
[ 243.059012] process reclaim queue work at vmpressure 72
dis -lx mmc_blk_mq_recovery
0xc0b950f0 <mmc_blk_mq_recovery>: push {r4, r5, r11, lr}
0xc0b950f4 <mmc_blk_mq_recovery+0x4>: add r11, sp, #8
0xc0b950f8 <mmc_blk_mq_recovery+0x8>: mov r4, r0
0xc0b950fc <mmc_blk_mq_recovery+0xc>: stmfd sp!, {lr}
0xc0b95100 <mmc_blk_mq_recovery+0x10>: ldmfd sp!, {lr}
0xc0b95104 <mmc_blk_mq_recovery+0x14>: ldr r0, [r4]
0xc0b95108 <mmc_blk_mq_recovery+0x18>: mov r2, #0
0xc0b9510c <mmc_blk_mq_recovery+0x1c>: ldr r5, [r4, #196] ; 0xc4
0xc0b95110 <mmc_blk_mq_recovery+0x20>: ldr r0, [r0]
0xc0b95114 <mmc_blk_mq_recovery+0x24>: str r2, [r4, #196] ; 0xc4
0xc0b95118 <mmc_blk_mq_recovery+0x28>: strb r2, [r4, #148] ; 0x94
0xc0b9511c <mmc_blk_mq_recovery+0x2c>: ldr r1, [r5, #404] ; 0x194
Analyze:
0xc0b9510c <mmc_blk_mq_recovery+0x1c>: ldr r5, [r4, #196] ; 0xc4
r4 = e00aca08
r4 + 0xc4 = E00ACACC
crash_arm> rd E00ACACC
e00acacc: ec00cc00
But r5 is 0, the correct value should be ec00cc00
Code:
void mmc_blk_mq_recovery(struct mmc_queue *mq)
{
struct request *req = mq->recovery_req;
struct mmc_host *host = mq->card->host;
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
mq->recovery_req = NULL;//<2>
mq->rw_wait = false;
if (mmc_blk_rq_error(&mqrq->brq)) {
mmc_retune_hold_now(host);
mmc_blk_mq_rw_recovery(mq, req);
}
mmc_blk_urgent_bkops(mq, mqrq);
mmc_blk_mq_post_req(mq, req, true);
}
static void mmc_blk_hsq_req_done(struct mmc_request *mrq)
{
struct mmc_queue_req *mqrq =
container_of(mrq, struct mmc_queue_req, brq.mrq);
struct request *req = mmc_queue_req_to_req(mqrq);
struct request_queue *q = req->q;
struct mmc_queue *mq = q->queuedata;
struct mmc_host *host = mq->card->host;
unsigned long flags;
if (mmc_blk_rq_error(&mqrq->brq) ||
mmc_blk_urgent_bkops_needed(mq, mqrq)) {
spin_lock_irqsave(&mq->lock, flags);
mq->recovery_needed = true;
mq->recovery_req = req; //<1>
spin_unlock_irqrestore(&mq->lock, flags);
host->cqe_ops->cqe_recovery_start(host);
schedule_work(&mq->recovery_work);
return;
}
mmc_blk_rw_reset_success(mq, req);
/*
* Block layer timeouts race with completions which means the normal
* completion path cannot be used during recovery.
*/
if (mq->in_recovery)
mmc_blk_cqe_complete_rq(mq, req);
else if (likely(!blk_should_fake_timeout(req->q)))
blk_mq_complete_request(req);
}
When <1> is executed, just after the previous work is executed <2>, at
this time, mq->recovery_req will be reassigned to NULL, causing the
kernel to crash.
So we need to wait for the recovery to complete before continuing to issue req.
> >
> >> ---
> >> drivers/mmc/host/mmc_hsq.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
> >> index a5e05ed0fda3..9d35453e7371 100644
> >> --- a/drivers/mmc/host/mmc_hsq.c
> >> +++ b/drivers/mmc/host/mmc_hsq.c
> >> @@ -34,7 +34,7 @@ static void mmc_hsq_pump_requests(struct mmc_hsq *hsq)
> >> spin_lock_irqsave(&hsq->lock, flags);
> >>
> >> /* Make sure we are not already running a request now */
> >> - if (hsq->mrq) {
> >> + if (hsq->mrq || hsq->recovery_halt) {
> >
> > This still looks a bit odd to me, but I may not fully understand the
> > code, as it's been a while since I looked at this.
> >
> > In particular, I wonder why the callers of mmc_hsq_pump_requests()
> > need to release the spin_lock before they call
> > mmc_hsq_pump_requests()? Is it because we want to allow some other
> > code that may be waiting for the spin_lock to be released, to run too?
>
> FWIW, I am not aware of any reason.
>
> >
> > If that isn't the case, it seems better to let the callers of
> > mmc_hsq_pump_requests() to keep holding the lock - and thus we can
> > avoid the additional check(s). In that case, it means the
> > "recovery_halt" flag has already been checked, for example.
> >
> >> spin_unlock_irqrestore(&hsq->lock, flags);
> >> return;
> >> }
> >> --
> >> 2.17.1
> >>
> >
> > Kind regards
> > Uffe
>
next prev parent reply other threads:[~2022-09-27 5:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-16 9:05 [PATCH] mmc: host: Fix data stomping during mmc recovery Wenchao Chen
2022-09-20 9:32 ` Ulf Hansson
2022-09-20 10:19 ` Adrian Hunter
2022-09-27 5:45 ` 陈文超 [this message]
2022-09-27 12:13 ` Ulf Hansson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+Da2qxNU9cuYLm3-RcTeqpSszeCtHw4bzQ2xe_=76mwaKiKSQ@mail.gmail.com' \
--to=wenchao.chen666@gmail.com \
--cc=adrian.hunter@intel.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mmc@vger.kernel.org \
--cc=lzx.stg@gmail.com \
--cc=megoo.tang@gmail.com \
--cc=ulf.hansson@linaro.org \
--cc=zhang.lyra@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.