linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ming Lei <ming.lei@redhat.com>
To: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: Re: [PATCH] blk-mq: put driver tag when this request is completed
Date: Thu, 2 Jul 2020 19:48:48 +0800	[thread overview]
Message-ID: <20200702114848.GE2452799@T590> (raw)
In-Reply-To: <5acf69fb-04b2-8649-1fc4-2cfe8aa8b9c7@samsung.com>

On Thu, Jul 02, 2020 at 12:19:08PM +0200, Marek Szyprowski wrote:
> On 02.07.2020 11:23, Ming Lei wrote:
> > On Thu, Jul 02, 2020 at 10:04:38AM +0200, Marek Szyprowski wrote:
> >> On 02.07.2020 03:22, Ming Lei wrote:
> >>> On Wed, Jul 01, 2020 at 04:16:32PM +0200, Marek Szyprowski wrote:
> >>>> On 01.07.2020 15:45, Ming Lei wrote:
> >>>>> On Wed, Jul 01, 2020 at 03:01:03PM +0200, Marek Szyprowski wrote:
> >>>>>> On 29.06.2020 11:47, Ming Lei wrote:
> >>>>>>> It is natural to release driver tag when this request is completed by
> >>>>>>> LLD or device since its purpose is for LLD use.
> >>>>>>>
> >>>>>>> One big benefit is that the released tag can be re-used quicker since
> >>>>>>> bio_endio() may take too long.
> >>>>>>>
> >>>>>>> Meantime we don't need to release driver tag for flush request.
> >>>>>>>
> >>>>>>> Cc: Christoph Hellwig <hch@lst.de>
> >>>>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> >>>>>> This patch landed recently in linux-next as commit 36a3df5a4574. Sadly
> >>>>>> it causes a regression on one of my test systems (ARM 32bit, Samsung
> >>>>>> Exynos5422 SoC based Odroid XU3 board with eMMC). The system boots fine
> >>>>>> and then after a few seconds every executed command hangs. No
> >>>>>> panic/ops/any other message. I will try to provide more information asap
> >>>>>> I find something to share. Simple reverting it in linux-next is not
> >>>>>> possible due to dependencies.
> >>>>> What is the exact eMMC's driver code(include the host driver)?
> >>>> dwmmc-exynos (drivers/mmc/host/dw_mmc-exynos.c)
> >>> Hi,
> >>>
> >>> Just take a quick look at mmc code, there are only two req->tag
> >>> consumers:
> >>>
> >>> 1) cqhci_tag
> >>> cqhci_tag
> >>> 	cqhci_request
> >>> 		host->cqe_ops->cqe_request
> >>> 			mmc_cqe_start_req
> >>> 	cqhci_timeout
> >>>
> >>> 2) mmc_hsq_request
> >>> mmc_hsq_request
> >>> 	host->cqe_ops->cqe_request
> >>> 		mmc_cqe_start_req
> >>>
> >>> mmc_cqe_start_req() is called before issuing this request to hardware,
> >>> so completion won't happen when the tag is used in mmc_cqe_start_req().
> >>>
> >>> cqhci_timeout() may race with normal completion, however looks the
> >>> following code can handle the race correctly:
> >>>
> >>>           spin_lock_irqsave(&cq_host->lock, flags);
> >>>           timed_out = slot->mrq == mrq;
> >>>
> >>> So still no idea why the commit causes the trouble for mmc.
> >>>
> >>> Do you know it is cqhci or mmc_hsh which works for dw_mmc-exynos?
> >>> And can you apply the following patch and see if warning can be
> >>> triggered?
> >>>
> >>> diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
> >>> index 75934f3c117e..2cb49ecfbf34 100644
> >>> --- a/drivers/mmc/host/cqhci.c
> >>> +++ b/drivers/mmc/host/cqhci.c
> >>> @@ -612,6 +612,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
> >>>    		goto out_unlock;
> >>>    	}
> >>>    
> >>> +	WARN_ON_ONCE(cq_host->slot[tag].mrq);
> >>>    	cq_host->slot[tag].mrq = mrq;
> >>>    	cq_host->slot[tag].flags = 0;
> >>>    
> >>> diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
> >>> index a5e05ed0fda3..11a4c1f3a970 100644
> >>> --- a/drivers/mmc/host/mmc_hsq.c
> >>> +++ b/drivers/mmc/host/mmc_hsq.c
> >>> @@ -227,6 +227,7 @@ static int mmc_hsq_request(struct mmc_host *mmc, struct mmc_request *mrq)
> >>>    		return -EBUSY;
> >>>    	}
> >>>    
> >>> +	WARN_ON_ONCE(hsq->slot[tag].mrq);
> >>>    	hsq->slot[tag].mrq = mrq;
> >>>    
> >>>    	/*
> >> None of the above is even compiled for my system (I'm using
> >> arm/exynos_defconfig), so this must be something else.
> > Hello Marek,
> >
> > Or can you boot the system with one workable disk(usb, nand, ...)?
> > then run some IO test on this eMMC, and collect debugfs log via the following
> > command after the hang is triggered:
> >
> > (cd /sys/kernel/debug/block/$MMC && find . -type f -exec grep -aH . {} \;)
> >
> > $MMC is this mmc disk name.
> 
> 
> I hope it helps.

It does help, :-)

Thanks for collecting the log, now I understood the reason: flush
request's driver tag is leaked in case that request isn't done via
blk_mq_complete_request(), such as freed via blk_mq_end_request()
directly.

Please try the following patch, which should have been one two-line
change if the driver tag cleanup patch isn't merged.

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ebab8f1044cb..7d62e9e5972e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -532,6 +532,26 @@ void blk_mq_free_request(struct request *rq)
 }
 EXPORT_SYMBOL_GPL(blk_mq_free_request);
 
+static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
+		struct request *rq)
+{
+	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
+	rq->tag = BLK_MQ_NO_TAG;
+
+	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
+		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
+		atomic_dec(&hctx->nr_active);
+	}
+}
+
+static inline void blk_mq_put_driver_tag(struct request *rq)
+{
+	if (rq->tag == BLK_MQ_NO_TAG || rq->internal_tag == BLK_MQ_NO_TAG)
+		return;
+
+	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
+}
+
 inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
 {
 	u64 now = 0;
@@ -551,6 +571,7 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error)
 
 	if (rq->end_io) {
 		rq_qos_done(rq->q, rq);
+		blk_mq_put_driver_tag(rq);
 		rq->end_io(rq, error);
 	} else {
 		blk_mq_free_request(rq);
@@ -862,26 +883,6 @@ static inline bool blk_mq_complete_need_ipi(struct request *rq)
 	return cpu_online(rq->mq_ctx->cpu);
 }
 
-static void __blk_mq_put_driver_tag(struct blk_mq_hw_ctx *hctx,
-		struct request *rq)
-{
-	blk_mq_put_tag(hctx->tags, rq->mq_ctx, rq->tag);
-	rq->tag = BLK_MQ_NO_TAG;
-
-	if (rq->rq_flags & RQF_MQ_INFLIGHT) {
-		rq->rq_flags &= ~RQF_MQ_INFLIGHT;
-		atomic_dec(&hctx->nr_active);
-	}
-}
-
-static inline void blk_mq_put_driver_tag(struct request *rq)
-{
-	if (rq->tag == BLK_MQ_NO_TAG || rq->internal_tag == BLK_MQ_NO_TAG)
-		return;
-
-	__blk_mq_put_driver_tag(rq->mq_hctx, rq);
-}
-
 bool blk_mq_complete_request_remote(struct request *rq)
 {
 	WRITE_ONCE(rq->state, MQ_RQ_COMPLETE);
@@ -1185,9 +1186,10 @@ static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
 	if (blk_mq_req_expired(rq, next))
 		blk_mq_rq_timed_out(rq, reserved);
 
-	if (is_flush_rq(rq, hctx))
+	if (is_flush_rq(rq, hctx)) {
+		blk_mq_put_driver_tag(rq);
 		rq->end_io(rq, 0);
-	else if (refcount_dec_and_test(&rq->ref))
+	} else if (refcount_dec_and_test(&rq->ref))
 		__blk_mq_free_request(rq);
 
 	return true;

Thanks,
Ming


  reply	other threads:[~2020-07-02 11:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29  9:47 [PATCH] blk-mq: put driver tag when this request is completed Ming Lei
2020-06-29 15:04 ` Christoph Hellwig
2020-06-29 15:56 ` Jens Axboe
     [not found] ` <CGME20200701130104eucas1p1f8dcce58bf704b726aee1e89980fe19e@eucas1p1.samsung.com>
2020-07-01 13:01   ` Marek Szyprowski
2020-07-01 13:45     ` Ming Lei
2020-07-01 14:16       ` Marek Szyprowski
2020-07-01 14:58         ` Marek Szyprowski
2020-07-02  1:22         ` Ming Lei
2020-07-02  5:03           ` Jens Axboe
2020-07-02  8:04           ` Marek Szyprowski
2020-07-02  9:23             ` Ming Lei
2020-07-02 10:19               ` Marek Szyprowski
2020-07-02 11:48                 ` Ming Lei [this message]
2020-07-02 12:12                   ` Marek Szyprowski
2020-07-06 14:40 Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200702114848.GE2452799@T590 \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=b.zolnierkie@samsung.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=m.szyprowski@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).