linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marek Szyprowski <m.szyprowski@samsung.com>
To: Ming Lei <ming.lei@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	linux-block@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Subject: Re: [PATCH] blk-mq: put driver tag when this request is completed
Date: Thu, 2 Jul 2020 14:12:13 +0200	[thread overview]
Message-ID: <e339f9ec-6b0a-dc9b-707b-1f871ac0863b@samsung.com> (raw)
In-Reply-To: <20200702114848.GE2452799@T590>

Hi Ming,

On 02.07.2020 13:48, Ming Lei wrote:
> On Thu, Jul 02, 2020 at 12:19:08PM +0200, Marek Szyprowski wrote:
>> On 02.07.2020 11:23, Ming Lei wrote:
>>> On Thu, Jul 02, 2020 at 10:04:38AM +0200, Marek Szyprowski wrote:
>>>> On 02.07.2020 03:22, Ming Lei wrote:
>>>>> On Wed, Jul 01, 2020 at 04:16:32PM +0200, Marek Szyprowski wrote:
>>>>>> On 01.07.2020 15:45, Ming Lei wrote:
>>>>>>> On Wed, Jul 01, 2020 at 03:01:03PM +0200, Marek Szyprowski wrote:
>>>>>>>> On 29.06.2020 11:47, Ming Lei wrote:
>>>>>>>>> It is natural to release driver tag when this request is completed by
>>>>>>>>> LLD or device since its purpose is for LLD use.
>>>>>>>>>
>>>>>>>>> One big benefit is that the released tag can be re-used quicker since
>>>>>>>>> bio_endio() may take too long.
>>>>>>>>>
>>>>>>>>> Meantime we don't need to release driver tag for flush request.
>>>>>>>>>
>>>>>>>>> Cc: Christoph Hellwig <hch@lst.de>
>>>>>>>>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>>>>>>>> This patch landed recently in linux-next as commit 36a3df5a4574. Sadly
>>>>>>>> it causes a regression on one of my test systems (ARM 32bit, Samsung
>>>>>>>> Exynos5422 SoC based Odroid XU3 board with eMMC). The system boots fine
>>>>>>>> and then after a few seconds every executed command hangs. No
>>>>>>>> panic/ops/any other message. I will try to provide more information asap
>>>>>>>> I find something to share. Simple reverting it in linux-next is not
>>>>>>>> possible due to dependencies.
>>>>>>> What is the exact eMMC's driver code(include the host driver)?
>>>>>> dwmmc-exynos (drivers/mmc/host/dw_mmc-exynos.c)
>>>>> Hi,
>>>>>
>>>>> Just take a quick look at mmc code, there are only two req->tag
>>>>> consumers:
>>>>>
>>>>> 1) cqhci_tag
>>>>> cqhci_tag
>>>>> 	cqhci_request
>>>>> 		host->cqe_ops->cqe_request
>>>>> 			mmc_cqe_start_req
>>>>> 	cqhci_timeout
>>>>>
>>>>> 2) mmc_hsq_request
>>>>> mmc_hsq_request
>>>>> 	host->cqe_ops->cqe_request
>>>>> 		mmc_cqe_start_req
>>>>>
>>>>> mmc_cqe_start_req() is called before issuing this request to hardware,
>>>>> so completion won't happen when the tag is used in mmc_cqe_start_req().
>>>>>
>>>>> cqhci_timeout() may race with normal completion, however looks the
>>>>> following code can handle the race correctly:
>>>>>
>>>>>            spin_lock_irqsave(&cq_host->lock, flags);
>>>>>            timed_out = slot->mrq == mrq;
>>>>>
>>>>> So still no idea why the commit causes the trouble for mmc.
>>>>>
>>>>> Do you know it is cqhci or mmc_hsh which works for dw_mmc-exynos?
>>>>> And can you apply the following patch and see if warning can be
>>>>> triggered?
>>>>>
>>>>> diff --git a/drivers/mmc/host/cqhci.c b/drivers/mmc/host/cqhci.c
>>>>> index 75934f3c117e..2cb49ecfbf34 100644
>>>>> --- a/drivers/mmc/host/cqhci.c
>>>>> +++ b/drivers/mmc/host/cqhci.c
>>>>> @@ -612,6 +612,7 @@ static int cqhci_request(struct mmc_host *mmc, struct mmc_request *mrq)
>>>>>     		goto out_unlock;
>>>>>     	}
>>>>>     
>>>>> +	WARN_ON_ONCE(cq_host->slot[tag].mrq);
>>>>>     	cq_host->slot[tag].mrq = mrq;
>>>>>     	cq_host->slot[tag].flags = 0;
>>>>>     
>>>>> diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
>>>>> index a5e05ed0fda3..11a4c1f3a970 100644
>>>>> --- a/drivers/mmc/host/mmc_hsq.c
>>>>> +++ b/drivers/mmc/host/mmc_hsq.c
>>>>> @@ -227,6 +227,7 @@ static int mmc_hsq_request(struct mmc_host *mmc, struct mmc_request *mrq)
>>>>>     		return -EBUSY;
>>>>>     	}
>>>>>     
>>>>> +	WARN_ON_ONCE(hsq->slot[tag].mrq);
>>>>>     	hsq->slot[tag].mrq = mrq;
>>>>>     
>>>>>     	/*
>>>> None of the above is even compiled for my system (I'm using
>>>> arm/exynos_defconfig), so this must be something else.
>>> Hello Marek,
>>>
>>> Or can you boot the system with one workable disk(usb, nand, ...)?
>>> then run some IO test on this eMMC, and collect debugfs log via the following
>>> command after the hang is triggered:
>>>
>>> (cd /sys/kernel/debug/block/$MMC && find . -type f -exec grep -aH . {} \;)
>>>
>>> $MMC is this mmc disk name.
>>
>> I hope it helps.
> It does help, :-)
>
> Thanks for collecting the log, now I understood the reason: flush
> request's driver tag is leaked in case that request isn't done via
> blk_mq_complete_request(), such as freed via blk_mq_end_request()
> directly.
>
> Please try the following patch, which should have been one two-line
> change if the driver tag cleanup patch isn't merged.


Yes, this fixes the issue on my test system! :)

Feel free to add:

Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>

to the final patch.


Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


  reply	other threads:[~2020-07-02 12:12 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-29  9:47 [PATCH] blk-mq: put driver tag when this request is completed Ming Lei
2020-06-29 15:04 ` Christoph Hellwig
2020-06-29 15:56 ` Jens Axboe
     [not found] ` <CGME20200701130104eucas1p1f8dcce58bf704b726aee1e89980fe19e@eucas1p1.samsung.com>
2020-07-01 13:01   ` Marek Szyprowski
2020-07-01 13:45     ` Ming Lei
2020-07-01 14:16       ` Marek Szyprowski
2020-07-01 14:58         ` Marek Szyprowski
2020-07-02  1:22         ` Ming Lei
2020-07-02  5:03           ` Jens Axboe
2020-07-02  8:04           ` Marek Szyprowski
2020-07-02  9:23             ` Ming Lei
2020-07-02 10:19               ` Marek Szyprowski
2020-07-02 11:48                 ` Ming Lei
2020-07-02 12:12                   ` Marek Szyprowski [this message]
2020-07-06 14:40 Ming Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e339f9ec-6b0a-dc9b-707b-1f871ac0863b@samsung.com \
    --to=m.szyprowski@samsung.com \
    --cc=axboe@kernel.dk \
    --cc=b.zolnierkie@samsung.com \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).