linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Osipenko <digetx@gmail.com>
To: Adrian Hunter <adrian.hunter@intel.com>,
	Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-mmc <linux-mmc@vger.kernel.org>,
	linux-block <linux-block@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Bough Chen <haibo.chen@nxp.com>,
	Alex Lemberg <alex.lemberg@sandisk.com>,
	Mateusz Nowak <mateusz.nowak@intel.com>,
	Yuliy Izrailov <Yuliy.Izrailov@sandisk.com>,
	Jaehoon Chung <jh80.chung@samsung.com>,
	Dong Aisheng <dongas86@gmail.com>,
	Das Asutosh <asutoshd@codeaurora.org>,
	Zhangfei Gao <zhangfei.gao@gmail.com>,
	Sahitya Tummala <stummala@codeaurora.org>,
	Harjani Ritesh <riteshh@codeaurora.org>,
	Venu Byravarasu <vbyravarasu@nvidia.com>,
	Linus Walleij <linus.walleij@linaro.org>,
	Shawn Lin <shawn.lin@rock-chips.com>,
	Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH V15 06/22] mmc: block: Add blk-mq support
Date: Wed, 21 Feb 2018 23:50:01 +0300	[thread overview]
Message-ID: <bc65b542-9703-aa04-f1c7-b10584e44391@gmail.com> (raw)
In-Reply-To: <1511962879-24262-7-git-send-email-adrian.hunter@intel.com>

On 29.11.2017 16:41, Adrian Hunter wrote:
> Define and use a blk-mq queue. Discards and flushes are processed
> synchronously, but reads and writes asynchronously. In order to support
> slow DMA unmapping, DMA unmapping is not done until after the next request
> is started. That means the request is not completed until then. If there is
> no next request then the completion is done by queued work.

Hello,

I'm using (running linux-next and doing some upstream development for) some old
NVIDIA Tegra tablet that has built-in (internal) and external MMC's and with the
blk-mq being enabled I'm observing a soft lockup. The lockup seems is
reproducible quite reliably by running fsck on any MMC partition, sometimes
kernels lockups on boot during probing partitions table (weirdly only when both
SDHCI's are present, i.e. internal storage enabled in DT and external SD is
inserted/enabled) and it also lockups pretty quickly in a case of just a general
use. Reverting mmc/ commits up to 1bec43a3b18 ("Remove option not to use
blk-mq") and disabling CONFIG_MMC_MQ_DEFAULT makes everything working fine
again. There is also a third SDHCI populated with built-in WiFi/Bluetooth SDIO
and I'm observing odd MMC timeouts with the blk-mq enabled, disabling
CONFIG_MMC_MQ_DEFAULT fixes these timeouts as well.

Any thoughts?

WiFi issue
========================

[   38.247006] mmc2: Timeout waiting for hardware interrupt.
[   38.247027] brcmfmac: brcmf_escan_timeout: timer expired
[   38.247036] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[   38.247047] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00000001
[   38.247055] mmc2: sdhci: Blk size:  0x00007008 | Blk cnt:  0x00000000
[   38.247062] mmc2: sdhci: Argument:  0x21000008 | Trn mode: 0x00000013
[   38.247070] mmc2: sdhci: Present:   0x01d70000 | Host ctl: 0x00000013
[   38.247077] mmc2: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[   38.247084] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   38.247091] mmc2: sdhci: Timeout:   0x0000000e | Int stat: 0x00000000
[   38.247098] mmc2: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02fc000b
[   38.247105] mmc2: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   38.247112] mmc2: sdhci: Caps:      0x61ff30b0 | Caps_1:   0x00000000
[   38.247119] mmc2: sdhci: Cmd:       0x0000353a | Max curr: 0x00000001
[   38.247126] mmc2: sdhci: Resp[0]:   0x00001800 | Resp[1]:  0x08002db5
[   38.247133] mmc2: sdhci: Resp[2]:   0x16da8000 | Resp[3]:  0x00000400
[   38.247139] mmc2: sdhci: Host ctl2: 0x00000000
[   38.247146] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x17c47200
[   38.247152] mmc2: sdhci: ============================================
[   38.247250] brcmfmac: brcmf_sdio_readframes: read 520 bytes from channel 1
failed: -84
[   38.247274] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK
[   40.807019] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[   40.807042] brcmfmac: brcmf_notify_escan_complete: Scan abort failed
[   48.487007] mmc2: Timeout waiting for hardware interrupt.
[   48.487057] mmc2: sdhci: ============ SDHCI REGISTER DUMP ===========
[   48.487096] mmc2: sdhci: Sys addr:  0x00000000 | Version:  0x00000001
[   48.487128] mmc2: sdhci: Blk size:  0x00007040 | Blk cnt:  0x00000001
[   48.487160] mmc2: sdhci: Argument:  0x21000040 | Trn mode: 0x00000013
[   48.487191] mmc2: sdhci: Present:   0x01d70000 | Host ctl: 0x00000013
[   48.487221] mmc2: sdhci: Power:     0x00000001 | Blk gap:  0x00000000
[   48.487251] mmc2: sdhci: Wake-up:   0x00000000 | Clock:    0x00000007
[   48.487281] mmc2: sdhci: Timeout:   0x0000000e | Int stat: 0x00000000
[   48.487313] mmc2: sdhci: Int enab:  0x02ff000b | Sig enab: 0x02fc000b
[   48.487343] mmc2: sdhci: AC12 err:  0x00000000 | Slot int: 0x00000000
[   48.487374] mmc2: sdhci: Caps:      0x61ff30b0 | Caps_1:   0x00000000
[   48.487404] mmc2: sdhci: Cmd:       0x0000353a | Max curr: 0x00000001
[   48.487435] mmc2: sdhci: Resp[0]:   0x00001000 | Resp[1]:  0x08002db5
[   48.487466] mmc2: sdhci: Resp[2]:   0x16da8000 | Resp[3]:  0x00000400
[   48.487493] mmc2: sdhci: Host ctl2: 0x00000000
[   48.487525] mmc2: sdhci: ADMA Err:  0x00000000 | ADMA Ptr: 0x17c47200
[   48.487552] mmc2: sdhci: ============================================
[   48.487749] brcmfmac: brcmf_sdio_readframes: read 480 bytes from channel 1
failed: -84
[   48.487822] brcmfmac: brcmf_sdio_rxfail: abort command, terminate frame, send NAK


Soft lockup issue
========================

# fsck -f /dev/disk/by-uuid/6768309f-3545-49d5-9ac7-d5be24d35ef2
fsck из util-linux 2.30.2
e2fsck 1.43.9 (8-Feb-2018)
Проход 1: Проверка inodes, блокs, а также размеров
Pass 2: Checking каталог structure
Pass 3: Checking каталог connectivity
Pass 4: Checking reference counts
Pass 5: Checking группа summary information
....

[  125.925436] INFO: task kworker/0:3H:263 blocked for more than 60 seconds.
[  125.925496]       Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[  125.925530] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  125.925562] kworker/0:3H    D    0   263      2 0x00000000
[  125.925653] Workqueue: kblockd mmc_blk_mq_complete_work
[  125.925747] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[  125.925805] [<c0b998f0>] (schedule) from [<c086c870>]
(__mmc_claim_host+0xdc/0x24c)
[  125.925849] [<c086c870>] (__mmc_claim_host) from [<c08750bc>]
(mmc_start_bkops+0x4c/0x190)
[  125.925895] [<c08750bc>] (mmc_start_bkops) from [<c087d254>]
(mmc_blk_urgent_bkops+0x48/0x5c)
[  125.925945] [<c087d254>] (mmc_blk_urgent_bkops) from [<c087f2e0>]
(mmc_blk_mq_complete_prev_req.part.5+0x74/0x210)
[  125.925995] [<c087f2e0>] (mmc_blk_mq_complete_prev_req.part.5) from
[<c08813d0>] (mmc_blk_mq_complete_work+0x30/0x34)
[  125.926049] [<c08813d0>] (mmc_blk_mq_complete_work) from [<c014306c>]
(process_one_work+0x1f8/0x584)
[  125.926093] [<c014306c>] (process_one_work) from [<c014402c>]
(worker_thread+0x68/0x5d4)
[  125.926144] [<c014402c>] (worker_thread) from [<c014965c>] (kthread+0x178/0x184)
[  125.926188] [<c014965c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[  125.926209] Exception stack(0xd579ffb0 to 0xd579fff8)
[  125.926239] ffa0:                                     00000000 00000000
00000000 00000000
[  125.926272] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[  125.926301] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  125.926361] INFO: task fsck.ext4:471 blocked for more than 60 seconds.
[  125.926399]       Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[  125.926427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  125.926455] fsck.ext4       D    0   471    470 0x00000000
[  125.926536] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[  125.926593] [<c0b998f0>] (schedule) from [<c015823c>] (io_schedule+0x20/0x40)
[  125.926654] [<c015823c>] (io_schedule) from [<c0220acc>]
(wait_on_page_bit+0x120/0x144)
[  125.926705] [<c0220acc>] (wait_on_page_bit) from [<c0220bc4>]
(__filemap_fdatawait_range+0xd4/0x114)
[  125.926757] [<c0220bc4>] (__filemap_fdatawait_range) from [<c0223784>]
(file_write_and_wait_range+0x98/0xb4)
[  125.926810] [<c0223784>] (file_write_and_wait_range) from [<c02c0300>]
(blkdev_fsync+0x2c/0x5c)
[  125.926867] [<c02c0300>] (blkdev_fsync) from [<c02b7ce8>]
(vfs_fsync_range+0x4c/0xb0)
[  125.926912] [<c02b7ce8>] (vfs_fsync_range) from [<c02b7dd4>] (do_fsync+0x4c/0x74)
[  125.926954] [<c02b7dd4>] (do_fsync) from [<c02b80a0>] (SyS_fsync+0x1c/0x20)
[  125.926996] [<c02b80a0>] (SyS_fsync) from [<c0101000>]
(ret_fast_syscall+0x0/0x54)
[  125.927016] Exception stack(0xd5631fa8 to 0xd5631ff0)
[  125.927050] 1fa0:                   00480fc8 00481108 00000004 00481108
00000000 00000000
[  125.927086] 1fc0: 00480fc8 00481108 00000000 00000076 00000000 7f2bb750
00483268 bec602e8
[  125.927113] 1fe0: 00000076 bec60288 b6de5e8b b6d67cf6
[  177.015618] random: crng init done
[  187.365434] INFO: task kworker/0:3H:263 blocked for more than 60 seconds.
[  187.365491]       Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[  187.365524] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  187.365556] kworker/0:3H    D    0   263      2 0x00000000
[  187.365647] Workqueue: kblockd mmc_blk_mq_complete_work
[  187.365741] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[  187.365798] [<c0b998f0>] (schedule) from [<c086c870>]
(__mmc_claim_host+0xdc/0x24c)
[  187.365842] [<c086c870>] (__mmc_claim_host) from [<c08750bc>]
(mmc_start_bkops+0x4c/0x190)
[  187.365887] [<c08750bc>] (mmc_start_bkops) from [<c087d254>]
(mmc_blk_urgent_bkops+0x48/0x5c)
[  187.365936] [<c087d254>] (mmc_blk_urgent_bkops) from [<c087f2e0>]
(mmc_blk_mq_complete_prev_req.part.5+0x74/0x210)
[  187.365986] [<c087f2e0>] (mmc_blk_mq_complete_prev_req.part.5) from
[<c08813d0>] (mmc_blk_mq_complete_work+0x30/0x34)
[  187.366039] [<c08813d0>] (mmc_blk_mq_complete_work) from [<c014306c>]
(process_one_work+0x1f8/0x584)
[  187.366083] [<c014306c>] (process_one_work) from [<c014402c>]
(worker_thread+0x68/0x5d4)
[  187.366134] [<c014402c>] (worker_thread) from [<c014965c>] (kthread+0x178/0x184)
[  187.366178] [<c014965c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[  187.366200] Exception stack(0xd579ffb0 to 0xd579fff8)
[  187.366229] ffa0:                                     00000000 00000000
00000000 00000000
[  187.366262] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000
[  187.366291] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  187.366350] INFO: task fsck.ext4:471 blocked for more than 60 seconds.
[  187.366388]       Not tainted
4.16.0-rc2-next-20180220-00101-gaefde91c3955-dirty #2041
[  187.366416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
[  187.366445] fsck.ext4       D    0   471    470 0x00000000
[  187.366526] [<c0b991cc>] (__schedule) from [<c0b998f0>] (schedule+0x60/0xcc)
[  187.366582] [<c0b998f0>] (schedule) from [<c015823c>] (io_schedule+0x20/0x40)
[  187.366644] [<c015823c>] (io_schedule) from [<c0220acc>]
(wait_on_page_bit+0x120/0x144)
[  187.366693] [<c0220acc>] (wait_on_page_bit) from [<c0220bc4>]
(__filemap_fdatawait_range+0xd4/0x114)
[  187.366745] [<c0220bc4>] (__filemap_fdatawait_range) from [<c0223784>]
(file_write_and_wait_range+0x98/0xb4)
[  187.366799] [<c0223784>] (file_write_and_wait_range) from [<c02c0300>]
(blkdev_fsync+0x2c/0x5c)
[  187.366857] [<c02c0300>] (blkdev_fsync) from [<c02b7ce8>]
(vfs_fsync_range+0x4c/0xb0)
[  187.366902] [<c02b7ce8>] (vfs_fsync_range) from [<c02b7dd4>] (do_fsync+0x4c/0x74)
[  187.366944] [<c02b7dd4>] (do_fsync) from [<c02b80a0>] (SyS_fsync+0x1c/0x20)
[  187.366987] [<c02b80a0>] (SyS_fsync) from [<c0101000>]
(ret_fast_syscall+0x0/0x54)
[  187.367008] Exception stack(0xd5631fa8 to 0xd5631ff0)
[  187.367041] 1fa0:                   00480fc8 00481108 00000004 00481108
00000000 00000000
[  187.367078] 1fc0: 00480fc8 00481108 00000000 00000076 00000000 7f2bb750
00483268 bec602e8
[  187.367103] 1fe0: 00000076 bec60288 b6de5e8b b6d67cf6

-- 
Dmitry

  reply	other threads:[~2018-02-21 20:50 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-29 13:40 [PATCH V15 00/22] mmc: Add Command Queue support Adrian Hunter
2017-11-29 13:40 ` [PATCH V15 01/22] mmc: block: No need to export mmc_cleanup_queue() Adrian Hunter
2017-11-29 13:40 ` [PATCH V15 02/22] mmc: block: Simplify cleaning up the queue Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 03/22] mmc: core: Make mmc_pre_req() and mmc_post_req() available Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 04/22] mmc: block: Add error-handling comments Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 05/22] mmc: core: Add parameter use_blk_mq Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 06/22] mmc: block: Add blk-mq support Adrian Hunter
2018-02-21 20:50   ` Dmitry Osipenko [this message]
2018-02-22  7:42     ` Adrian Hunter
2018-02-22 17:54       ` Dmitry Osipenko
2018-02-26 21:48         ` Dmitry Osipenko
2018-02-27  8:57           ` Linus Walleij
2018-02-27 12:04             ` Dmitry Osipenko
2018-02-27  9:28           ` Adrian Hunter
2018-03-01  8:55             ` EXT4 Oops (Re: [PATCH V15 06/22] mmc: block: Add blk-mq support) Adrian Hunter
2018-03-01  9:15               ` Jose R R
2018-03-01 16:07                 ` Theodore Ts'o
2018-03-01 16:04               ` Theodore Ts'o
2018-03-01 20:20                 ` Andreas Dilger
2018-03-02 16:39                   ` Dmitry Osipenko
2018-03-06  0:48                 ` Dmitry Osipenko
2017-11-29 13:41 ` [PATCH V15 07/22] mmc: block: Add CQE support Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 08/22] mmc: cqhci: support for command queue enabled host Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 09/22] mmc: sdhci-pci: Add CQHCI support for Intel GLK Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 10/22] mmc: block: blk-mq: Add support for direct completion Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 11/22] mmc: block: blk-mq: Separate card polling from recovery Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 12/22] mmc: block: Make card_busy_detect() accumulate all response error bits Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 13/22] mmc: block: blk-mq: Check error bits and save the exception bit when polling card busy Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 14/22] mmc: block: Check the timeout correctly in card_busy_detect() Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 15/22] mmc: block: Check for transfer state " Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 16/22] mmc: block: Add timeout_clks when calculating timeout Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 17/22] mmc: block: Reduce polling timeout from 10 minutes to 10 seconds Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 18/22] mmc: block: blk-mq: Stop using legacy recovery Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 19/22] mmc: mmc_test: Do not use mmc_start_areq() anymore Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 20/22] mmc: core: Remove option not to use blk-mq Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 21/22] mmc: block: Remove code no longer needed after the switch to blk-mq Adrian Hunter
2017-11-29 13:41 ` [PATCH V15 22/22] mmc: core: " Adrian Hunter
2017-11-29 15:47 ` [PATCH V15 00/22] mmc: Add Command Queue support Ulf Hansson
2017-12-01 13:13   ` Adrian Hunter
2017-12-05 10:10   ` Linus Walleij
2017-12-05 15:53     ` Ulf Hansson
2017-12-11 12:28   ` Ulf Hansson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bc65b542-9703-aa04-f1c7-b10584e44391@gmail.com \
    --to=digetx@gmail.com \
    --cc=Yuliy.Izrailov@sandisk.com \
    --cc=adrian.hunter@intel.com \
    --cc=alex.lemberg@sandisk.com \
    --cc=asutoshd@codeaurora.org \
    --cc=b.zolnierkie@samsung.com \
    --cc=dongas86@gmail.com \
    --cc=haibo.chen@nxp.com \
    --cc=hch@lst.de \
    --cc=jh80.chung@samsung.com \
    --cc=linus.walleij@linaro.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=mateusz.nowak@intel.com \
    --cc=riteshh@codeaurora.org \
    --cc=shawn.lin@rock-chips.com \
    --cc=stummala@codeaurora.org \
    --cc=ulf.hansson@linaro.org \
    --cc=vbyravarasu@nvidia.com \
    --cc=zhangfei.gao@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).