linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
To: "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Tejun Heo <tj@kernel.org>,
	Johannes Thumshirn <Johannes.Thumshirn@wdc.com>,
	Damien Le Moal <damien.lemoal@opensource.wdc.com>
Subject: lockdep WARNING at blktests block/011
Date: Fri, 30 Sep 2022 00:19:45 +0000	[thread overview]
Message-ID: <20220930001943.zdbvolc3gkekfmcv@shindev> (raw)

Kernel v6.0-rc7 triggered the WARN "possible circular locking dependency
detected" at blktests test case block/011 for NVME devices [1]. The trigger
commit was c0feea594e05 ("workqueue: don't skip lockdep work dependency in
cancel_work_sync()"). The commit was backported to v5.15.71 stable kernel and
the WARN is observed with the stable kernel also. It looks that the possible
deadlock scenario has been hidden for long time and the commit revealed it now.

As the commit title says, it restored lockdep work dependency in
cancel_work_sync(), which is called from nvme_reset_work() via blk_sync_queue().
The test case repeats PCI disable during I/O on the PCI-NVME device, then
nvme_reset_work() call is repeated.

The WARN was observed with QEMU NVME emulation device. I tried two of my real
NVME devices, and one of them recreated the WARN. The other real NVME device did
not recreate it.

I tried to understand the deadlock scenario, but can not tell if it is for real,
or false-positive. Help to understand the scenario will be appreciated. The
lockdep splat mentions three locks (ctrl->namespaces_rwsem, dev->shutdown_lock
and q->timeout_work). In the related function call paths, it looks that
ctrl->namespaces_rwsem is locked only for read, so the deadlock scenario does
not look possible for me. (Maybe I'm misunderstanding something...)

[1]

[ 3033.791353] run blktests block/011 at 2022-09-27 13:24:34
[ 3064.944407] nvme nvme5: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10

[ 3066.531825] ======================================================
[ 3066.533372] WARNING: possible circular locking dependency detected
[ 3066.534914] 6.0.0-rc7 #1 Not tainted
[ 3066.535891] ------------------------------------------------------
[ 3066.537492] kworker/u8:0/904 is trying to acquire lock:
[ 3066.538807] ffff888102f24788 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}, at: __flush_work+0xc2/0x900
[ 3066.541161] 
               but task is already holding lock:
[ 3066.542793] ffff888116840528 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_sync_queues+0x26/0x100 [nvme_core]
[ 3066.545294] 
               which lock already depends on the new lock.

[ 3066.547563] 
               the existing dependency chain (in reverse order) is:
[ 3066.549530] 
               -> #2 (&ctrl->namespaces_rwsem){++++}-{3:3}:
[ 3066.551431]        down_read+0x3e/0x50
[ 3066.552421]        nvme_start_freeze+0x24/0xb0 [nvme_core]
[ 3066.553862]        nvme_dev_disable+0x4ee/0x660 [nvme]
[ 3066.555208]        nvme_timeout.cold+0x3f2/0x699 [nvme]
[ 3066.556575]        blk_mq_check_expired+0x19a/0x270
[ 3066.557865]        bt_iter+0x21e/0x300
[ 3066.558935]        blk_mq_queue_tag_busy_iter+0x7fd/0x1240
[ 3066.560356]        blk_mq_timeout_work+0x143/0x380
[ 3066.562602]        process_one_work+0x816/0x1300
[ 3066.563847]        worker_thread+0xfc/0x1270
[ 3066.564942]        kthread+0x29b/0x340
[ 3066.565973]        ret_from_fork+0x1f/0x30
[ 3066.567083] 
               -> #1 (&dev->shutdown_lock){+.+.}-{3:3}:
[ 3066.568851]        __mutex_lock+0x14c/0x12b0
[ 3066.570010]        nvme_dev_disable+0x5d/0x660 [nvme]
[ 3066.571309]        nvme_timeout.cold+0x3f2/0x699 [nvme]
[ 3066.572678]        blk_mq_check_expired+0x19a/0x270
[ 3066.573972]        bt_iter+0x21e/0x300
[ 3066.575024]        blk_mq_queue_tag_busy_iter+0x7fd/0x1240
[ 3066.576431]        blk_mq_timeout_work+0x143/0x380
[ 3066.577688]        process_one_work+0x816/0x1300
[ 3066.578882]        worker_thread+0xfc/0x1270
[ 3066.580046]        kthread+0x29b/0x340
[ 3066.581081]        ret_from_fork+0x1f/0x30
[ 3066.582194] 
               -> #0 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}:
[ 3066.584265]        __lock_acquire+0x2843/0x5520
[ 3066.585426]        lock_acquire+0x194/0x4f0
[ 3066.586458]        __flush_work+0xe2/0x900
[ 3066.587484]        __cancel_work_timer+0x27a/0x3a0
[ 3066.588568]        nvme_sync_queues+0x71/0x100 [nvme_core]
[ 3066.589782]        nvme_reset_work+0x137/0x39f0 [nvme]
[ 3066.590896]        process_one_work+0x816/0x1300
[ 3066.591895]        worker_thread+0xfc/0x1270
[ 3066.592795]        kthread+0x29b/0x340
[ 3066.593654]        ret_from_fork+0x1f/0x30
[ 3066.594569] 
               other info that might help us debug this:

[ 3066.596290] Chain exists of:
                 (work_completion)(&q->timeout_work) --> &dev->shutdown_lock --> &ctrl->namespaces_rwsem

[ 3066.598960]  Possible unsafe locking scenario:

[ 3066.600167]        CPU0                    CPU1
[ 3066.600985]        ----                    ----
[ 3066.601779]   lock(&ctrl->namespaces_rwsem);
[ 3066.602506]                                lock(&dev->shutdown_lock);
[ 3066.603597]                                lock(&ctrl->namespaces_rwsem);
[ 3066.604739]   lock((work_completion)(&q->timeout_work));
[ 3066.605653] 
                *** DEADLOCK ***

[ 3066.606728] 3 locks held by kworker/u8:0/904:
[ 3066.607436]  #0: ffff888111fe3138 ((wq_completion)nvme-reset-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300
[ 3066.608959]  #1: ffff88811ddcfdd0 ((work_completion)(&dev->ctrl.reset_work)){+.+.}-{0:0}, at: process_one_work+0x769/0x1300
[ 3066.610491]  #2: ffff888116840528 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_sync_queues+0x26/0x100 [nvme_core]
[ 3066.612010] 
               stack backtrace:
[ 3066.612819] CPU: 1 PID: 904 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #1
[ 3066.613951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 3066.615613] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 3066.616532] Call Trace:
[ 3066.616985]  <TASK>
[ 3066.617378]  dump_stack_lvl+0x5b/0x77
[ 3066.618057]  check_noncircular+0x242/0x2f0
[ 3066.618734]  ? lock_chain_count+0x20/0x20
[ 3066.727613]  ? print_circular_bug+0x1e0/0x1e0
[ 3066.950865]  ? unwind_next_frame+0x21b/0x1e60
[ 3066.951388]  ? ret_from_fork+0x1f/0x30
[ 3066.951855]  ? lockdep_lock+0xbc/0x1a0
[ 3066.952361]  ? call_rcu_zapped+0xc0/0xc0
[ 3066.952835]  __lock_acquire+0x2843/0x5520
[ 3066.953315]  ? stack_trace_save+0x81/0xa0
[ 3066.953845]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 3066.954439]  lock_acquire+0x194/0x4f0
[ 3066.954931]  ? __flush_work+0xc2/0x900
[ 3066.955389]  ? lock_downgrade+0x6b0/0x6b0
[ 3066.955865]  ? lockdep_unlock+0xf2/0x250
[ 3066.956380]  ? __lock_acquire+0x2080/0x5520
[ 3066.956878]  __flush_work+0xe2/0x900
[ 3066.957316]  ? __flush_work+0xc2/0x900
[ 3066.957813]  ? mark_lock+0xee/0x1610
[ 3066.958249]  ? queue_delayed_work_on+0x90/0x90
[ 3066.958774]  ? lock_chain_count+0x20/0x20
[ 3066.959312]  ? lock_acquire+0x194/0x4f0
[ 3066.959776]  ? lock_acquire+0x1a4/0x4f0
[ 3066.960233]  ? lock_is_held_type+0xe3/0x140
[ 3066.960777]  ? find_held_lock+0x2d/0x110
[ 3066.961247]  ? mark_held_locks+0x9e/0xe0
[ 3066.961707]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 3066.962330]  __cancel_work_timer+0x27a/0x3a0
[ 3066.962832]  ? cancel_delayed_work+0x10/0x10
[ 3066.963381]  ? _raw_spin_unlock_irqrestore+0x40/0x60
[ 3066.963952]  ? try_to_del_timer_sync+0xd0/0xd0
[ 3066.964467]  nvme_sync_queues+0x71/0x100 [nvme_core]
[ 3066.965087]  nvme_reset_work+0x137/0x39f0 [nvme]
[ 3066.965627]  ? __lock_acquire+0x1593/0x5520
[ 3066.966173]  ? nvme_queue_rqs+0xa80/0xa80 [nvme]
[ 3066.966713]  ? lock_acquire+0x194/0x4f0
[ 3066.967180]  ? lock_acquire+0x1a4/0x4f0
[ 3066.967698]  ? lock_downgrade+0x6b0/0x6b0
[ 3066.968176]  ? reacquire_held_locks+0x4e0/0x4e0
[ 3066.968751]  ? lock_is_held_type+0xe3/0x140
[ 3066.969247]  process_one_work+0x816/0x1300
[ 3066.969746]  ? lock_downgrade+0x6b0/0x6b0
[ 3066.970279]  ? pwq_dec_nr_in_flight+0x230/0x230
[ 3066.970818]  ? rwlock_bug.part.0+0x90/0x90
[ 3066.971303]  worker_thread+0xfc/0x1270
[ 3066.971818]  ? process_one_work+0x1300/0x1300
[ 3066.972330]  kthread+0x29b/0x340
[ 3066.972746]  ? kthread_complete_and_exit+0x20/0x20
[ 3066.973345]  ret_from_fork+0x1f/0x30
[ 3066.973791]  </TASK>
[ 3067.963008] nvme nvme5: 4/0/0 default/read/poll queues
[ 3067.972454] nvme nvme5: Ignoring bogus Namespace Identifiers
[ 3098.224067] nvme nvme5: I/O 936 (I/O Cmd) QID 2 timeout, aborting
[ 3098.225903] nvme nvme5: I/O 598 (I/O Cmd) QID 4 timeout, aborting
[ 3098.226229] nvme nvme5: Abort status: 0x0
[ 3098.227499] nvme nvme5: I/O 599 (I/O Cmd) QID 4 timeout, aborting
[ 3098.227872] nvme nvme5: Abort status: 0x0
[ 3098.231401] nvme nvme5: Abort status: 0x0
[ 3128.432035] nvme nvme5: I/O 936 QID 2 timeout, reset controller
[ 3130.808541] nvme nvme5: 4/0/0 default/read/poll queues
[ 3130.816416] nvme nvme5: Ignoring bogus Namespace Identifiers
[ 3161.199887] nvme nvme5: I/O 936 (I/O Cmd) QID 2 timeout, aborting
[ 3161.201909] nvme nvme5: Abort status: 0x0
[ 3191.407811] nvme nvme5: I/O 936 QID 2 timeout, reset controller
[ 3254.895644] nvme nvme5: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10
[ 3257.299155] nvme nvme5: 4/0/0 default/read/poll queues
[ 3257.312480] nvme nvme5: Ignoring bogus Namespace Identifiers

-- 
Shin'ichiro Kawasaki

             reply	other threads:[~2022-09-30  0:19 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30  0:19 Shinichiro Kawasaki [this message]
2022-09-30 11:06 ` lockdep WARNING at blktests block/011 Tetsuo Handa
2022-10-03 13:32   ` Shinichiro Kawasaki
2022-10-03 15:28     ` Keith Busch
2022-10-04 10:44       ` Shinichiro Kawasaki
2022-10-04 11:10         ` Tetsuo Handa
2022-10-04 12:23           ` Shinichiro Kawasaki
2022-10-05  8:31             ` Shinichiro Kawasaki
2022-10-05 10:00               ` Tetsuo Handa
2022-10-05 14:20                 ` Keith Busch
2022-10-06  2:30                   ` Shinichiro Kawasaki
2022-10-07  1:36                     ` Shinichiro Kawasaki
2022-10-04 22:34         ` Damien Le Moal
2022-10-07 20:34       ` Bart Van Assche
2022-10-10 13:31         ` Keith Busch
2022-10-11 17:11           ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220930001943.zdbvolc3gkekfmcv@shindev \
    --to=shinichiro.kawasaki@wdc.com \
    --cc=Johannes.Thumshirn@wdc.com \
    --cc=damien.lemoal@opensource.wdc.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).