Re: [PATCH 2/2] nvmet-tcp: Fix incorrect locking in state_change sk callback

From: Sagi Grimberg <sagi@grimberg.me>
To: Yi Zhang <yi.zhang@redhat.com>
Cc: linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	Keith Busch <kbusch@kernel.org>,
	Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com>
Subject: Re: [PATCH 2/2] nvmet-tcp: Fix incorrect locking in state_change sk callback
Date: Thu, 25 Mar 2021 15:39:55 -0700	[thread overview]
Message-ID: <318e8d36-ffea-8add-ef72-d66836c07693@grimberg.me> (raw)
In-Reply-To: <4fe7519f-b93e-a9b9-841d-56f0e3b647c4@redhat.com>

> Hi Sagi
> With the two patch, I reproduced another lock dependency issue, here is 

Hey Yi,

This one is different, still the fixes for the other one are valid...

Will look into this one too...

> the full log:
> 
> [  143.310362] run blktests nvme/003 at 2021-03-23 21:52:15
> [  143.927284] loop: module loaded
> [  144.027532] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [  144.059070] nvmet_tcp: enabling port 0 (127.0.0.1:4420)
> [  144.201559] nvmet: creating controller 1 for subsystem 
> nqn.2014-08.org.nvmexpress.discovery for NQN 
> nqn.2014-08.org.nvmexpress:uuid:e25db33098f14032b70b755db1976647.
> [  144.211644] nvme nvme1: new ctrl: NQN 
> "nqn.2014-08.org.nvmexpress.discovery", addr 127.0.0.1:4420
> [  154.400575] nvme nvme1: Removing ctrl: NQN 
> "nqn.2014-08.org.nvmexpress.discovery"
> 
> [  154.407970] ======================================================
> [  154.414871] WARNING: possible circular locking dependency detected
> [  154.421765] 5.12.0-rc3.fix+ #2 Not tainted
> [  154.426340] ------------------------------------------------------
> [  154.433232] kworker/7:2/260 is trying to acquire lock:
> [  154.438972] ffff888288e92030 
> ((work_completion)(&queue->io_work)){+.+.}-{0:0}, at: 
> __flush_work+0x118/0x1a0
> [  154.449882]
>                 but task is already holding lock:
> [  154.456395] ffffc90002b57db0 
> ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: 
> process_one_work+0x7c1/0x1480
> [  154.468263]
>                 which lock already depends on the new lock.
> 
> [  154.477393]
>                 the existing dependency chain (in reverse order) is:
> [  154.485739]
>                 -> #2 
> ((work_completion)(&queue->release_work)){+.+.}-{0:0}:
> [  154.494884]        __lock_acquire+0xb77/0x18d0
> [  154.499853]        lock_acquire+0x1ca/0x480
> [  154.504528]        process_one_work+0x813/0x1480
> [  154.509688]        worker_thread+0x590/0xf80
> [  154.514458]        kthread+0x368/0x440
> [  154.518650]        ret_from_fork+0x22/0x30
> [  154.523232]
>                 -> #1 ((wq_completion)events){+.+.}-{0:0}:
> [  154.530633]        __lock_acquire+0xb77/0x18d0
> [  154.535597]        lock_acquire+0x1ca/0x480
> [  154.540272]        flush_workqueue+0x101/0x1250
> [  154.545334]        nvmet_tcp_install_queue+0x22c/0x2a0 [nvmet_tcp]
> [  154.552242]        nvmet_install_queue+0x2a3/0x360 [nvmet]
> [  154.558387]        nvmet_execute_admin_connect+0x321/0x420 [nvmet]
> [  154.565305]        nvmet_tcp_io_work+0xa04/0xcfb [nvmet_tcp]
> [  154.571629]        process_one_work+0x8b2/0x1480
> [  154.576787]        worker_thread+0x590/0xf80
> [  154.581560]        kthread+0x368/0x440
> [  154.585749]        ret_from_fork+0x22/0x30
> [  154.590328]
>                 -> #0 ((work_completion)(&queue->io_work)){+.+.}-{0:0}:
> [  154.598989]        check_prev_add+0x15e/0x20f0
> [  154.603953]        validate_chain+0xec9/0x19c0
> [  154.608918]        __lock_acquire+0xb77/0x18d0
> [  154.613883]        lock_acquire+0x1ca/0x480
> [  154.618556]        __flush_work+0x139/0x1a0
> [  154.623229]        nvmet_tcp_release_queue_work+0x2e5/0xcb0 [nvmet_tcp]
> [  154.630621]        process_one_work+0x8b2/0x1480
> [  154.635780]        worker_thread+0x590/0xf80
> [  154.640549]        kthread+0x368/0x440
> [  154.644741]        ret_from_fork+0x22/0x30
> [  154.649321]
>                 other info that might help us debug this:
> 
> [  154.658257] Chain exists of:
>                   (work_completion)(&queue->io_work) --> 
> (wq_completion)events --> (work_completion)(&queue->release_work)
> 
> [  154.675070]  Possible unsafe locking scenario:
> 
> [  154.681679]        CPU0                    CPU1
> [  154.686728]        ----                    ----
> [  154.691776] lock((work_completion)(&queue->release_work));
> [  154.698102] lock((wq_completion)events);
> [  154.705493] lock((work_completion)(&queue->release_work));
> [  154.714631]   lock((work_completion)(&queue->io_work));
> [  154.720470]
>                  *** DEADLOCK ***
> 
> [  154.727080] 2 locks held by kworker/7:2/260:
> [  154.731849]  #0: ffff888100053148 
> ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x792/0x1480
> [  154.742458]  #1: ffffc90002b57db0 
> ((work_completion)(&queue->release_work)){+.+.}-{0:0}, at: 
> process_one_work+0x7c1/0x1480
> [  154.754809]
>                 stack backtrace:
> [  154.759674] CPU: 7 PID: 260 Comm: kworker/7:2 Not tainted 
> 5.12.0-rc3.fix+ #2
> [  154.767549] Hardware name: Dell Inc. PowerEdge 
> R730xd/\xc9\xb2\xdePow, BIOS 2.12.1 12/04/2020
> [  154.776197] Workqueue: events nvmet_tcp_release_queue_work [nvmet_tcp]
> [  154.783497] Call Trace:
> [  154.786231]  dump_stack+0x93/0xc2
> [  154.789942]  check_noncircular+0x26a/0x310
> [  154.794521]  ? print_circular_bug+0x460/0x460
> [  154.799391]  ? deref_stack_reg+0x170/0x170
> [  154.803967]  ? alloc_chain_hlocks+0x1de/0x520
> [  154.808843]  check_prev_add+0x15e/0x20f0
> [  154.813231]  validate_chain+0xec9/0x19c0
> [  154.817611]  ? check_prev_add+0x20f0/0x20f0
> [  154.822286]  ? save_trace+0x88/0x5e0
> [  154.826290]  __lock_acquire+0xb77/0x18d0
> [  154.830682]  lock_acquire+0x1ca/0x480
> [  154.834775]  ? __flush_work+0x118/0x1a0
> [  154.839066]  ? rcu_read_unlock+0x40/0x40
> [  154.843455]  ? __lock_acquire+0xb77/0x18d0
> [  154.848036]  __flush_work+0x139/0x1a0
> [  154.852120]  ? __flush_work+0x118/0x1a0
> [  154.856409]  ? start_flush_work+0x810/0x810
> [  154.861084]  ? mark_lock+0xd3/0x1470
> [  154.865082]  ? mark_lock_irq+0x1d10/0x1d10
> [  154.869662]  ? lock_downgrade+0x100/0x100
> [  154.874147]  ? mark_held_locks+0xa5/0xe0
> [  154.878522]  ? sk_stream_wait_memory+0xe40/0xe40
> [  154.883686]  ? lockdep_hardirqs_on_prepare.part.0+0x198/0x340
> [  154.890394]  ? __local_bh_enable_ip+0xa2/0x100
> [  154.895358]  ? trace_hardirqs_on+0x1c/0x160
> [  154.900034]  ? sk_stream_wait_memory+0xe40/0xe40
> [  154.905192]  nvmet_tcp_release_queue_work+0x2e5/0xcb0 [nvmet_tcp]
> [  154.911999]  ? lock_is_held_type+0x9a/0x110
> [  154.916676]  process_one_work+0x8b2/0x1480
> [  154.921255]  ? pwq_dec_nr_in_flight+0x260/0x260
> [  154.926315]  ? __lock_contended+0x910/0x910
> [  154.930990]  ? worker_thread+0x150/0xf80
> [  154.935374]  worker_thread+0x590/0xf80
> [  154.939564]  ? __kthread_parkme+0xcb/0x1b0
> [  154.944140]  ? process_one_work+0x1480/0x1480
> [  154.949007]  kthread+0x368/0x440
> [  154.952615]  ? _raw_spin_unlock_irq+0x24/0x30
> [  154.957482]  ? __kthread_bind_mask+0x90/0x90
> [  154.962255]  ret_from_fork+0x22/0x30
> 
> 
> On 3/21/21 3:08 PM, Sagi Grimberg wrote:
>> We are not changing anything in the TCP connection state so
>> we should not take a write_lock but rather a read lock.
>>
>> This caused a deadlock when running nvmet-tcp and nvme-tcp
>> on the same system, where state_change callbacks on the
>> host and on the controller side have causal relationship
>> and made lockdep report on this with blktests:
> 

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme