* rdma-for-next, rdma_rxe: inconsistent lock state
@ 2022-05-31 20:46 Bart Van Assche
2022-05-31 20:55 ` Pearson, Robert B
0 siblings, 1 reply; 3+ messages in thread
From: Bart Van Assche @ 2022-05-31 20:46 UTC (permalink / raw)
To: Bob Pearson; +Cc: linux-rdma
Hi Bob,
With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one
kernel-doc comment")) I see the following:
================================
WARNING: inconsistent lock state
5.18.0-dbg #4 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe]
{SOFTIRQ-ON-W} state was registered at:
__lock_acquire+0x45b/0xce0
lock_acquire+0x18a/0x450
_raw_spin_lock+0x34/0x50
__rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
__ib_alloc_pd+0xa3/0x270 [ib_core]
ib_mad_port_open+0x44a/0x790 [ib_core]
ib_mad_init_device+0x8e/0x110 [ib_core]
add_client_context+0x26a/0x330 [ib_core]
enable_device_and_get+0x169/0x2b0 [ib_core]
ib_register_device+0x26f/0x330 [ib_core]
rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
rxe_add+0x8c/0xc0 [rdma_rxe]
rxe_net_add+0x5b/0x90 [rdma_rxe]
rxe_newlink+0x71/0x80 [rdma_rxe]
nldev_newlink+0x21e/0x370 [ib_core]
rdma_nl_rcv_msg+0x200/0x410 [ib_core]
rdma_nl_rcv+0x140/0x220 [ib_core]
netlink_unicast+0x307/0x460
netlink_sendmsg+0x422/0x750
__sys_sendto+0x1c2/0x250
__x64_sys_sendto+0x7f/0x90
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 71543
hardirqs last enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0
hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60
softirqs last enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1
softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xa->xa_lock#12);
<Interrupt>
lock(&xa->xa_lock#12);
*** DEADLOCK ***
no locks held by ksoftirqd/2/25.
stack backtrace:
CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
Call Trace:
<TASK>
show_stack+0x52/0x58
dump_stack_lvl+0x5b/0x82
dump_stack+0x10/0x12
print_usage_bug.part.0+0x29c/0x2ab
mark_lock_irq.cold+0x54/0xbf
mark_lock.part.0+0x3f5/0xa70
mark_usage+0x74/0x1a0
__lock_acquire+0x45b/0xce0
lock_acquire+0x18a/0x450
_raw_spin_lock_irqsave+0x43/0x60
rxe_pool_get_index+0x73/0x170 [rdma_rxe]
rxe_get_av+0xcc/0x140 [rdma_rxe]
rxe_requester+0x34c/0xe60 [rdma_rxe]
rxe_do_task+0xcc/0x140 [rdma_rxe]
tasklet_action_common.constprop.0+0x168/0x1b0
tasklet_action+0x42/0x60
__do_softirq+0x1d8/0x6e1
run_ksoftirqd+0x37/0x60
smpboot_thread_fn+0x302/0x410
kthread+0x183/0x1c0
ret_from_fork+0x1f/0x30
</TASK>
Is this perhaps the same issue as what I reported on May 6
(https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@acm.org/)?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 3+ messages in thread
* RE: rdma-for-next, rdma_rxe: inconsistent lock state
2022-05-31 20:46 rdma-for-next, rdma_rxe: inconsistent lock state Bart Van Assche
@ 2022-05-31 20:55 ` Pearson, Robert B
2022-05-31 22:24 ` Yanjun Zhu
0 siblings, 1 reply; 3+ messages in thread
From: Pearson, Robert B @ 2022-05-31 20:55 UTC (permalink / raw)
To: Bart Van Assche, Bob Pearson; +Cc: linux-rdma
-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org>
Sent: Tuesday, May 31, 2022 3:47 PM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: linux-rdma@vger.kernel.org
Subject: rdma-for-next, rdma_rxe: inconsistent lock state
Hi Bob,
With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one kernel-doc comment")) I see the following:
================================
WARNING: inconsistent lock state
5.18.0-dbg #4 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe] {SOFTIRQ-ON-W} state was registered at:
__lock_acquire+0x45b/0xce0
lock_acquire+0x18a/0x450
_raw_spin_lock+0x34/0x50
__rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
__ib_alloc_pd+0xa3/0x270 [ib_core]
ib_mad_port_open+0x44a/0x790 [ib_core]
ib_mad_init_device+0x8e/0x110 [ib_core]
add_client_context+0x26a/0x330 [ib_core]
enable_device_and_get+0x169/0x2b0 [ib_core]
ib_register_device+0x26f/0x330 [ib_core]
rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
rxe_add+0x8c/0xc0 [rdma_rxe]
rxe_net_add+0x5b/0x90 [rdma_rxe]
rxe_newlink+0x71/0x80 [rdma_rxe]
nldev_newlink+0x21e/0x370 [ib_core]
rdma_nl_rcv_msg+0x200/0x410 [ib_core]
rdma_nl_rcv+0x140/0x220 [ib_core]
netlink_unicast+0x307/0x460
netlink_sendmsg+0x422/0x750
__sys_sendto+0x1c2/0x250
__x64_sys_sendto+0x7f/0x90
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 71543
hardirqs last enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0 hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60 softirqs last enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1 softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xa->xa_lock#12);
<Interrupt>
lock(&xa->xa_lock#12);
*** DEADLOCK ***
no locks held by ksoftirqd/2/25.
stack backtrace:
CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Call Trace:
<TASK>
show_stack+0x52/0x58
dump_stack_lvl+0x5b/0x82
dump_stack+0x10/0x12
print_usage_bug.part.0+0x29c/0x2ab
mark_lock_irq.cold+0x54/0xbf
mark_lock.part.0+0x3f5/0xa70
mark_usage+0x74/0x1a0
__lock_acquire+0x45b/0xce0
lock_acquire+0x18a/0x450
_raw_spin_lock_irqsave+0x43/0x60
rxe_pool_get_index+0x73/0x170 [rdma_rxe]
rxe_get_av+0xcc/0x140 [rdma_rxe]
rxe_requester+0x34c/0xe60 [rdma_rxe]
rxe_do_task+0xcc/0x140 [rdma_rxe]
tasklet_action_common.constprop.0+0x168/0x1b0
tasklet_action+0x42/0x60
__do_softirq+0x1d8/0x6e1
run_ksoftirqd+0x37/0x60
smpboot_thread_fn+0x302/0x410
kthread+0x183/0x1c0
ret_from_fork+0x1f/0x30
</TASK>
Is this perhaps the same issue as what I reported on May 6 (https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@acm.org/)?
Thanks,
Bart.
(from windows)
Yes. There is a lock level bug in rxe_pool.c that requires a patch to fix. I have one that is a temporary fix.
Zhu had one that he posted while ago but was never accepted. I don't want to step on his toes.
This is related to the "AH bug" i.e. rdmacm holding locks while calling into the verbs APIs which is just plain evil.
I'll send you my patch.
Bob
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: rdma-for-next, rdma_rxe: inconsistent lock state
2022-05-31 20:55 ` Pearson, Robert B
@ 2022-05-31 22:24 ` Yanjun Zhu
0 siblings, 0 replies; 3+ messages in thread
From: Yanjun Zhu @ 2022-05-31 22:24 UTC (permalink / raw)
To: Pearson, Robert B, Bart Van Assche, Bob Pearson; +Cc: linux-rdma
在 2022/6/1 4:55, Pearson, Robert B 写道:
>
>
> -----Original Message-----
> From: Bart Van Assche <bvanassche@acm.org>
> Sent: Tuesday, May 31, 2022 3:47 PM
> To: Bob Pearson <rpearsonhpe@gmail.com>
> Cc: linux-rdma@vger.kernel.org
> Subject: rdma-for-next, rdma_rxe: inconsistent lock state
>
> Hi Bob,
>
> With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one kernel-doc comment")) I see the following:
>
> ================================
> WARNING: inconsistent lock state
> 5.18.0-dbg #4 Not tainted
> --------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
> ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe] {SOFTIRQ-ON-W} state was registered at:
> __lock_acquire+0x45b/0xce0
> lock_acquire+0x18a/0x450
> _raw_spin_lock+0x34/0x50
> __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
> rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
> __ib_alloc_pd+0xa3/0x270 [ib_core]
> ib_mad_port_open+0x44a/0x790 [ib_core]
> ib_mad_init_device+0x8e/0x110 [ib_core]
> add_client_context+0x26a/0x330 [ib_core]
> enable_device_and_get+0x169/0x2b0 [ib_core]
> ib_register_device+0x26f/0x330 [ib_core]
> rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
> rxe_add+0x8c/0xc0 [rdma_rxe]
> rxe_net_add+0x5b/0x90 [rdma_rxe]
> rxe_newlink+0x71/0x80 [rdma_rxe]
> nldev_newlink+0x21e/0x370 [ib_core]
> rdma_nl_rcv_msg+0x200/0x410 [ib_core]
> rdma_nl_rcv+0x140/0x220 [ib_core]
> netlink_unicast+0x307/0x460
> netlink_sendmsg+0x422/0x750
> __sys_sendto+0x1c2/0x250
> __x64_sys_sendto+0x7f/0x90
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> irq event stamp: 71543
> hardirqs last enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0 hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60 softirqs last enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1 softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
> CPU0
> ----
> lock(&xa->xa_lock#12);
> <Interrupt>
> lock(&xa->xa_lock#12);
>
> *** DEADLOCK ***
> no locks held by ksoftirqd/2/25.
>
> stack backtrace:
> CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Call Trace:
> <TASK>
> show_stack+0x52/0x58
> dump_stack_lvl+0x5b/0x82
> dump_stack+0x10/0x12
> print_usage_bug.part.0+0x29c/0x2ab
> mark_lock_irq.cold+0x54/0xbf
> mark_lock.part.0+0x3f5/0xa70
> mark_usage+0x74/0x1a0
> __lock_acquire+0x45b/0xce0
> lock_acquire+0x18a/0x450
> _raw_spin_lock_irqsave+0x43/0x60
> rxe_pool_get_index+0x73/0x170 [rdma_rxe]
> rxe_get_av+0xcc/0x140 [rdma_rxe]
> rxe_requester+0x34c/0xe60 [rdma_rxe]
> rxe_do_task+0xcc/0x140 [rdma_rxe]
> tasklet_action_common.constprop.0+0x168/0x1b0
> tasklet_action+0x42/0x60
> __do_softirq+0x1d8/0x6e1
> run_ksoftirqd+0x37/0x60
> smpboot_thread_fn+0x302/0x410
> kthread+0x183/0x1c0
> ret_from_fork+0x1f/0x30
> </TASK>
>
> Is this perhaps the same issue as what I reported on May 6 (https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@acm.org/)?
>
> Thanks,
>
> Bart.
>
> (from windows)
>
> Yes. There is a lock level bug in rxe_pool.c that requires a patch to fix. I have one that is a temporary fix.
> Zhu had one that he posted while ago but was never accepted. I don't want to step on his toes.
> This is related to the "AH bug" i.e. rdmacm holding locks while calling into the verbs APIs which is just plain evil.
Yes. This patch is not accepted. And it seems that all expect that this
problem should be fixed in your rcu patch series.
Zhu Yanjun
>
> I'll send you my patch.
>
> Bob
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-05-31 22:24 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 20:46 rdma-for-next, rdma_rxe: inconsistent lock state Bart Van Assche
2022-05-31 20:55 ` Pearson, Robert B
2022-05-31 22:24 ` Yanjun Zhu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.