All of lore.kernel.org
 help / color / mirror / Atom feed
* rdma-for-next, rdma_rxe: inconsistent lock state
@ 2022-05-31 20:46 Bart Van Assche
  2022-05-31 20:55 ` Pearson, Robert B
  0 siblings, 1 reply; 3+ messages in thread
From: Bart Van Assche @ 2022-05-31 20:46 UTC (permalink / raw)
  To: Bob Pearson; +Cc: linux-rdma

Hi Bob,

With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one
kernel-doc comment")) I see the following:

================================
WARNING: inconsistent lock state
5.18.0-dbg #4 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe]
{SOFTIRQ-ON-W} state was registered at:
   __lock_acquire+0x45b/0xce0
   lock_acquire+0x18a/0x450
   _raw_spin_lock+0x34/0x50
   __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
   rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
   __ib_alloc_pd+0xa3/0x270 [ib_core]
   ib_mad_port_open+0x44a/0x790 [ib_core]
   ib_mad_init_device+0x8e/0x110 [ib_core]
   add_client_context+0x26a/0x330 [ib_core]
   enable_device_and_get+0x169/0x2b0 [ib_core]
   ib_register_device+0x26f/0x330 [ib_core]
   rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
   rxe_add+0x8c/0xc0 [rdma_rxe]
   rxe_net_add+0x5b/0x90 [rdma_rxe]
   rxe_newlink+0x71/0x80 [rdma_rxe]
   nldev_newlink+0x21e/0x370 [ib_core]
   rdma_nl_rcv_msg+0x200/0x410 [ib_core]
   rdma_nl_rcv+0x140/0x220 [ib_core]
   netlink_unicast+0x307/0x460
   netlink_sendmsg+0x422/0x750
   __sys_sendto+0x1c2/0x250
   __x64_sys_sendto+0x7f/0x90
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 71543
hardirqs last  enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0
hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60
softirqs last  enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1
softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60

other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(&xa->xa_lock#12);
   <Interrupt>
     lock(&xa->xa_lock#12);

  *** DEADLOCK ***
no locks held by ksoftirqd/2/25.

stack backtrace:
CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
Call Trace:
  <TASK>
  show_stack+0x52/0x58
  dump_stack_lvl+0x5b/0x82
  dump_stack+0x10/0x12
  print_usage_bug.part.0+0x29c/0x2ab
  mark_lock_irq.cold+0x54/0xbf
  mark_lock.part.0+0x3f5/0xa70
  mark_usage+0x74/0x1a0
  __lock_acquire+0x45b/0xce0
  lock_acquire+0x18a/0x450
  _raw_spin_lock_irqsave+0x43/0x60
  rxe_pool_get_index+0x73/0x170 [rdma_rxe]
  rxe_get_av+0xcc/0x140 [rdma_rxe]
  rxe_requester+0x34c/0xe60 [rdma_rxe]
  rxe_do_task+0xcc/0x140 [rdma_rxe]
  tasklet_action_common.constprop.0+0x168/0x1b0
  tasklet_action+0x42/0x60
  __do_softirq+0x1d8/0x6e1
  run_ksoftirqd+0x37/0x60
  smpboot_thread_fn+0x302/0x410
  kthread+0x183/0x1c0
  ret_from_fork+0x1f/0x30
  </TASK>

Is this perhaps the same issue as what I reported on May 6
(https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@acm.org/)?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: rdma-for-next, rdma_rxe: inconsistent lock state
  2022-05-31 20:46 rdma-for-next, rdma_rxe: inconsistent lock state Bart Van Assche
@ 2022-05-31 20:55 ` Pearson, Robert B
  2022-05-31 22:24   ` Yanjun Zhu
  0 siblings, 1 reply; 3+ messages in thread
From: Pearson, Robert B @ 2022-05-31 20:55 UTC (permalink / raw)
  To: Bart Van Assche, Bob Pearson; +Cc: linux-rdma



-----Original Message-----
From: Bart Van Assche <bvanassche@acm.org> 
Sent: Tuesday, May 31, 2022 3:47 PM
To: Bob Pearson <rpearsonhpe@gmail.com>
Cc: linux-rdma@vger.kernel.org
Subject: rdma-for-next, rdma_rxe: inconsistent lock state

Hi Bob,

With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one kernel-doc comment")) I see the following:

================================
WARNING: inconsistent lock state
5.18.0-dbg #4 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe] {SOFTIRQ-ON-W} state was registered at:
   __lock_acquire+0x45b/0xce0
   lock_acquire+0x18a/0x450
   _raw_spin_lock+0x34/0x50
   __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
   rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
   __ib_alloc_pd+0xa3/0x270 [ib_core]
   ib_mad_port_open+0x44a/0x790 [ib_core]
   ib_mad_init_device+0x8e/0x110 [ib_core]
   add_client_context+0x26a/0x330 [ib_core]
   enable_device_and_get+0x169/0x2b0 [ib_core]
   ib_register_device+0x26f/0x330 [ib_core]
   rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
   rxe_add+0x8c/0xc0 [rdma_rxe]
   rxe_net_add+0x5b/0x90 [rdma_rxe]
   rxe_newlink+0x71/0x80 [rdma_rxe]
   nldev_newlink+0x21e/0x370 [ib_core]
   rdma_nl_rcv_msg+0x200/0x410 [ib_core]
   rdma_nl_rcv+0x140/0x220 [ib_core]
   netlink_unicast+0x307/0x460
   netlink_sendmsg+0x422/0x750
   __sys_sendto+0x1c2/0x250
   __x64_sys_sendto+0x7f/0x90
   do_syscall_64+0x35/0x80
   entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 71543
hardirqs last  enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0 hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60 softirqs last  enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1 softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60

other info that might help us debug this:
  Possible unsafe locking scenario:
        CPU0
        ----
   lock(&xa->xa_lock#12);
   <Interrupt>
     lock(&xa->xa_lock#12);

  *** DEADLOCK ***
no locks held by ksoftirqd/2/25.

stack backtrace:
CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Call Trace:
  <TASK>
  show_stack+0x52/0x58
  dump_stack_lvl+0x5b/0x82
  dump_stack+0x10/0x12
  print_usage_bug.part.0+0x29c/0x2ab
  mark_lock_irq.cold+0x54/0xbf
  mark_lock.part.0+0x3f5/0xa70
  mark_usage+0x74/0x1a0
  __lock_acquire+0x45b/0xce0
  lock_acquire+0x18a/0x450
  _raw_spin_lock_irqsave+0x43/0x60
  rxe_pool_get_index+0x73/0x170 [rdma_rxe]
  rxe_get_av+0xcc/0x140 [rdma_rxe]
  rxe_requester+0x34c/0xe60 [rdma_rxe]
  rxe_do_task+0xcc/0x140 [rdma_rxe]
  tasklet_action_common.constprop.0+0x168/0x1b0
  tasklet_action+0x42/0x60
  __do_softirq+0x1d8/0x6e1
  run_ksoftirqd+0x37/0x60
  smpboot_thread_fn+0x302/0x410
  kthread+0x183/0x1c0
  ret_from_fork+0x1f/0x30
  </TASK>

Is this perhaps the same issue as what I reported on May 6 (https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@acm.org/)?

Thanks,

Bart.

(from windows)

Yes. There is a lock level bug in rxe_pool.c that requires a patch to fix. I have one that is a temporary fix.
Zhu had one that he posted  while ago but was never accepted. I don't want to step on his toes.
This is related to the "AH bug" i.e. rdmacm holding locks while calling into the verbs APIs which is just plain evil.

I'll send you my patch.

Bob

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rdma-for-next, rdma_rxe: inconsistent lock state
  2022-05-31 20:55 ` Pearson, Robert B
@ 2022-05-31 22:24   ` Yanjun Zhu
  0 siblings, 0 replies; 3+ messages in thread
From: Yanjun Zhu @ 2022-05-31 22:24 UTC (permalink / raw)
  To: Pearson, Robert B, Bart Van Assche, Bob Pearson; +Cc: linux-rdma

在 2022/6/1 4:55, Pearson, Robert B 写道:
> 
> 
> -----Original Message-----
> From: Bart Van Assche <bvanassche@acm.org>
> Sent: Tuesday, May 31, 2022 3:47 PM
> To: Bob Pearson <rpearsonhpe@gmail.com>
> Cc: linux-rdma@vger.kernel.org
> Subject: rdma-for-next, rdma_rxe: inconsistent lock state
> 
> Hi Bob,
> 
> With the rdma-for-next branch (commit 9c477178a0a1 ("RDMA/rtrs-clt: Fix one kernel-doc comment")) I see the following:
> 
> ================================
> WARNING: inconsistent lock state
> 5.18.0-dbg #4 Not tainted
> --------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ksoftirqd/2/25 [HC0[0]:SC1[1]:HE0:SE0] takes:
> ffff888116f0d350 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x73/0x170 [rdma_rxe] {SOFTIRQ-ON-W} state was registered at:
>     __lock_acquire+0x45b/0xce0
>     lock_acquire+0x18a/0x450
>     _raw_spin_lock+0x34/0x50
>     __rxe_add_to_pool+0xcc/0x140 [rdma_rxe]
>     rxe_alloc_pd+0x2d/0x40 [rdma_rxe]
>     __ib_alloc_pd+0xa3/0x270 [ib_core]
>     ib_mad_port_open+0x44a/0x790 [ib_core]
>     ib_mad_init_device+0x8e/0x110 [ib_core]
>     add_client_context+0x26a/0x330 [ib_core]
>     enable_device_and_get+0x169/0x2b0 [ib_core]
>     ib_register_device+0x26f/0x330 [ib_core]
>     rxe_register_device+0x1b4/0x1d0 [rdma_rxe]
>     rxe_add+0x8c/0xc0 [rdma_rxe]
>     rxe_net_add+0x5b/0x90 [rdma_rxe]
>     rxe_newlink+0x71/0x80 [rdma_rxe]
>     nldev_newlink+0x21e/0x370 [ib_core]
>     rdma_nl_rcv_msg+0x200/0x410 [ib_core]
>     rdma_nl_rcv+0x140/0x220 [ib_core]
>     netlink_unicast+0x307/0x460
>     netlink_sendmsg+0x422/0x750
>     __sys_sendto+0x1c2/0x250
>     __x64_sys_sendto+0x7f/0x90
>     do_syscall_64+0x35/0x80
>     entry_SYSCALL_64_after_hwframe+0x44/0xae
> irq event stamp: 71543
> hardirqs last  enabled at (71542): [<ffffffff810cdc28>] __local_bh_enable_ip+0x88/0xf0 hardirqs last disabled at (71543): [<ffffffff81e9d67d>] _raw_spin_lock_irqsave+0x5d/0x60 softirqs last  enabled at (71532): [<ffffffff82200467>] __do_softirq+0x467/0x6e1 softirqs last disabled at (71537): [<ffffffff810cda47>] run_ksoftirqd+0x37/0x60
> 
> other info that might help us debug this:
>    Possible unsafe locking scenario:
>          CPU0
>          ----
>     lock(&xa->xa_lock#12);
>     <Interrupt>
>       lock(&xa->xa_lock#12);
> 
>    *** DEADLOCK ***
> no locks held by ksoftirqd/2/25.
> 
> stack backtrace:
> CPU: 2 PID: 25 Comm: ksoftirqd/2 Not tainted 5.18.0-dbg #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014 Call Trace:
>    <TASK>
>    show_stack+0x52/0x58
>    dump_stack_lvl+0x5b/0x82
>    dump_stack+0x10/0x12
>    print_usage_bug.part.0+0x29c/0x2ab
>    mark_lock_irq.cold+0x54/0xbf
>    mark_lock.part.0+0x3f5/0xa70
>    mark_usage+0x74/0x1a0
>    __lock_acquire+0x45b/0xce0
>    lock_acquire+0x18a/0x450
>    _raw_spin_lock_irqsave+0x43/0x60
>    rxe_pool_get_index+0x73/0x170 [rdma_rxe]
>    rxe_get_av+0xcc/0x140 [rdma_rxe]
>    rxe_requester+0x34c/0xe60 [rdma_rxe]
>    rxe_do_task+0xcc/0x140 [rdma_rxe]
>    tasklet_action_common.constprop.0+0x168/0x1b0
>    tasklet_action+0x42/0x60
>    __do_softirq+0x1d8/0x6e1
>    run_ksoftirqd+0x37/0x60
>    smpboot_thread_fn+0x302/0x410
>    kthread+0x183/0x1c0
>    ret_from_fork+0x1f/0x30
>    </TASK>
> 
> Is this perhaps the same issue as what I reported on May 6 (https://lore.kernel.org/all/cf8b9980-3965-a4f6-07e0-d4b25755b0db@acm.org/)?
> 
> Thanks,
> 
> Bart.
> 
> (from windows)
> 
> Yes. There is a lock level bug in rxe_pool.c that requires a patch to fix. I have one that is a temporary fix.
> Zhu had one that he posted  while ago but was never accepted. I don't want to step on his toes.
> This is related to the "AH bug" i.e. rdmacm holding locks while calling into the verbs APIs which is just plain evil.

Yes. This patch is not accepted. And it seems that all expect that this 
problem should be fixed in your rcu patch series.

Zhu Yanjun

> 
> I'll send you my patch.
> 
> Bob


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-05-31 22:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-31 20:46 rdma-for-next, rdma_rxe: inconsistent lock state Bart Van Assche
2022-05-31 20:55 ` Pearson, Robert B
2022-05-31 22:24   ` Yanjun Zhu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.