linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Lockdep splat in RXE (softRoCE) driver in xarray accesses
@ 2022-05-27  9:57 David Howells
  2022-05-28  0:23 ` Yanjun Zhu
  0 siblings, 1 reply; 2+ messages in thread
From: David Howells @ 2022-05-27  9:57 UTC (permalink / raw)
  To: Zhu Yanjun, Bob Pearson, Steve French
  Cc: dhowells, willy, Tom Talpey, Namjae Jeon, linux-rdma, linux-cifs,
	linux-kernel

Hi Zhu, Bob, Steve,

There seems to be a locking bug in the softRoCE driver when mounting a cifs
share.  See attached trace.  I'm guessing the problem is that a softirq
handler is accessing the xarray, but other accesses to the xarray aren't
guarded by _bh or _irq markers on the lock primitives.

I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not
take the spinlock.

Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or
xa_alloc_cyclic_irq().

I used the following commands:

   rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE
   mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=...

talking to ksmbd on the other side.

Kernel is v5.18-rc6.

David
---
infiniband rxe0: set active
infiniband rxe0: added enp6s0
RDS/IB: rxe0: added
CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
CIFS: Attempting to mount \\192.168.6.1\scratch

================================
WARNING: inconsistent lock state
5.18.0-rc6-build2+ #465 Not tainted
--------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes:
ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69
{SOFTIRQ-ON-W} state was registered at:
  mark_usage+0x169/0x17b
  __lock_acquire+0x50c/0x96a
  lock_acquire+0x2f4/0x37b
  _raw_spin_lock+0x2f/0x39
  xa_alloc_cyclic.constprop.0+0x20/0x55
  __rxe_add_to_pool+0xe3/0xf2
  __ib_alloc_pd+0xa2/0x26b
  ib_mad_port_open+0x1ac/0x4a1
  ib_mad_init_device+0x9b/0x1b9
  add_client_context+0x133/0x1b3
  enable_device_and_get+0x129/0x248
  ib_register_device+0x256/0x2fd
  rxe_register_device+0x18e/0x1b7
  rxe_net_add+0x57/0x71
  rxe_newlink+0x71/0x8e
  nldev_newlink+0x200/0x26a
  rdma_nl_rcv_msg+0x260/0x2ab
  rdma_nl_rcv+0x108/0x1a7
  netlink_unicast+0x1fc/0x2b3
  netlink_sendmsg+0x4ce/0x51b
  sock_sendmsg_nosec+0x41/0x4f
  __sys_sendto+0x157/0x1cc
  __x64_sys_sendto+0x76/0x82
  do_syscall_64+0x39/0x46
  entry_SYSCALL_64_after_hwframe+0x44/0xae
irq event stamp: 194111
hardirqs last  enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc
hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51
softirqs last  enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489
softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&xa->xa_lock#12);
  <Interrupt>
    lock(&xa->xa_lock#12);

 *** DEADLOCK ***

no locks held by ksoftirqd/1/20.

stack backtrace:
CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x45/0x59
 valid_state+0x56/0x61
 mark_lock_irq+0x9b/0x2ec
 ? ret_from_fork+0x1f/0x30
 ? valid_state+0x61/0x61
 ? stack_trace_save+0x8f/0xbe
 ? filter_irq_stacks+0x58/0x58
 ? jhash.constprop.0+0x1ad/0x202
 ? save_trace+0x17c/0x196
 mark_lock.part.0+0x10c/0x164
 mark_usage+0xe6/0x17b
 __lock_acquire+0x50c/0x96a
 lock_acquire+0x2f4/0x37b
 ? rxe_pool_get_index+0x19/0x69
 ? rcu_read_unlock+0x52/0x52
 ? jhash.constprop.0+0x1ad/0x202
 ? lockdep_unlock+0xde/0xe6
 ? validate_chain+0x44a/0x4a8
 ? req_next_wqe+0x312/0x363
 _raw_spin_lock_irqsave+0x41/0x51
 ? rxe_pool_get_index+0x19/0x69
 rxe_pool_get_index+0x19/0x69
 rxe_get_av+0xbe/0x14b
 rxe_requester+0x6b5/0xbb0
 ? rnr_nak_timer+0x16/0x16
 ? lock_downgrade+0xad/0xad
 ? rcu_read_lock_bh_held+0xab/0xab
 ? __wake_up+0xf/0xf
 ? mark_held_locks+0x1f/0x78
 ? __local_bh_enable_ip+0xb8/0xcc
 ? rnr_nak_timer+0x16/0x16
 rxe_do_task+0xb5/0x13d
 ? rxe_detach_mcast+0x1d6/0x1d6
 tasklet_action_common.constprop.0+0xda/0x145
 __do_softirq+0x202/0x489
 ? __irq_exit_rcu+0x108/0x108
 ? _local_bh_enable+0x1c/0x1c
 run_ksoftirqd+0x31/0x56
 smpboot_thread_fn+0x35c/0x376
 ? sort_range+0x1c/0x1c
 kthread+0x164/0x173
 ? kthread_complete_and_exit+0x20/0x20
 ret_from_fork+0x1f/0x30
 </TASK>
CIFS: VFS: RDMA transport established


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Lockdep splat in RXE (softRoCE) driver in xarray accesses
  2022-05-27  9:57 Lockdep splat in RXE (softRoCE) driver in xarray accesses David Howells
@ 2022-05-28  0:23 ` Yanjun Zhu
  0 siblings, 0 replies; 2+ messages in thread
From: Yanjun Zhu @ 2022-05-28  0:23 UTC (permalink / raw)
  To: David Howells, Zhu Yanjun, Bob Pearson, Steve French
  Cc: willy, Tom Talpey, Namjae Jeon, linux-rdma, linux-cifs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4814 bytes --]

在 2022/5/27 17:57, David Howells 写道:
> Hi Zhu, Bob, Steve,
> 
> There seems to be a locking bug in the softRoCE driver when mounting a cifs
> share.  See attached trace.  I'm guessing the problem is that a softirq
> handler is accessing the xarray, but other accesses to the xarray aren't
> guarded by _bh or _irq markers on the lock primitives.
> 
> I wonder if rxe_pool_get_index() should just rely on the RCU read lock and not
> take the spinlock.
> 
> Alternatively, __rxe_add_to_pool() should be using xa_alloc_cyclic_bh() or
> xa_alloc_cyclic_irq().
> 
> I used the following commands:
> 
>     rdma link add rxe0 type rxe netdev enp6s0 # andromeda, softRoCE
>     mount //192.168.6.1/scratch /xfstest.scratch -o user=shares,rdma,pass=...
> 
> talking to ksmbd on the other side.

It seems a known bug. Please make tests with the patches in the 
attachment. If not work, please let me know.

Thanks a lot.
Zhu Yanjun

> 
> Kernel is v5.18-rc6.
> 
> David
> ---
> infiniband rxe0: set active
> infiniband rxe0: added enp6s0
> RDS/IB: rxe0: added
> CIFS: No dialect specified on mount. Default has changed to a more secure dialect, SMB2.1 or later (e.g. SMB3.1.1), from CIFS (SMB1). To use the less secure SMB1 dialect to access old servers which do not support SMB3.1.1 (or even SMB3 or SMB2.1) specify vers=1.0 on mount.
> CIFS: Attempting to mount \\192.168.6.1\scratch
> 
> ================================
> WARNING: inconsistent lock state
> 5.18.0-rc6-build2+ #465 Not tainted
> --------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ksoftirqd/1/20 [HC0[0]:SC1[1]:HE0:SE0] takes:
> ffff888134d11310 (&xa->xa_lock#12){+.?.}-{2:2}, at: rxe_pool_get_index+0x19/0x69
> {SOFTIRQ-ON-W} state was registered at:
>    mark_usage+0x169/0x17b
>    __lock_acquire+0x50c/0x96a
>    lock_acquire+0x2f4/0x37b
>    _raw_spin_lock+0x2f/0x39
>    xa_alloc_cyclic.constprop.0+0x20/0x55
>    __rxe_add_to_pool+0xe3/0xf2
>    __ib_alloc_pd+0xa2/0x26b
>    ib_mad_port_open+0x1ac/0x4a1
>    ib_mad_init_device+0x9b/0x1b9
>    add_client_context+0x133/0x1b3
>    enable_device_and_get+0x129/0x248
>    ib_register_device+0x256/0x2fd
>    rxe_register_device+0x18e/0x1b7
>    rxe_net_add+0x57/0x71
>    rxe_newlink+0x71/0x8e
>    nldev_newlink+0x200/0x26a
>    rdma_nl_rcv_msg+0x260/0x2ab
>    rdma_nl_rcv+0x108/0x1a7
>    netlink_unicast+0x1fc/0x2b3
>    netlink_sendmsg+0x4ce/0x51b
>    sock_sendmsg_nosec+0x41/0x4f
>    __sys_sendto+0x157/0x1cc
>    __x64_sys_sendto+0x76/0x82
>    do_syscall_64+0x39/0x46
>    entry_SYSCALL_64_after_hwframe+0x44/0xae
> irq event stamp: 194111
> hardirqs last  enabled at (194110): [<ffffffff81094eb2>] __local_bh_enable_ip+0xb8/0xcc
> hardirqs last disabled at (194111): [<ffffffff82040077>] _raw_spin_lock_irqsave+0x1b/0x51
> softirqs last  enabled at (194100): [<ffffffff8240043a>] __do_softirq+0x43a/0x489
> softirqs last disabled at (194105): [<ffffffff81094d30>] run_ksoftirqd+0x31/0x56
> 
> other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&xa->xa_lock#12);
>    <Interrupt>
>      lock(&xa->xa_lock#12);
> 
>   *** DEADLOCK ***
> 
> no locks held by ksoftirqd/1/20.
> 
> stack backtrace:
> CPU: 1 PID: 20 Comm: ksoftirqd/1 Not tainted 5.18.0-rc6-build2+ #465
> Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
> Call Trace:
>   <TASK>
>   dump_stack_lvl+0x45/0x59
>   valid_state+0x56/0x61
>   mark_lock_irq+0x9b/0x2ec
>   ? ret_from_fork+0x1f/0x30
>   ? valid_state+0x61/0x61
>   ? stack_trace_save+0x8f/0xbe
>   ? filter_irq_stacks+0x58/0x58
>   ? jhash.constprop.0+0x1ad/0x202
>   ? save_trace+0x17c/0x196
>   mark_lock.part.0+0x10c/0x164
>   mark_usage+0xe6/0x17b
>   __lock_acquire+0x50c/0x96a
>   lock_acquire+0x2f4/0x37b
>   ? rxe_pool_get_index+0x19/0x69
>   ? rcu_read_unlock+0x52/0x52
>   ? jhash.constprop.0+0x1ad/0x202
>   ? lockdep_unlock+0xde/0xe6
>   ? validate_chain+0x44a/0x4a8
>   ? req_next_wqe+0x312/0x363
>   _raw_spin_lock_irqsave+0x41/0x51
>   ? rxe_pool_get_index+0x19/0x69
>   rxe_pool_get_index+0x19/0x69
>   rxe_get_av+0xbe/0x14b
>   rxe_requester+0x6b5/0xbb0
>   ? rnr_nak_timer+0x16/0x16
>   ? lock_downgrade+0xad/0xad
>   ? rcu_read_lock_bh_held+0xab/0xab
>   ? __wake_up+0xf/0xf
>   ? mark_held_locks+0x1f/0x78
>   ? __local_bh_enable_ip+0xb8/0xcc
>   ? rnr_nak_timer+0x16/0x16
>   rxe_do_task+0xb5/0x13d
>   ? rxe_detach_mcast+0x1d6/0x1d6
>   tasklet_action_common.constprop.0+0xda/0x145
>   __do_softirq+0x202/0x489
>   ? __irq_exit_rcu+0x108/0x108
>   ? _local_bh_enable+0x1c/0x1c
>   run_ksoftirqd+0x31/0x56
>   smpboot_thread_fn+0x35c/0x376
>   ? sort_range+0x1c/0x1c
>   kthread+0x164/0x173
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x1f/0x30
>   </TASK>
> CIFS: VFS: RDMA transport established
> 

[-- Attachment #2: PATCHv6-1-4-RDMA-rxe-Fix-dead-lock-caused-by-__rxe_add_to_pool-interrupted-by-rxe_pool_get_index.patch --]
[-- Type: text/plain, Size: 21178 bytes --]

From patchwork Fri Apr 22 19:44:13 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yanjun Zhu <yanjun.zhu@linux.dev>
X-Patchwork-Id: 12822712
X-Patchwork-Delegate: jgg@ziepe.ca
Return-Path: <linux-rdma-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B8DA1C433F5
	for <linux-rdma@archiver.kernel.org>; Fri, 22 Apr 2022 03:17:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S242102AbiDVDUk (ORCPT <rfc822;linux-rdma@archiver.kernel.org>);
        Thu, 21 Apr 2022 23:20:40 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60894 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1443716AbiDVDUj (ORCPT
        <rfc822;linux-rdma@vger.kernel.org>); Thu, 21 Apr 2022 23:20:39 -0400
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3DCFA4D623
        for <linux-rdma@vger.kernel.org>;
 Thu, 21 Apr 2022 20:17:47 -0700 (PDT)
X-IronPort-AV: E=McAfee;i="6400,9594,10324"; a="350994641"
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="350994641"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Apr 2022 20:17:46 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="615218036"
Received: from unknown (HELO intel-71.bj.intel.com) ([10.238.154.71])
  by fmsmga008.fm.intel.com with ESMTP; 21 Apr 2022 20:17:45 -0700
From: yanjun.zhu@linux.dev
To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org,
        yanjun.zhu@linux.dev
Cc: Yi Zhang <yi.zhang@redhat.com>
Subject: [PATCHv6 1/4] RDMA/rxe: Fix dead lock caused by __rxe_add_to_pool
 interrupted by rxe_pool_get_index
Date: Fri, 22 Apr 2022 15:44:13 -0400
Message-Id: <20220422194416.983549-1-yanjun.zhu@linux.dev>
X-Mailer: git-send-email 2.27.0
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org

From: Zhu Yanjun <yanjun.zhu@linux.dev>

This is a dead lock problem.
The ah_pool xa_lock first is acquired in this:

{SOFTIRQ-ON-W} state was registered at:

  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock+0x33/0x80
  __rxe_add_to_pool+0x183/0x230 [rdma_rxe]

Then ah_pool xa_lock is acquired in this:

{IN-SOFTIRQ-W}:

Call Trace:
 <TASK>
  dump_stack_lvl+0x44/0x57
  mark_lock.part.52.cold.79+0x3c/0x46
  __lock_acquire+0x1565/0x34a0
  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock_irqsave+0x42/0x90
  rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
  rxe_get_av+0x168/0x2a0 [rdma_rxe]
</TASK>

From the above, in the function __rxe_add_to_pool,
xa_lock is acquired. Then the function __rxe_add_to_pool
is interrupted by softirq. The function
rxe_pool_get_index will also acquire xa_lock.

Finally, the dead lock appears.

        CPU0
        ----
   lock(&xa->xa_lock#15);  <----- __rxe_add_to_pool
   <Interrupt>
     lock(&xa->xa_lock#15); <---- rxe_pool_get_index

                 *** DEADLOCK ***

Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
V5->V6: One dead lock fix in one commit
V4->V5: Commit logs are changed.
V3->V4: xa_lock_irq locks are used.
V2->V3: __rxe_add_to_pool is between spin_lock and spin_unlock, so
        GFP_ATOMIC is used in __rxe_add_to_pool.
V1->V2: Replace GFP_KERNEL with GFP_ATOMIC
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 87066d04ed18..67f1d4733682 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -106,7 +106,7 @@ void rxe_pool_init(struct rxe_dev *rxe, struct rxe_pool *pool,
 
 	atomic_set(&pool->num_elem, 0);
 
-	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC);
+	xa_init_flags(&pool->xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
 	pool->limit.min = info->min_index;
 	pool->limit.max = info->max_index;
 }
@@ -155,6 +155,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
 	int err;
+	unsigned long flags;
 
 	if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
 		return -EINVAL;
@@ -166,8 +167,10 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-			      &pool->next, GFP_KERNEL);
+	xa_lock_irqsave(&pool->xa, flags);
+	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+				&pool->next, GFP_ATOMIC);
+	xa_unlock_irqrestore(&pool->xa, flags);
 	if (err)
 		goto err_cnt;
 
@@ -201,7 +204,7 @@ static void rxe_elem_release(struct kref *kref)
 	struct rxe_pool_elem *elem = container_of(kref, typeof(*elem), ref_cnt);
 	struct rxe_pool *pool = elem->pool;
 
-	xa_erase(&pool->xa, elem->index);
+	xa_erase_irq(&pool->xa, elem->index);
 
 	if (pool->cleanup)
 		pool->cleanup(elem);

From patchwork Fri Apr 22 19:44:14 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yanjun Zhu <yanjun.zhu@linux.dev>
X-Patchwork-Id: 12822713
X-Patchwork-Delegate: jgg@ziepe.ca
Return-Path: <linux-rdma-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6C7CAC433EF
	for <linux-rdma@archiver.kernel.org>; Fri, 22 Apr 2022 03:17:51 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1443716AbiDVDUl (ORCPT <rfc822;linux-rdma@archiver.kernel.org>);
        Thu, 21 Apr 2022 23:20:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60900 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1443717AbiDVDUj (ORCPT
        <rfc822;linux-rdma@vger.kernel.org>); Thu, 21 Apr 2022 23:20:39 -0400
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98FCB44766
        for <linux-rdma@vger.kernel.org>;
 Thu, 21 Apr 2022 20:17:48 -0700 (PDT)
X-IronPort-AV: E=McAfee;i="6400,9594,10324"; a="350994645"
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="350994645"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Apr 2022 20:17:48 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="615218047"
Received: from unknown (HELO intel-71.bj.intel.com) ([10.238.154.71])
  by fmsmga008.fm.intel.com with ESMTP; 21 Apr 2022 20:17:46 -0700
From: yanjun.zhu@linux.dev
To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org,
        yanjun.zhu@linux.dev
Cc: Yi Zhang <yi.zhang@redhat.com>
Subject: [PATCH 2/4] RDMA/rxe: Fix dead lock caused by rxe_alloc interrupted
 by rxe_pool_get_index
Date: Fri, 22 Apr 2022 15:44:14 -0400
Message-Id: <20220422194416.983549-2-yanjun.zhu@linux.dev>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20220422194416.983549-1-yanjun.zhu@linux.dev>
References: <20220422194416.983549-1-yanjun.zhu@linux.dev>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org

From: Zhu Yanjun <yanjun.zhu@linux.dev>

The ah_pool xa_lock first is acquired in this:

{SOFTIRQ-ON-W} state was registered at:
  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock+0x33/0x80
  rxe_alloc+0x1be/0x290 [rdma_rxe]

Then ah_pool xa_lock is acquired in this:

{IN-SOFTIRQ-W}:
  <TASK>
  __lock_acquire+0x1565/0x34a0
  lock_acquire+0x1d2/0x5a0
  _raw_spin_lock_irqsave+0x42/0x90
  rxe_pool_get_index+0x72/0x1d0 [rdma_rxe]
  </TASK>

From the above, in the function rxe_alloc,
xa_lock is acquired. Then the function rxe_alloc
is interrupted by softirq. The function
rxe_pool_get_index will also acquire xa_lock.

Finally, the dead lock appears.

        CPU0
        ----
   lock(&xa->xa_lock#15);  <----- rxe_alloc
   <Interrupt>
     lock(&xa->xa_lock#15); <---- rxe_pool_get_index

    *** DEADLOCK ***

Fixes: 3225717f6dfa ("RDMA/rxe: Replace red-black trees by carrays")
Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 67f1d4733682..7b12a52fed35 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -138,8 +138,8 @@ void *rxe_alloc(struct rxe_pool *pool)
 	elem->obj = obj;
 	kref_init(&elem->ref_cnt);
 
-	err = xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-			      &pool->next, GFP_KERNEL);
+	err = xa_alloc_cyclic_irq(&pool->xa, &elem->index, elem, pool->limit,
+				  &pool->next, GFP_KERNEL);
 	if (err)
 		goto err_free;
 

From patchwork Fri Apr 22 19:44:15 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yanjun Zhu <yanjun.zhu@linux.dev>
X-Patchwork-Id: 12822714
X-Patchwork-Delegate: jgg@ziepe.ca
Return-Path: <linux-rdma-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3115DC433FE
	for <linux-rdma@archiver.kernel.org>; Fri, 22 Apr 2022 03:17:52 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1443717AbiDVDUl (ORCPT <rfc822;linux-rdma@archiver.kernel.org>);
        Thu, 21 Apr 2022 23:20:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60912 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1443718AbiDVDUl (ORCPT
        <rfc822;linux-rdma@vger.kernel.org>); Thu, 21 Apr 2022 23:20:41 -0400
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BC22C44766
        for <linux-rdma@vger.kernel.org>;
 Thu, 21 Apr 2022 20:17:49 -0700 (PDT)
X-IronPort-AV: E=McAfee;i="6400,9594,10324"; a="350994648"
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="350994648"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Apr 2022 20:17:49 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="615218057"
Received: from unknown (HELO intel-71.bj.intel.com) ([10.238.154.71])
  by fmsmga008.fm.intel.com with ESMTP; 21 Apr 2022 20:17:48 -0700
From: yanjun.zhu@linux.dev
To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org,
        yanjun.zhu@linux.dev
Subject: [PATCH 3/4] RDMA/rxe: Use different xa locks on different path
Date: Fri, 22 Apr 2022 15:44:15 -0400
Message-Id: <20220422194416.983549-3-yanjun.zhu@linux.dev>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20220422194416.983549-1-yanjun.zhu@linux.dev>
References: <20220422194416.983549-1-yanjun.zhu@linux.dev>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org

From: Zhu Yanjun <yanjun.zhu@linux.dev>

The function __rxe_add_to_pool is called on different paths, and the
requirement of the locks is different. The function rxe_create_ah
requires xa_lock_irqsave/irqrestore while others only require xa_lock_irq.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_pool.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 7b12a52fed35..3f3fa2123f30 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -155,7 +155,6 @@ void *rxe_alloc(struct rxe_pool *pool)
 int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 {
 	int err;
-	unsigned long flags;
 
 	if (WARN_ON(pool->flags & RXE_POOL_ALLOC))
 		return -EINVAL;
@@ -167,10 +166,17 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	xa_lock_irqsave(&pool->xa, flags);
-	err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-				&pool->next, GFP_ATOMIC);
-	xa_unlock_irqrestore(&pool->xa, flags);
+	if (pool->type == RXE_TYPE_AH) {
+		unsigned long flags;
+
+		xa_lock_irqsave(&pool->xa, flags);
+		err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
+					&pool->next, GFP_ATOMIC);
+		xa_unlock_irqrestore(&pool->xa, flags);
+	} else {
+		err = xa_alloc_cyclic_irq(&pool->xa, &elem->index, elem, pool->limit,
+					  &pool->next, GFP_KERNEL);
+	}
 	if (err)
 		goto err_cnt;
 

From patchwork Fri Apr 22 19:44:16 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Yanjun Zhu <yanjun.zhu@linux.dev>
X-Patchwork-Id: 12822715
X-Patchwork-Delegate: jgg@ziepe.ca
Return-Path: <linux-rdma-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 96A33C433F5
	for <linux-rdma@archiver.kernel.org>; Fri, 22 Apr 2022 03:17:55 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1443719AbiDVDUp (ORCPT <rfc822;linux-rdma@archiver.kernel.org>);
        Thu, 21 Apr 2022 23:20:45 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60926 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1443718AbiDVDUm (ORCPT
        <rfc822;linux-rdma@vger.kernel.org>); Thu, 21 Apr 2022 23:20:42 -0400
Received: from mga05.intel.com (mga05.intel.com [192.55.52.43])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 349A844766
        for <linux-rdma@vger.kernel.org>;
 Thu, 21 Apr 2022 20:17:51 -0700 (PDT)
X-IronPort-AV: E=McAfee;i="6400,9594,10324"; a="350994649"
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="350994649"
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
  by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 21 Apr 2022 20:17:51 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.90,280,1643702400";
   d="scan'208";a="615218071"
Received: from unknown (HELO intel-71.bj.intel.com) ([10.238.154.71])
  by fmsmga008.fm.intel.com with ESMTP; 21 Apr 2022 20:17:49 -0700
From: yanjun.zhu@linux.dev
To: jgg@ziepe.ca, leon@kernel.org, linux-rdma@vger.kernel.org,
        yanjun.zhu@linux.dev
Subject: [PATCH 4/4] RDMA/rxe: Check RDMA_CREATE_AH_SLEEPABLE in creating AH
Date: Fri, 22 Apr 2022 15:44:16 -0400
Message-Id: <20220422194416.983549-4-yanjun.zhu@linux.dev>
X-Mailer: git-send-email 2.27.0
In-Reply-To: <20220422194416.983549-1-yanjun.zhu@linux.dev>
References: <20220422194416.983549-1-yanjun.zhu@linux.dev>
MIME-Version: 1.0
Precedence: bulk
List-ID: <linux-rdma.vger.kernel.org>
X-Mailing-List: linux-rdma@vger.kernel.org

From: Zhu Yanjun <yanjun.zhu@linux.dev>

During creating AH, the flag RDMA_CREATE_AH_SLEEPABLE should
be tested.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
---
 drivers/infiniband/sw/rxe/rxe_mw.c    |  2 +-
 drivers/infiniband/sw/rxe/rxe_pool.c  | 14 ++++++++------
 drivers/infiniband/sw/rxe/rxe_pool.h  |  4 ++--
 drivers/infiniband/sw/rxe/rxe_verbs.c | 18 ++++++++++++------
 4 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_mw.c b/drivers/infiniband/sw/rxe/rxe_mw.c
index c86b2efd58f2..9d72dcc9060d 100644
--- a/drivers/infiniband/sw/rxe/rxe_mw.c
+++ b/drivers/infiniband/sw/rxe/rxe_mw.c
@@ -14,7 +14,7 @@ int rxe_alloc_mw(struct ib_mw *ibmw, struct ib_udata *udata)
 
 	rxe_get(pd);
 
-	ret = rxe_add_to_pool(&rxe->mw_pool, mw);
+	ret = rxe_add_to_pool(&rxe->mw_pool, mw, GFP_KERNEL);
 	if (ret) {
 		rxe_put(pd);
 		return ret;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c
index 3f3fa2123f30..5555060702fd 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.c
+++ b/drivers/infiniband/sw/rxe/rxe_pool.c
@@ -152,7 +152,7 @@ void *rxe_alloc(struct rxe_pool *pool)
 	return NULL;
 }
 
-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
+int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem, gfp_t gfp)
 {
 	int err;
 
@@ -166,16 +166,18 @@ int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem)
 	elem->obj = (u8 *)elem - pool->elem_offset;
 	kref_init(&elem->ref_cnt);
 
-	if (pool->type == RXE_TYPE_AH) {
+	if ((pool->type == RXE_TYPE_AH) && (gfp & GFP_ATOMIC)) {
 		unsigned long flags;
 
 		xa_lock_irqsave(&pool->xa, flags);
-		err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem, pool->limit,
-					&pool->next, GFP_ATOMIC);
+		err = __xa_alloc_cyclic(&pool->xa, &elem->index, elem,
+					pool->limit, &pool->next,
+					GFP_ATOMIC);
 		xa_unlock_irqrestore(&pool->xa, flags);
 	} else {
-		err = xa_alloc_cyclic_irq(&pool->xa, &elem->index, elem, pool->limit,
-					  &pool->next, GFP_KERNEL);
+		err = xa_alloc_cyclic_irq(&pool->xa, &elem->index, elem,
+					  pool->limit, &pool->next,
+					  GFP_KERNEL);
 	}
 	if (err)
 		goto err_cnt;
diff --git a/drivers/infiniband/sw/rxe/rxe_pool.h b/drivers/infiniband/sw/rxe/rxe_pool.h
index 24bcc786c1b3..12986622088b 100644
--- a/drivers/infiniband/sw/rxe/rxe_pool.h
+++ b/drivers/infiniband/sw/rxe/rxe_pool.h
@@ -62,9 +62,9 @@ void rxe_pool_cleanup(struct rxe_pool *pool);
 void *rxe_alloc(struct rxe_pool *pool);
 
 /* connect already allocated object to pool */
-int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem);
+int __rxe_add_to_pool(struct rxe_pool *pool, struct rxe_pool_elem *elem, gfp_t gfp);
 
-#define rxe_add_to_pool(pool, obj) __rxe_add_to_pool(pool, &(obj)->elem)
+#define rxe_add_to_pool(pool, obj, gfp) __rxe_add_to_pool(pool, &(obj)->elem, gfp)
 
 /* lookup an indexed object from index. takes a reference on object */
 void *rxe_pool_get_index(struct rxe_pool *pool, u32 index);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 67184b0281a0..dce665e74fa7 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -108,7 +108,7 @@ static int rxe_alloc_ucontext(struct ib_ucontext *ibuc, struct ib_udata *udata)
 	struct rxe_dev *rxe = to_rdev(ibuc->device);
 	struct rxe_ucontext *uc = to_ruc(ibuc);
 
-	return rxe_add_to_pool(&rxe->uc_pool, uc);
+	return rxe_add_to_pool(&rxe->uc_pool, uc, GFP_KERNEL);
 }
 
 static void rxe_dealloc_ucontext(struct ib_ucontext *ibuc)
@@ -142,7 +142,7 @@ static int rxe_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
 	struct rxe_dev *rxe = to_rdev(ibpd->device);
 	struct rxe_pd *pd = to_rpd(ibpd);
 
-	return rxe_add_to_pool(&rxe->pd_pool, pd);
+	return rxe_add_to_pool(&rxe->pd_pool, pd, GFP_KERNEL);
 }
 
 static int rxe_dealloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
@@ -162,6 +162,7 @@ static int rxe_create_ah(struct ib_ah *ibah,
 	struct rxe_ah *ah = to_rah(ibah);
 	struct rxe_create_ah_resp __user *uresp = NULL;
 	int err;
+	gfp_t gfp;
 
 	if (udata) {
 		/* test if new user provider */
@@ -176,7 +177,12 @@ static int rxe_create_ah(struct ib_ah *ibah,
 	if (err)
 		return err;
 
-	err = rxe_add_to_pool(&rxe->ah_pool, ah);
+	if (init_attr->flags & RDMA_CREATE_AH_SLEEPABLE)
+		gfp = GFP_KERNEL;
+	else
+		gfp = GFP_ATOMIC;
+
+	err = rxe_add_to_pool(&rxe->ah_pool, ah, gfp);
 	if (err)
 		return err;
 
@@ -299,7 +305,7 @@ static int rxe_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init,
 	if (err)
 		goto err1;
 
-	err = rxe_add_to_pool(&rxe->srq_pool, srq);
+	err = rxe_add_to_pool(&rxe->srq_pool, srq, GFP_KERNEL);
 	if (err)
 		goto err1;
 
@@ -431,7 +437,7 @@ static int rxe_create_qp(struct ib_qp *ibqp, struct ib_qp_init_attr *init,
 		qp->is_user = false;
 	}
 
-	err = rxe_add_to_pool(&rxe->qp_pool, qp);
+	err = rxe_add_to_pool(&rxe->qp_pool, qp, GFP_KERNEL);
 	if (err)
 		return err;
 
@@ -800,7 +806,7 @@ static int rxe_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
 	if (err)
 		return err;
 
-	return rxe_add_to_pool(&rxe->cq_pool, cq);
+	return rxe_add_to_pool(&rxe->cq_pool, cq, GFP_KERNEL);
 }
 
 static int rxe_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-05-28  0:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-27  9:57 Lockdep splat in RXE (softRoCE) driver in xarray accesses David Howells
2022-05-28  0:23 ` Yanjun Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).