linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver
@ 2020-07-07 11:06 Leon Romanovsky
  2020-07-07 11:06 ` [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table Leon Romanovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-07-07 11:06 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Aya Levin, Eran Ben Elisha, linux-kernel,
	linux-rdma, Maor Gottlieb, Matthew Wilcox, Saeed Mahameed

From: Leon Romanovsky <leonro@mellanox.com>

Hi,

This is patchset of independent fixes to mlx5_ib driver.

Thanks

Aya Levin (1):
  IB/mlx5: Fix 50G per lane indication

Leon Romanovsky (1):
  RDMA/mlx5: Set PD pointers for the error flow unwind

Maor Gottlieb (1):
  RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table

 drivers/infiniband/hw/mlx5/main.c    |  2 +-
 drivers/infiniband/hw/mlx5/qp.c      |  3 ++-
 drivers/infiniband/hw/mlx5/srq_cmd.c | 10 ++++++----
 3 files changed, 9 insertions(+), 6 deletions(-)

--
2.26.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table
  2020-07-07 11:06 [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Leon Romanovsky
@ 2020-07-07 11:06 ` Leon Romanovsky
  2020-07-07 11:43   ` Jason Gunthorpe
  2020-07-07 11:06 ` [PATCH rdma-rc 2/3] IB/mlx5: Fix 50G per lane indication Leon Romanovsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: Leon Romanovsky @ 2020-07-07 11:06 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Maor Gottlieb, linux-rdma, Matthew Wilcox

From: Maor Gottlieb <maorg@mellanox.com>

SRQ table is accessed both from interrupt and process context,
therefore we must use xa_lock_irqsave.

 [ 9878.321379] --------------------------------
 [ 9878.322349] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
 [ 9878.323667] kworker/u17:9/8573 [HC0[0]:SC0[0]:HE1:SE1] takes:
 [ 9878.324894] ffff8883e3503d30 (&xa->xa_lock#13){?...}-{2:2}, at:
mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
 [ 9878.326816] {IN-HARDIRQ-W} state was registered at:
 [ 9878.327905]   lock_acquire+0xb9/0x3a0
 [ 9878.328720]   _raw_spin_lock+0x25/0x30
 [ 9878.329475]   srq_event_notifier+0x2b/0xc0 [mlx5_ib]
 [ 9878.330433]   notifier_call_chain+0x45/0x70
 [ 9878.331393]   __atomic_notifier_call_chain+0x69/0x100
 [ 9878.332530]   forward_event+0x36/0xc0 [mlx5_core]
 [ 9878.333558]   notifier_call_chain+0x45/0x70
 [ 9878.334418]   __atomic_notifier_call_chain+0x69/0x100
 [ 9878.335498]   mlx5_eq_async_int+0xc5/0x160 [mlx5_core]
 [ 9878.336543]   notifier_call_chain+0x45/0x70
 [ 9878.337354]   __atomic_notifier_call_chain+0x69/0x100
 [ 9878.338337]   mlx5_irq_int_handler+0x19/0x30 [mlx5_core]
 [ 9878.339369]   __handle_irq_event_percpu+0x43/0x2a0
 [ 9878.340382]   handle_irq_event_percpu+0x30/0x70
 [ 9878.341252]   handle_irq_event+0x34/0x60
 [ 9878.342020]   handle_edge_irq+0x7c/0x1b0
 [ 9878.342788]   do_IRQ+0x60/0x110
 [ 9878.343482]   ret_from_intr+0x0/0x2a
 [ 9878.344251]   default_idle+0x34/0x160
 [ 9878.344996]   do_idle+0x1ec/0x220
 [ 9878.345682]   cpu_startup_entry+0x19/0x20
 [ 9878.346511]   start_secondary+0x153/0x1a0
 [ 9878.347328]   secondary_startup_64+0xa4/0xb0
 [ 9878.348226] irq event stamp: 20907
 [ 9878.348953] hardirqs last  enabled at (20907): [<ffffffff819f0eb4>]
_raw_spin_unlock_irq+0x24/0x30
 [ 9878.350599] hardirqs last disabled at (20906): [<ffffffff819f0cbf>]
_raw_spin_lock_irq+0xf/0x40
 [ 9878.352300] softirqs last  enabled at (20746): [<ffffffff81c002c9>]
__do_softirq+0x2c9/0x436
 [ 9878.353859] softirqs last disabled at (20681): [<ffffffff81139543>]
irq_exit+0xb3/0xc0
 [ 9878.355365]
 [ 9878.355365] other info that might help us debug this:
 [ 9878.356703]  Possible unsafe locking scenario:
 [ 9878.356703]
 [ 9878.357941]        CPU0
 [ 9878.358522]        ----
 [ 9878.359109]   lock(&xa->xa_lock#13);
 [ 9878.359875]   <Interrupt>
 [ 9878.360504]     lock(&xa->xa_lock#13);
 [ 9878.361315]
 [ 9878.361315]  *** DEADLOCK ***
 [ 9878.361315]
 [ 9878.362632] 2 locks held by kworker/u17:9/8573:
 [ 9878.374883]  #0: ffff888295218d38
((wq_completion)mlx5_ib_page_fault){+.+.}-{0:0}, at:
process_one_work+0x1f1/0x5f0
 [ 9878.376728]  #1: ffff888401647e78
((work_completion)(&pfault->work)){+.+.}-{0:0}, at:
process_one_work+0x1f1/0x5f0
 [ 9878.378550]
 [ 9878.378550] stack backtrace:
 [ 9878.379489] CPU: 0 PID: 8573 Comm: kworker/u17:9 Tainted: G
O      5.7.0_for_upstream_min_debug_2020_06_14_11_31_46_41 #1
 [ 9878.381730] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
 [ 9878.383940] Workqueue: mlx5_ib_page_fault mlx5_ib_eqe_pf_action
[mlx5_ib]
 [ 9878.385239] Call Trace:
 [ 9878.385822]  dump_stack+0x71/0x9b
 [ 9878.386519]  mark_lock+0x4f2/0x590
 [ 9878.387263]  ? print_shortest_lock_dependencies+0x200/0x200
 [ 9878.388362]  __lock_acquire+0xa00/0x1eb0
 [ 9878.389133]  lock_acquire+0xb9/0x3a0
 [ 9878.389854]  ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
 [ 9878.390796]  _raw_spin_lock+0x25/0x30
 [ 9878.391533]  ? mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
 [ 9878.392455]  mlx5_cmd_get_srq+0x18/0x70 [mlx5_ib]
 [ 9878.393351]  mlx5_ib_eqe_pf_action+0x257/0xa30 [mlx5_ib]
 [ 9878.394337]  ? process_one_work+0x209/0x5f0
 [ 9878.395150]  process_one_work+0x27b/0x5f0
 [ 9878.395939]  ? __schedule+0x280/0x7e0
 [ 9878.396683]  worker_thread+0x2d/0x3c0
 [ 9878.397424]  ? process_one_work+0x5f0/0x5f0
 [ 9878.398249]  kthread+0x111/0x130
 [ 9878.398926]  ? kthread_park+0x90/0x90
 [ 9878.399709]  ret_from_fork+0x24/0x30

Fixes: b02a29eb8841 ("mlx5: Convert mlx5_srq_table to XArray")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/srq_cmd.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
index 6f5eadc4d183..be0e5469dad0 100644
--- a/drivers/infiniband/hw/mlx5/srq_cmd.c
+++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
@@ -82,12 +82,13 @@ struct mlx5_core_srq *mlx5_cmd_get_srq(struct mlx5_ib_dev *dev, u32 srqn)
 {
 	struct mlx5_srq_table *table = &dev->srq_table;
 	struct mlx5_core_srq *srq;
+	unsigned long flags;
 
-	xa_lock(&table->array);
+	xa_lock_irqsave(&table->array, flags);
 	srq = xa_load(&table->array, srqn);
 	if (srq)
 		refcount_inc(&srq->common.refcount);
-	xa_unlock(&table->array);
+	xa_unlock_irqrestore(&table->array, flags);
 
 	return srq;
 }
@@ -644,6 +645,7 @@ static int srq_event_notifier(struct notifier_block *nb,
 	struct mlx5_srq_table *table;
 	struct mlx5_core_srq *srq;
 	struct mlx5_eqe *eqe;
+	unsigned long flags;
 	u32 srqn;
 
 	if (type != MLX5_EVENT_TYPE_SRQ_CATAS_ERROR &&
@@ -655,11 +657,11 @@ static int srq_event_notifier(struct notifier_block *nb,
 	eqe = data;
 	srqn = be32_to_cpu(eqe->data.qp_srq.qp_srq_n) & 0xffffff;
 
-	xa_lock(&table->array);
+	xa_lock_irqsave(&table->array, flags);
 	srq = xa_load(&table->array, srqn);
 	if (srq)
 		refcount_inc(&srq->common.refcount);
-	xa_unlock(&table->array);
+	xa_unlock_irqrestore(&table->array, flags);
 
 	if (!srq)
 		return NOTIFY_OK;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-rc 2/3] IB/mlx5: Fix 50G per lane indication
  2020-07-07 11:06 [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Leon Romanovsky
  2020-07-07 11:06 ` [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table Leon Romanovsky
@ 2020-07-07 11:06 ` Leon Romanovsky
  2020-07-07 11:06 ` [PATCH rdma-rc 3/3] RDMA/mlx5: Set PD pointers for the error flow unwind Leon Romanovsky
  2020-07-08 23:26 ` [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Jason Gunthorpe
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-07-07 11:06 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Aya Levin, Eran Ben Elisha, linux-rdma, Saeed Mahameed

From: Aya Levin <ayal@mellanox.com>

Some released FW versions mistakenly don't set the capability that 50G
per lane link-modes are supported for VFs (ptys_extended_ethernet
capability bit). Use PTYS.ext_eth_proto_capability instead, as this
indication is always accurate. If PTYS.ext_eth_proto_capability is valid
(has a non-zero value) conclude that the HCA supports 50G per lane.
Otherwise, conclude that the HCA doesn't support 50G per lane.

Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes")
Signed-off-by: Aya Levin <ayal@mellanox.com>
Reviewed-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 343a8b8361e7..6f99ed03d88e 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -511,7 +511,7 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
 					   mdev_port_num);
 	if (err)
 		goto out;
-	ext = MLX5_CAP_PCAM_FEATURE(dev->mdev, ptys_extended_ethernet);
+	ext = !!MLX5_GET_ETH_PROTO(ptys_reg, out, true, eth_proto_capability);
 	eth_prot_oper = MLX5_GET_ETH_PROTO(ptys_reg, out, ext, eth_proto_oper);
 
 	props->active_width     = IB_WIDTH_4X;
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH rdma-rc 3/3] RDMA/mlx5: Set PD pointers for the error flow unwind
  2020-07-07 11:06 [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Leon Romanovsky
  2020-07-07 11:06 ` [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table Leon Romanovsky
  2020-07-07 11:06 ` [PATCH rdma-rc 2/3] IB/mlx5: Fix 50G per lane indication Leon Romanovsky
@ 2020-07-07 11:06 ` Leon Romanovsky
  2020-07-08 23:26 ` [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Jason Gunthorpe
  3 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-07-07 11:06 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Leon Romanovsky, linux-rdma

From: Leon Romanovsky <leonro@mellanox.com>

ib_pd is accessed internally during destroy of the TIR/TIS, but PD
can be not set yet. This leading to the following kernel panic.

BUG: kernel NULL pointer dereference, address: 0000000000000074
PGD 8000000079eaa067 P4D 8000000079eaa067 PUD 7ae81067 PMD 0 Oops: 0000 [#1] SMP PTI
CPU: 1 PID: 709 Comm: syz-executor.0 Not tainted 5.8.0-rc3 #41 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline]
RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline]
RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397
Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48
RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140
RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00
R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000
R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000
FS:  00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 mlx5_ib_create_qp+0xf36/0xf90 drivers/infiniband/hw/mlx5/qp.c:3014
 _ib_create_qp drivers/infiniband/core/core_priv.h:333 [inline]
 create_qp+0x57f/0xd20 drivers/infiniband/core/uverbs_cmd.c:1443
 ib_uverbs_create_qp+0xcf/0x100 drivers/infiniband/core/uverbs_cmd.c:1564
 ib_uverbs_write+0x5fa/0x780 drivers/infiniband/core/uverbs_main.c:664
 __vfs_write+0x3f/0x90 fs/read_write.c:495
 vfs_write+0xc7/0x1f0 fs/read_write.c:559
 ksys_write+0x5e/0x110 fs/read_write.c:612
 do_syscall_64+0x3e/0x70 arch/x86/entry/common.c:359
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x466479
Code: Bad RIP value.
RSP: 002b:00007ffd057b62b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000466479
RDX: 0000000000000070 RSI: 0000000020000240 RDI: 0000000000000003
RBP: 00000000019bf8fc R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000000bf6 R14: 00000000004cb859 R15: 00000000006fefc0
Modules linked in:
Dumping ftrace buffer:
   (ftrace buffer empty)
CR2: 0000000000000074
---[ end trace d1e9f6724bb6ea83 ]---
RIP: 0010:destroy_raw_packet_qp_tis drivers/infiniband/hw/mlx5/qp.c:1189 [inline]
RIP: 0010:destroy_raw_packet_qp drivers/infiniband/hw/mlx5/qp.c:1527 [inline]
RIP: 0010:destroy_qp_common+0x2ca/0x4f0 drivers/infiniband/hw/mlx5/qp.c:2397
Code: 00 85 c0 74 2e e8 56 18 55 ff 48 8d b3 28 01 00 00 48 89 ef e8 d7 d3 ff ff 48 8b 43 08 8b b3 c0 01 00 00 48 8b bd a8 0a 00 00 <0f> b7 50 74 e8 0d 6a fe ff e8 28 18 55 ff 49 8d 55 50 4c 89 f1 48
RSP: 0018:ffffc900007bbac8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff88807949e800 RCX: 0000000000000998
RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88807c180140
RBP: ffff88807b50c000 R08: 000000000002d379 R09: ffffc900007bba00
R10: 0000000000000001 R11: 000000000002d358 R12: ffff888076f37000
R13: ffff88807949e9c8 R14: ffffc900007bbe08 R15: ffff888076f37000
FS:  00000000019bf940(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000074 CR3: 0000000076d68004 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Fixes: 3f5efb458639 ("RDMA/mlx5: Don't access ib_qp fields in internal destroy QP path")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/qp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index b316c9cafbc5..e050eade97a1 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -3005,11 +3005,12 @@ struct ib_qp *mlx5_ib_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attr,
 		mlx5_ib_destroy_dct(qp);
 	} else {
 		/*
-		 * The two lines below are temp solution till QP allocation
+		 * These lines below are temp solution till QP allocation
 		 * will be moved to be under IB/core responsiblity.
 		 */
 		qp->ibqp.send_cq = attr->send_cq;
 		qp->ibqp.recv_cq = attr->recv_cq;
+		qp->ibqp.pd = pd;
 		destroy_qp_common(dev, qp, udata);
 	}
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table
  2020-07-07 11:06 ` [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table Leon Romanovsky
@ 2020-07-07 11:43   ` Jason Gunthorpe
  2020-07-07 12:09     ` Leon Romanovsky
  0 siblings, 1 reply; 7+ messages in thread
From: Jason Gunthorpe @ 2020-07-07 11:43 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Maor Gottlieb, linux-rdma, Matthew Wilcox

On Tue, Jul 07, 2020 at 02:06:10PM +0300, Leon Romanovsky wrote:
> Fixes: b02a29eb8841 ("mlx5: Convert mlx5_srq_table to XArray")

This didn't introduce the bug, when things were converted to xarray it
already had the wrong spinlock type.

I'm surprised this is only been found now since it has been wrong for
years. Did something else change?

> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
>  drivers/infiniband/hw/mlx5/srq_cmd.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
> index 6f5eadc4d183..be0e5469dad0 100644
> +++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
> @@ -82,12 +82,13 @@ struct mlx5_core_srq *mlx5_cmd_get_srq(struct mlx5_ib_dev *dev, u32 srqn)
>  {
>  	struct mlx5_srq_table *table = &dev->srq_table;
>  	struct mlx5_core_srq *srq;
> +	unsigned long flags;
>  
> -	xa_lock(&table->array);
> +	xa_lock_irqsave(&table->array, flags);
>  	srq = xa_load(&table->array, srqn);
>  	if (srq)
>  		refcount_inc(&srq->common.refcount);
> -	xa_unlock(&table->array);
> +	xa_unlock_irqrestore(&table->array, flags);

This and other places can just be xa_lock_irq as we are not in an atomic
context here.
  
>  	return srq;
>  }
> @@ -644,6 +645,7 @@ static int srq_event_notifier(struct notifier_block *nb,
>  	struct mlx5_srq_table *table;
>  	struct mlx5_core_srq *srq;
>  	struct mlx5_eqe *eqe;
> +	unsigned long flags;
>  	u32 srqn;
>  
>  	if (type != MLX5_EVENT_TYPE_SRQ_CATAS_ERROR &&
> @@ -655,11 +657,11 @@ static int srq_event_notifier(struct notifier_block *nb,
>  	eqe = data;
>  	srqn = be32_to_cpu(eqe->data.qp_srq.qp_srq_n) & 0xffffff;
>  
> -	xa_lock(&table->array);
> +	xa_lock_irqsave(&table->array, flags);
>  	srq = xa_load(&table->array, srqn);
>  	if (srq)
>  		refcount_inc(&srq->common.refcount);
> -	xa_unlock(&table->array);
> +	xa_unlock_irqrestore(&table->array, flags);

This change isn't needed, the notifier is always called from an IRQ
context

Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table
  2020-07-07 11:43   ` Jason Gunthorpe
@ 2020-07-07 12:09     ` Leon Romanovsky
  0 siblings, 0 replies; 7+ messages in thread
From: Leon Romanovsky @ 2020-07-07 12:09 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Maor Gottlieb, linux-rdma, Matthew Wilcox

On Tue, Jul 07, 2020 at 08:43:03AM -0300, Jason Gunthorpe wrote:
> On Tue, Jul 07, 2020 at 02:06:10PM +0300, Leon Romanovsky wrote:
> > Fixes: b02a29eb8841 ("mlx5: Convert mlx5_srq_table to XArray")
>
> This didn't introduce the bug, when things were converted to xarray it
> already had the wrong spinlock type.
>
> I'm surprised this is only been found now since it has been wrong for
> years. Did something else change?

I checked internal bug tracker to see when this bug was discovered. It
looks like ennoblement of more debug kernel config options by default
helped to find it.

>
> > Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
> > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> >  drivers/infiniband/hw/mlx5/srq_cmd.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/infiniband/hw/mlx5/srq_cmd.c b/drivers/infiniband/hw/mlx5/srq_cmd.c
> > index 6f5eadc4d183..be0e5469dad0 100644
> > +++ b/drivers/infiniband/hw/mlx5/srq_cmd.c
> > @@ -82,12 +82,13 @@ struct mlx5_core_srq *mlx5_cmd_get_srq(struct mlx5_ib_dev *dev, u32 srqn)
> >  {
> >  	struct mlx5_srq_table *table = &dev->srq_table;
> >  	struct mlx5_core_srq *srq;
> > +	unsigned long flags;
> >
> > -	xa_lock(&table->array);
> > +	xa_lock_irqsave(&table->array, flags);
> >  	srq = xa_load(&table->array, srqn);
> >  	if (srq)
> >  		refcount_inc(&srq->common.refcount);
> > -	xa_unlock(&table->array);
> > +	xa_unlock_irqrestore(&table->array, flags);
>
> This and other places can just be xa_lock_irq as we are not in an atomic
> context here.

ok, let's change for the clarity.

Thanks

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver
  2020-07-07 11:06 [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-07-07 11:06 ` [PATCH rdma-rc 3/3] RDMA/mlx5: Set PD pointers for the error flow unwind Leon Romanovsky
@ 2020-07-08 23:26 ` Jason Gunthorpe
  3 siblings, 0 replies; 7+ messages in thread
From: Jason Gunthorpe @ 2020-07-08 23:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Aya Levin, Eran Ben Elisha,
	linux-kernel, linux-rdma, Maor Gottlieb, Matthew Wilcox,
	Saeed Mahameed

On Tue, Jul 07, 2020 at 02:06:09PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Hi,
> 
> This is patchset of independent fixes to mlx5_ib driver.
> 
> Thanks
> 
> Aya Levin (1):
>   IB/mlx5: Fix 50G per lane indication
> 
> Leon Romanovsky (1):
>   RDMA/mlx5: Set PD pointers for the error flow unwind

Applied to for-rc

> Maor Gottlieb (1):
>   RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table

This one need resending

Thanks,
Jason

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-07-08 23:26 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07 11:06 [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Leon Romanovsky
2020-07-07 11:06 ` [PATCH rdma-rc 1/3] RDMA/mlx5: Use xa_lock_irqsave when access to SRQ table Leon Romanovsky
2020-07-07 11:43   ` Jason Gunthorpe
2020-07-07 12:09     ` Leon Romanovsky
2020-07-07 11:06 ` [PATCH rdma-rc 2/3] IB/mlx5: Fix 50G per lane indication Leon Romanovsky
2020-07-07 11:06 ` [PATCH rdma-rc 3/3] RDMA/mlx5: Set PD pointers for the error flow unwind Leon Romanovsky
2020-07-08 23:26 ` [PATCH rdma-rc 0/3] Fixes to mlx5_ib driver Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).