linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH rdma-rc 0/5] RDMA fixes
@ 2021-05-11  5:48 Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Batch of completely unrelated fixes.

Thanks

Leon Romanovsky (2):
  RDMA/core: Simplify addition of restrack object
  RDMA/rxe: Return CQE error if invalid lkey was supplied

Maor Gottlieb (2):
  RDMA/mlx5: Verify that DM operation is reasonable
  RDMA/mlx5: Recover from fatal event in dual port mode

Shay Drory (1):
  RDMA/core: Don't access cm_id after its destruction

 drivers/infiniband/core/cma.c        | 18 +++++++-----------
 drivers/infiniband/hw/mlx5/dm.c      |  3 +++
 drivers/infiniband/hw/mlx5/main.c    |  1 +
 drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
 4 files changed, 21 insertions(+), 17 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun

From: Maor Gottlieb <maorg@nvidia.com>

Fix the below complain from smatch by verify that DM operation is not
greater than 31.

divers/infiniband/hw/mlx5/dm.c:220 mlx5_ib_handler_MLX5_IB_METHOD_DM_MAP_OP_ADDR()
error: undefined (user controlled) shift '(((1))) << op'

Fixes: cea85fa5dbc2 ("RDMA/mlx5: Add support in MEMIC operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/dm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index 094bf85589db..001d766cf291 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -217,6 +217,9 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DM_MAP_OP_ADDR)(
 	if (err)
 		return err;
 
+	if (op >= BITS_PER_TYPE(u32))
+		return -EOPNOTSUPP;
+
 	if (!(MLX5_CAP_DEV_MEM(dev->mdev, memic_operations) & BIT(op)))
 		return -EOPNOTSUPP;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Shay Drory, Dan Carpenter, linux-rdma, Maor Gottlieb, Zhu Yanjun

From: Shay Drory <shayd@nvidia.com>

The idea that restrack needs asymmetrical delete routine was proven as
wrong. It caused to try and access ib_device after it was already
disconnected from the cma_dev.

As a solution, move rdma_restrack_del() to be before id_priv destruction.

Found by syzcaller:
BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
Write of size 8 at addr dead000000000108 by task syz-executor716/334

CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xbe/0xf9 lib/dump_stack.c:120
 __kasan_report mm/kasan/report.c:400 [inline]
 kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
 __list_del include/linux/list.h:112 [inline]
 __list_del_entry include/linux/list.h:135 [inline]
 list_del include/linux/list.h:146 [inline]
 cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
 cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
 cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
 _destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
 ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
 ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
 ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
 __fput+0x169/0x540 fs/file_table.c:280
 task_work_run+0xb7/0x100 kernel/task_work.c:140
 exit_task_work include/linux/task_work.h:30 [inline]
 do_exit+0x7da/0x17f0 kernel/exit.c:825
 do_group_exit+0x9e/0x190 kernel/exit.c:922
 __do_sys_exit_group kernel/exit.c:933 [inline]
 __se_sys_exit_group kernel/exit.c:931 [inline]
 __x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
 do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44a326
Code: Unable to access opcode bytes at RIP 0x44a2fc.
RSP: 002b:00007ffd14306748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000006c4490 RCX: 000000000044a326
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
R10: bb1414ac000000c2 R11: 0000000000000246 R12: 00000000006c4490
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
==================================================================

Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2b9ffc21cbc4..ab148a696c0c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -473,6 +473,7 @@ static void cma_release_dev(struct rdma_id_private *id_priv)
 	list_del(&id_priv->list);
 	cma_dev_put(id_priv->cma_dev);
 	id_priv->cma_dev = NULL;
+	id_priv->id.device = NULL;
 	if (id_priv->id.route.addr.dev_addr.sgid_attr) {
 		rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr);
 		id_priv->id.route.addr.dev_addr.sgid_attr = NULL;
@@ -1860,6 +1861,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
 				iw_destroy_cm_id(id_priv->cm_id.iw);
 		}
 		cma_leave_mc_groups(id_priv);
+		rdma_restrack_del(&id_priv->res);
 		cma_release_dev(id_priv);
 	}
 
@@ -1873,7 +1875,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
 	kfree(id_priv->id.route.path_rec);
 
 	put_net(id_priv->id.route.addr.dev_addr.net);
-	rdma_restrack_del(&id_priv->res);
 	kfree(id_priv);
 }
 
@@ -3774,7 +3775,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 	}
 
 	id_priv->backlog = backlog;
-	if (id->device) {
+	if (id_priv->cma_dev) {
 		if (rdma_cap_ib_cm(id->device, 1)) {
 			ret = cma_ib_listen(id_priv);
 			if (ret)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun

From: Maor Gottlieb <maorg@nvidia.com>

When there is fatal event on the slave port, the device is marked
as not active. We need to mark it as active again when the slave
is recovered.

Fixes: d69a24e03659 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6d1dd09a4388..644d5d0ac544 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4419,6 +4419,7 @@ static int mlx5r_mp_probe(struct auxiliary_device *adev,
 
 		if (bound) {
 			rdma_roce_rescan_device(&dev->ib_dev);
+			mpi->ibdev->ib_active = true;
 			break;
 		}
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
                   ` (2 preceding siblings ...)
  2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-25 13:42   ` Jason Gunthorpe
  2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
  2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
  5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Change location of rdma_restrack_add() callers to be near attachment
to device logic.

Fixes: cb5cd0ea4eb3 ("RDMA/core: Add CM to restrack after successful attachment to a device")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index ab148a696c0c..bdc645f9c692 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -462,6 +462,7 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
 	_cma_attach_to_dev(id_priv, cma_dev);
+	rdma_restrack_add(&id_priv->res);
 	id_priv->gid_type =
 		cma_dev->default_gid_type[id_priv->id.port_num -
 					  rdma_start_port(cma_dev->device)];
@@ -691,7 +692,6 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
 	mutex_lock(&lock);
 	cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
 	mutex_unlock(&lock);
-	rdma_restrack_add(&id_priv->res);
 	return 0;
 }
 
@@ -746,10 +746,8 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
 	}
 
 out:
-	if (!ret) {
+	if (!ret)
 		cma_attach_to_dev(id_priv, cma_dev);
-		rdma_restrack_add(&id_priv->res);
-	}
 
 	mutex_unlock(&lock);
 	return ret;
@@ -810,7 +808,6 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
 
 found:
 	cma_attach_to_dev(id_priv, cma_dev);
-	rdma_restrack_add(&id_priv->res);
 	mutex_unlock(&lock);
 	addr = (struct sockaddr_ib *)cma_src_addr(id_priv);
 	memcpy(&addr->sib_addr, &sgid, sizeof(sgid));
@@ -3208,7 +3205,6 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
 	ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
 	id_priv->id.port_num = p;
 	cma_attach_to_dev(id_priv, cma_dev);
-	rdma_restrack_add(&id_priv->res);
 	cma_set_loopback(cma_src_addr(id_priv));
 out:
 	mutex_unlock(&lock);
@@ -3241,7 +3237,6 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 		if (status)
 			pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to acquire device. status %d\n",
 					     status);
-		rdma_restrack_add(&id_priv->res);
 	} else if (status) {
 		pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to resolve IP. status %d\n", status);
 	}
@@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	if (ret)
 		goto err2;
 
-	if (!cma_any_addr(addr))
-		rdma_restrack_add(&id_priv->res);
 	return 0;
 err2:
 	if (id_priv->cma_dev)
 		cma_release_dev(id_priv);
+	if (!cma_any_addr(addr))
+		rdma_restrack_del(&id_priv->res);
 err1:
 	cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE);
 	return ret;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
                   ` (3 preceding siblings ...)
  2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  6:12   ` Zhu Yanjun
  2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
  5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

The RXE is missing update of WQE status in LOCAL_WRITE failures.
This caused to the following kernel panic if someone sent atomic
operation with explicitly wrong lkey.

[leonro@vm ~]$ mkt test
test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [   43.860977] ------------[ cut here ]------------
 WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
 Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
 CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
 Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
 RSP: 0018:ffff8880158af090 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
 RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
 RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
 R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
 R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
 FS:  00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ? lock_release+0x1f9/0x6c0
  ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? lock_downgrade+0x6d0/0x6d0
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
  rxe_do_task+0x130/0x230 [rdma_rxe]
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  rxe_rcv+0xb11/0x1df0 [rdma_rxe]
  ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
  ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
  rxe_loopback+0x157/0x1e0 [rdma_rxe]
  ? rxe_send+0x520/0x520 [rdma_rxe]
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  rxe_responder+0x5532/0x7620 [rdma_rxe]
  ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
  ? lock_is_held_type+0x98/0x110
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x1f9/0x6c0
  ? rxe_do_task+0xe5/0x230 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
  rxe_do_task+0x130/0x230 [rdma_rxe]
  rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
  ? __kmalloc_track_caller+0x174/0x390
  ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
  rxe_loopback+0x157/0x1e0 [rdma_rxe]
  ? rxe_send+0x520/0x520 [rdma_rxe]
  rxe_requester+0x1efd/0x58c0 [rdma_rxe]
  ? lock_is_held_type+0x98/0x110
  ? find_held_lock+0x2d/0x110
  ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
  ? lock_release+0x1f9/0x6c0
  ? rxe_do_task+0xe5/0x230 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
  rxe_do_task+0x130/0x230 [rdma_rxe]
  ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
  rxe_post_send+0x998/0x1860 [rdma_rxe]
  ? lock_is_held_type+0x98/0x110
  ? lock_is_held_type+0x98/0x110
  ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
  ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
  ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
  ? lock_acquire+0x1a9/0x6d0
  ? lock_is_held_type+0x98/0x110
  ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
  ? lock_release+0x1f9/0x6c0
  ? __might_fault+0xba/0x160
  ? lock_downgrade+0x6d0/0x6d0
  ib_uverbs_write+0x847/0xc80 [ib_uverbs]
  ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
  ? vfs_fileattr_set+0x990/0x990
  ? __up_read+0x1a1/0x7b0
  vfs_write+0x1c5/0x840
  ksys_write+0x176/0x1d0
  ? __x64_sys_read+0xb0/0xb0
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x3f/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f88b64917a7
 Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
 RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
 RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
 RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
 R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
 R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
 irq event stamp: 809719
 hardirqs last  enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
 hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
 softirqs last  enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
 softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
 ---[ end trace 1e302e4b7857843b ]---

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 2af26737d32d..a6712e373eed 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
 	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
 			&wqe->dma, payload_addr(pkt),
 			payload_size(pkt), to_mr_obj, NULL);
-	if (ret)
+	if (ret) {
+		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
+	}
 
 	if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
 		return COMPST_COMP_ACK;
-	else
-		return COMPST_UPDATE_COMP;
+
+	return COMPST_UPDATE_COMP;
 }
 
 static inline enum comp_state do_atomic(struct rxe_qp *qp,
@@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
 	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
 			&wqe->dma, &atomic_orig,
 			sizeof(u64), to_mr_obj, NULL);
-	if (ret)
+	if (ret) {
+		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
-	else
-		return COMPST_COMP_ACK;
+	}
+
+	return COMPST_COMP_ACK;
 }
 
 static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
  2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-11  6:12   ` Zhu Yanjun
  0 siblings, 0 replies; 10+ messages in thread
From: Zhu Yanjun @ 2021-05-11  6:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Dan Carpenter,
	RDMA mailing list, Maor Gottlieb, Shay Drory

On Tue, May 11, 2021 at 1:48 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> The RXE is missing update of WQE status in LOCAL_WRITE failures.
> This caused to the following kernel panic if someone sent atomic
> operation with explicitly wrong lkey.
>
> [leonro@vm ~]$ mkt test
> test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [   43.860977] ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
>  Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
>  CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
>  Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
>  RSP: 0018:ffff8880158af090 EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
>  RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
>  RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
>  R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
>  R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
>  FS:  00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   rxe_rcv+0xb11/0x1df0 [rdma_rxe]
>   ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
>   ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
>   rxe_loopback+0x157/0x1e0 [rdma_rxe]
>   ? rxe_send+0x520/0x520 [rdma_rxe]
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   rxe_responder+0x5532/0x7620 [rdma_rxe]
>   ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? find_held_lock+0x2d/0x110
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_do_task+0xe5/0x230 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
>   ? __kmalloc_track_caller+0x174/0x390
>   ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
>   rxe_loopback+0x157/0x1e0 [rdma_rxe]
>   ? rxe_send+0x520/0x520 [rdma_rxe]
>   rxe_requester+0x1efd/0x58c0 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? find_held_lock+0x2d/0x110
>   ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_do_task+0xe5/0x230 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
>   rxe_post_send+0x998/0x1860 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? lock_is_held_type+0x98/0x110
>   ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
>   ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
>   ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
>   ? lock_acquire+0x1a9/0x6d0
>   ? lock_is_held_type+0x98/0x110
>   ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
>   ? lock_release+0x1f9/0x6c0
>   ? __might_fault+0xba/0x160
>   ? lock_downgrade+0x6d0/0x6d0
>   ib_uverbs_write+0x847/0xc80 [ib_uverbs]
>   ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
>   ? vfs_fileattr_set+0x990/0x990
>   ? __up_read+0x1a1/0x7b0
>   vfs_write+0x1c5/0x840
>   ksys_write+0x176/0x1d0
>   ? __x64_sys_read+0xb0/0xb0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? syscall_enter_from_user_mode+0x1d/0x50
>   do_syscall_64+0x3f/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>  RIP: 0033:0x7f88b64917a7
>  Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>  RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>  RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
>  RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
>  RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
>  R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
>  R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
>  irq event stamp: 809719
>  hardirqs last  enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
>  hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
>  softirqs last  enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
>  softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
>  ---[ end trace 1e302e4b7857843b ]---
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Thanks,
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>

Zhu Yanjun

> ---
>  drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 2af26737d32d..a6712e373eed 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>                         &wqe->dma, payload_addr(pkt),
>                         payload_size(pkt), to_mr_obj, NULL);
> -       if (ret)
> +       if (ret) {
> +               wqe->status = IB_WC_LOC_PROT_ERR;
>                 return COMPST_ERROR;
> +       }
>
>         if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
>                 return COMPST_COMP_ACK;
> -       else
> -               return COMPST_UPDATE_COMP;
> +
> +       return COMPST_UPDATE_COMP;
>  }
>
>  static inline enum comp_state do_atomic(struct rxe_qp *qp,
> @@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>                         &wqe->dma, &atomic_orig,
>                         sizeof(u64), to_mr_obj, NULL);
> -       if (ret)
> +       if (ret) {
> +               wqe->status = IB_WC_LOC_PROT_ERR;
>                 return COMPST_ERROR;
> -       else
> -               return COMPST_COMP_ACK;
> +       }
> +
> +       return COMPST_COMP_ACK;
>  }
>
>  static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 0/5] RDMA fixes
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
                   ` (4 preceding siblings ...)
  2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-18 17:40 ` Jason Gunthorpe
  5 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-18 17:40 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
	Maor Gottlieb, Shay Drory, Zhu Yanjun

On Tue, May 11, 2021 at 08:48:26AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Batch of completely unrelated fixes.
> 
> Thanks
> 
> Leon Romanovsky (2):
>   RDMA/rxe: Return CQE error if invalid lkey was supplied
> 
> Maor Gottlieb (2):
>   RDMA/mlx5: Verify that DM operation is reasonable
>   RDMA/mlx5: Recover from fatal event in dual port mode
> 
> Shay Drory (1):
>   RDMA/core: Don't access cm_id after its destruction

These applied to for-rc

>   RDMA/core: Simplify addition of restrack object

This one will have to go to -next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
  2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-25 13:42   ` Jason Gunthorpe
  2021-06-02  6:36     ` Leon Romanovsky
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-25 13:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
	Maor Gottlieb, Shay Drory, Zhu Yanjun

On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:

> @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
>  	if (ret)
>  		goto err2;
>  
> -	if (!cma_any_addr(addr))
> -		rdma_restrack_add(&id_priv->res);
>  	return 0;
>  err2:
>  	if (id_priv->cma_dev)
>  		cma_release_dev(id_priv);
> +	if (!cma_any_addr(addr))
> +		rdma_restrack_del(&id_priv->res);

But this whole thing is reverting an earlier patch - the whole point
was to avoid the restrack_del().

Plus this is out of order the del has to be before the release, due to
the other recent patch.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
  2021-05-25 13:42   ` Jason Gunthorpe
@ 2021-06-02  6:36     ` Leon Romanovsky
  0 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-06-02  6:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

On Tue, May 25, 2021 at 10:42:53AM -0300, Jason Gunthorpe wrote:
> On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:
> 
> > @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> >  	if (ret)
> >  		goto err2;
> >  
> > -	if (!cma_any_addr(addr))
> > -		rdma_restrack_add(&id_priv->res);
> >  	return 0;
> >  err2:
> >  	if (id_priv->cma_dev)
> >  		cma_release_dev(id_priv);
> > +	if (!cma_any_addr(addr))
> > +		rdma_restrack_del(&id_priv->res);
> 
> But this whole thing is reverting an earlier patch - the whole point
> was to avoid the restrack_del().

Not really, we are calling to rdma_restrack_add() only in "if (!cma_any_addr(addr))"
flow, so the patch does the same as the earlier one, but in more compact way.

> 
> Plus this is out of order the del has to be before the release, due to
> the other recent patch.

Shay pointed it to me too, I will fix.

Thanks

> 
> Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-02  6:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
2021-05-25 13:42   ` Jason Gunthorpe
2021-06-02  6:36     ` Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
2021-05-11  6:12   ` Zhu Yanjun
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).