All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-rc 0/5] RDMA fixes
@ 2021-05-11  5:48 Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Batch of completely unrelated fixes.

Thanks

Leon Romanovsky (2):
  RDMA/core: Simplify addition of restrack object
  RDMA/rxe: Return CQE error if invalid lkey was supplied

Maor Gottlieb (2):
  RDMA/mlx5: Verify that DM operation is reasonable
  RDMA/mlx5: Recover from fatal event in dual port mode

Shay Drory (1):
  RDMA/core: Don't access cm_id after its destruction

 drivers/infiniband/core/cma.c        | 18 +++++++-----------
 drivers/infiniband/hw/mlx5/dm.c      |  3 +++
 drivers/infiniband/hw/mlx5/main.c    |  1 +
 drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
 4 files changed, 21 insertions(+), 17 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun

From: Maor Gottlieb <maorg@nvidia.com>

Fix the below complain from smatch by verify that DM operation is not
greater than 31.

divers/infiniband/hw/mlx5/dm.c:220 mlx5_ib_handler_MLX5_IB_METHOD_DM_MAP_OP_ADDR()
error: undefined (user controlled) shift '(((1))) << op'

Fixes: cea85fa5dbc2 ("RDMA/mlx5: Add support in MEMIC operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/dm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index 094bf85589db..001d766cf291 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -217,6 +217,9 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DM_MAP_OP_ADDR)(
 	if (err)
 		return err;
 
+	if (op >= BITS_PER_TYPE(u32))
+		return -EOPNOTSUPP;
+
 	if (!(MLX5_CAP_DEV_MEM(dev->mdev, memic_operations) & BIT(op)))
 		return -EOPNOTSUPP;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Shay Drory, Dan Carpenter, linux-rdma, Maor Gottlieb, Zhu Yanjun

From: Shay Drory <shayd@nvidia.com>

The idea that restrack needs asymmetrical delete routine was proven as
wrong. It caused to try and access ib_device after it was already
disconnected from the cma_dev.

As a solution, move rdma_restrack_del() to be before id_priv destruction.

Found by syzcaller:
BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
Write of size 8 at addr dead000000000108 by task syz-executor716/334

CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0xbe/0xf9 lib/dump_stack.c:120
 __kasan_report mm/kasan/report.c:400 [inline]
 kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
 __list_del include/linux/list.h:112 [inline]
 __list_del_entry include/linux/list.h:135 [inline]
 list_del include/linux/list.h:146 [inline]
 cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
 cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
 cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
 _destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
 ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
 ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
 ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
 __fput+0x169/0x540 fs/file_table.c:280
 task_work_run+0xb7/0x100 kernel/task_work.c:140
 exit_task_work include/linux/task_work.h:30 [inline]
 do_exit+0x7da/0x17f0 kernel/exit.c:825
 do_group_exit+0x9e/0x190 kernel/exit.c:922
 __do_sys_exit_group kernel/exit.c:933 [inline]
 __se_sys_exit_group kernel/exit.c:931 [inline]
 __x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
 do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44a326
Code: Unable to access opcode bytes at RIP 0x44a2fc.
RSP: 002b:00007ffd14306748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000006c4490 RCX: 000000000044a326
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
R10: bb1414ac000000c2 R11: 0000000000000246 R12: 00000000006c4490
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
==================================================================

Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2b9ffc21cbc4..ab148a696c0c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -473,6 +473,7 @@ static void cma_release_dev(struct rdma_id_private *id_priv)
 	list_del(&id_priv->list);
 	cma_dev_put(id_priv->cma_dev);
 	id_priv->cma_dev = NULL;
+	id_priv->id.device = NULL;
 	if (id_priv->id.route.addr.dev_addr.sgid_attr) {
 		rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr);
 		id_priv->id.route.addr.dev_addr.sgid_attr = NULL;
@@ -1860,6 +1861,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
 				iw_destroy_cm_id(id_priv->cm_id.iw);
 		}
 		cma_leave_mc_groups(id_priv);
+		rdma_restrack_del(&id_priv->res);
 		cma_release_dev(id_priv);
 	}
 
@@ -1873,7 +1875,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
 	kfree(id_priv->id.route.path_rec);
 
 	put_net(id_priv->id.route.addr.dev_addr.net);
-	rdma_restrack_del(&id_priv->res);
 	kfree(id_priv);
 }
 
@@ -3774,7 +3775,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
 	}
 
 	id_priv->backlog = backlog;
-	if (id->device) {
+	if (id_priv->cma_dev) {
 		if (rdma_cap_ib_cm(id->device, 1)) {
 			ret = cma_ib_listen(id_priv);
 			if (ret)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun

From: Maor Gottlieb <maorg@nvidia.com>

When there is fatal event on the slave port, the device is marked
as not active. We need to mark it as active again when the slave
is recovered.

Fixes: d69a24e03659 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6d1dd09a4388..644d5d0ac544 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4419,6 +4419,7 @@ static int mlx5r_mp_probe(struct auxiliary_device *adev,
 
 		if (bound) {
 			rdma_roce_rescan_device(&dev->ib_dev);
+			mpi->ibdev->ib_active = true;
 			break;
 		}
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
                   ` (2 preceding siblings ...)
  2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-25 13:42   ` Jason Gunthorpe
  2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
  2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
  5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

Change location of rdma_restrack_add() callers to be near attachment
to device logic.

Fixes: cb5cd0ea4eb3 ("RDMA/core: Add CM to restrack after successful attachment to a device")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index ab148a696c0c..bdc645f9c692 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -462,6 +462,7 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
 			      struct cma_device *cma_dev)
 {
 	_cma_attach_to_dev(id_priv, cma_dev);
+	rdma_restrack_add(&id_priv->res);
 	id_priv->gid_type =
 		cma_dev->default_gid_type[id_priv->id.port_num -
 					  rdma_start_port(cma_dev->device)];
@@ -691,7 +692,6 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
 	mutex_lock(&lock);
 	cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
 	mutex_unlock(&lock);
-	rdma_restrack_add(&id_priv->res);
 	return 0;
 }
 
@@ -746,10 +746,8 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
 	}
 
 out:
-	if (!ret) {
+	if (!ret)
 		cma_attach_to_dev(id_priv, cma_dev);
-		rdma_restrack_add(&id_priv->res);
-	}
 
 	mutex_unlock(&lock);
 	return ret;
@@ -810,7 +808,6 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
 
 found:
 	cma_attach_to_dev(id_priv, cma_dev);
-	rdma_restrack_add(&id_priv->res);
 	mutex_unlock(&lock);
 	addr = (struct sockaddr_ib *)cma_src_addr(id_priv);
 	memcpy(&addr->sib_addr, &sgid, sizeof(sgid));
@@ -3208,7 +3205,6 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
 	ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
 	id_priv->id.port_num = p;
 	cma_attach_to_dev(id_priv, cma_dev);
-	rdma_restrack_add(&id_priv->res);
 	cma_set_loopback(cma_src_addr(id_priv));
 out:
 	mutex_unlock(&lock);
@@ -3241,7 +3237,6 @@ static void addr_handler(int status, struct sockaddr *src_addr,
 		if (status)
 			pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to acquire device. status %d\n",
 					     status);
-		rdma_restrack_add(&id_priv->res);
 	} else if (status) {
 		pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to resolve IP. status %d\n", status);
 	}
@@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
 	if (ret)
 		goto err2;
 
-	if (!cma_any_addr(addr))
-		rdma_restrack_add(&id_priv->res);
 	return 0;
 err2:
 	if (id_priv->cma_dev)
 		cma_release_dev(id_priv);
+	if (!cma_any_addr(addr))
+		rdma_restrack_del(&id_priv->res);
 err1:
 	cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE);
 	return ret;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
                   ` (3 preceding siblings ...)
  2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-11  5:48 ` Leon Romanovsky
  2021-05-11  6:12   ` Zhu Yanjun
  2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
  5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11  5:48 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

From: Leon Romanovsky <leonro@nvidia.com>

The RXE is missing update of WQE status in LOCAL_WRITE failures.
This caused to the following kernel panic if someone sent atomic
operation with explicitly wrong lkey.

[leonro@vm ~]$ mkt test
test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [   43.860977] ------------[ cut here ]------------
 WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
 Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
 CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
 Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
 RSP: 0018:ffff8880158af090 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
 RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
 RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
 R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
 R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
 FS:  00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 Call Trace:
  ? lock_release+0x1f9/0x6c0
  ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? lock_downgrade+0x6d0/0x6d0
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
  rxe_do_task+0x130/0x230 [rdma_rxe]
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  rxe_rcv+0xb11/0x1df0 [rdma_rxe]
  ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
  ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
  rxe_loopback+0x157/0x1e0 [rdma_rxe]
  ? rxe_send+0x520/0x520 [rdma_rxe]
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  rxe_responder+0x5532/0x7620 [rdma_rxe]
  ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
  ? lock_is_held_type+0x98/0x110
  ? find_held_lock+0x2d/0x110
  ? lock_release+0x1f9/0x6c0
  ? rxe_do_task+0xe5/0x230 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
  rxe_do_task+0x130/0x230 [rdma_rxe]
  rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
  ? __kmalloc_track_caller+0x174/0x390
  ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
  rxe_loopback+0x157/0x1e0 [rdma_rxe]
  ? rxe_send+0x520/0x520 [rdma_rxe]
  rxe_requester+0x1efd/0x58c0 [rdma_rxe]
  ? lock_is_held_type+0x98/0x110
  ? find_held_lock+0x2d/0x110
  ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
  ? lock_release+0x1f9/0x6c0
  ? rxe_do_task+0xe5/0x230 [rdma_rxe]
  ? lock_downgrade+0x6d0/0x6d0
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? _raw_spin_unlock_irqrestore+0x2d/0x40
  ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
  rxe_do_task+0x130/0x230 [rdma_rxe]
  ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
  rxe_post_send+0x998/0x1860 [rdma_rxe]
  ? lock_is_held_type+0x98/0x110
  ? lock_is_held_type+0x98/0x110
  ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
  ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
  ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
  ? lock_acquire+0x1a9/0x6d0
  ? lock_is_held_type+0x98/0x110
  ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
  ? lock_release+0x1f9/0x6c0
  ? __might_fault+0xba/0x160
  ? lock_downgrade+0x6d0/0x6d0
  ib_uverbs_write+0x847/0xc80 [ib_uverbs]
  ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
  ? vfs_fileattr_set+0x990/0x990
  ? __up_read+0x1a1/0x7b0
  vfs_write+0x1c5/0x840
  ksys_write+0x176/0x1d0
  ? __x64_sys_read+0xb0/0xb0
  ? lockdep_hardirqs_on_prepare+0x273/0x3e0
  ? syscall_enter_from_user_mode+0x1d/0x50
  do_syscall_64+0x3f/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f88b64917a7
 Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
 RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
 RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
 RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
 RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
 R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
 R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
 irq event stamp: 809719
 hardirqs last  enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
 hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
 softirqs last  enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
 softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
 ---[ end trace 1e302e4b7857843b ]---

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 2af26737d32d..a6712e373eed 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
 	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
 			&wqe->dma, payload_addr(pkt),
 			payload_size(pkt), to_mr_obj, NULL);
-	if (ret)
+	if (ret) {
+		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
+	}
 
 	if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
 		return COMPST_COMP_ACK;
-	else
-		return COMPST_UPDATE_COMP;
+
+	return COMPST_UPDATE_COMP;
 }
 
 static inline enum comp_state do_atomic(struct rxe_qp *qp,
@@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
 	ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
 			&wqe->dma, &atomic_orig,
 			sizeof(u64), to_mr_obj, NULL);
-	if (ret)
+	if (ret) {
+		wqe->status = IB_WC_LOC_PROT_ERR;
 		return COMPST_ERROR;
-	else
-		return COMPST_COMP_ACK;
+	}
+
+	return COMPST_COMP_ACK;
 }
 
 static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
  2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-11  6:12   ` Zhu Yanjun
  0 siblings, 0 replies; 10+ messages in thread
From: Zhu Yanjun @ 2021-05-11  6:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Dan Carpenter,
	RDMA mailing list, Maor Gottlieb, Shay Drory

On Tue, May 11, 2021 at 1:48 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> The RXE is missing update of WQE status in LOCAL_WRITE failures.
> This caused to the following kernel panic if someone sent atomic
> operation with explicitly wrong lkey.
>
> [leonro@vm ~]$ mkt test
> test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [   43.860977] ------------[ cut here ]------------
>  WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
>  Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
>  CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
>  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
>  RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
>  Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
>  RSP: 0018:ffff8880158af090 EFLAGS: 00010246
>  RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
>  RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
>  RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
>  R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
>  R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
>  FS:  00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
>  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
>  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>  Call Trace:
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   rxe_rcv+0xb11/0x1df0 [rdma_rxe]
>   ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
>   ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
>   rxe_loopback+0x157/0x1e0 [rdma_rxe]
>   ? rxe_send+0x520/0x520 [rdma_rxe]
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   rxe_responder+0x5532/0x7620 [rdma_rxe]
>   ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? find_held_lock+0x2d/0x110
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_do_task+0xe5/0x230 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
>   ? __kmalloc_track_caller+0x174/0x390
>   ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
>   rxe_loopback+0x157/0x1e0 [rdma_rxe]
>   ? rxe_send+0x520/0x520 [rdma_rxe]
>   rxe_requester+0x1efd/0x58c0 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? find_held_lock+0x2d/0x110
>   ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
>   ? lock_release+0x1f9/0x6c0
>   ? rxe_do_task+0xe5/0x230 [rdma_rxe]
>   ? lock_downgrade+0x6d0/0x6d0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? _raw_spin_unlock_irqrestore+0x2d/0x40
>   ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
>   rxe_do_task+0x130/0x230 [rdma_rxe]
>   ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
>   rxe_post_send+0x998/0x1860 [rdma_rxe]
>   ? lock_is_held_type+0x98/0x110
>   ? lock_is_held_type+0x98/0x110
>   ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
>   ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
>   ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
>   ? lock_acquire+0x1a9/0x6d0
>   ? lock_is_held_type+0x98/0x110
>   ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
>   ? lock_release+0x1f9/0x6c0
>   ? __might_fault+0xba/0x160
>   ? lock_downgrade+0x6d0/0x6d0
>   ib_uverbs_write+0x847/0xc80 [ib_uverbs]
>   ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
>   ? vfs_fileattr_set+0x990/0x990
>   ? __up_read+0x1a1/0x7b0
>   vfs_write+0x1c5/0x840
>   ksys_write+0x176/0x1d0
>   ? __x64_sys_read+0xb0/0xb0
>   ? lockdep_hardirqs_on_prepare+0x273/0x3e0
>   ? syscall_enter_from_user_mode+0x1d/0x50
>   do_syscall_64+0x3f/0x80
>   entry_SYSCALL_64_after_hwframe+0x44/0xae
>  RIP: 0033:0x7f88b64917a7
>  Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>  RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>  RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
>  RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
>  RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
>  R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
>  R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
>  irq event stamp: 809719
>  hardirqs last  enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
>  hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
>  softirqs last  enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
>  softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
>  ---[ end trace 1e302e4b7857843b ]---
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Thanks,
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>

Zhu Yanjun

> ---
>  drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 2af26737d32d..a6712e373eed 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>                         &wqe->dma, payload_addr(pkt),
>                         payload_size(pkt), to_mr_obj, NULL);
> -       if (ret)
> +       if (ret) {
> +               wqe->status = IB_WC_LOC_PROT_ERR;
>                 return COMPST_ERROR;
> +       }
>
>         if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
>                 return COMPST_COMP_ACK;
> -       else
> -               return COMPST_UPDATE_COMP;
> +
> +       return COMPST_UPDATE_COMP;
>  }
>
>  static inline enum comp_state do_atomic(struct rxe_qp *qp,
> @@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
>         ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
>                         &wqe->dma, &atomic_orig,
>                         sizeof(u64), to_mr_obj, NULL);
> -       if (ret)
> +       if (ret) {
> +               wqe->status = IB_WC_LOC_PROT_ERR;
>                 return COMPST_ERROR;
> -       else
> -               return COMPST_COMP_ACK;
> +       }
> +
> +       return COMPST_COMP_ACK;
>  }
>
>  static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 0/5] RDMA fixes
  2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
                   ` (4 preceding siblings ...)
  2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-18 17:40 ` Jason Gunthorpe
  5 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-18 17:40 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
	Maor Gottlieb, Shay Drory, Zhu Yanjun

On Tue, May 11, 2021 at 08:48:26AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Batch of completely unrelated fixes.
> 
> Thanks
> 
> Leon Romanovsky (2):
>   RDMA/rxe: Return CQE error if invalid lkey was supplied
> 
> Maor Gottlieb (2):
>   RDMA/mlx5: Verify that DM operation is reasonable
>   RDMA/mlx5: Recover from fatal event in dual port mode
> 
> Shay Drory (1):
>   RDMA/core: Don't access cm_id after its destruction

These applied to for-rc

>   RDMA/core: Simplify addition of restrack object

This one will have to go to -next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
  2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-25 13:42   ` Jason Gunthorpe
  2021-06-02  6:36     ` Leon Romanovsky
  0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-25 13:42 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
	Maor Gottlieb, Shay Drory, Zhu Yanjun

On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:

> @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
>  	if (ret)
>  		goto err2;
>  
> -	if (!cma_any_addr(addr))
> -		rdma_restrack_add(&id_priv->res);
>  	return 0;
>  err2:
>  	if (id_priv->cma_dev)
>  		cma_release_dev(id_priv);
> +	if (!cma_any_addr(addr))
> +		rdma_restrack_del(&id_priv->res);

But this whole thing is reverting an earlier patch - the whole point
was to avoid the restrack_del().

Plus this is out of order the del has to be before the release, due to
the other recent patch.

Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
  2021-05-25 13:42   ` Jason Gunthorpe
@ 2021-06-02  6:36     ` Leon Romanovsky
  0 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-06-02  6:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Dan Carpenter, linux-rdma, Maor Gottlieb,
	Shay Drory, Zhu Yanjun

On Tue, May 25, 2021 at 10:42:53AM -0300, Jason Gunthorpe wrote:
> On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:
> 
> > @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> >  	if (ret)
> >  		goto err2;
> >  
> > -	if (!cma_any_addr(addr))
> > -		rdma_restrack_add(&id_priv->res);
> >  	return 0;
> >  err2:
> >  	if (id_priv->cma_dev)
> >  		cma_release_dev(id_priv);
> > +	if (!cma_any_addr(addr))
> > +		rdma_restrack_del(&id_priv->res);
> 
> But this whole thing is reverting an earlier patch - the whole point
> was to avoid the restrack_del().

Not really, we are calling to rdma_restrack_add() only in "if (!cma_any_addr(addr))"
flow, so the patch does the same as the earlier one, but in more compact way.

> 
> Plus this is out of order the del has to be before the release, due to
> the other recent patch.

Shay pointed it to me too, I will fix.

Thanks

> 
> Jason

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-06-02  6:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11  5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
2021-05-25 13:42   ` Jason Gunthorpe
2021-06-02  6:36     ` Leon Romanovsky
2021-05-11  5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
2021-05-11  6:12   ` Zhu Yanjun
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.