* [PATCH rdma-rc 0/5] RDMA fixes
@ 2021-05-11 5:48 Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
` (5 more replies)
0 siblings, 6 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
From: Leon Romanovsky <leonro@nvidia.com>
Batch of completely unrelated fixes.
Thanks
Leon Romanovsky (2):
RDMA/core: Simplify addition of restrack object
RDMA/rxe: Return CQE error if invalid lkey was supplied
Maor Gottlieb (2):
RDMA/mlx5: Verify that DM operation is reasonable
RDMA/mlx5: Recover from fatal event in dual port mode
Shay Drory (1):
RDMA/core: Don't access cm_id after its destruction
drivers/infiniband/core/cma.c | 18 +++++++-----------
drivers/infiniband/hw/mlx5/dm.c | 3 +++
drivers/infiniband/hw/mlx5/main.c | 1 +
drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
4 files changed, 21 insertions(+), 17 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun
From: Maor Gottlieb <maorg@nvidia.com>
Fix the below complain from smatch by verify that DM operation is not
greater than 31.
divers/infiniband/hw/mlx5/dm.c:220 mlx5_ib_handler_MLX5_IB_METHOD_DM_MAP_OP_ADDR()
error: undefined (user controlled) shift '(((1))) << op'
Fixes: cea85fa5dbc2 ("RDMA/mlx5: Add support in MEMIC operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/dm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index 094bf85589db..001d766cf291 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -217,6 +217,9 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DM_MAP_OP_ADDR)(
if (err)
return err;
+ if (op >= BITS_PER_TYPE(u32))
+ return -EOPNOTSUPP;
+
if (!(MLX5_CAP_DEV_MEM(dev->mdev, memic_operations) & BIT(op)))
return -EOPNOTSUPP;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Shay Drory, Dan Carpenter, linux-rdma, Maor Gottlieb, Zhu Yanjun
From: Shay Drory <shayd@nvidia.com>
The idea that restrack needs asymmetrical delete routine was proven as
wrong. It caused to try and access ib_device after it was already
disconnected from the cma_dev.
As a solution, move rdma_restrack_del() to be before id_priv destruction.
Found by syzcaller:
BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
Write of size 8 at addr dead000000000108 by task syz-executor716/334
CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0xbe/0xf9 lib/dump_stack.c:120
__kasan_report mm/kasan/report.c:400 [inline]
kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
__list_del include/linux/list.h:112 [inline]
__list_del_entry include/linux/list.h:135 [inline]
list_del include/linux/list.h:146 [inline]
cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
_destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
__fput+0x169/0x540 fs/file_table.c:280
task_work_run+0xb7/0x100 kernel/task_work.c:140
exit_task_work include/linux/task_work.h:30 [inline]
do_exit+0x7da/0x17f0 kernel/exit.c:825
do_group_exit+0x9e/0x190 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44a326
Code: Unable to access opcode bytes at RIP 0x44a2fc.
RSP: 002b:00007ffd14306748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000006c4490 RCX: 000000000044a326
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
R10: bb1414ac000000c2 R11: 0000000000000246 R12: 00000000006c4490
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
==================================================================
Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cma.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2b9ffc21cbc4..ab148a696c0c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -473,6 +473,7 @@ static void cma_release_dev(struct rdma_id_private *id_priv)
list_del(&id_priv->list);
cma_dev_put(id_priv->cma_dev);
id_priv->cma_dev = NULL;
+ id_priv->id.device = NULL;
if (id_priv->id.route.addr.dev_addr.sgid_attr) {
rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr);
id_priv->id.route.addr.dev_addr.sgid_attr = NULL;
@@ -1860,6 +1861,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
iw_destroy_cm_id(id_priv->cm_id.iw);
}
cma_leave_mc_groups(id_priv);
+ rdma_restrack_del(&id_priv->res);
cma_release_dev(id_priv);
}
@@ -1873,7 +1875,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
kfree(id_priv->id.route.path_rec);
put_net(id_priv->id.route.addr.dev_addr.net);
- rdma_restrack_del(&id_priv->res);
kfree(id_priv);
}
@@ -3774,7 +3775,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
}
id_priv->backlog = backlog;
- if (id->device) {
+ if (id_priv->cma_dev) {
if (rdma_cap_ib_cm(id->device, 1)) {
ret = cma_ib_listen(id_priv);
if (ret)
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun
From: Maor Gottlieb <maorg@nvidia.com>
When there is fatal event on the slave port, the device is marked
as not active. We need to mark it as active again when the slave
is recovered.
Fixes: d69a24e03659 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6d1dd09a4388..644d5d0ac544 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4419,6 +4419,7 @@ static int mlx5r_mp_probe(struct auxiliary_device *adev,
if (bound) {
rdma_roce_rescan_device(&dev->ib_dev);
+ mpi->ibdev->ib_active = true;
break;
}
}
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
` (2 preceding siblings ...)
2021-05-11 5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-25 13:42 ` Jason Gunthorpe
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
From: Leon Romanovsky <leonro@nvidia.com>
Change location of rdma_restrack_add() callers to be near attachment
to device logic.
Fixes: cb5cd0ea4eb3 ("RDMA/core: Add CM to restrack after successful attachment to a device")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cma.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index ab148a696c0c..bdc645f9c692 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -462,6 +462,7 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
struct cma_device *cma_dev)
{
_cma_attach_to_dev(id_priv, cma_dev);
+ rdma_restrack_add(&id_priv->res);
id_priv->gid_type =
cma_dev->default_gid_type[id_priv->id.port_num -
rdma_start_port(cma_dev->device)];
@@ -691,7 +692,6 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
mutex_lock(&lock);
cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
mutex_unlock(&lock);
- rdma_restrack_add(&id_priv->res);
return 0;
}
@@ -746,10 +746,8 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
}
out:
- if (!ret) {
+ if (!ret)
cma_attach_to_dev(id_priv, cma_dev);
- rdma_restrack_add(&id_priv->res);
- }
mutex_unlock(&lock);
return ret;
@@ -810,7 +808,6 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
found:
cma_attach_to_dev(id_priv, cma_dev);
- rdma_restrack_add(&id_priv->res);
mutex_unlock(&lock);
addr = (struct sockaddr_ib *)cma_src_addr(id_priv);
memcpy(&addr->sib_addr, &sgid, sizeof(sgid));
@@ -3208,7 +3205,6 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
id_priv->id.port_num = p;
cma_attach_to_dev(id_priv, cma_dev);
- rdma_restrack_add(&id_priv->res);
cma_set_loopback(cma_src_addr(id_priv));
out:
mutex_unlock(&lock);
@@ -3241,7 +3237,6 @@ static void addr_handler(int status, struct sockaddr *src_addr,
if (status)
pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to acquire device. status %d\n",
status);
- rdma_restrack_add(&id_priv->res);
} else if (status) {
pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to resolve IP. status %d\n", status);
}
@@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
if (ret)
goto err2;
- if (!cma_any_addr(addr))
- rdma_restrack_add(&id_priv->res);
return 0;
err2:
if (id_priv->cma_dev)
cma_release_dev(id_priv);
+ if (!cma_any_addr(addr))
+ rdma_restrack_del(&id_priv->res);
err1:
cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE);
return ret;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
` (3 preceding siblings ...)
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 6:12 ` Zhu Yanjun
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
From: Leon Romanovsky <leonro@nvidia.com>
The RXE is missing update of WQE status in LOCAL_WRITE failures.
This caused to the following kernel panic if someone sent atomic
operation with explicitly wrong lkey.
[leonro@vm ~]$ mkt test
test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [ 43.860977] ------------[ cut here ]------------
WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
RSP: 0018:ffff8880158af090 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
FS: 00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? lock_release+0x1f9/0x6c0
? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? lock_downgrade+0x6d0/0x6d0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
rxe_do_task+0x130/0x230 [rdma_rxe]
? _raw_spin_unlock_irqrestore+0x2d/0x40
rxe_rcv+0xb11/0x1df0 [rdma_rxe]
? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
rxe_loopback+0x157/0x1e0 [rdma_rxe]
? rxe_send+0x520/0x520 [rdma_rxe]
? lockdep_hardirqs_on_prepare+0x273/0x3e0
rxe_responder+0x5532/0x7620 [rdma_rxe]
? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
? lock_is_held_type+0x98/0x110
? find_held_lock+0x2d/0x110
? lock_release+0x1f9/0x6c0
? rxe_do_task+0xe5/0x230 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
rxe_do_task+0x130/0x230 [rdma_rxe]
rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
? __kmalloc_track_caller+0x174/0x390
? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
rxe_loopback+0x157/0x1e0 [rdma_rxe]
? rxe_send+0x520/0x520 [rdma_rxe]
rxe_requester+0x1efd/0x58c0 [rdma_rxe]
? lock_is_held_type+0x98/0x110
? find_held_lock+0x2d/0x110
? rnr_nak_timer+0x70/0x70 [rdma_rxe]
? lock_release+0x1f9/0x6c0
? rxe_do_task+0xe5/0x230 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
? rnr_nak_timer+0x70/0x70 [rdma_rxe]
rxe_do_task+0x130/0x230 [rdma_rxe]
? rxe_poll_cq+0x450/0x450 [rdma_rxe]
rxe_post_send+0x998/0x1860 [rdma_rxe]
? lock_is_held_type+0x98/0x110
? lock_is_held_type+0x98/0x110
? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
? rxe_poll_cq+0x450/0x450 [rdma_rxe]
ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
? lock_acquire+0x1a9/0x6d0
? lock_is_held_type+0x98/0x110
? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
? lock_release+0x1f9/0x6c0
? __might_fault+0xba/0x160
? lock_downgrade+0x6d0/0x6d0
ib_uverbs_write+0x847/0xc80 [ib_uverbs]
? ib_uverbs_open+0x810/0x810 [ib_uverbs]
? vfs_fileattr_set+0x990/0x990
? __up_read+0x1a1/0x7b0
vfs_write+0x1c5/0x840
ksys_write+0x176/0x1d0
? __x64_sys_read+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x3f/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f88b64917a7
Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
irq event stamp: 809719
hardirqs last enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
softirqs last enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
---[ end trace 1e302e4b7857843b ]---
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 2af26737d32d..a6712e373eed 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
&wqe->dma, payload_addr(pkt),
payload_size(pkt), to_mr_obj, NULL);
- if (ret)
+ if (ret) {
+ wqe->status = IB_WC_LOC_PROT_ERR;
return COMPST_ERROR;
+ }
if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
return COMPST_COMP_ACK;
- else
- return COMPST_UPDATE_COMP;
+
+ return COMPST_UPDATE_COMP;
}
static inline enum comp_state do_atomic(struct rxe_qp *qp,
@@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
&wqe->dma, &atomic_orig,
sizeof(u64), to_mr_obj, NULL);
- if (ret)
+ if (ret) {
+ wqe->status = IB_WC_LOC_PROT_ERR;
return COMPST_ERROR;
- else
- return COMPST_COMP_ACK;
+ }
+
+ return COMPST_COMP_ACK;
}
static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-11 6:12 ` Zhu Yanjun
0 siblings, 0 replies; 10+ messages in thread
From: Zhu Yanjun @ 2021-05-11 6:12 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Dan Carpenter,
RDMA mailing list, Maor Gottlieb, Shay Drory
On Tue, May 11, 2021 at 1:48 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> The RXE is missing update of WQE status in LOCAL_WRITE failures.
> This caused to the following kernel panic if someone sent atomic
> operation with explicitly wrong lkey.
>
> [leonro@vm ~]$ mkt test
> test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [ 43.860977] ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
> Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
> CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
> Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
> RSP: 0018:ffff8880158af090 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
> RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
> RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
> R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
> R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
> FS: 00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> ? lock_release+0x1f9/0x6c0
> ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? lock_downgrade+0x6d0/0x6d0
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
> rxe_do_task+0x130/0x230 [rdma_rxe]
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> rxe_rcv+0xb11/0x1df0 [rdma_rxe]
> ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
> ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
> rxe_loopback+0x157/0x1e0 [rdma_rxe]
> ? rxe_send+0x520/0x520 [rdma_rxe]
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> rxe_responder+0x5532/0x7620 [rdma_rxe]
> ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
> ? lock_is_held_type+0x98/0x110
> ? find_held_lock+0x2d/0x110
> ? lock_release+0x1f9/0x6c0
> ? rxe_do_task+0xe5/0x230 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
> rxe_do_task+0x130/0x230 [rdma_rxe]
> rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
> ? __kmalloc_track_caller+0x174/0x390
> ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
> rxe_loopback+0x157/0x1e0 [rdma_rxe]
> ? rxe_send+0x520/0x520 [rdma_rxe]
> rxe_requester+0x1efd/0x58c0 [rdma_rxe]
> ? lock_is_held_type+0x98/0x110
> ? find_held_lock+0x2d/0x110
> ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
> ? lock_release+0x1f9/0x6c0
> ? rxe_do_task+0xe5/0x230 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
> rxe_do_task+0x130/0x230 [rdma_rxe]
> ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
> rxe_post_send+0x998/0x1860 [rdma_rxe]
> ? lock_is_held_type+0x98/0x110
> ? lock_is_held_type+0x98/0x110
> ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
> ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
> ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
> ? lock_acquire+0x1a9/0x6d0
> ? lock_is_held_type+0x98/0x110
> ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
> ? lock_release+0x1f9/0x6c0
> ? __might_fault+0xba/0x160
> ? lock_downgrade+0x6d0/0x6d0
> ib_uverbs_write+0x847/0xc80 [ib_uverbs]
> ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
> ? vfs_fileattr_set+0x990/0x990
> ? __up_read+0x1a1/0x7b0
> vfs_write+0x1c5/0x840
> ksys_write+0x176/0x1d0
> ? __x64_sys_read+0xb0/0xb0
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? syscall_enter_from_user_mode+0x1d/0x50
> do_syscall_64+0x3f/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f88b64917a7
> Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
> RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
> R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
> R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
> irq event stamp: 809719
> hardirqs last enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
> hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
> softirqs last enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
> softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
> ---[ end trace 1e302e4b7857843b ]---
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Thanks,
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 2af26737d32d..a6712e373eed 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
> ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
> &wqe->dma, payload_addr(pkt),
> payload_size(pkt), to_mr_obj, NULL);
> - if (ret)
> + if (ret) {
> + wqe->status = IB_WC_LOC_PROT_ERR;
> return COMPST_ERROR;
> + }
>
> if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
> return COMPST_COMP_ACK;
> - else
> - return COMPST_UPDATE_COMP;
> +
> + return COMPST_UPDATE_COMP;
> }
>
> static inline enum comp_state do_atomic(struct rxe_qp *qp,
> @@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
> ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
> &wqe->dma, &atomic_orig,
> sizeof(u64), to_mr_obj, NULL);
> - if (ret)
> + if (ret) {
> + wqe->status = IB_WC_LOC_PROT_ERR;
> return COMPST_ERROR;
> - else
> - return COMPST_COMP_ACK;
> + }
> +
> + return COMPST_COMP_ACK;
> }
>
> static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 0/5] RDMA fixes
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
` (4 preceding siblings ...)
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-18 17:40 ` Jason Gunthorpe
5 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-18 17:40 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
Maor Gottlieb, Shay Drory, Zhu Yanjun
On Tue, May 11, 2021 at 08:48:26AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Batch of completely unrelated fixes.
>
> Thanks
>
> Leon Romanovsky (2):
> RDMA/rxe: Return CQE error if invalid lkey was supplied
>
> Maor Gottlieb (2):
> RDMA/mlx5: Verify that DM operation is reasonable
> RDMA/mlx5: Recover from fatal event in dual port mode
>
> Shay Drory (1):
> RDMA/core: Don't access cm_id after its destruction
These applied to for-rc
> RDMA/core: Simplify addition of restrack object
This one will have to go to -next
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-25 13:42 ` Jason Gunthorpe
2021-06-02 6:36 ` Leon Romanovsky
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-25 13:42 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
Maor Gottlieb, Shay Drory, Zhu Yanjun
On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:
> @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> if (ret)
> goto err2;
>
> - if (!cma_any_addr(addr))
> - rdma_restrack_add(&id_priv->res);
> return 0;
> err2:
> if (id_priv->cma_dev)
> cma_release_dev(id_priv);
> + if (!cma_any_addr(addr))
> + rdma_restrack_del(&id_priv->res);
But this whole thing is reverting an earlier patch - the whole point
was to avoid the restrack_del().
Plus this is out of order the del has to be before the release, due to
the other recent patch.
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
2021-05-25 13:42 ` Jason Gunthorpe
@ 2021-06-02 6:36 ` Leon Romanovsky
0 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-06-02 6:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
On Tue, May 25, 2021 at 10:42:53AM -0300, Jason Gunthorpe wrote:
> On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:
>
> > @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> > if (ret)
> > goto err2;
> >
> > - if (!cma_any_addr(addr))
> > - rdma_restrack_add(&id_priv->res);
> > return 0;
> > err2:
> > if (id_priv->cma_dev)
> > cma_release_dev(id_priv);
> > + if (!cma_any_addr(addr))
> > + rdma_restrack_del(&id_priv->res);
>
> But this whole thing is reverting an earlier patch - the whole point
> was to avoid the restrack_del().
Not really, we are calling to rdma_restrack_add() only in "if (!cma_any_addr(addr))"
flow, so the patch does the same as the earlier one, but in more compact way.
>
> Plus this is out of order the del has to be before the release, due to
> the other recent patch.
Shay pointed it to me too, I will fix.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-06-02 6:37 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
2021-05-25 13:42 ` Jason Gunthorpe
2021-06-02 6:36 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
2021-05-11 6:12 ` Zhu Yanjun
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.