* [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
` (4 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun
From: Maor Gottlieb <maorg@nvidia.com>
Fix the below complain from smatch by verify that DM operation is not
greater than 31.
divers/infiniband/hw/mlx5/dm.c:220 mlx5_ib_handler_MLX5_IB_METHOD_DM_MAP_OP_ADDR()
error: undefined (user controlled) shift '(((1))) << op'
Fixes: cea85fa5dbc2 ("RDMA/mlx5: Add support in MEMIC operations")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/dm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/dm.c b/drivers/infiniband/hw/mlx5/dm.c
index 094bf85589db..001d766cf291 100644
--- a/drivers/infiniband/hw/mlx5/dm.c
+++ b/drivers/infiniband/hw/mlx5/dm.c
@@ -217,6 +217,9 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_DM_MAP_OP_ADDR)(
if (err)
return err;
+ if (op >= BITS_PER_TYPE(u32))
+ return -EOPNOTSUPP;
+
if (!(MLX5_CAP_DEV_MEM(dev->mdev, memic_operations) & BIT(op)))
return -EOPNOTSUPP;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Shay Drory, Dan Carpenter, linux-rdma, Maor Gottlieb, Zhu Yanjun
From: Shay Drory <shayd@nvidia.com>
The idea that restrack needs asymmetrical delete routine was proven as
wrong. It caused to try and access ib_device after it was already
disconnected from the cma_dev.
As a solution, move rdma_restrack_del() to be before id_priv destruction.
Found by syzcaller:
BUG: KASAN: wild-memory-access in __list_del include/linux/list.h:112 [inline]
BUG: KASAN: wild-memory-access in __list_del_entry include/linux/list.h:135 [inline]
BUG: KASAN: wild-memory-access in list_del include/linux/list.h:146 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
BUG: KASAN: wild-memory-access in cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
Write of size 8 at addr dead000000000108 by task syz-executor716/334
CPU: 0 PID: 334 Comm: syz-executor716 Not tainted 5.11.0+ #271
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:79 [inline]
dump_stack+0xbe/0xf9 lib/dump_stack.c:120
__kasan_report mm/kasan/report.c:400 [inline]
kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
__list_del include/linux/list.h:112 [inline]
__list_del_entry include/linux/list.h:135 [inline]
list_del include/linux/list.h:146 [inline]
cma_cancel_listens drivers/infiniband/core/cma.c:1767 [inline]
cma_cancel_operation drivers/infiniband/core/cma.c:1795 [inline]
cma_cancel_operation+0x1f4/0x4b0 drivers/infiniband/core/cma.c:1783
_destroy_id+0x29/0x460 drivers/infiniband/core/cma.c:1862
ucma_close_id+0x36/0x50 drivers/infiniband/core/ucma.c:185
ucma_destroy_private_ctx+0x58d/0x5b0 drivers/infiniband/core/ucma.c:576
ucma_close+0x91/0xd0 drivers/infiniband/core/ucma.c:1797
__fput+0x169/0x540 fs/file_table.c:280
task_work_run+0xb7/0x100 kernel/task_work.c:140
exit_task_work include/linux/task_work.h:30 [inline]
do_exit+0x7da/0x17f0 kernel/exit.c:825
do_group_exit+0x9e/0x190 kernel/exit.c:922
__do_sys_exit_group kernel/exit.c:933 [inline]
__se_sys_exit_group kernel/exit.c:931 [inline]
__x64_sys_exit_group+0x2d/0x30 kernel/exit.c:931
do_syscall_64+0x2d/0x40 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x44a326
Code: Unable to access opcode bytes at RIP 0x44a2fc.
RSP: 002b:00007ffd14306748 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00000000006c4490 RCX: 000000000044a326
RDX: 0000000000000001 RSI: 000000000000003c RDI: 0000000000000001
RBP: 0000000000000001 R08: 00000000000000e7 R09: ffffffffffffffc0
R10: bb1414ac000000c2 R11: 0000000000000246 R12: 00000000006c4490
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
==================================================================
Fixes: 255d0c14b375 ("RDMA/cma: rdma_bind_addr() leaks a cma_dev reference count")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cma.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 2b9ffc21cbc4..ab148a696c0c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -473,6 +473,7 @@ static void cma_release_dev(struct rdma_id_private *id_priv)
list_del(&id_priv->list);
cma_dev_put(id_priv->cma_dev);
id_priv->cma_dev = NULL;
+ id_priv->id.device = NULL;
if (id_priv->id.route.addr.dev_addr.sgid_attr) {
rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr);
id_priv->id.route.addr.dev_addr.sgid_attr = NULL;
@@ -1860,6 +1861,7 @@ static void _destroy_id(struct rdma_id_private *id_priv,
iw_destroy_cm_id(id_priv->cm_id.iw);
}
cma_leave_mc_groups(id_priv);
+ rdma_restrack_del(&id_priv->res);
cma_release_dev(id_priv);
}
@@ -1873,7 +1875,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,
kfree(id_priv->id.route.path_rec);
put_net(id_priv->id.route.addr.dev_addr.net);
- rdma_restrack_del(&id_priv->res);
kfree(id_priv);
}
@@ -3774,7 +3775,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
}
id_priv->backlog = backlog;
- if (id->device) {
+ if (id_priv->cma_dev) {
if (rdma_cap_ib_cm(id->device, 1)) {
ret = cma_ib_listen(id_priv);
if (ret)
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Verify that DM operation is reasonable Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 2/5] RDMA/core: Don't access cm_id after its destruction Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Maor Gottlieb, Dan Carpenter, linux-rdma, Shay Drory, Zhu Yanjun
From: Maor Gottlieb <maorg@nvidia.com>
When there is fatal event on the slave port, the device is marked
as not active. We need to mark it as active again when the slave
is recovered.
Fixes: d69a24e03659 ("IB/mlx5: Move IB event processing onto a workqueue")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/hw/mlx5/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6d1dd09a4388..644d5d0ac544 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -4419,6 +4419,7 @@ static int mlx5r_mp_probe(struct auxiliary_device *adev,
if (bound) {
rdma_roce_rescan_device(&dev->ib_dev);
+ mpi->ibdev->ib_active = true;
break;
}
}
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
` (2 preceding siblings ...)
2021-05-11 5:48 ` [PATCH rdma-rc 3/5] RDMA/mlx5: Recover from fatal event in dual port mode Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-25 13:42 ` Jason Gunthorpe
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
From: Leon Romanovsky <leonro@nvidia.com>
Change location of rdma_restrack_add() callers to be near attachment
to device logic.
Fixes: cb5cd0ea4eb3 ("RDMA/core: Add CM to restrack after successful attachment to a device")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/core/cma.c | 13 ++++---------
1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index ab148a696c0c..bdc645f9c692 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -462,6 +462,7 @@ static void cma_attach_to_dev(struct rdma_id_private *id_priv,
struct cma_device *cma_dev)
{
_cma_attach_to_dev(id_priv, cma_dev);
+ rdma_restrack_add(&id_priv->res);
id_priv->gid_type =
cma_dev->default_gid_type[id_priv->id.port_num -
rdma_start_port(cma_dev->device)];
@@ -691,7 +692,6 @@ static int cma_ib_acquire_dev(struct rdma_id_private *id_priv,
mutex_lock(&lock);
cma_attach_to_dev(id_priv, listen_id_priv->cma_dev);
mutex_unlock(&lock);
- rdma_restrack_add(&id_priv->res);
return 0;
}
@@ -746,10 +746,8 @@ static int cma_iw_acquire_dev(struct rdma_id_private *id_priv,
}
out:
- if (!ret) {
+ if (!ret)
cma_attach_to_dev(id_priv, cma_dev);
- rdma_restrack_add(&id_priv->res);
- }
mutex_unlock(&lock);
return ret;
@@ -810,7 +808,6 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
found:
cma_attach_to_dev(id_priv, cma_dev);
- rdma_restrack_add(&id_priv->res);
mutex_unlock(&lock);
addr = (struct sockaddr_ib *)cma_src_addr(id_priv);
memcpy(&addr->sib_addr, &sgid, sizeof(sgid));
@@ -3208,7 +3205,6 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
ib_addr_set_pkey(&id_priv->id.route.addr.dev_addr, pkey);
id_priv->id.port_num = p;
cma_attach_to_dev(id_priv, cma_dev);
- rdma_restrack_add(&id_priv->res);
cma_set_loopback(cma_src_addr(id_priv));
out:
mutex_unlock(&lock);
@@ -3241,7 +3237,6 @@ static void addr_handler(int status, struct sockaddr *src_addr,
if (status)
pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to acquire device. status %d\n",
status);
- rdma_restrack_add(&id_priv->res);
} else if (status) {
pr_debug_ratelimited("RDMA CM: ADDR_ERROR: failed to resolve IP. status %d\n", status);
}
@@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
if (ret)
goto err2;
- if (!cma_any_addr(addr))
- rdma_restrack_add(&id_priv->res);
return 0;
err2:
if (id_priv->cma_dev)
cma_release_dev(id_priv);
+ if (!cma_any_addr(addr))
+ rdma_restrack_del(&id_priv->res);
err1:
cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_IDLE);
return ret;
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-25 13:42 ` Jason Gunthorpe
2021-06-02 6:36 ` Leon Romanovsky
0 siblings, 1 reply; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-25 13:42 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
Maor Gottlieb, Shay Drory, Zhu Yanjun
On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:
> @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> if (ret)
> goto err2;
>
> - if (!cma_any_addr(addr))
> - rdma_restrack_add(&id_priv->res);
> return 0;
> err2:
> if (id_priv->cma_dev)
> cma_release_dev(id_priv);
> + if (!cma_any_addr(addr))
> + rdma_restrack_del(&id_priv->res);
But this whole thing is reverting an earlier patch - the whole point
was to avoid the restrack_del().
Plus this is out of order the del has to be before the release, due to
the other recent patch.
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object
2021-05-25 13:42 ` Jason Gunthorpe
@ 2021-06-02 6:36 ` Leon Romanovsky
0 siblings, 0 replies; 10+ messages in thread
From: Leon Romanovsky @ 2021-06-02 6:36 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Doug Ledford, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
On Tue, May 25, 2021 at 10:42:53AM -0300, Jason Gunthorpe wrote:
> On Tue, May 11, 2021 at 08:48:30AM +0300, Leon Romanovsky wrote:
>
> > @@ -3853,12 +3848,12 @@ int rdma_bind_addr(struct rdma_cm_id *id, struct sockaddr *addr)
> > if (ret)
> > goto err2;
> >
> > - if (!cma_any_addr(addr))
> > - rdma_restrack_add(&id_priv->res);
> > return 0;
> > err2:
> > if (id_priv->cma_dev)
> > cma_release_dev(id_priv);
> > + if (!cma_any_addr(addr))
> > + rdma_restrack_del(&id_priv->res);
>
> But this whole thing is reverting an earlier patch - the whole point
> was to avoid the restrack_del().
Not really, we are calling to rdma_restrack_add() only in "if (!cma_any_addr(addr))"
flow, so the patch does the same as the earlier one, but in more compact way.
>
> Plus this is out of order the del has to be before the release, due to
> the other recent patch.
Shay pointed it to me too, I will fix.
Thanks
>
> Jason
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
` (3 preceding siblings ...)
2021-05-11 5:48 ` [PATCH rdma-rc 4/5] RDMA/core: Simplify addition of restrack object Leon Romanovsky
@ 2021-05-11 5:48 ` Leon Romanovsky
2021-05-11 6:12 ` Zhu Yanjun
2021-05-18 17:40 ` [PATCH rdma-rc 0/5] RDMA fixes Jason Gunthorpe
5 siblings, 1 reply; 10+ messages in thread
From: Leon Romanovsky @ 2021-05-11 5:48 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe
Cc: Leon Romanovsky, Dan Carpenter, linux-rdma, Maor Gottlieb,
Shay Drory, Zhu Yanjun
From: Leon Romanovsky <leonro@nvidia.com>
The RXE is missing update of WQE status in LOCAL_WRITE failures.
This caused to the following kernel panic if someone sent atomic
operation with explicitly wrong lkey.
[leonro@vm ~]$ mkt test
test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [ 43.860977] ------------[ cut here ]------------
WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
RSP: 0018:ffff8880158af090 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
FS: 00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? lock_release+0x1f9/0x6c0
? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? lock_downgrade+0x6d0/0x6d0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
rxe_do_task+0x130/0x230 [rdma_rxe]
? _raw_spin_unlock_irqrestore+0x2d/0x40
rxe_rcv+0xb11/0x1df0 [rdma_rxe]
? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
rxe_loopback+0x157/0x1e0 [rdma_rxe]
? rxe_send+0x520/0x520 [rdma_rxe]
? lockdep_hardirqs_on_prepare+0x273/0x3e0
rxe_responder+0x5532/0x7620 [rdma_rxe]
? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
? lock_is_held_type+0x98/0x110
? find_held_lock+0x2d/0x110
? lock_release+0x1f9/0x6c0
? rxe_do_task+0xe5/0x230 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
rxe_do_task+0x130/0x230 [rdma_rxe]
rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
? __kmalloc_track_caller+0x174/0x390
? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
rxe_loopback+0x157/0x1e0 [rdma_rxe]
? rxe_send+0x520/0x520 [rdma_rxe]
rxe_requester+0x1efd/0x58c0 [rdma_rxe]
? lock_is_held_type+0x98/0x110
? find_held_lock+0x2d/0x110
? rnr_nak_timer+0x70/0x70 [rdma_rxe]
? lock_release+0x1f9/0x6c0
? rxe_do_task+0xe5/0x230 [rdma_rxe]
? lock_downgrade+0x6d0/0x6d0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? _raw_spin_unlock_irqrestore+0x2d/0x40
? rnr_nak_timer+0x70/0x70 [rdma_rxe]
rxe_do_task+0x130/0x230 [rdma_rxe]
? rxe_poll_cq+0x450/0x450 [rdma_rxe]
rxe_post_send+0x998/0x1860 [rdma_rxe]
? lock_is_held_type+0x98/0x110
? lock_is_held_type+0x98/0x110
? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
? rxe_poll_cq+0x450/0x450 [rdma_rxe]
ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
? lock_acquire+0x1a9/0x6d0
? lock_is_held_type+0x98/0x110
? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
? lock_release+0x1f9/0x6c0
? __might_fault+0xba/0x160
? lock_downgrade+0x6d0/0x6d0
ib_uverbs_write+0x847/0xc80 [ib_uverbs]
? ib_uverbs_open+0x810/0x810 [ib_uverbs]
? vfs_fileattr_set+0x990/0x990
? __up_read+0x1a1/0x7b0
vfs_write+0x1c5/0x840
ksys_write+0x176/0x1d0
? __x64_sys_read+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x273/0x3e0
? syscall_enter_from_user_mode+0x1d/0x50
do_syscall_64+0x3f/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f88b64917a7
Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
irq event stamp: 809719
hardirqs last enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
softirqs last enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
---[ end trace 1e302e4b7857843b ]---
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 2af26737d32d..a6712e373eed 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
&wqe->dma, payload_addr(pkt),
payload_size(pkt), to_mr_obj, NULL);
- if (ret)
+ if (ret) {
+ wqe->status = IB_WC_LOC_PROT_ERR;
return COMPST_ERROR;
+ }
if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
return COMPST_COMP_ACK;
- else
- return COMPST_UPDATE_COMP;
+
+ return COMPST_UPDATE_COMP;
}
static inline enum comp_state do_atomic(struct rxe_qp *qp,
@@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
&wqe->dma, &atomic_orig,
sizeof(u64), to_mr_obj, NULL);
- if (ret)
+ if (ret) {
+ wqe->status = IB_WC_LOC_PROT_ERR;
return COMPST_ERROR;
- else
- return COMPST_COMP_ACK;
+ }
+
+ return COMPST_COMP_ACK;
}
static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
--
2.31.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-11 6:12 ` Zhu Yanjun
0 siblings, 0 replies; 10+ messages in thread
From: Zhu Yanjun @ 2021-05-11 6:12 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Jason Gunthorpe, Leon Romanovsky, Dan Carpenter,
RDMA mailing list, Maor Gottlieb, Shay Drory
On Tue, May 11, 2021 at 1:48 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Leon Romanovsky <leonro@nvidia.com>
>
> The RXE is missing update of WQE status in LOCAL_WRITE failures.
> This caused to the following kernel panic if someone sent atomic
> operation with explicitly wrong lkey.
>
> [leonro@vm ~]$ mkt test
> test_atomic_invalid_lkey (tests.test_atomic.AtomicTest) ... [ 43.860977] ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 263 at drivers/infiniband/sw/rxe/rxe_comp.c:740 rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
> Modules linked in: crc32_generic rdma_rxe ip6_udp_tunnel udp_tunnel rdma_ucm rdma_cm ib_umad ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core ptp pps_core
> CPU: 5 PID: 263 Comm: python3 Not tainted 5.13.0-rc1+ #2936
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
> RIP: 0010:rxe_completer+0x1a6d/0x2e30 [rdma_rxe]
> Code: 03 0f 8e 65 0e 00 00 3b 93 10 06 00 00 0f 84 82 0a 00 00 4c 89 ff 4c 89 44 24 38 e8 2d 74 a9 e1 4c 8b 44 24 38 e9 1c f5 ff ff <0f> 0b e9 0c e8 ff ff b8 05 00 00 00 41 bf 05 00 00 00 e9 ab e7 ff
> RSP: 0018:ffff8880158af090 EFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff888016a78000 RCX: ffffffffa0cf1652
> RDX: 1ffff9200004b442 RSI: 0000000000000004 RDI: ffffc9000025a210
> RBP: dffffc0000000000 R08: 00000000ffffffea R09: ffff88801617740b
> R10: ffffed1002c2ee81 R11: 0000000000000007 R12: ffff88800f3b63e8
> R13: ffff888016a78008 R14: ffffc9000025a180 R15: 000000000000000c
> FS: 00007f88b622a740(0000) GS:ffff88806d540000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f88b5a1fa10 CR3: 000000000d848004 CR4: 0000000000370ea0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> ? lock_release+0x1f9/0x6c0
> ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? lock_downgrade+0x6d0/0x6d0
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> ? rxe_comp_queue_pkt+0xb0/0xb0 [rdma_rxe]
> rxe_do_task+0x130/0x230 [rdma_rxe]
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> rxe_rcv+0xb11/0x1df0 [rdma_rxe]
> ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
> ? prepare_ack_packet+0x50b/0xa20 [rdma_rxe]
> rxe_loopback+0x157/0x1e0 [rdma_rxe]
> ? rxe_send+0x520/0x520 [rdma_rxe]
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> rxe_responder+0x5532/0x7620 [rdma_rxe]
> ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? rxe_crc32.isra.0+0x8e/0x120 [rdma_rxe]
> ? lock_is_held_type+0x98/0x110
> ? find_held_lock+0x2d/0x110
> ? lock_release+0x1f9/0x6c0
> ? rxe_do_task+0xe5/0x230 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? rxe_resp_queue_pkt+0x19/0xa0 [rdma_rxe]
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> ? rxe_resp_queue_pkt+0xa0/0xa0 [rdma_rxe]
> rxe_do_task+0x130/0x230 [rdma_rxe]
> rxe_rcv+0x9c8/0x1df0 [rdma_rxe]
> ? __kmalloc_track_caller+0x174/0x390
> ? rxe_crc32.isra.0+0x120/0x120 [rdma_rxe]
> rxe_loopback+0x157/0x1e0 [rdma_rxe]
> ? rxe_send+0x520/0x520 [rdma_rxe]
> rxe_requester+0x1efd/0x58c0 [rdma_rxe]
> ? lock_is_held_type+0x98/0x110
> ? find_held_lock+0x2d/0x110
> ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
> ? lock_release+0x1f9/0x6c0
> ? rxe_do_task+0xe5/0x230 [rdma_rxe]
> ? lock_downgrade+0x6d0/0x6d0
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? _raw_spin_unlock_irqrestore+0x2d/0x40
> ? rnr_nak_timer+0x70/0x70 [rdma_rxe]
> rxe_do_task+0x130/0x230 [rdma_rxe]
> ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
> rxe_post_send+0x998/0x1860 [rdma_rxe]
> ? lock_is_held_type+0x98/0x110
> ? lock_is_held_type+0x98/0x110
> ? rdma_lookup_get_uobject+0x22c/0x4a0 [ib_uverbs]
> ? rxe_poll_cq+0x450/0x450 [rdma_rxe]
> ib_uverbs_post_send+0xd5f/0x1220 [ib_uverbs]
> ? lock_acquire+0x1a9/0x6d0
> ? lock_is_held_type+0x98/0x110
> ? ib_uverbs_ex_create_wq+0xb00/0xb00 [ib_uverbs]
> ? lock_release+0x1f9/0x6c0
> ? __might_fault+0xba/0x160
> ? lock_downgrade+0x6d0/0x6d0
> ib_uverbs_write+0x847/0xc80 [ib_uverbs]
> ? ib_uverbs_open+0x810/0x810 [ib_uverbs]
> ? vfs_fileattr_set+0x990/0x990
> ? __up_read+0x1a1/0x7b0
> vfs_write+0x1c5/0x840
> ksys_write+0x176/0x1d0
> ? __x64_sys_read+0xb0/0xb0
> ? lockdep_hardirqs_on_prepare+0x273/0x3e0
> ? syscall_enter_from_user_mode+0x1d/0x50
> do_syscall_64+0x3f/0x80
> entry_SYSCALL_64_after_hwframe+0x44/0xae
> RIP: 0033:0x7f88b64917a7
> Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> RSP: 002b:00007ffee189e6c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> RAX: ffffffffffffffda RBX: 000055c17c8b2660 RCX: 00007f88b64917a7
> RDX: 0000000000000020 RSI: 00007ffee189e6e0 RDI: 0000000000000003
> RBP: 0000000000000000 R08: 00007f88b5cbbc80 R09: 00007f88b689fdc0
> R10: 00007f88b66a0510 R11: 0000000000000246 R12: 00007f88b6072180
> R13: 0000000000000000 R14: 00007f88b5e6e9d0 R15: 0000000000000008
> irq event stamp: 809719
> hardirqs last enabled at (809727): [<ffffffff813990c4>] console_unlock+0x434/0x850
> hardirqs last disabled at (809734): [<ffffffff81399279>] console_unlock+0x5e9/0x850
> softirqs last enabled at (809016): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
> softirqs last disabled at (808963): [<ffffffff8129089e>] irq_exit_rcu+0x11e/0x1a0
> ---[ end trace 1e302e4b7857843b ]---
>
> Fixes: 8700e3e7c485 ("Soft RoCE driver")
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Thanks,
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Zhu Yanjun
> ---
> drivers/infiniband/sw/rxe/rxe_comp.c | 16 ++++++++++------
> 1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> index 2af26737d32d..a6712e373eed 100644
> --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> @@ -346,13 +346,15 @@ static inline enum comp_state do_read(struct rxe_qp *qp,
> ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
> &wqe->dma, payload_addr(pkt),
> payload_size(pkt), to_mr_obj, NULL);
> - if (ret)
> + if (ret) {
> + wqe->status = IB_WC_LOC_PROT_ERR;
> return COMPST_ERROR;
> + }
>
> if (wqe->dma.resid == 0 && (pkt->mask & RXE_END_MASK))
> return COMPST_COMP_ACK;
> - else
> - return COMPST_UPDATE_COMP;
> +
> + return COMPST_UPDATE_COMP;
> }
>
> static inline enum comp_state do_atomic(struct rxe_qp *qp,
> @@ -366,10 +368,12 @@ static inline enum comp_state do_atomic(struct rxe_qp *qp,
> ret = copy_data(qp->pd, IB_ACCESS_LOCAL_WRITE,
> &wqe->dma, &atomic_orig,
> sizeof(u64), to_mr_obj, NULL);
> - if (ret)
> + if (ret) {
> + wqe->status = IB_WC_LOC_PROT_ERR;
> return COMPST_ERROR;
> - else
> - return COMPST_COMP_ACK;
> + }
> +
> + return COMPST_COMP_ACK;
> }
>
> static void make_send_cqe(struct rxe_qp *qp, struct rxe_send_wqe *wqe,
> --
> 2.31.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH rdma-rc 0/5] RDMA fixes
2021-05-11 5:48 [PATCH rdma-rc 0/5] RDMA fixes Leon Romanovsky
` (4 preceding siblings ...)
2021-05-11 5:48 ` [PATCH rdma-rc 5/5] RDMA/rxe: Return CQE error if invalid lkey was supplied Leon Romanovsky
@ 2021-05-18 17:40 ` Jason Gunthorpe
5 siblings, 0 replies; 10+ messages in thread
From: Jason Gunthorpe @ 2021-05-18 17:40 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Leon Romanovsky, Dan Carpenter, linux-rdma,
Maor Gottlieb, Shay Drory, Zhu Yanjun
On Tue, May 11, 2021 at 08:48:26AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
>
> Batch of completely unrelated fixes.
>
> Thanks
>
> Leon Romanovsky (2):
> RDMA/rxe: Return CQE error if invalid lkey was supplied
>
> Maor Gottlieb (2):
> RDMA/mlx5: Verify that DM operation is reasonable
> RDMA/mlx5: Recover from fatal event in dual port mode
>
> Shay Drory (1):
> RDMA/core: Don't access cm_id after its destruction
These applied to for-rc
> RDMA/core: Simplify addition of restrack object
This one will have to go to -next
Thanks,
Jason
^ permalink raw reply [flat|nested] 10+ messages in thread