All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib
@ 2021-06-10  7:34 Leon Romanovsky
  2021-06-10  7:34 ` [PATCH rdma-rc 1/3] RDMA: Verify port when creating flow rule Leon Romanovsky
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Leon Romanovsky @ 2021-06-10  7:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Aharon Landau, Alaa Hleihel, linux-kernel,
	linux-rdma, Maor Gottlieb

From: Leon Romanovsky <leonro@nvidia.com>

Hi,

This is unrelated batch of fixes to mlx5_ib.

Thanks

Aharon Landau (1):
  RDMA/mlx5: Delete right entry from MR signature database

Alaa Hleihel (1):
  IB/mlx5: Fix initializing CQ fragments buffer

Maor Gottlieb (1):
  RDMA: Verify port when creating flow rule

 drivers/infiniband/core/uverbs_cmd.c | 5 +++++
 drivers/infiniband/hw/mlx4/main.c    | 3 ---
 drivers/infiniband/hw/mlx5/cq.c      | 9 ++++-----
 drivers/infiniband/hw/mlx5/fs.c      | 5 ++---
 drivers/infiniband/hw/mlx5/mr.c      | 4 ++--
 5 files changed, 13 insertions(+), 13 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH rdma-rc 1/3] RDMA: Verify port when creating flow rule
  2021-06-10  7:34 [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Leon Romanovsky
@ 2021-06-10  7:34 ` Leon Romanovsky
  2021-06-10  7:34 ` [PATCH rdma-rc 2/3] RDMA/mlx5: Delete right entry from MR signature database Leon Romanovsky
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2021-06-10  7:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Maor Gottlieb, Aharon Landau, Alaa Hleihel, linux-kernel,
	linux-rdma, Mark Bloch

From: Maor Gottlieb <maorg@nvidia.com>

Validate port value provided by the user and with that remove no
longer needed validation by the driver.
The missing check in the mlx5_ib driver could cause to the below oops.

Call trace:
  _create_flow_rule+0x2d4/0xf28 [mlx5_ib]
  mlx5_ib_create_flow+0x2d0/0x5b0 [mlx5_ib]
  ib_uverbs_ex_create_flow+0x4cc/0x624 [ib_uverbs]
  ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xd4/0x150 [ib_uverbs]
  ib_uverbs_cmd_verbs.isra.7+0xb28/0xc50 [ib_uverbs]
  ib_uverbs_ioctl+0x158/0x1d0 [ib_uverbs]
  do_vfs_ioctl+0xd0/0xaf0
  ksys_ioctl+0x84/0xb4
  __arm64_sys_ioctl+0x28/0xc4
  el0_svc_common.constprop.3+0xa4/0x254
  el0_svc_handler+0x84/0xa0
  el0_svc+0x10/0x26c
 Code: b9401260 f9615681 51000400 8b001c20 (f9403c1a)
 ---[ end trace 1b5ffb34e3a14d2b ]---

Fixes: 436f2ad05a0b ("IB/core: Export ib_create/destroy_flow through uverbs")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
This is new version of previously posted patch here
https://lore.kernel.org/linux-rdma/07ddc8516a0e53e54e3cf5cbbff19cac6cda3d82.1623129061.git.leonro@nvidia.com
---
 drivers/infiniband/core/uverbs_cmd.c | 5 +++++
 drivers/infiniband/hw/mlx4/main.c    | 3 ---
 drivers/infiniband/hw/mlx5/fs.c      | 5 ++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index d5e15a8c870d..64e4be1cbec7 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -3248,6 +3248,11 @@ static int ib_uverbs_ex_create_flow(struct uverbs_attr_bundle *attrs)
 		goto err_free_attr;
 	}
 
+	if (!rdma_is_port_valid(uobj->context->device, cmd.flow_attr.port)) {
+		err = -EINVAL;
+		goto err_uobj;
+	}
+
 	qp = uobj_get_obj_read(qp, UVERBS_OBJECT_QP, cmd.qp_handle, attrs);
 	if (!qp) {
 		err = -EINVAL;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 16704262fc3a..230a6ae0ab5a 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -1699,9 +1699,6 @@ static struct ib_flow *mlx4_ib_create_flow(struct ib_qp *qp,
 	struct mlx4_dev *dev = (to_mdev(qp->device))->dev;
 	int is_bonded = mlx4_is_bonded(dev);
 
-	if (!rdma_is_port_valid(qp->device, flow_attr->port))
-		return ERR_PTR(-EINVAL);
-
 	if (flow_attr->flags & ~IB_FLOW_ATTR_FLAGS_DONT_TRAP)
 		return ERR_PTR(-EOPNOTSUPP);
 
diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c
index f84441ff0c81..18ee2f293825 100644
--- a/drivers/infiniband/hw/mlx5/fs.c
+++ b/drivers/infiniband/hw/mlx5/fs.c
@@ -1194,9 +1194,8 @@ static struct ib_flow *mlx5_ib_create_flow(struct ib_qp *qp,
 		goto free_ucmd;
 	}
 
-	if (flow_attr->port > dev->num_ports ||
-	    (flow_attr->flags &
-	     ~(IB_FLOW_ATTR_FLAGS_DONT_TRAP | IB_FLOW_ATTR_FLAGS_EGRESS))) {
+	if (flow_attr->flags &
+	    ~(IB_FLOW_ATTR_FLAGS_DONT_TRAP | IB_FLOW_ATTR_FLAGS_EGRESS)) {
 		err = -EINVAL;
 		goto free_ucmd;
 	}
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH rdma-rc 2/3] RDMA/mlx5: Delete right entry from MR signature database
  2021-06-10  7:34 [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Leon Romanovsky
  2021-06-10  7:34 ` [PATCH rdma-rc 1/3] RDMA: Verify port when creating flow rule Leon Romanovsky
@ 2021-06-10  7:34 ` Leon Romanovsky
  2021-06-10  7:34 ` [PATCH rdma-rc 3/3] IB/mlx5: Fix initializing CQ fragments buffer Leon Romanovsky
  2021-06-10 12:38 ` [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2021-06-10  7:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Aharon Landau, Alaa Hleihel, linux-kernel, linux-rdma, Maor Gottlieb

From: Aharon Landau <aharonl@nvidia.com>

The value mr->sig is stored in the entry upon mr allocation, however, ibmr
is entered here as "old", therefore, xa_cmpxchg() does not replace the
entry with NULL, which leads to the following trace:

 WARNING: CPU: 28 PID: 2078 at drivers/infiniband/hw/mlx5/main.c:3643 mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib]
 Modules linked in: nvme_rdma nvme_fabrics nvme_core 8021q garp mrp bonding bridge stp llc rfkill rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_tad
 CPU: 28 PID: 2078 Comm: reboot Tainted: G               X --------- ---  5.13.0-0.rc2.19.el9.x86_64 #1
 Hardware name: Dell Inc. PowerEdge R430/03XKDV, BIOS 2.9.1 12/07/2018
 RIP: 0010:mlx5_ib_stage_init_cleanup+0x4d/0x60 [mlx5_ib]
 Code: 8d bb 70 1f 00 00 be 00 01 00 00 e8 9d 94 ce da 48 3d 00 01 00 00 75 02 5b c3 0f 0b 5b c3 0f 0b 48 83 bb b0 20 00 00 00 74 d5 <0f> 0b eb d1 4
 RSP: 0018:ffffa8db06d33c90 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: ffff97f890a44000 RCX: ffff97f900ec0160
 RDX: 0000000000000000 RSI: 0000000080080001 RDI: ffff97f890a44000
 RBP: ffffffffc0c189b8 R08: 0000000000000001 R09: 0000000000000000
 R10: 0000000000000001 R11: 0000000000000300 R12: ffff97f890a44000
 R13: ffffffffc0c36030 R14: 00000000fee1dead R15: 0000000000000000
 FS:  00007f0d5a8a3b40(0000) GS:ffff98077fb80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000555acbf4f450 CR3: 00000002a6f56002 CR4: 00000000001706e0
 Call Trace:
  mlx5r_remove+0x39/0x60 [mlx5_ib]
  auxiliary_bus_remove+0x1b/0x30
  __device_release_driver+0x17a/0x230
  device_release_driver+0x24/0x30
  bus_remove_device+0xdb/0x140
  device_del+0x18b/0x3e0
  mlx5_detach_device+0x59/0x90 [mlx5_core]
  mlx5_unload_one+0x22/0x60 [mlx5_core]
  shutdown+0x31/0x3a [mlx5_core]
  pci_device_shutdown+0x34/0x60
  device_shutdown+0x15b/0x1c0
  __do_sys_reboot.cold+0x2f/0x5b
  ? vfs_writev+0xc7/0x140
  ? handle_mm_fault+0xc5/0x290
  ? do_writev+0x6b/0x110
  do_syscall_64+0x40/0x80
  entry_SYSCALL_64_after_hwframe+0x44/0xae
 RIP: 0033:0x7f0d5b5132e7
 Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 89 fa be 69 19 12 28 bf ad de e1 fe b8 a9 00 00 00 0f 05 <48> 3d 00 f0 8
 RSP: 002b:00007ffd7c7b8388 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f0d5b5132e7
 RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
 RBP: 00007ffd7c7b83d0 R08: 000000000000000a R09: 0000000000000000
 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000006
 R13: 00007ffd7c7b8578 R14: 0000564da83690bd R15: 000000000000000

Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()")
Signed-off-by: Aharon Landau <aharonl@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 9662cd39c7ff..425423dfac72 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -1940,8 +1940,8 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr, struct ib_udata *udata)
 		mlx5r_deref_wait_odp_mkey(&mr->mmkey);
 
 	if (ibmr->type == IB_MR_TYPE_INTEGRITY) {
-		xa_cmpxchg(&dev->sig_mrs, mlx5_base_mkey(mr->mmkey.key), ibmr,
-			   NULL, GFP_KERNEL);
+		xa_cmpxchg(&dev->sig_mrs, mlx5_base_mkey(mr->mmkey.key),
+			   mr->sig, NULL, GFP_KERNEL);
 
 		if (mr->mtt_mr) {
 			rc = mlx5_ib_dereg_mr(&mr->mtt_mr->ibmr, NULL);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH rdma-rc 3/3] IB/mlx5: Fix initializing CQ fragments buffer
  2021-06-10  7:34 [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Leon Romanovsky
  2021-06-10  7:34 ` [PATCH rdma-rc 1/3] RDMA: Verify port when creating flow rule Leon Romanovsky
  2021-06-10  7:34 ` [PATCH rdma-rc 2/3] RDMA/mlx5: Delete right entry from MR signature database Leon Romanovsky
@ 2021-06-10  7:34 ` Leon Romanovsky
  2021-06-10 12:38 ` [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Leon Romanovsky @ 2021-06-10  7:34 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Alaa Hleihel, Aharon Landau, linux-kernel, linux-rdma, Maor Gottlieb

From: Alaa Hleihel <alaa@nvidia.com>

Function init_cq_frag_buf() can be called to initialize the current CQ
fragments buffer cq->buf, or the temporary cq->resize_buf that is filled
during CQ resize operation.

However, the offending commit started to use function get_cqe() for
getting the CQEs, the issue with this change is that get_cqe() always
returns CQEs from cq->buf, which leads us to initialize the wrong
buffer, and in case of enlarging the CQ we try to access elements beyond
the size of the current cq->buf and eventually hit a kernel panic.

 [exception RIP: init_cq_frag_buf+103]
  [ffff9f799ddcbcd8] mlx5_ib_resize_cq at ffffffffc0835d60 [mlx5_ib]
  [ffff9f799ddcbdb0] ib_resize_cq at ffffffffc05270df [ib_core]
  [ffff9f799ddcbdc0] llt_rdma_setup_qp at ffffffffc0a6a712 [llt]
  [ffff9f799ddcbe10] llt_rdma_cc_event_action at ffffffffc0a6b411 [llt]
  [ffff9f799ddcbe98] llt_rdma_client_conn_thread at ffffffffc0a6bb75 [llt]
  [ffff9f799ddcbec8] kthread at ffffffffa66c5da1
  [ffff9f799ddcbf50] ret_from_fork_nospec_begin at ffffffffa6d95ddd

Fix it by getting the needed CQE by calling mlx5_frag_buf_get_wqe() that
takes the correct source buffer as a parameter.

Fixes: 388ca8be0037 ("IB/mlx5: Implement fragmented completion queue (CQ)")
Signed-off-by: Alaa Hleihel <alaa@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/cq.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index eb92cefffd77..9ce01f729673 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -849,15 +849,14 @@ static void destroy_cq_user(struct mlx5_ib_cq *cq, struct ib_udata *udata)
 	ib_umem_release(cq->buf.umem);
 }
 
-static void init_cq_frag_buf(struct mlx5_ib_cq *cq,
-			     struct mlx5_ib_cq_buf *buf)
+static void init_cq_frag_buf(struct mlx5_ib_cq_buf *buf)
 {
 	int i;
 	void *cqe;
 	struct mlx5_cqe64 *cqe64;
 
 	for (i = 0; i < buf->nent; i++) {
-		cqe = get_cqe(cq, i);
+		cqe = mlx5_frag_buf_get_wqe(&buf->fbc, i);
 		cqe64 = buf->cqe_size == 64 ? cqe : cqe + 64;
 		cqe64->op_own = MLX5_CQE_INVALID << 4;
 	}
@@ -883,7 +882,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	if (err)
 		goto err_db;
 
-	init_cq_frag_buf(cq, &cq->buf);
+	init_cq_frag_buf(&cq->buf);
 
 	*inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
 		 MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) *
@@ -1184,7 +1183,7 @@ static int resize_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_cq *cq,
 	if (err)
 		goto ex;
 
-	init_cq_frag_buf(cq, cq->resize_buf);
+	init_cq_frag_buf(cq->resize_buf);
 
 	return 0;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib
  2021-06-10  7:34 [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Leon Romanovsky
                   ` (2 preceding siblings ...)
  2021-06-10  7:34 ` [PATCH rdma-rc 3/3] IB/mlx5: Fix initializing CQ fragments buffer Leon Romanovsky
@ 2021-06-10 12:38 ` Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2021-06-10 12:38 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Aharon Landau, Alaa Hleihel,
	linux-kernel, linux-rdma, Maor Gottlieb

On Thu, Jun 10, 2021 at 10:34:24AM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Hi,
> 
> This is unrelated batch of fixes to mlx5_ib.
> 
> Thanks
> 
> Aharon Landau (1):
>   RDMA/mlx5: Delete right entry from MR signature database
> 
> Alaa Hleihel (1):
>   IB/mlx5: Fix initializing CQ fragments buffer
> 
> Maor Gottlieb (1):
>   RDMA: Verify port when creating flow rule

Applied to for-rc, thanks

Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-06-10 12:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-10  7:34 [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Leon Romanovsky
2021-06-10  7:34 ` [PATCH rdma-rc 1/3] RDMA: Verify port when creating flow rule Leon Romanovsky
2021-06-10  7:34 ` [PATCH rdma-rc 2/3] RDMA/mlx5: Delete right entry from MR signature database Leon Romanovsky
2021-06-10  7:34 ` [PATCH rdma-rc 3/3] IB/mlx5: Fix initializing CQ fragments buffer Leon Romanovsky
2021-06-10 12:38 ` [PATCH rdma-rc 0/3] Collection of small fixes to mlx5_ib Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.