All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rdma-rc 0/5] Fixes to v5.10
@ 2020-12-13 13:29 Leon Romanovsky
  2020-12-13 13:29 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Fix MR cache memory leak Leon Romanovsky
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-13 13:29 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Daniel Jurgens, linux-kernel, linux-rdma,
	Maor Gottlieb, Parav Pandit, Shay Drory

From: Leon Romanovsky <leonro@nvidia.com>

Hi,

This is another series with various fixes that can easily go to -next too.

Thanks

Leon Romanovsky (1):
  RDMA/cma: Don't overwrite sgid_attr after device is released

Maor Gottlieb (2):
  RDMA/mlx5: Fix MR cache memory leak
  RDMA/ucma: Fix memory leak of connection request

Shay Drory (2):
  IB/umad: Return EIO in case of when device disassociated
  IB/umad: Return EPOLLERR in case of when device disassociated

 drivers/infiniband/core/cma.c      | 7 ++++---
 drivers/infiniband/core/ucma.c     | 4 +++-
 drivers/infiniband/core/user_mad.c | 6 +++++-
 drivers/infiniband/hw/mlx5/mr.c    | 1 +
 4 files changed, 13 insertions(+), 5 deletions(-)

--
2.29.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH rdma-rc 1/5] RDMA/mlx5: Fix MR cache memory leak
  2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
@ 2020-12-13 13:29 ` Leon Romanovsky
  2020-12-13 13:29 ` [PATCH rdma-rc 2/5] IB/umad: Return EIO in case of when device disassociated Leon Romanovsky
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-13 13:29 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Maor Gottlieb, linux-rdma

From: Maor Gottlieb <maorg@nvidia.com>

If the MR cache entry invalidation failed, then we detach this entry
from the cache, therefore we must to free the memory as well.

  comm "python3", pid 15325, jiffies 4298483783 (age 1787.596s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 40 62 12 f4 82 88 ff ff  ........b……
    00 22 08 00 00 22 08 00 00 00 00 00 00 00 00 00  .“…”……….

  backtrace:

    [<00000000d8e423b0>] alloc_cache_mr+0x23/0xc0 [mlx5_ib]
    [<000000001f21304c>] create_cache_mr+0x3f/0xf0 [mlx5_ib]
    [<000000009d6b45dc>] mlx5_ib_alloc_implicit_mr+0x41/0×210 [mlx5_ib]
    [<00000000879d0d68>] mlx5_ib_reg_user_mr+0x9e/0×6e0 [mlx5_ib]
    [<00000000be74bf89>] create_qp+0x2fc/0xf00 [ib_uverbs]
    [<000000001a532d22>] ib_uverbs_handler_UVERBS_METHOD_COUNTERS_READ+0x1d9/0×230 [ib_uverbs]
    [<0000000070f46001>] rdma_alloc_commit_uobject+0xb5/0×120 [ib_uverbs]
    [<000000006d8a0b38>] uverbs_alloc+0x2b/0xf0 [ib_uverbs]
    [<00000000075217c9>] ksysioctl+0x234/0×7d0
    [<00000000eb5c120b>] __x64_sys_ioctl+0x16/0×20
    [<00000000db135b48>] do_syscall_64+0x59/0×2e0

Fixes: 1769c4c57548 ("RDMA/mlx5: Always remove MRs from the cache before destroying them")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/hw/mlx5/mr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c
index 80e3047110a8..479543ebf697 100644
--- a/drivers/infiniband/hw/mlx5/mr.c
+++ b/drivers/infiniband/hw/mlx5/mr.c
@@ -644,6 +644,7 @@ void mlx5_mr_cache_free(struct mlx5_ib_dev *dev, struct mlx5_ib_mr *mr)
 	if (mlx5_mr_cache_invalidate(mr)) {
 		detach_mr_from_cache(mr);
 		destroy_mkey(dev, mr);
+		kfree(mr);
 		return;
 	}

--
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-rc 2/5] IB/umad: Return EIO in case of when device disassociated
  2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
  2020-12-13 13:29 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Fix MR cache memory leak Leon Romanovsky
@ 2020-12-13 13:29 ` Leon Romanovsky
  2020-12-13 13:29 ` [PATCH rdma-rc 3/5] IB/umad: Return EPOLLERR " Leon Romanovsky
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-13 13:29 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Shay Drory, linux-rdma

From: Shay Drory <shayd@nvidia.com>

MAD message received by the user has EINVAL error in all flows
including when the device is disassociated. That makes it impossible
for the applications to treat such flow differently.

Change it to return EIO, so the applications will be able to perform
disassociation recovery.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/user_mad.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 19104a675691..b671d4aede77 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -524,7 +524,7 @@ static ssize_t ib_umad_write(struct file *filp, const char __user *buf,

 	agent = __get_agent(file, packet->mad.hdr.id);
 	if (!agent) {
-		ret = -EINVAL;
+		ret = -EIO;
 		goto err_up;
 	}

--
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-rc 3/5] IB/umad: Return EPOLLERR in case of when device disassociated
  2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
  2020-12-13 13:29 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Fix MR cache memory leak Leon Romanovsky
  2020-12-13 13:29 ` [PATCH rdma-rc 2/5] IB/umad: Return EIO in case of when device disassociated Leon Romanovsky
@ 2020-12-13 13:29 ` Leon Romanovsky
  2021-01-12 19:25   ` Jason Gunthorpe
  2020-12-13 13:29 ` [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released Leon Romanovsky
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-13 13:29 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Shay Drory, linux-rdma

From: Shay Drory <shayd@nvidia.com>

Currently, polling a umad device will always works, even if the device
was disassociated. Hence, returning EPOLLERR if device was
disassociated.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Shay Drory <shayd@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/user_mad.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index b671d4aede77..6681e9cf8a18 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -653,10 +653,14 @@ static __poll_t ib_umad_poll(struct file *filp, struct poll_table_struct *wait)
 	/* we will always be able to post a MAD send */
 	__poll_t mask = EPOLLOUT | EPOLLWRNORM;

+	mutex_lock(&file->mutex);
 	poll_wait(filp, &file->recv_wait, wait);

 	if (!list_empty(&file->recv_list))
 		mask |= EPOLLIN | EPOLLRDNORM;
+	if (file->agents_dead)
+		mask = EPOLLERR;
+	mutex_unlock(&file->mutex);

 	return mask;
 }
--
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released
  2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-12-13 13:29 ` [PATCH rdma-rc 3/5] IB/umad: Return EPOLLERR " Leon Romanovsky
@ 2020-12-13 13:29 ` Leon Romanovsky
  2020-12-14 19:26   ` Jason Gunthorpe
  2020-12-13 13:29 ` [PATCH rdma-rc 5/5] RDMA/ucma: Fix memory leak of connection request Leon Romanovsky
  2020-12-14 19:27 ` [PATCH rdma-rc 0/5] Fixes to v5.10 Jason Gunthorpe
  5 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-13 13:29 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, Daniel Jurgens, linux-rdma, Parav Pandit

From: Leon Romanovsky <leonro@nvidia.com>

As part of the cma_dev release, that pointer will be set to NULL.
In case it happens in rdma_bind_addr() (part of an error flow),
the next call to addr_handler() will have a call to cma_acquire_dev_by_src_ip()
which will overwrite sgid_attr without releasing it.

WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline]
WARNING: CPU: 2 PID: 108 at drivers/infiniband/core/cma.c:606 cma_acquire_dev_by_src_ip+0x470/0x4b0 drivers/infiniband/core/cma.c:649
CPU: 2 PID: 108 Comm: kworker/u8:1 Not tainted 5.10.0-rc6+ #257
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: ib_addr process_one_req
RIP: 0010:cma_bind_sgid_attr drivers/infiniband/core/cma.c:606 [inline]
RIP: 0010:cma_acquire_dev_by_src_ip+0x470/0&times;4b0 drivers/infiniband/core/cma.c:649
Code: 66 d9 4a ff 4d 8b 6e 10 49 8d bd 1c 08 00 00 e8 b6 d6 4a ff 45 0f b6 bd 1c 08 00 00 41 83 e7 01 e9 49 fd ff ff e8 90 c5 29 ff &lt;0f&gt; 0b e9 80 fe ff ff e8 84 c5 29 ff 4c 89 f7 e8 2c d9 4a ff 4d 8b

RSP: 0018:ffff8881047c7b40 EFLAGS: 00010293
RAX: ffff888104789c80 RBX: 0000000000000001 RCX: ffffffff820b8ef8
RDX: 0000000000000000 RSI: ffffffff820b9080 RDI: ffff88810cd4c998
RBP: ffff8881047c7c08 R08: ffff888104789c80 R09: ffffed10209f4036
R10: ffff888104fa01ab R11: ffffed10209f4035 R12: ffff88810cd4c800
R13: ffff888105750e28 R14: ffff888108f0a100 R15: ffff88810cd4c998
FS:  0000000000000000(0000) GS:ffff888119c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000104e60005 CR4: 0000000000370ea0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

Call Trace:
 addr_handler+0x266/0&times;350 drivers/infiniband/core/cma.c:3190
 process_one_req+0xa3/0&times;300 drivers/infiniband/core/addr.c:645
 process_one_work+0x54c/0&times;930 kernel/workqueue.c:2272
 worker_thread+0x82/0&times;830 kernel/workqueue.c:2418
 kthread+0x1ca/0&times;220 kernel/kthread.c:292
 ret_from_fork+0x1f/0&times;30 arch/x86/entry/entry_64.S:296

Fixes: ff11c6cd521f ("RDMA/cma: Introduce and use cma_acquire_dev_by_src_ip()")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/cma.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index bfdc2eaee351..e17ba841e204 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -491,6 +491,10 @@ static void cma_release_dev(struct rdma_id_private *id_priv)
 	list_del(&id_priv->list);
 	cma_dev_put(id_priv->cma_dev);
 	id_priv->cma_dev = NULL;
+	if (id_priv->id.route.addr.dev_addr.sgid_attr) {
+		rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr);
+		id_priv->id.route.addr.dev_addr.sgid_attr = NULL;
+	}
 	mutex_unlock(&lock);
 }

@@ -1877,9 +1881,6 @@ static void _destroy_id(struct rdma_id_private *id_priv,

 	kfree(id_priv->id.route.path_rec);

-	if (id_priv->id.route.addr.dev_addr.sgid_attr)
-		rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr);
-
 	put_net(id_priv->id.route.addr.dev_addr.net);
 	rdma_restrack_del(&id_priv->res);
 	kfree(id_priv);
--
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH rdma-rc 5/5] RDMA/ucma: Fix memory leak of connection request
  2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-12-13 13:29 ` [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released Leon Romanovsky
@ 2020-12-13 13:29 ` Leon Romanovsky
  2020-12-14 15:34   ` Jason Gunthorpe
  2020-12-14 19:27 ` [PATCH rdma-rc 0/5] Fixes to v5.10 Jason Gunthorpe
  5 siblings, 1 reply; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-13 13:29 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe; +Cc: Maor Gottlieb, linux-rdma

From: Maor Gottlieb <maorg@nvidia.com>

Add missing call to xa_erase when destroy connection request.
It fixes the below memory leak.

unreferenced object 0xffff88812a340490 (size 576):
comm “kworker/5:0”, pid 96291, jiffies 4296565270 (age 1835.596s)
hex dump (first 32 bytes):
00 20 03 00 00 00 00 00 00 00 00 00 00 00 00 00 . …………..
a0 d3 1a a0 ff ff ff ff a8 04 34 2a 81 88 ff ff ……….4*….
backtrace:
[<0000000059399d4c>] xas_alloc+0x94/0xb0
[<00000000d855673c>] xas_create+0x1f4/0×4c0
[<00000000336166d1>] xas_store+0x52/0×5e0
[<000000006b811da0>] __xa_alloc+0xab/0×140
[<00000000cf0e9936>] ucma_alloc_ctx+0x197/0×1f0 [rdma_ucm]
[<000000008f99b6bb>] ucma_event_handler+0x17b/0×2e0 [rdma_ucm]
[<000000000a07fc34>] cma_cm_event_handler+0x6f/0×390 [rdma_cm]
[<00000000fe05d574>] cma_ib_req_handler+0x1163/0×2370 [rdma_cm]
[<000000004516baf4>] cm_work_handler+0xeda/0×2340 [ib_cm]
[<000000008a83945b>] process_one_work+0x27c/0×610
[<00000000b71b71e2>] worker_thread+0x2d/0×3c0
[<00000000caab54ff>] kthread+0x125/0×140
[<000000004303d699>] ret_from_fork+0x1f/0×30

Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery")
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/infiniband/core/ucma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 7dab9a27a145..b0b9ea90a27d 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -549,8 +549,10 @@ static int ucma_free_ctx(struct ucma_context *ctx)
 	list_for_each_entry_safe(uevent, tmp, &list, list) {
 		list_del(&uevent->list);
 		if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST &&
-		    uevent->conn_req_ctx != ctx)
+		    uevent->conn_req_ctx != ctx) {
+			xa_erase(&ctx_table, uevent->conn_req_ctx->id);
 			__destroy_id(uevent->conn_req_ctx);
+		}
 		kfree(uevent);
 	}

--
2.29.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-rc 5/5] RDMA/ucma: Fix memory leak of connection request
  2020-12-13 13:29 ` [PATCH rdma-rc 5/5] RDMA/ucma: Fix memory leak of connection request Leon Romanovsky
@ 2020-12-14 15:34   ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2020-12-14 15:34 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Maor Gottlieb, linux-rdma

On Sun, Dec 13, 2020 at 03:29:40PM +0200, Leon Romanovsky wrote:
> From: Maor Gottlieb <maorg@nvidia.com>
> 
> Add missing call to xa_erase when destroy connection request.
> It fixes the below memory leak.
> 
> unreferenced object 0xffff88812a340490 (size 576):
> comm “kworker/5:0”, pid 96291, jiffies 4296565270 (age 1835.596s)
> hex dump (first 32 bytes):
> 00 20 03 00 00 00 00 00 00 00 00 00 00 00 00 00 . …………..
> a0 d3 1a a0 ff ff ff ff a8 04 34 2a 81 88 ff ff ……….4*….
> backtrace:
> [<0000000059399d4c>] xas_alloc+0x94/0xb0
> [<00000000d855673c>] xas_create+0x1f4/0×4c0
> [<00000000336166d1>] xas_store+0x52/0×5e0
> [<000000006b811da0>] __xa_alloc+0xab/0×140
> [<00000000cf0e9936>] ucma_alloc_ctx+0x197/0×1f0 [rdma_ucm]
> [<000000008f99b6bb>] ucma_event_handler+0x17b/0×2e0 [rdma_ucm]
> [<000000000a07fc34>] cma_cm_event_handler+0x6f/0×390 [rdma_cm]
> [<00000000fe05d574>] cma_ib_req_handler+0x1163/0×2370 [rdma_cm]
> [<000000004516baf4>] cm_work_handler+0xeda/0×2340 [ib_cm]
> [<000000008a83945b>] process_one_work+0x27c/0×610
> [<00000000b71b71e2>] worker_thread+0x2d/0×3c0
> [<00000000caab54ff>] kthread+0x125/0×140
> [<000000004303d699>] ret_from_fork+0x1f/0×30
> 
> Fixes: a1d33b70dbbc ("RDMA/ucma: Rework how new connections are passed through event delivery")
> Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>  drivers/infiniband/core/ucma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index 7dab9a27a145..b0b9ea90a27d 100644
> +++ b/drivers/infiniband/core/ucma.c
> @@ -549,8 +549,10 @@ static int ucma_free_ctx(struct ucma_context *ctx)
>  	list_for_each_entry_safe(uevent, tmp, &list, list) {
>  		list_del(&uevent->list);
>  		if (uevent->resp.event == RDMA_CM_EVENT_CONNECT_REQUEST &&
> -		    uevent->conn_req_ctx != ctx)
> +		    uevent->conn_req_ctx != ctx) {
> +			xa_erase(&ctx_table, uevent->conn_req_ctx->id);
>  			__destroy_id(uevent->conn_req_ctx);

Oooh, yes this is wrong, but this fix isn't right.

At this point ucma_finish_ctx() has been called so this must be
careful to avoid racing with parallel access from the FD side.

Actually all this destroy stuff still looks subtly wrong, sigh. Let me
try again.

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released
  2020-12-13 13:29 ` [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released Leon Romanovsky
@ 2020-12-14 19:26   ` Jason Gunthorpe
  2020-12-15  5:23     ` Leon Romanovsky
  0 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2020-12-14 19:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Daniel Jurgens, linux-rdma, Parav Pandit

On Sun, Dec 13, 2020 at 03:29:39PM +0200, Leon Romanovsky wrote:
 
> Call Trace:
>  addr_handler+0x266/0&times;350 drivers/infiniband/core/cma.c:3190
>  process_one_req+0xa3/0&times;300 drivers/infiniband/core/addr.c:645
>  process_one_work+0x54c/0&times;930 kernel/workqueue.c:2272
>  worker_thread+0x82/0&times;830 kernel/workqueue.c:2418
>  kthread+0x1ca/0&times;220 kernel/kthread.c:292
>  ret_from_fork+0x1f/0&times;30 arch/x86/entry/entry_64.S:296

Why has this been weirdly HTML escaped??? I fixed it..

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-rc 0/5] Fixes to v5.10
  2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-12-13 13:29 ` [PATCH rdma-rc 5/5] RDMA/ucma: Fix memory leak of connection request Leon Romanovsky
@ 2020-12-14 19:27 ` Jason Gunthorpe
  2020-12-15  5:25   ` Leon Romanovsky
  5 siblings, 1 reply; 12+ messages in thread
From: Jason Gunthorpe @ 2020-12-14 19:27 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, Daniel Jurgens, linux-kernel,
	linux-rdma, Maor Gottlieb, Parav Pandit, Shay Drory

On Sun, Dec 13, 2020 at 03:29:35PM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@nvidia.com>
> 
> Hi,
> 
> This is another series with various fixes that can easily go to -next too.
> 
> Thanks
> 
> Leon Romanovsky (1):
>   RDMA/cma: Don't overwrite sgid_attr after device is released
> 
> Maor Gottlieb (2):
>   RDMA/mlx5: Fix MR cache memory leak

Applied these two to for-next, thanks

>   RDMA/ucma: Fix memory leak of connection request
>   IB/umad: Return EIO in case of when device disassociated
>   IB/umad: Return EPOLLERR in case of when device disassociated

These ones can wait till next cycle

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released
  2020-12-14 19:26   ` Jason Gunthorpe
@ 2020-12-15  5:23     ` Leon Romanovsky
  0 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-15  5:23 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, Daniel Jurgens, linux-rdma, Parav Pandit

On Mon, Dec 14, 2020 at 03:26:36PM -0400, Jason Gunthorpe wrote:
> On Sun, Dec 13, 2020 at 03:29:39PM +0200, Leon Romanovsky wrote:
>
> > Call Trace:
> >  addr_handler+0x266/0&times;350 drivers/infiniband/core/cma.c:3190
> >  process_one_req+0xa3/0&times;300 drivers/infiniband/core/addr.c:645
> >  process_one_work+0x54c/0&times;930 kernel/workqueue.c:2272
> >  worker_thread+0x82/0&times;830 kernel/workqueue.c:2418
> >  kthread+0x1ca/0&times;220 kernel/kthread.c:292
> >  ret_from_fork+0x1f/0&times;30 arch/x86/entry/entry_64.S:296
>
> Why has this been weirdly HTML escaped??? I fixed it..

Ahh sorry, I fixed the lines in beginning of the dump, but missed these lines.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-rc 0/5] Fixes to v5.10
  2020-12-14 19:27 ` [PATCH rdma-rc 0/5] Fixes to v5.10 Jason Gunthorpe
@ 2020-12-15  5:25   ` Leon Romanovsky
  0 siblings, 0 replies; 12+ messages in thread
From: Leon Romanovsky @ 2020-12-15  5:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, Daniel Jurgens, linux-kernel, linux-rdma,
	Maor Gottlieb, Parav Pandit, Shay Drory

On Mon, Dec 14, 2020 at 03:27:22PM -0400, Jason Gunthorpe wrote:
> On Sun, Dec 13, 2020 at 03:29:35PM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro@nvidia.com>
> >
> > Hi,
> >
> > This is another series with various fixes that can easily go to -next too.
> >
> > Thanks
> >
> > Leon Romanovsky (1):
> >   RDMA/cma: Don't overwrite sgid_attr after device is released
> >
> > Maor Gottlieb (2):
> >   RDMA/mlx5: Fix MR cache memory leak
>
> Applied these two to for-next, thanks
>
> >   RDMA/ucma: Fix memory leak of connection request
> >   IB/umad: Return EIO in case of when device disassociated
> >   IB/umad: Return EPOLLERR in case of when device disassociated
>
> These ones can wait till next cycle

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH rdma-rc 3/5] IB/umad: Return EPOLLERR in case of when device disassociated
  2020-12-13 13:29 ` [PATCH rdma-rc 3/5] IB/umad: Return EPOLLERR " Leon Romanovsky
@ 2021-01-12 19:25   ` Jason Gunthorpe
  0 siblings, 0 replies; 12+ messages in thread
From: Jason Gunthorpe @ 2021-01-12 19:25 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, Shay Drory, linux-rdma

On Sun, Dec 13, 2020 at 03:29:38PM +0200, Leon Romanovsky wrote:
> From: Shay Drory <shayd@nvidia.com>
> 
> Currently, polling a umad device will always works, even if the device
> was disassociated. Hence, returning EPOLLERR if device was
> disassociated.

Grammar:

    Currently, polling a umad device will always work, even if the device was
    disassociated. A disassociated device should immediately return EPOLLERR
    from poll() and EIO from any read()/write(). Otherwise userspace is
    endlessly hung on poll() with no idea that the device has been removed
    from the system.
    

> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
>  drivers/infiniband/core/user_mad.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
> index b671d4aede77..6681e9cf8a18 100644
> +++ b/drivers/infiniband/core/user_mad.c
> @@ -653,10 +653,14 @@ static __poll_t ib_umad_poll(struct file *filp, struct poll_table_struct *wait)
>  	/* we will always be able to post a MAD send */
>  	__poll_t mask = EPOLLOUT | EPOLLWRNORM;
> 
> +	mutex_lock(&file->mutex);
>  	poll_wait(filp, &file->recv_wait, wait);
> 
>  	if (!list_empty(&file->recv_list))
>  		mask |= EPOLLIN | EPOLLRDNORM;
> +	if (file->agents_dead)
> +		mask = EPOLLERR;

This also needs to trigger the recv_wait when agents_dead is set in
ib_umad_kill_port()

Jason

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-01-12 19:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13 13:29 [PATCH rdma-rc 0/5] Fixes to v5.10 Leon Romanovsky
2020-12-13 13:29 ` [PATCH rdma-rc 1/5] RDMA/mlx5: Fix MR cache memory leak Leon Romanovsky
2020-12-13 13:29 ` [PATCH rdma-rc 2/5] IB/umad: Return EIO in case of when device disassociated Leon Romanovsky
2020-12-13 13:29 ` [PATCH rdma-rc 3/5] IB/umad: Return EPOLLERR " Leon Romanovsky
2021-01-12 19:25   ` Jason Gunthorpe
2020-12-13 13:29 ` [PATCH rdma-rc 4/5] RDMA/cma: Don't overwrite sgid_attr after device is released Leon Romanovsky
2020-12-14 19:26   ` Jason Gunthorpe
2020-12-15  5:23     ` Leon Romanovsky
2020-12-13 13:29 ` [PATCH rdma-rc 5/5] RDMA/ucma: Fix memory leak of connection request Leon Romanovsky
2020-12-14 15:34   ` Jason Gunthorpe
2020-12-14 19:27 ` [PATCH rdma-rc 0/5] Fixes to v5.10 Jason Gunthorpe
2020-12-15  5:25   ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.