All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Leon Romanovsky <leon@kernel.org>
Cc: Doug Ledford <dledford@redhat.com>,
	Yishai Hadas <yishaih@mellanox.com>,
	RDMA mailing list <linux-rdma@vger.kernel.org>,
	Artemy Kovalyov <artemyko@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>
Subject: Re: [PATCH rdma-rc] IB/mlx5: Fix implicit MR release flow
Date: Wed, 7 Aug 2019 15:37:36 -0300	[thread overview]
Message-ID: <20190807183736.GA26235@ziepe.ca> (raw)
In-Reply-To: <20190805083010.21777-1-leon@kernel.org>

On Mon, Aug 05, 2019 at 11:30:10AM +0300, Leon Romanovsky wrote:
> From: Yishai Hadas <yishaih@mellanox.com>
> 
> Once implicit MR is being called to be released by
> ib_umem_notifier_release() its leaves were marked as "dying".
> 
> However, when dereg_mr()->mlx5_ib_free_implicit_mr()->mr_leaf_free() is
> called, it skips running the mr_leaf_free_action (i.e. umem_odp->work)
> when those leaves were marked as "dying".
> 
> As such ib_umem_release() for the leaves won't be called and their MRs
> will be leaked as well.
> 
> When an application exits/killed without calling dereg_mr we might hit
> the above flow.
> 
> This fatal scenario is reported by WARN_ON() upon
> mlx5_ib_dealloc_ucontext() as ibcontext->per_mm_list is not empty, the
> call trace can be seen below.
> 
> Originally the "dying" mark as part of ib_umem_notifier_release() was
> introduced to prevent pagefault_mr() from returning a success response
> once this happened. However, we already have today the completion
> mechanism so no need for that in those flows any more.  Even in case a
> success response will be returned the firmware will not find the pages
> and an error will be returned in the following call as a released mm
> will cause ib_umem_odp_map_dma_pages() to permanently fail
> mmget_not_zero().
> 
> Fix the above issue by dropping the "dying" from the above flows.  The
> other flows that are using "dying" are still needed it for their
> synchronization purposes.
> 
> WARNING: CPU: 1 PID: 7218 at
> drivers/infiniband/hw/mlx5/main.c:2004
>                mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
> CPU: 1 PID: 7218 Comm: ibv_rc_pingpong Tainted: G     E
>             5.2.0-rc6+ #13
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> RIP: 0010:mlx5_ib_dealloc_ucontext+0x84/0x90 [mlx5_ib]
> Code: 8d bd e8 09 00 00 48 89 de e8 58 a1 ff ff 48 8b bb
>       c8 00 00 00 e8 ec 8b 3a c9 48 8b bb d8 00 00 00 5b 5d 41
>       5c e9 dc 8b 3a c9 <0f> 0b eb a0 0f 1f 84 00 00 00 00 00
>       66 66 66 66 90 41 57 b9 09 00
> RSP: 0018:ffffb8e4c0adbc48 EFLAGS: 00010297
> RAX: ffff9e1a791a65b8 RBX: ffff9e1a643c1e00 RCX:
>      0000000000000000
> RDX: ffff9e1a643c1e40 RSI: 0000000000000246 RDI:
>      ffff9e1a643c1e20
> RBP: ffff9e1a75b70000 R08: 0000000000000000 R09:
>      ffff9e1a643c1e50
> R10: 0000000000000000 R11: 0000000000000001 R12:
>      ffff9e1a643c1e20
> R13: ffff9e1a5da6bc10 R14: ffff9e1a5da6bc70 R15:
>      ffff9e1a75b70000
> FS:  00007ff61835d740(0000) GS:ffff9e1a7bb00000(0000)
>      knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f9e6ac34000 CR3: 000000011e41e000 CR4:
>      00000000000006e0
> Call Trace:
> uverbs_destroy_ufile_hw+0xb5/0x120 [ib_uverbs]
> ib_uverbs_close+0x1f/0x80 [ib_uverbs]
> __fput+0xbe/0x250
> task_work_run+0x88/0xa0
> do_exit+0x2cb/0xc30
> ? __fput+0x14b/0x250
> do_group_exit+0x39/0xb0
> get_signal+0x191/0x920
> ? _raw_spin_unlock_bh+0xa/0x20
> ? inet_csk_accept+0x229/0x2f0
> do_signal+0x36/0x5e0
> ? put_unused_fd+0x5b/0x70
> ? __sys_accept4+0x1a6/0x1e0
> ? inet_hash+0x35/0x40
> ? release_sock+0x43/0x90
> ? _raw_spin_unlock_bh+0xa/0x20
> ? inet_listen+0x9f/0x120
> exit_to_usermode_loop+0x5c/0xc6
> do_syscall_64+0x182/0x1b0
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> RIP: 0033:0x7ff617c807d0
> Code: Bad RIP value.
> RSP: 002b:00007ffd1f4f7c68 EFLAGS: 00000246 ORIG_RAX:
>      000000000000002b
> RAX: fffffffffffffe00 RBX: 00007ffd1f4f7dd0 RCX:
>      00007ff617c807d0
> RDX: 0000000000000000 RSI: 0000000000000000 RDI:
>      0000000000000005
> RBP: 00007ffd1f4f7fd0 R08: 0000000000000000 R09:
>      0000000001327f50
> R10: 00007ffd1f4f7830 R11: 0000000000000246 R12:
>      0000000001327600
> R13: 00007ffd1f4f7e10 R14: 0000000001327fb0 R15:
>      0000000000000005
> [ end trace 4fa29cb158fefa46 ]
> 
> Fixes: 81713d3788d2 ("IB/mlx5: Add implicit MR support")
> Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
> Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/core/umem_odp.c |  4 ----
>  drivers/infiniband/hw/mlx5/odp.c   | 23 ++++++++---------------
>  2 files changed, 8 insertions(+), 19 deletions(-)

Applied to for-rc

Thanks,
Jason

      reply	other threads:[~2019-08-07 18:37 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-05  8:30 [PATCH rdma-rc] IB/mlx5: Fix implicit MR release flow Leon Romanovsky
2019-08-07 18:37 ` Jason Gunthorpe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190807183736.GA26235@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=artemyko@mellanox.com \
    --cc=dledford@redhat.com \
    --cc=leon@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=yishaih@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.