linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] xprtrdma: make sure MRs are unmapped before freeing them
@ 2020-08-14 17:37 Dan Aloni
  2020-08-14 18:12 ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Aloni @ 2020-08-14 17:37 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, linux-nfs

It was observed that on disconnections, these unmaps don't occur. The
relevant path is rpcrdma_mrs_destroy(), being called from
rpcrdma_xprt_disconnect().

Signed-off-by: Dan Aloni <dan@kernelim.com>
---
 net/sunrpc/xprtrdma/frwr_ops.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
index 7f94c9a19fd3..3899a5310214 100644
--- a/net/sunrpc/xprtrdma/frwr_ops.c
+++ b/net/sunrpc/xprtrdma/frwr_ops.c
@@ -49,6 +49,19 @@
 # define RPCDBG_FACILITY	RPCDBG_TRANS
 #endif
 
+static void frwr_mr_unmap(struct rpcrdma_mr *mr)
+{
+	struct rpcrdma_xprt *r_xprt = mr->mr_xprt;
+
+	if (mr->mr_dir == DMA_NONE)
+		return;
+
+	trace_xprtrdma_mr_unmap(mr);
+	ib_dma_unmap_sg(r_xprt->rx_ep->re_id->device,
+			mr->mr_sg, mr->mr_nents, mr->mr_dir);
+	mr->mr_dir = DMA_NONE;
+}
+
 /**
  * frwr_release_mr - Destroy one MR
  * @mr: MR allocated by frwr_mr_init
@@ -58,6 +71,8 @@ void frwr_release_mr(struct rpcrdma_mr *mr)
 {
 	int rc;
 
+	frwr_mr_unmap(mr);
+
 	rc = ib_dereg_mr(mr->frwr.fr_mr);
 	if (rc)
 		trace_xprtrdma_frwr_dereg(mr, rc);
@@ -71,12 +86,7 @@ static void frwr_mr_recycle(struct rpcrdma_mr *mr)
 
 	trace_xprtrdma_mr_recycle(mr);
 
-	if (mr->mr_dir != DMA_NONE) {
-		trace_xprtrdma_mr_unmap(mr);
-		ib_dma_unmap_sg(r_xprt->rx_ep->re_id->device,
-				mr->mr_sg, mr->mr_nents, mr->mr_dir);
-		mr->mr_dir = DMA_NONE;
-	}
+	frwr_mr_unmap(mr);
 
 	spin_lock(&r_xprt->rx_buf.rb_lock);
 	list_del(&mr->mr_all);
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] xprtrdma: make sure MRs are unmapped before freeing them
  2020-08-14 17:37 [PATCH] xprtrdma: make sure MRs are unmapped before freeing them Dan Aloni
@ 2020-08-14 18:12 ` Chuck Lever
  2020-08-14 19:10   ` Dan Aloni
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2020-08-14 18:12 UTC (permalink / raw)
  To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List

Hi Dan-

> On Aug 14, 2020, at 1:37 PM, Dan Aloni <dan@kernelim.com> wrote:
> 
> It was observed that on disconnections, these unmaps don't occur. The
> relevant path is rpcrdma_mrs_destroy(), being called from
> rpcrdma_xprt_disconnect().

MRs are supposed to be unmapped right after they are used, so
during disconnect they should all be unmapped already. How often
do you see a DMA mapped MR in this code path? Do you have a
reproducer I can try?


> Signed-off-by: Dan Aloni <dan@kernelim.com>
> ---
> net/sunrpc/xprtrdma/frwr_ops.c | 22 ++++++++++++++++------
> 1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c
> index 7f94c9a19fd3..3899a5310214 100644
> --- a/net/sunrpc/xprtrdma/frwr_ops.c
> +++ b/net/sunrpc/xprtrdma/frwr_ops.c
> @@ -49,6 +49,19 @@
> # define RPCDBG_FACILITY	RPCDBG_TRANS
> #endif
> 
> +static void frwr_mr_unmap(struct rpcrdma_mr *mr)
> +{
> +	struct rpcrdma_xprt *r_xprt = mr->mr_xprt;
> +
> +	if (mr->mr_dir == DMA_NONE)
> +		return;
> +
> +	trace_xprtrdma_mr_unmap(mr);
> +	ib_dma_unmap_sg(r_xprt->rx_ep->re_id->device,
> +			mr->mr_sg, mr->mr_nents, mr->mr_dir);
> +	mr->mr_dir = DMA_NONE;
> +}
> +
> /**
>  * frwr_release_mr - Destroy one MR
>  * @mr: MR allocated by frwr_mr_init
> @@ -58,6 +71,8 @@ void frwr_release_mr(struct rpcrdma_mr *mr)
> {
> 	int rc;
> 
> +	frwr_mr_unmap(mr);
> +
> 	rc = ib_dereg_mr(mr->frwr.fr_mr);
> 	if (rc)
> 		trace_xprtrdma_frwr_dereg(mr, rc);
> @@ -71,12 +86,7 @@ static void frwr_mr_recycle(struct rpcrdma_mr *mr)
> 
> 	trace_xprtrdma_mr_recycle(mr);
> 
> -	if (mr->mr_dir != DMA_NONE) {
> -		trace_xprtrdma_mr_unmap(mr);
> -		ib_dma_unmap_sg(r_xprt->rx_ep->re_id->device,
> -				mr->mr_sg, mr->mr_nents, mr->mr_dir);
> -		mr->mr_dir = DMA_NONE;
> -	}
> +	frwr_mr_unmap(mr);
> 
> 	spin_lock(&r_xprt->rx_buf.rb_lock);
> 	list_del(&mr->mr_all);
> -- 
> 2.25.4
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xprtrdma: make sure MRs are unmapped before freeing them
  2020-08-14 18:12 ` Chuck Lever
@ 2020-08-14 19:10   ` Dan Aloni
  2020-08-14 20:21     ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Aloni @ 2020-08-14 19:10 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List

On Fri, Aug 14, 2020 at 02:12:48PM -0400, Chuck Lever wrote:
> Hi Dan-
> 
> > On Aug 14, 2020, at 1:37 PM, Dan Aloni <dan@kernelim.com> wrote:
> > 
> > It was observed that on disconnections, these unmaps don't occur. The
> > relevant path is rpcrdma_mrs_destroy(), being called from
> > rpcrdma_xprt_disconnect().
> 
> MRs are supposed to be unmapped right after they are used, so
> during disconnect they should all be unmapped already. How often
> do you see a DMA mapped MR in this code path? Do you have a
> reproducer I can try?

These are not graceful disconnections but abnormal ones, where many large
IOs are still in flight, while the remote server suddenly breaks the
connection, the remote IP is still reachable but refusing to accept new
connections only for a few seconds.

We may also need reconnection attempts in the background trying to
recover the xprt, so that with short reconnect timeouts it may be enough
for xprt_rdma_close() to be triggered from xprt_rdma_connect_worker(),
leading up to rpcrdma_xprt_disconnect().

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xprtrdma: make sure MRs are unmapped before freeing them
  2020-08-14 19:10   ` Dan Aloni
@ 2020-08-14 20:21     ` Chuck Lever
  2020-08-15  5:45       ` Dan Aloni
  0 siblings, 1 reply; 6+ messages in thread
From: Chuck Lever @ 2020-08-14 20:21 UTC (permalink / raw)
  To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List



> On Aug 14, 2020, at 3:10 PM, Dan Aloni <dan@kernelim.com> wrote:
> 
> On Fri, Aug 14, 2020 at 02:12:48PM -0400, Chuck Lever wrote:
>> Hi Dan-
>> 
>>> On Aug 14, 2020, at 1:37 PM, Dan Aloni <dan@kernelim.com> wrote:
>>> 
>>> It was observed that on disconnections, these unmaps don't occur. The
>>> relevant path is rpcrdma_mrs_destroy(), being called from
>>> rpcrdma_xprt_disconnect().
>> 
>> MRs are supposed to be unmapped right after they are used, so
>> during disconnect they should all be unmapped already. How often
>> do you see a DMA mapped MR in this code path? Do you have a
>> reproducer I can try?
> 
> These are not graceful disconnections but abnormal ones, where many large
> IOs are still in flight, while the remote server suddenly breaks the
> connection, the remote IP is still reachable but refusing to accept new
> connections only for a few seconds.

Ideally that's not supposed to matter. I'll see if I can reproduce
with my usual tricks.

Why is your server behaving this way?


> We may also need reconnection attempts in the background trying to
> recover the xprt, so that with short reconnect timeouts it may be enough
> for xprt_rdma_close() to be triggered from xprt_rdma_connect_worker(),
> leading up to rpcrdma_xprt_disconnect().

--
Chuck Lever




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xprtrdma: make sure MRs are unmapped before freeing them
  2020-08-14 20:21     ` Chuck Lever
@ 2020-08-15  5:45       ` Dan Aloni
  2020-08-16 22:28         ` Chuck Lever
  0 siblings, 1 reply; 6+ messages in thread
From: Dan Aloni @ 2020-08-15  5:45 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma, Linux NFS Mailing List

On Fri, Aug 14, 2020 at 04:21:54PM -0400, Chuck Lever wrote:
> 
> 
> > On Aug 14, 2020, at 3:10 PM, Dan Aloni <dan@kernelim.com> wrote:
> > 
> > On Fri, Aug 14, 2020 at 02:12:48PM -0400, Chuck Lever wrote:
> >> Hi Dan-
> >> 
> >>> On Aug 14, 2020, at 1:37 PM, Dan Aloni <dan@kernelim.com> wrote:
> >>> 
> >>> It was observed that on disconnections, these unmaps don't occur. The
> >>> relevant path is rpcrdma_mrs_destroy(), being called from
> >>> rpcrdma_xprt_disconnect().
> >> 
> >> MRs are supposed to be unmapped right after they are used, so
> >> during disconnect they should all be unmapped already. How often
> >> do you see a DMA mapped MR in this code path? Do you have a
> >> reproducer I can try?
> > 
> > These are not graceful disconnections but abnormal ones, where many large
> > IOs are still in flight, while the remote server suddenly breaks the
> > connection, the remote IP is still reachable but refusing to accept new
> > connections only for a few seconds.
> 
> Ideally that's not supposed to matter. I'll see if I can reproduce
> with my usual tricks.
> 
> Why is your server behaving this way?

It's a dedicated storage cluster under a specific testing scenario,
implementing floating IPs.  Haven't tried, but maybe the same scenario
can be reproduced with a standard single Linux NFSv3 server by fiddling
with nfsd open ports.

-- 
Dan Aloni

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xprtrdma: make sure MRs are unmapped before freeing them
  2020-08-15  5:45       ` Dan Aloni
@ 2020-08-16 22:28         ` Chuck Lever
  0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2020-08-16 22:28 UTC (permalink / raw)
  To: Dan Aloni; +Cc: linux-rdma, Linux NFS Mailing List



> On Aug 15, 2020, at 1:45 AM, Dan Aloni <dan@kernelim.com> wrote:
> 
> On Fri, Aug 14, 2020 at 04:21:54PM -0400, Chuck Lever wrote:
>> 
>> 
>>> On Aug 14, 2020, at 3:10 PM, Dan Aloni <dan@kernelim.com> wrote:
>>> 
>>> On Fri, Aug 14, 2020 at 02:12:48PM -0400, Chuck Lever wrote:
>>>> Hi Dan-
>>>> 
>>>>> On Aug 14, 2020, at 1:37 PM, Dan Aloni <dan@kernelim.com> wrote:
>>>>> 
>>>>> It was observed that on disconnections, these unmaps don't occur. The
>>>>> relevant path is rpcrdma_mrs_destroy(), being called from
>>>>> rpcrdma_xprt_disconnect().
>>>> 
>>>> MRs are supposed to be unmapped right after they are used, so
>>>> during disconnect they should all be unmapped already. How often
>>>> do you see a DMA mapped MR in this code path? Do you have a
>>>> reproducer I can try?
>>> 
>>> These are not graceful disconnections but abnormal ones, where many large
>>> IOs are still in flight, while the remote server suddenly breaks the
>>> connection, the remote IP is still reachable but refusing to accept new
>>> connections only for a few seconds.
>> 
>> Ideally that's not supposed to matter. I'll see if I can reproduce
>> with my usual tricks.
>> 
>> Why is your server behaving this way?
> 
> It's a dedicated storage cluster under a specific testing scenario,
> implementing floating IPs.  Haven't tried, but maybe the same scenario
> can be reproduced with a standard single Linux NFSv3 server by fiddling
> with nfsd open ports.

Hi Dan, I was able to reproduce the DMA-map leak with a simple server-side
disconnect injection test. I'll try some root cause analysis tomorrow.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-08-16 22:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-14 17:37 [PATCH] xprtrdma: make sure MRs are unmapped before freeing them Dan Aloni
2020-08-14 18:12 ` Chuck Lever
2020-08-14 19:10   ` Dan Aloni
2020-08-14 20:21     ` Chuck Lever
2020-08-15  5:45       ` Dan Aloni
2020-08-16 22:28         ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).