Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
* NFSv4.1 backchannel xprt problems.
@ 2019-10-09  5:15 NeilBrown
  2019-10-11 16:56 ` bfields
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2019-10-09  5:15 UTC (permalink / raw)
  To: Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 1441 bytes --]


Hi,
 I have a customer with a 4.12-based kernel who is experiencing memory
 exhaustion.

 There are over 100,000 rpc_rqst structures queue on sv_cb_list for
 handing by the NFSv4 callback, which is idle.

 The rpc_rqst.rq_xprt pointer points to freed memory.

 I notice that that server code calls svc_xprt_get() on the xprt
 before storing it in rq_xprt, but the client/backchannel code doesn't.

 I'm wondering if the following might be useful.

 I plan to explore the code a bit more tomorrow and if this  still seems
 likely I get the customer to test this change, but I thought I would
 ask here as well incase someone more knowledgeable can give me any
 pointers.

Thanks,
NeilBrown

diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
index 339e8c077c2d..c95ca39688b6 100644
--- a/net/sunrpc/backchannel_rqst.c
+++ b/net/sunrpc/backchannel_rqst.c
@@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst *req)
 	free_page((unsigned long)xbufp->head[0].iov_base);
 	xbufp = &req->rq_snd_buf;
 	free_page((unsigned long)xbufp->head[0].iov_base);
+	xprt_put(req->rq_xprt);
 	kfree(req);
 }
 
@@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags)
 	if (req == NULL)
 		return NULL;
 
-	req->rq_xprt = xprt;
+	req->rq_xprt = xprt_get(xprt);
 	INIT_LIST_HEAD(&req->rq_bc_list);
 
 	/* Preallocate one XDR receive buffer */

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFSv4.1 backchannel xprt problems.
  2019-10-09  5:15 NFSv4.1 backchannel xprt problems NeilBrown
@ 2019-10-11 16:56 ` bfields
  2019-10-14 23:36   ` [PATCH] SUNRPC: backchannel RPC request must reference XPRT NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: bfields @ 2019-10-11 16:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Linux NFS Mailing List

On Wed, Oct 09, 2019 at 04:15:04PM +1100, NeilBrown wrote:
>  I have a customer with a 4.12-based kernel who is experiencing memory
>  exhaustion.
> 
>  There are over 100,000 rpc_rqst structures queue on sv_cb_list for
>  handing by the NFSv4 callback, which is idle.
> 
>  The rpc_rqst.rq_xprt pointer points to freed memory.
> 
>  I notice that that server code calls svc_xprt_get() on the xprt
>  before storing it in rq_xprt, but the client/backchannel code doesn't.
> 
>  I'm wondering if the following might be useful.
> 
>  I plan to explore the code a bit more tomorrow and if this  still seems
>  likely I get the customer to test this change, but I thought I would
>  ask here as well incase someone more knowledgeable can give me any
>  pointers.

Looks sensible.  But if I ever understood how this works, I've got no
memory of it now....  Good luck.

> 
> Thanks,
> NeilBrown
> 
> diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
> index 339e8c077c2d..c95ca39688b6 100644
> --- a/net/sunrpc/backchannel_rqst.c
> +++ b/net/sunrpc/backchannel_rqst.c
> @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst *req)
>  	free_page((unsigned long)xbufp->head[0].iov_base);
>  	xbufp = &req->rq_snd_buf;
>  	free_page((unsigned long)xbufp->head[0].iov_base);
> +	xprt_put(req->rq_xprt);
>  	kfree(req);
>  }
>  
> @@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags)
>  	if (req == NULL)
>  		return NULL;
>  
> -	req->rq_xprt = xprt;
> +	req->rq_xprt = xprt_get(xprt);
>  	INIT_LIST_HEAD(&req->rq_bc_list);
>  
>  	/* Preallocate one XDR receive buffer */



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] SUNRPC: backchannel RPC request must reference XPRT
  2019-10-11 16:56 ` bfields
@ 2019-10-14 23:36   ` NeilBrown
  2019-10-15 21:16     ` Trond Myklebust
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2019-10-14 23:36 UTC (permalink / raw)
  To: J. Bruce Fields, Trond Myklebust, Anna Schumaker; +Cc: Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]


The backchannel RPC requests - that are queued waiting
for the reply to be sent by the "NFSv4 callback" thread -
have a pointer to the xprt, but it is not reference counted.
It is possible for the xprt to be freed while there are
still queued requests.

I think this has been a problem since
Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
when the code was introduced, but I suspect it became more of
a problem after
Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple transports")
(or there abouts).
Before this second patch, the nfs client would hold a reference to
the xprt to keep it alive.  After multipath was introduced,
a client holds a reference to a swtich, and the switch can have multiple
xprts which can be added and removed.

I'm not sure of all the causal issues, but this patch has
fixed a customer problem were an NFSv4.1 client would run out
of memory with tens of thousands of backchannel rpc requests
queued for an xprt that had been freed.  This was a 64K-page
machine so each rpc_rqst consumed more than 128K of memory.

Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple transports")
cc: stable@vger.kernel.org (v4.6)
Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/backchannel_rqst.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
index 339e8c077c2d..c95ca39688b6 100644
--- a/net/sunrpc/backchannel_rqst.c
+++ b/net/sunrpc/backchannel_rqst.c
@@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst *req)
 	free_page((unsigned long)xbufp->head[0].iov_base);
 	xbufp = &req->rq_snd_buf;
 	free_page((unsigned long)xbufp->head[0].iov_base);
+	xprt_put(req->rq_xprt);
 	kfree(req);
 }
 
@@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags)
 	if (req == NULL)
 		return NULL;
 
-	req->rq_xprt = xprt;
+	req->rq_xprt = xprt_get(xprt);
 	INIT_LIST_HEAD(&req->rq_bc_list);
 
 	/* Preallocate one XDR receive buffer */
-- 
2.23.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] SUNRPC: backchannel RPC request must reference XPRT
  2019-10-14 23:36   ` [PATCH] SUNRPC: backchannel RPC request must reference XPRT NeilBrown
@ 2019-10-15 21:16     ` Trond Myklebust
  2019-10-15 21:47       ` Chuck Lever
  2019-10-15 23:23       ` NeilBrown
  0 siblings, 2 replies; 8+ messages in thread
From: Trond Myklebust @ 2019-10-15 21:16 UTC (permalink / raw)
  To: bfields, neilb, anna.schumaker; +Cc: linux-nfs

Hi Neil,

On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote:
> The backchannel RPC requests - that are queued waiting
> for the reply to be sent by the "NFSv4 callback" thread -
> have a pointer to the xprt, but it is not reference counted.
> It is possible for the xprt to be freed while there are
> still queued requests.
> 
> I think this has been a problem since
> Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
> when the code was introduced, but I suspect it became more of
> a problem after
> Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
> transports")
> (or there abouts).
> Before this second patch, the nfs client would hold a reference to
> the xprt to keep it alive.  After multipath was introduced,
> a client holds a reference to a swtich, and the switch can have
> multiple
> xprts which can be added and removed.
> 
> I'm not sure of all the causal issues, but this patch has
> fixed a customer problem were an NFSv4.1 client would run out
> of memory with tens of thousands of backchannel rpc requests
> queued for an xprt that had been freed.  This was a 64K-page
> machine so each rpc_rqst consumed more than 128K of memory.
> 
> Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
> transports")
> cc: stable@vger.kernel.org (v4.6)
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  net/sunrpc/backchannel_rqst.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/backchannel_rqst.c
> b/net/sunrpc/backchannel_rqst.c
> index 339e8c077c2d..c95ca39688b6 100644
> --- a/net/sunrpc/backchannel_rqst.c
> +++ b/net/sunrpc/backchannel_rqst.c
> @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst
> *req)
>  	free_page((unsigned long)xbufp->head[0].iov_base);
>  	xbufp = &req->rq_snd_buf;
>  	free_page((unsigned long)xbufp->head[0].iov_base);
> +	xprt_put(req->rq_xprt);
>  	kfree(req);
>  }

Would it perhaps make better sense to move the xprt_get() to
xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()? 

Otherwise as far as I can tell, we will have freed slots on the xprt-
>bc_pa_list that hold a reference to the transport itself, meaning that
the latter never gets released.

>  
> @@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt
> *xprt, gfp_t gfp_flags)
>  	if (req == NULL)
>  		return NULL;
>  
> -	req->rq_xprt = xprt;
> +	req->rq_xprt = xprt_get(xprt);
>  	INIT_LIST_HEAD(&req->rq_bc_list);
>  
>  	/* Preallocate one XDR receive buffer */
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] SUNRPC: backchannel RPC request must reference XPRT
  2019-10-15 21:16     ` Trond Myklebust
@ 2019-10-15 21:47       ` Chuck Lever
  2019-10-15 23:23       ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: Chuck Lever @ 2019-10-15 21:47 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Bruce Fields, Neil Brown, Anna Schumaker, Linux NFS Mailing List



> On Oct 15, 2019, at 5:16 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> Hi Neil,
> 
> On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote:
>> The backchannel RPC requests - that are queued waiting
>> for the reply to be sent by the "NFSv4 callback" thread -
>> have a pointer to the xprt, but it is not reference counted.
>> It is possible for the xprt to be freed while there are
>> still queued requests.
>> 
>> I think this has been a problem since
>> Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
>> when the code was introduced, but I suspect it became more of
>> a problem after
>> Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> transports")
>> (or there abouts).
>> Before this second patch, the nfs client would hold a reference to
>> the xprt to keep it alive.  After multipath was introduced,
>> a client holds a reference to a swtich, and the switch can have
>> multiple
>> xprts which can be added and removed.
>> 
>> I'm not sure of all the causal issues, but this patch has
>> fixed a customer problem were an NFSv4.1 client would run out
>> of memory with tens of thousands of backchannel rpc requests
>> queued for an xprt that had been freed.  This was a 64K-page
>> machine so each rpc_rqst consumed more than 128K of memory.
>> 
>> Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> transports")
>> cc: stable@vger.kernel.org (v4.6)
>> Signed-off-by: NeilBrown <neilb@suse.de>
>> ---
>> net/sunrpc/backchannel_rqst.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/backchannel_rqst.c
>> b/net/sunrpc/backchannel_rqst.c
>> index 339e8c077c2d..c95ca39688b6 100644
>> --- a/net/sunrpc/backchannel_rqst.c
>> +++ b/net/sunrpc/backchannel_rqst.c
>> @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst
>> *req)
>> 	free_page((unsigned long)xbufp->head[0].iov_base);
>> 	xbufp = &req->rq_snd_buf;
>> 	free_page((unsigned long)xbufp->head[0].iov_base);
>> +	xprt_put(req->rq_xprt);
>> 	kfree(req);
>> }
> 
> Would it perhaps make better sense to move the xprt_get() to
> xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()?

/me wonders if the same problem exists for the RPC/RDMA backchannel....


> Otherwise as far as I can tell, we will have freed slots on the xprt-
>> bc_pa_list that hold a reference to the transport itself, meaning that
> the latter never gets released.
> 
>> 
>> @@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt
>> *xprt, gfp_t gfp_flags)
>> 	if (req == NULL)
>> 		return NULL;
>> 
>> -	req->rq_xprt = xprt;
>> +	req->rq_xprt = xprt_get(xprt);
>> 	INIT_LIST_HEAD(&req->rq_bc_list);
>> 
>> 	/* Preallocate one XDR receive buffer */
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

--
Chuck Lever




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] SUNRPC: backchannel RPC request must reference XPRT
  2019-10-15 21:16     ` Trond Myklebust
  2019-10-15 21:47       ` Chuck Lever
@ 2019-10-15 23:23       ` NeilBrown
  2019-10-16  3:04         ` Trond Myklebust
  1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2019-10-15 23:23 UTC (permalink / raw)
  To: Trond Myklebust, bfields\, anna.schumaker\; +Cc: linux-nfs\

[-- Attachment #1: Type: text/plain, Size: 4253 bytes --]

On Tue, Oct 15 2019, Trond Myklebust wrote:

> Hi Neil,
>
> On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote:
>> The backchannel RPC requests - that are queued waiting
>> for the reply to be sent by the "NFSv4 callback" thread -
>> have a pointer to the xprt, but it is not reference counted.
>> It is possible for the xprt to be freed while there are
>> still queued requests.
>> 
>> I think this has been a problem since
>> Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
>> when the code was introduced, but I suspect it became more of
>> a problem after
>> Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> transports")
>> (or there abouts).
>> Before this second patch, the nfs client would hold a reference to
>> the xprt to keep it alive.  After multipath was introduced,
>> a client holds a reference to a swtich, and the switch can have
>> multiple
>> xprts which can be added and removed.
>> 
>> I'm not sure of all the causal issues, but this patch has
>> fixed a customer problem were an NFSv4.1 client would run out
>> of memory with tens of thousands of backchannel rpc requests
>> queued for an xprt that had been freed.  This was a 64K-page
>> machine so each rpc_rqst consumed more than 128K of memory.
>> 
>> Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> transports")
>> cc: stable@vger.kernel.org (v4.6)
>> Signed-off-by: NeilBrown <neilb@suse.de>
>> ---
>>  net/sunrpc/backchannel_rqst.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>> 
>> diff --git a/net/sunrpc/backchannel_rqst.c
>> b/net/sunrpc/backchannel_rqst.c
>> index 339e8c077c2d..c95ca39688b6 100644
>> --- a/net/sunrpc/backchannel_rqst.c
>> +++ b/net/sunrpc/backchannel_rqst.c
>> @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst
>> *req)
>>  	free_page((unsigned long)xbufp->head[0].iov_base);
>>  	xbufp = &req->rq_snd_buf;
>>  	free_page((unsigned long)xbufp->head[0].iov_base);
>> +	xprt_put(req->rq_xprt);
>>  	kfree(req);
>>  }
>
> Would it perhaps make better sense to move the xprt_get() to
> xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()? 

... maybe ...

>
> Otherwise as far as I can tell, we will have freed slots on the xprt-
>>bc_pa_list that hold a reference to the transport itself, meaning that
> the latter never gets released.

Apart from cleanup-on-error paths, xprt_free_allocation() is called:
 - in xprt_destroy_bc() - at most 'max_reqs' times.
 - in xprt_free_bc_rqst(), if the bc_alloc_count >= bc_alloc_max

So when xprt_destroy_bc() is called (from nfs4_destroy_session, via
xprt_destroy_backchannel()), bc_alloc_max() is reduced, and possibly
the requests are freed and bc_alloc_count is reduced accordingly.
If the requests were busy, they won't have been freed then, but will
then be freed when xprt_free_bc_reqst is called - because bc_alloc_max
has been reduced.

Once nfs4_destroy_session() has been called on all session, and
xprt_free_bc_rqst() has been called on all active requests, I think
they should be no requests left to be holding a reference to
the xprt.

And if there were requests left, and if we change the refcount code
(like you suggest) so that they weren't holding a reference, then they
would never be freed. - freeing an xprt doesn't clean out the bc_pa_list.

Though ... the connection from a session to an xprt isn't permanent (I
guess, based on the rcu_read_lock in nfs4_destroy_session... which
itself seems a bit odd as it doesn't inc a refcount while holding the
lock).

So maybe the counts can get out of balance.

Conclusion: I'm not sure the ref counts are entirely correct, but I
   cannot see a benefit from moving the xprt_get/put requests like you
   suggest.

Thanks,
NeilBrown


>
>>  
>> @@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt
>> *xprt, gfp_t gfp_flags)
>>  	if (req == NULL)
>>  		return NULL;
>>  
>> -	req->rq_xprt = xprt;
>> +	req->rq_xprt = xprt_get(xprt);
>>  	INIT_LIST_HEAD(&req->rq_bc_list);
>>  
>>  	/* Preallocate one XDR receive buffer */
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] SUNRPC: backchannel RPC request must reference XPRT
  2019-10-15 23:23       ` NeilBrown
@ 2019-10-16  3:04         ` Trond Myklebust
  2019-10-16  4:38           ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Trond Myklebust @ 2019-10-16  3:04 UTC (permalink / raw)
  To: NeilBrown, bfields, anna.schumaker; +Cc: linux-nfs

On Wed, 2019-10-16 at 10:23 +1100, NeilBrown wrote:
> On Tue, Oct 15 2019, Trond Myklebust wrote:
> 
> > Hi Neil,
> > 
> > On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote:
> > > The backchannel RPC requests - that are queued waiting
> > > for the reply to be sent by the "NFSv4 callback" thread -
> > > have a pointer to the xprt, but it is not reference counted.
> > > It is possible for the xprt to be freed while there are
> > > still queued requests.
> > > 
> > > I think this has been a problem since
> > > Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
> > > when the code was introduced, but I suspect it became more of
> > > a problem after
> > > Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
> > > transports")
> > > (or there abouts).
> > > Before this second patch, the nfs client would hold a reference
> > > to
> > > the xprt to keep it alive.  After multipath was introduced,
> > > a client holds a reference to a swtich, and the switch can have
> > > multiple
> > > xprts which can be added and removed.
> > > 
> > > I'm not sure of all the causal issues, but this patch has
> > > fixed a customer problem were an NFSv4.1 client would run out
> > > of memory with tens of thousands of backchannel rpc requests
> > > queued for an xprt that had been freed.  This was a 64K-page
> > > machine so each rpc_rqst consumed more than 128K of memory.
> > > 
> > > Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
> > > transports")
> > > cc: stable@vger.kernel.org (v4.6)
> > > Signed-off-by: NeilBrown <neilb@suse.de>
> > > ---
> > >  net/sunrpc/backchannel_rqst.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/net/sunrpc/backchannel_rqst.c
> > > b/net/sunrpc/backchannel_rqst.c
> > > index 339e8c077c2d..c95ca39688b6 100644
> > > --- a/net/sunrpc/backchannel_rqst.c
> > > +++ b/net/sunrpc/backchannel_rqst.c
> > > @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct
> > > rpc_rqst
> > > *req)
> > >  	free_page((unsigned long)xbufp->head[0].iov_base);
> > >  	xbufp = &req->rq_snd_buf;
> > >  	free_page((unsigned long)xbufp->head[0].iov_base);
> > > +	xprt_put(req->rq_xprt);
> > >  	kfree(req);
> > >  }
> > 
> > Would it perhaps make better sense to move the xprt_get() to
> > xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()? 
> 
> ... maybe ...
> 
> > Otherwise as far as I can tell, we will have freed slots on the
> > xprt-
> > > bc_pa_list that hold a reference to the transport itself, meaning
> > > that
> > the latter never gets released.
> 
> Apart from cleanup-on-error paths, xprt_free_allocation() is called:
>  - in xprt_destroy_bc() - at most 'max_reqs' times.
>  - in xprt_free_bc_rqst(), if the bc_alloc_count >= bc_alloc_max
> 
> So when xprt_destroy_bc() is called (from nfs4_destroy_session, via
> xprt_destroy_backchannel()), bc_alloc_max() is reduced, and possibly
> the requests are freed and bc_alloc_count is reduced accordingly.
> If the requests were busy, they won't have been freed then, but will
> then be freed when xprt_free_bc_reqst is called - because
> bc_alloc_max
> has been reduced.
> 
> Once nfs4_destroy_session() has been called on all session, and
> xprt_free_bc_rqst() has been called on all active requests, I think
> they should be no requests left to be holding a reference to
> the xprt.
> 
> And if there were requests left, and if we change the refcount code
> (like you suggest) so that they weren't holding a reference, then
> they
> would never be freed. - freeing an xprt doesn't clean out the
> bc_pa_list.
> 
> Though ... the connection from a session to an xprt isn't permanent
> (I
> guess, based on the rcu_read_lock in nfs4_destroy_session... which
> itself seems a bit odd as it doesn't inc a refcount while holding the
> lock).
> 
> So maybe the counts can get out of balance.
> 
> Conclusion: I'm not sure the ref counts are entirely correct, but I
>    cannot see a benefit from moving the xprt_get/put requests like
> you
>    suggest.
> 

I don't buy that conclusion.

Nothing stops me from changing the value of NFS41_BC_MIN_CALLBACKS to
zero. Why should that affect the existence or not of the transport by
changing the number of references held? That parameter is supposed to
determine the number of pre-allocated requests and nothing else.

I do agree with your assessment that destroying the xprt does not
currrently destroy the contents of xprt->bc_pa_list if they are non-
zero, but that would be a bug, and an easy one to fix.


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] SUNRPC: backchannel RPC request must reference XPRT
  2019-10-16  3:04         ` Trond Myklebust
@ 2019-10-16  4:38           ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2019-10-16  4:38 UTC (permalink / raw)
  To: Trond Myklebust, bfields\, anna.schumaker\; +Cc: linux-nfs\

[-- Attachment #1: Type: text/plain, Size: 5697 bytes --]

On Tue, Oct 15 2019, Trond Myklebust wrote:

> On Wed, 2019-10-16 at 10:23 +1100, NeilBrown wrote:
>> On Tue, Oct 15 2019, Trond Myklebust wrote:
>> 
>> > Hi Neil,
>> > 
>> > On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote:
>> > > The backchannel RPC requests - that are queued waiting
>> > > for the reply to be sent by the "NFSv4 callback" thread -
>> > > have a pointer to the xprt, but it is not reference counted.
>> > > It is possible for the xprt to be freed while there are
>> > > still queued requests.
>> > > 
>> > > I think this has been a problem since
>> > > Commit fb7a0b9addbd ("nfs41: New backchannel helper routines")
>> > > when the code was introduced, but I suspect it became more of
>> > > a problem after
>> > > Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> > > transports")
>> > > (or there abouts).
>> > > Before this second patch, the nfs client would hold a reference
>> > > to
>> > > the xprt to keep it alive.  After multipath was introduced,
>> > > a client holds a reference to a swtich, and the switch can have
>> > > multiple
>> > > xprts which can be added and removed.
>> > > 
>> > > I'm not sure of all the causal issues, but this patch has
>> > > fixed a customer problem were an NFSv4.1 client would run out
>> > > of memory with tens of thousands of backchannel rpc requests
>> > > queued for an xprt that had been freed.  This was a 64K-page
>> > > machine so each rpc_rqst consumed more than 128K of memory.
>> > > 
>> > > Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple
>> > > transports")
>> > > cc: stable@vger.kernel.org (v4.6)
>> > > Signed-off-by: NeilBrown <neilb@suse.de>
>> > > ---
>> > >  net/sunrpc/backchannel_rqst.c | 3 ++-
>> > >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > > 
>> > > diff --git a/net/sunrpc/backchannel_rqst.c
>> > > b/net/sunrpc/backchannel_rqst.c
>> > > index 339e8c077c2d..c95ca39688b6 100644
>> > > --- a/net/sunrpc/backchannel_rqst.c
>> > > +++ b/net/sunrpc/backchannel_rqst.c
>> > > @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct
>> > > rpc_rqst
>> > > *req)
>> > >  	free_page((unsigned long)xbufp->head[0].iov_base);
>> > >  	xbufp = &req->rq_snd_buf;
>> > >  	free_page((unsigned long)xbufp->head[0].iov_base);
>> > > +	xprt_put(req->rq_xprt);
>> > >  	kfree(req);
>> > >  }
>> > 
>> > Would it perhaps make better sense to move the xprt_get() to
>> > xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()? 
>> 
>> ... maybe ...
>> 
>> > Otherwise as far as I can tell, we will have freed slots on the
>> > xprt-
>> > > bc_pa_list that hold a reference to the transport itself, meaning
>> > > that
>> > the latter never gets released.
>> 
>> Apart from cleanup-on-error paths, xprt_free_allocation() is called:
>>  - in xprt_destroy_bc() - at most 'max_reqs' times.
>>  - in xprt_free_bc_rqst(), if the bc_alloc_count >= bc_alloc_max
>> 
>> So when xprt_destroy_bc() is called (from nfs4_destroy_session, via
>> xprt_destroy_backchannel()), bc_alloc_max() is reduced, and possibly
>> the requests are freed and bc_alloc_count is reduced accordingly.
>> If the requests were busy, they won't have been freed then, but will
>> then be freed when xprt_free_bc_reqst is called - because
>> bc_alloc_max
>> has been reduced.
>> 
>> Once nfs4_destroy_session() has been called on all session, and
>> xprt_free_bc_rqst() has been called on all active requests, I think
>> they should be no requests left to be holding a reference to
>> the xprt.
>> 
>> And if there were requests left, and if we change the refcount code
>> (like you suggest) so that they weren't holding a reference, then
>> they
>> would never be freed. - freeing an xprt doesn't clean out the
>> bc_pa_list.
>> 
>> Though ... the connection from a session to an xprt isn't permanent
>> (I
>> guess, based on the rcu_read_lock in nfs4_destroy_session... which
>> itself seems a bit odd as it doesn't inc a refcount while holding the
>> lock).
>> 
>> So maybe the counts can get out of balance.
>> 
>> Conclusion: I'm not sure the ref counts are entirely correct, but I
>>    cannot see a benefit from moving the xprt_get/put requests like
>> you
>>    suggest.
>> 
>
> I don't buy that conclusion.
>
> Nothing stops me from changing the value of NFS41_BC_MIN_CALLBACKS to
> zero. Why should that affect the existence or not of the transport by
> changing the number of references held? That parameter is supposed to
> determine the number of pre-allocated requests and nothing else.

If you reduce NFS41_BC_MIN_CALLBACKS to zero, bc_alloc_max won't be
increased and will remain at zero.  So xprt_get_bc_request() will
always need to allocate a new request (bc_pa_list will be empty),
and xprt_free_bc_rqst() will always free the request - never putting it
on the bc_pa_list.

Maybe it would make sense to do exactly that, get rid of bc_pa_list, and
possibly use a mempool if there is any concern about delays when
allocating.

I agree that it is a little odd for requests on the pa list to hold a
reference to the xprt, and I don't object at all to changing that.
My only point is that I think my patch as it stands is correct, that it
doesn't introduce new problems (and so might be the simplest thing for
-stable).

Thanks,
NeilBrown


>
> I do agree with your assessment that destroying the xprt does not
> currrently destroy the contents of xprt->bc_pa_list if they are non-
> zero, but that would be a bug, and an easy one to fix.
>
>
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, back to index

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-09  5:15 NFSv4.1 backchannel xprt problems NeilBrown
2019-10-11 16:56 ` bfields
2019-10-14 23:36   ` [PATCH] SUNRPC: backchannel RPC request must reference XPRT NeilBrown
2019-10-15 21:16     ` Trond Myklebust
2019-10-15 21:47       ` Chuck Lever
2019-10-15 23:23       ` NeilBrown
2019-10-16  3:04         ` Trond Myklebust
2019-10-16  4:38           ` NeilBrown

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git