On Tue, Oct 15 2019, Trond Myklebust wrote: > Hi Neil, > > On Tue, 2019-10-15 at 10:36 +1100, NeilBrown wrote: >> The backchannel RPC requests - that are queued waiting >> for the reply to be sent by the "NFSv4 callback" thread - >> have a pointer to the xprt, but it is not reference counted. >> It is possible for the xprt to be freed while there are >> still queued requests. >> >> I think this has been a problem since >> Commit fb7a0b9addbd ("nfs41: New backchannel helper routines") >> when the code was introduced, but I suspect it became more of >> a problem after >> Commit 80b14d5e61ca ("SUNRPC: Add a structure to track multiple >> transports") >> (or there abouts). >> Before this second patch, the nfs client would hold a reference to >> the xprt to keep it alive. After multipath was introduced, >> a client holds a reference to a swtich, and the switch can have >> multiple >> xprts which can be added and removed. >> >> I'm not sure of all the causal issues, but this patch has >> fixed a customer problem were an NFSv4.1 client would run out >> of memory with tens of thousands of backchannel rpc requests >> queued for an xprt that had been freed. This was a 64K-page >> machine so each rpc_rqst consumed more than 128K of memory. >> >> Fixes: 80b14d5e61ca ("SUNRPC: Add a structure to track multiple >> transports") >> cc: stable@vger.kernel.org (v4.6) >> Signed-off-by: NeilBrown >> --- >> net/sunrpc/backchannel_rqst.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/net/sunrpc/backchannel_rqst.c >> b/net/sunrpc/backchannel_rqst.c >> index 339e8c077c2d..c95ca39688b6 100644 >> --- a/net/sunrpc/backchannel_rqst.c >> +++ b/net/sunrpc/backchannel_rqst.c >> @@ -61,6 +61,7 @@ static void xprt_free_allocation(struct rpc_rqst >> *req) >> free_page((unsigned long)xbufp->head[0].iov_base); >> xbufp = &req->rq_snd_buf; >> free_page((unsigned long)xbufp->head[0].iov_base); >> + xprt_put(req->rq_xprt); >> kfree(req); >> } > > Would it perhaps make better sense to move the xprt_get() to > xprt_lookup_bc_request() and to release it in xprt_free_bc_rqst()? ... maybe ... > > Otherwise as far as I can tell, we will have freed slots on the xprt- >>bc_pa_list that hold a reference to the transport itself, meaning that > the latter never gets released. Apart from cleanup-on-error paths, xprt_free_allocation() is called: - in xprt_destroy_bc() - at most 'max_reqs' times. - in xprt_free_bc_rqst(), if the bc_alloc_count >= bc_alloc_max So when xprt_destroy_bc() is called (from nfs4_destroy_session, via xprt_destroy_backchannel()), bc_alloc_max() is reduced, and possibly the requests are freed and bc_alloc_count is reduced accordingly. If the requests were busy, they won't have been freed then, but will then be freed when xprt_free_bc_reqst is called - because bc_alloc_max has been reduced. Once nfs4_destroy_session() has been called on all session, and xprt_free_bc_rqst() has been called on all active requests, I think they should be no requests left to be holding a reference to the xprt. And if there were requests left, and if we change the refcount code (like you suggest) so that they weren't holding a reference, then they would never be freed. - freeing an xprt doesn't clean out the bc_pa_list. Though ... the connection from a session to an xprt isn't permanent (I guess, based on the rcu_read_lock in nfs4_destroy_session... which itself seems a bit odd as it doesn't inc a refcount while holding the lock). So maybe the counts can get out of balance. Conclusion: I'm not sure the ref counts are entirely correct, but I cannot see a benefit from moving the xprt_get/put requests like you suggest. Thanks, NeilBrown > >> >> @@ -85,7 +86,7 @@ struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt >> *xprt, gfp_t gfp_flags) >> if (req == NULL) >> return NULL; >> >> - req->rq_xprt = xprt; >> + req->rq_xprt = xprt_get(xprt); >> INIT_LIST_HEAD(&req->rq_bc_list); >> >> /* Preallocate one XDR receive buffer */ > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com