linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/2] Fix request deferral for NFS/RDMA
@ 2022-04-07 17:43 Chuck Lever
  2022-04-07 17:43 ` [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports Chuck Lever
  2022-04-07 17:43 ` [PATCH v3 2/2] SUNRPC: Fix the svc_deferred_event trace class Chuck Lever
  0 siblings, 2 replies; 6+ messages in thread
From: Chuck Lever @ 2022-04-07 17:43 UTC (permalink / raw)
  To: linux-nfs; +Cc: rostedt

Last week Trond reported a server crash in svc_rdma_sendto() that
seemed related to deferral of a request incoming on an RPC/RDMA
transport. A link to that report appears in the first patch below.

The first patch in this series addresses the crash, and the second
fixes a related observability crash. The other patches from v1
of this series have been postponed to 5.19.

Comments, opinions, and Reviewed-by's are appreciated!


Since v2:
- Fix "has unsafe dereference" warnings in patch 2/2

Since v1:
- Series trimmed down to just what will go to 5.18-rc
- 2nd patch re-worked so it can be backported to recent stable
- Addressed a few checkpatch.pl nits

---

Chuck Lever (2):
      SUNRPC: Fix NFSD's request deferral on RDMA transports
      SUNRPC: Fix the svc_deferred_event trace class


 include/linux/sunrpc/svc.h              | 1 +
 include/trace/events/sunrpc.h           | 7 ++++---
 net/sunrpc/svc_xprt.c                   | 3 +++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 2 +-
 4 files changed, 9 insertions(+), 4 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports
  2022-04-07 17:43 [PATCH v3 0/2] Fix request deferral for NFS/RDMA Chuck Lever
@ 2022-04-07 17:43 ` Chuck Lever
  2022-04-07 20:31   ` Trond Myklebust
  2023-05-08  0:20   ` NeilBrown
  2022-04-07 17:43 ` [PATCH v3 2/2] SUNRPC: Fix the svc_deferred_event trace class Chuck Lever
  1 sibling, 2 replies; 6+ messages in thread
From: Chuck Lever @ 2022-04-07 17:43 UTC (permalink / raw)
  To: linux-nfs; +Cc: rostedt

Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
investigation shows that the crash occurred while NFSD was handling
a deferred request.

This patch addresses two inter-related issues that prevent request
deferral from working correctly for RPC/RDMA requests:

1. Prevent the crash by ensuring that the original
   svc_rqst::rq_xprt_ctxt value is available when the request is
   revisited. Otherwise svc_rdma_sendto() does not have a Receive
   context available with which to construct its reply.

2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
   compute .len and .buflen for received RPC Calls"),
   svc_rdma_recvfrom() did not include the transport header in the
   returned xdr_buf. There should have been no need for svc_defer()
   and friends to save and restore that header, as of that commit.
   This issue is addressed in a backport-friendly way by simply
   having svc_rdma_recvfrom() set rq_xprt_hlen to zero
   unconditionally, just as svc_tcp_recvfrom() does. This enables
   svc_deferred_recv() to correctly reconstruct an RPC message
   received via RPC/RDMA.

Reported-by: Trond Myklebust <trondmy@hammerspace.com>
Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org>
---
 include/linux/sunrpc/svc.h              |    1 +
 net/sunrpc/svc_xprt.c                   |    3 +++
 net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index a5dda4987e8b..217711fc9cac 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -395,6 +395,7 @@ struct svc_deferred_req {
 	size_t			addrlen;
 	struct sockaddr_storage	daddr;	/* where reply must come from */
 	size_t			daddrlen;
+	void			*xprt_ctxt;
 	struct cache_deferred_req handle;
 	size_t			xprt_hlen;
 	int			argslen;
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 0c117d3bfda8..b42cfffa7395 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -1231,6 +1231,8 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
 		dr->daddr = rqstp->rq_daddr;
 		dr->argslen = rqstp->rq_arg.len >> 2;
 		dr->xprt_hlen = rqstp->rq_xprt_hlen;
+		dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
+		rqstp->rq_xprt_ctxt = NULL;
 
 		/* back up head to the start of the buffer and copy */
 		skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
@@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp)
 	rqstp->rq_xprt_hlen   = dr->xprt_hlen;
 	rqstp->rq_daddr       = dr->daddr;
 	rqstp->rq_respages    = rqstp->rq_pages;
+	rqstp->rq_xprt_ctxt   = dr->xprt_ctxt;
 	svc_xprt_received(rqstp->rq_xprt);
 	return (dr->argslen<<2) - dr->xprt_hlen;
 }
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index cf76a6ad127b..864131a9fc6e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
 		goto out_err;
 	if (ret == 0)
 		goto out_drop;
-	rqstp->rq_xprt_hlen = ret;
+	rqstp->rq_xprt_hlen = 0;
 
 	if (svc_rdma_is_reverse_direction_reply(xprt, ctxt))
 		goto out_backchannel;



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v3 2/2] SUNRPC: Fix the svc_deferred_event trace class
  2022-04-07 17:43 [PATCH v3 0/2] Fix request deferral for NFS/RDMA Chuck Lever
  2022-04-07 17:43 ` [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports Chuck Lever
@ 2022-04-07 17:43 ` Chuck Lever
  1 sibling, 0 replies; 6+ messages in thread
From: Chuck Lever @ 2022-04-07 17:43 UTC (permalink / raw)
  To: linux-nfs; +Cc: rostedt

Fix a NULL deref crash that occurs when an svc_rqst is deferred
while the sunrpc tracing subsystem is enabled. svc_revisit() sets
dr->xprt to NULL, so it can't be relied upon in the tracepoint to
provide the remote's address.

Unfortunately we can't revert the "svc_deferred_class" hunk in
commit ece200ddd54b ("sunrpc: Save remote presentation address in
svc_xprt for trace events") because there is now a specific check
of event format specifiers for unsafe dereferences. The warning
that check emits is:

  event svc_defer_recv has unsafe dereference of argument 1

A "%pISpc" format specifier with a "struct sockaddr *" is indeed
flagged by this check.

Instead, take the brute-force approach used by the svcrdma_qp_error
tracepoint. Convert the dr::addr field into a presentation address
in the TP_fast_assign() arm of the trace event, and store that as
a string. This fix can be backported to -stable kernels.

In the meantime, commit c6ced22997ad ("tracing: Update print fmt
check to handle new __get_sockaddr() macro") is now in v5.18, so
this wonky fix can be replaced with __sockaddr() and friends
properly during the v5.19 merge window.

Fixes: ece200ddd54b ("sunrpc: Save remote presentation address in svc_xprt for trace events")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/trace/events/sunrpc.h |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index ab8ae1f6ba84..4eb706fa5825 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -2017,17 +2017,18 @@ DECLARE_EVENT_CLASS(svc_deferred_event,
 	TP_STRUCT__entry(
 		__field(const void *, dr)
 		__field(u32, xid)
-		__string(addr, dr->xprt->xpt_remotebuf)
+		__array(__u8, addr, INET6_ADDRSTRLEN + 10)
 	),
 
 	TP_fast_assign(
 		__entry->dr = dr;
 		__entry->xid = be32_to_cpu(*(__be32 *)(dr->args +
 						       (dr->xprt_hlen>>2)));
-		__assign_str(addr, dr->xprt->xpt_remotebuf);
+		snprintf(__entry->addr, sizeof(__entry->addr) - 1,
+			 "%pISpc", (struct sockaddr *)&dr->addr);
 	),
 
-	TP_printk("addr=%s dr=%p xid=0x%08x", __get_str(addr), __entry->dr,
+	TP_printk("addr=%s dr=%p xid=0x%08x", __entry->addr, __entry->dr,
 		__entry->xid)
 );
 



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports
  2022-04-07 17:43 ` [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports Chuck Lever
@ 2022-04-07 20:31   ` Trond Myklebust
  2023-05-08  0:20   ` NeilBrown
  1 sibling, 0 replies; 6+ messages in thread
From: Trond Myklebust @ 2022-04-07 20:31 UTC (permalink / raw)
  To: linux-nfs, chuck.lever; +Cc: rostedt

On Thu, 2022-04-07 at 13:43 -0400, Chuck Lever wrote:
> Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
> investigation shows that the crash occurred while NFSD was handling
> a deferred request.
> 
> This patch addresses two inter-related issues that prevent request
> deferral from working correctly for RPC/RDMA requests:
> 
> 1. Prevent the crash by ensuring that the original
>    svc_rqst::rq_xprt_ctxt value is available when the request is
>    revisited. Otherwise svc_rdma_sendto() does not have a Receive
>    context available with which to construct its reply.
> 
> 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
>    compute .len and .buflen for received RPC Calls"),
>    svc_rdma_recvfrom() did not include the transport header in the
>    returned xdr_buf. There should have been no need for svc_defer()
>    and friends to save and restore that header, as of that commit.
>    This issue is addressed in a backport-friendly way by simply
>    having svc_rdma_recvfrom() set rq_xprt_hlen to zero
>    unconditionally, just as svc_tcp_recvfrom() does. This enables
>    svc_deferred_recv() to correctly reconstruct an RPC message
>    received via RPC/RDMA.
> 
> Reported-by: Trond Myklebust <trondmy@hammerspace.com>
> Link:
> https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
>  include/linux/sunrpc/svc.h              |    1 +
>  net/sunrpc/svc_xprt.c                   |    3 +++
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index a5dda4987e8b..217711fc9cac 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -395,6 +395,7 @@ struct svc_deferred_req {
>         size_t                  addrlen;
>         struct sockaddr_storage daddr;  /* where reply must come from
> */
>         size_t                  daddrlen;
> +       void                    *xprt_ctxt;
>         struct cache_deferred_req handle;
>         size_t                  xprt_hlen;
>         int                     argslen;
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 0c117d3bfda8..b42cfffa7395 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -1231,6 +1231,8 @@ static struct cache_deferred_req
> *svc_defer(struct cache_req *req)
>                 dr->daddr = rqstp->rq_daddr;
>                 dr->argslen = rqstp->rq_arg.len >> 2;
>                 dr->xprt_hlen = rqstp->rq_xprt_hlen;
> +               dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
> +               rqstp->rq_xprt_ctxt = NULL;
>  
>                 /* back up head to the start of the buffer and copy
> */
>                 skip = rqstp->rq_arg.len - rqstp-
> >rq_arg.head[0].iov_len;
> @@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct
> svc_rqst *rqstp)
>         rqstp->rq_xprt_hlen   = dr->xprt_hlen;
>         rqstp->rq_daddr       = dr->daddr;
>         rqstp->rq_respages    = rqstp->rq_pages;
> +       rqstp->rq_xprt_ctxt   = dr->xprt_ctxt;
>         svc_xprt_received(rqstp->rq_xprt);
>         return (dr->argslen<<2) - dr->xprt_hlen;
>  }
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index cf76a6ad127b..864131a9fc6e 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>                 goto out_err;
>         if (ret == 0)
>                 goto out_drop;
> -       rqstp->rq_xprt_hlen = ret;
> +       rqstp->rq_xprt_hlen = 0;
>  
>         if (svc_rdma_is_reverse_direction_reply(xprt, ctxt))
>                 goto out_backchannel;
> 
> 

Thanks for the patch, Chuck!

Given how there doesn't appear to be any possible alternative ways to
generate that particular stack trace, I'm fairly sure this will fix the
issue we saw, but I'll make sure we give it a thorough testing.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports
  2022-04-07 17:43 ` [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports Chuck Lever
  2022-04-07 20:31   ` Trond Myklebust
@ 2023-05-08  0:20   ` NeilBrown
  2023-05-08  4:10     ` Chuck Lever III
  1 sibling, 1 reply; 6+ messages in thread
From: NeilBrown @ 2023-05-08  0:20 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs, rostedt

On Fri, 08 Apr 2022, Chuck Lever wrote:
> Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
> investigation shows that the crash occurred while NFSD was handling
> a deferred request.
> 
> This patch addresses two inter-related issues that prevent request
> deferral from working correctly for RPC/RDMA requests:
> 
> 1. Prevent the crash by ensuring that the original
>    svc_rqst::rq_xprt_ctxt value is available when the request is
>    revisited. Otherwise svc_rdma_sendto() does not have a Receive
>    context available with which to construct its reply.
> 
> 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
>    compute .len and .buflen for received RPC Calls"),
>    svc_rdma_recvfrom() did not include the transport header in the
>    returned xdr_buf. There should have been no need for svc_defer()
>    and friends to save and restore that header, as of that commit.
>    This issue is addressed in a backport-friendly way by simply
>    having svc_rdma_recvfrom() set rq_xprt_hlen to zero
>    unconditionally, just as svc_tcp_recvfrom() does. This enables
>    svc_deferred_recv() to correctly reconstruct an RPC message
>    received via RPC/RDMA.

I'm a bit late to this party but .... this patch is bad and I don't know
how best to fix it.
It is bad for two reasons.
1/ It can leak memory.  When a deferral happens, the context is moved
   into an svc_deferred_req.  There are a couple of places which assume
   that an svc_deferred_req can be freed with kfree(), such as
   svc_delete_xprt() and svc_revisit().  These will now leak the
   context.  This is the bit that is hard to fix.

2/ It can cause a double-free with UDP (and maybe rdma).
   When a request is deferred, the ctxt is moved to the dreq.
   When that request is revisited the ctxt is copied back into the rqst.
   If the rqst is again deferred then the old dreq is reused and,
   importantly, the rq_xprt_ctxt is not cleared.  So when the release
   function is called the ctxt is freed.
   When the request is revisited a second time that ctxt (now pointing
   to freed and possibly reused memory) is copied back into the rqst.
   When the request completes the ctxt is again freed - double free
   oops.

For UDP there is no value in saving the ctxt in the dreq - the content
of the ctxt, which is an skbuf, has been copied into the dreq.  So maybe
the UDB ctxt is a very different beast than the rdma ctxt and needs to
be handled completely differently.

We can fix the UDP double-free by always doing
		rqstp->rq_xprt_ctxt = NULL;
whether a new dreq was needed or not.  But that doesn't fix the leaking
of ctxts and is really just a band-aid.

Thoughts?
Do we need to preserve ALL of the svc_rdma_recv_ctxt for deferred
requests?  Could we just copy some bits into the dreq (allocate a bit
more space somehow) so that a simple kfree() is still enough?
Or do we need a free_ctxt() handler attached to the xprt?

Thanks,
NeilBrown




> 
> Reported-by: Trond Myklebust <trondmy@hammerspace.com>
> Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Cc: <stable@vger.kernel.org>
> ---
>  include/linux/sunrpc/svc.h              |    1 +
>  net/sunrpc/svc_xprt.c                   |    3 +++
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
>  3 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index a5dda4987e8b..217711fc9cac 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -395,6 +395,7 @@ struct svc_deferred_req {
>  	size_t			addrlen;
>  	struct sockaddr_storage	daddr;	/* where reply must come from */
>  	size_t			daddrlen;
> +	void			*xprt_ctxt;
>  	struct cache_deferred_req handle;
>  	size_t			xprt_hlen;
>  	int			argslen;
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 0c117d3bfda8..b42cfffa7395 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -1231,6 +1231,8 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
>  		dr->daddr = rqstp->rq_daddr;
>  		dr->argslen = rqstp->rq_arg.len >> 2;
>  		dr->xprt_hlen = rqstp->rq_xprt_hlen;
> +		dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
> +		rqstp->rq_xprt_ctxt = NULL;
>  
>  		/* back up head to the start of the buffer and copy */
>  		skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
> @@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp)
>  	rqstp->rq_xprt_hlen   = dr->xprt_hlen;
>  	rqstp->rq_daddr       = dr->daddr;
>  	rqstp->rq_respages    = rqstp->rq_pages;
> +	rqstp->rq_xprt_ctxt   = dr->xprt_ctxt;
>  	svc_xprt_received(rqstp->rq_xprt);
>  	return (dr->argslen<<2) - dr->xprt_hlen;
>  }
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index cf76a6ad127b..864131a9fc6e 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>  		goto out_err;
>  	if (ret == 0)
>  		goto out_drop;
> -	rqstp->rq_xprt_hlen = ret;
> +	rqstp->rq_xprt_hlen = 0;
>  
>  	if (svc_rdma_is_reverse_direction_reply(xprt, ctxt))
>  		goto out_backchannel;
> 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports
  2023-05-08  0:20   ` NeilBrown
@ 2023-05-08  4:10     ` Chuck Lever III
  0 siblings, 0 replies; 6+ messages in thread
From: Chuck Lever III @ 2023-05-08  4:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: Linux NFS Mailing List, rostedt



> On May 7, 2023, at 5:20 PM, NeilBrown <neilb@suse.de> wrote:
> 
> On Fri, 08 Apr 2022, Chuck Lever wrote:
>> Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further
>> investigation shows that the crash occurred while NFSD was handling
>> a deferred request.
>> 
>> This patch addresses two inter-related issues that prevent request
>> deferral from working correctly for RPC/RDMA requests:
>> 
>> 1. Prevent the crash by ensuring that the original
>>   svc_rqst::rq_xprt_ctxt value is available when the request is
>>   revisited. Otherwise svc_rdma_sendto() does not have a Receive
>>   context available with which to construct its reply.
>> 
>> 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly
>>   compute .len and .buflen for received RPC Calls"),
>>   svc_rdma_recvfrom() did not include the transport header in the
>>   returned xdr_buf. There should have been no need for svc_defer()
>>   and friends to save and restore that header, as of that commit.
>>   This issue is addressed in a backport-friendly way by simply
>>   having svc_rdma_recvfrom() set rq_xprt_hlen to zero
>>   unconditionally, just as svc_tcp_recvfrom() does. This enables
>>   svc_deferred_recv() to correctly reconstruct an RPC message
>>   received via RPC/RDMA.
> 
> I'm a bit late to this party but .... this patch is bad and I don't know
> how best to fix it.
> It is bad for two reasons.
> 1/ It can leak memory.  When a deferral happens, the context is moved
>   into an svc_deferred_req.  There are a couple of places which assume
>   that an svc_deferred_req can be freed with kfree(), such as
>   svc_delete_xprt() and svc_revisit().  These will now leak the
>   context.  This is the bit that is hard to fix.
> 
> 2/ It can cause a double-free with UDP (and maybe rdma).
>   When a request is deferred, the ctxt is moved to the dreq.
>   When that request is revisited the ctxt is copied back into the rqst.
>   If the rqst is again deferred then the old dreq is reused and,
>   importantly, the rq_xprt_ctxt is not cleared.  So when the release
>   function is called the ctxt is freed.
>   When the request is revisited a second time that ctxt (now pointing
>   to freed and possibly reused memory) is copied back into the rqst.
>   When the request completes the ctxt is again freed - double free
>   oops.
> 
> For UDP there is no value in saving the ctxt in the dreq - the content
> of the ctxt, which is an skbuf, has been copied into the dreq.  So maybe
> the UDB ctxt is a very different beast than the rdma ctxt and needs to
> be handled completely differently.
> 
> We can fix the UDP double-free by always doing
> rqstp->rq_xprt_ctxt = NULL;
> whether a new dreq was needed or not.  But that doesn't fix the leaking
> of ctxts and is really just a band-aid.

A double-free is potentially catastrophic, so I would
say this is a reasonable change that can be back-ported
without fuss (while noting that more is needed).

The RDMA recv_ctxt is persistent for the life of the
connection. Releasing one of those just puts it back on
a free list. So, maybe the second free could cause free
list corruption?


> Thoughts?
> Do we need to preserve ALL of the svc_rdma_recv_ctxt for deferred
> requests?  Could we just copy some bits into the dreq (allocate a bit
> more space somehow) so that a simple kfree() is still enough?
> Or do we need a free_ctxt() handler attached to the xprt?

While I take some time to review code and options, did
you know there is a deferral fault injector that might
possibly help you sort through some of this? I doubt I
took the time to try forcing a second deferral of the
same request.


> Thanks,
> NeilBrown
> 
> 
> 
> 
>> 
>> Reported-by: Trond Myklebust <trondmy@hammerspace.com>
>> Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>> include/linux/sunrpc/svc.h              |    1 +
>> net/sunrpc/svc_xprt.c                   |    3 +++
>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |    2 +-
>> 3 files changed, 5 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
>> index a5dda4987e8b..217711fc9cac 100644
>> --- a/include/linux/sunrpc/svc.h
>> +++ b/include/linux/sunrpc/svc.h
>> @@ -395,6 +395,7 @@ struct svc_deferred_req {
>> size_t addrlen;
>> struct sockaddr_storage daddr; /* where reply must come from */
>> size_t daddrlen;
>> + void *xprt_ctxt;
>> struct cache_deferred_req handle;
>> size_t xprt_hlen;
>> int argslen;
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index 0c117d3bfda8..b42cfffa7395 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -1231,6 +1231,8 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
>> dr->daddr = rqstp->rq_daddr;
>> dr->argslen = rqstp->rq_arg.len >> 2;
>> dr->xprt_hlen = rqstp->rq_xprt_hlen;
>> + dr->xprt_ctxt = rqstp->rq_xprt_ctxt;
>> + rqstp->rq_xprt_ctxt = NULL;
>> 
>> /* back up head to the start of the buffer and copy */
>> skip = rqstp->rq_arg.len - rqstp->rq_arg.head[0].iov_len;
>> @@ -1269,6 +1271,7 @@ static noinline int svc_deferred_recv(struct svc_rqst *rqstp)
>> rqstp->rq_xprt_hlen   = dr->xprt_hlen;
>> rqstp->rq_daddr       = dr->daddr;
>> rqstp->rq_respages    = rqstp->rq_pages;
>> + rqstp->rq_xprt_ctxt   = dr->xprt_ctxt;
>> svc_xprt_received(rqstp->rq_xprt);
>> return (dr->argslen<<2) - dr->xprt_hlen;
>> }
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index cf76a6ad127b..864131a9fc6e 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -831,7 +831,7 @@ int svc_rdma_recvfrom(struct svc_rqst *rqstp)
>> goto out_err;
>> if (ret == 0)
>> goto out_drop;
>> - rqstp->rq_xprt_hlen = ret;
>> + rqstp->rq_xprt_hlen = 0;
>> 
>> if (svc_rdma_is_reverse_direction_reply(xprt, ctxt))
>> goto out_backchannel;


--
Chuck Lever



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-05-08  4:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 17:43 [PATCH v3 0/2] Fix request deferral for NFS/RDMA Chuck Lever
2022-04-07 17:43 ` [PATCH v3 1/2] SUNRPC: Fix NFSD's request deferral on RDMA transports Chuck Lever
2022-04-07 20:31   ` Trond Myklebust
2023-05-08  0:20   ` NeilBrown
2023-05-08  4:10     ` Chuck Lever III
2022-04-07 17:43 ` [PATCH v3 2/2] SUNRPC: Fix the svc_deferred_event trace class Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).