All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Handle NFSv4 operations in xdr_buf tail
@ 2017-08-18 15:12 Chuck Lever
  2017-08-18 15:12 ` [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE Chuck Lever
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Chuck Lever @ 2017-08-18 15:12 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Hi Bruce-

A little light weekend reading. I got around to implementing
support in the NFSv4 WRITE decoder for dealing with trailing
operations that reside in the receiver buffer's tail iovec.
Let me know what you think of this solution.

---

Chuck Lever (3):
      nfsd: Limit end of page list when decoding NFSv4 WRITE
      nfsd: Incoming xdr_bufs may have content in tail buffer
      svcrdma: Populate tail iovec when receiving


 fs/nfsd/nfs4xdr.c                 |   26 ++++++++-
 fs/nfsd/xdr4.h                    |    1 
 net/sunrpc/xprtrdma/svc_rdma_rw.c |  108 +++++--------------------------------
 3 files changed, 38 insertions(+), 97 deletions(-)

--
Chuck Lever

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-18 15:12 [PATCH v1 0/3] Handle NFSv4 operations in xdr_buf tail Chuck Lever
@ 2017-08-18 15:12 ` Chuck Lever
  2017-08-21 21:13   ` J. Bruce Fields
  2017-08-18 15:12 ` [PATCH v1 2/3] nfsd: Incoming xdr_bufs may have content in tail buffer Chuck Lever
  2017-08-18 15:12 ` [PATCH v1 3/3] svcrdma: Populate tail iovec when receiving Chuck Lever
  2 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2017-08-18 15:12 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

When processing an NFSv4 WRITE operation, argp->end should never
point past the end of the data in the final page of the page list.
Otherwise, nfsd4_decode_compound can walk into uninitialized memory.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4xdr.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 51e729a..7c48d68 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
 	argp->p = page_address(argp->pagelist[0]);
 	argp->pagelist++;
 	if (argp->pagelen < PAGE_SIZE) {
-		argp->end = argp->p + (argp->pagelen>>2);
+		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
 		argp->pagelen = 0;
 	} else {
 		argp->end = argp->p + (PAGE_SIZE>>2);
@@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
 		argp->pagelen -= pages * PAGE_SIZE;
 		len -= pages * PAGE_SIZE;
 
-		argp->p = (__be32 *)page_address(argp->pagelist[0]);
-		argp->pagelist++;
-		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
+		next_decode_page(argp);
 	}
 	argp->p += XDR_QUADLEN(len);
 


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 2/3] nfsd: Incoming xdr_bufs may have content in tail buffer
  2017-08-18 15:12 [PATCH v1 0/3] Handle NFSv4 operations in xdr_buf tail Chuck Lever
  2017-08-18 15:12 ` [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE Chuck Lever
@ 2017-08-18 15:12 ` Chuck Lever
  2017-08-25 17:46   ` J. Bruce Fields
  2017-08-18 15:12 ` [PATCH v1 3/3] svcrdma: Populate tail iovec when receiving Chuck Lever
  2 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2017-08-18 15:12 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Since the beginning, svcsock has built a received RPC Call message
by populating the xdr_buf's head, then placing the remaining
message bytes in the xdr_buf's page list. The xdr_buf's tail is
never populated.

This means that an NFSv4 COMPOUND containing an NFS WRITE operation
plus trailing operations has a page list that contains the WRITE
data payload followed by the trailing operations. NFSv4 XDR decoders
will not look in the xdr_buf's tail, ever, because svcsock never put
anything there.

To support transports that can pass the write payload in the
xdr_buf's pagelist and trailing content in the xdr_buf's tail,
introduce logic in READ_BUF that switches to the xdr_buf's tail vec
when the decoder runs out of content in rq_arg.pages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfs4xdr.c |   20 ++++++++++++++++++++
 fs/nfsd/xdr4.h    |    1 +
 2 files changed, 21 insertions(+)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 7c48d68..a9f88cf 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -159,6 +159,25 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
 	 */
 	unsigned int avail = (char *)argp->end - (char *)argp->p;
 	__be32 *p;
+
+	if (argp->pagelen == 0) {
+		struct kvec *vec = &argp->rqstp->rq_arg.tail[0];
+
+		if (!argp->tail) {
+			argp->tail = true;
+			avail = vec->iov_len;
+			argp->p = vec->iov_base;
+			argp->end = vec->iov_base + avail;
+		}
+
+		if (avail < nbytes)
+			return NULL;
+
+		p = argp->p;
+		argp->p += XDR_QUADLEN(nbytes);
+		return p;
+	}
+
 	if (avail + argp->pagelen < nbytes)
 		return NULL;
 	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
@@ -4573,6 +4592,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp)
 	args->end = rqstp->rq_arg.head[0].iov_base + rqstp->rq_arg.head[0].iov_len;
 	args->pagelist = rqstp->rq_arg.pages;
 	args->pagelen = rqstp->rq_arg.page_len;
+	args->tail = false;
 	args->tmpp = NULL;
 	args->to_free = NULL;
 	args->ops = args->iops;
diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
index 72c6ad1..bac91b1 100644
--- a/fs/nfsd/xdr4.h
+++ b/fs/nfsd/xdr4.h
@@ -614,6 +614,7 @@ struct nfsd4_compoundargs {
 	__be32 *			end;
 	struct page **			pagelist;
 	int				pagelen;
+	bool				tail;
 	__be32				tmp[8];
 	__be32 *			tmpp;
 	struct svcxdr_tmpbuf		*to_free;


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v1 3/3] svcrdma: Populate tail iovec when receiving
  2017-08-18 15:12 [PATCH v1 0/3] Handle NFSv4 operations in xdr_buf tail Chuck Lever
  2017-08-18 15:12 ` [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE Chuck Lever
  2017-08-18 15:12 ` [PATCH v1 2/3] nfsd: Incoming xdr_bufs may have content in tail buffer Chuck Lever
@ 2017-08-18 15:12 ` Chuck Lever
  2 siblings, 0 replies; 13+ messages in thread
From: Chuck Lever @ 2017-08-18 15:12 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

So that NFS WRITE payloads can eventually be placed directly into a
file's page cache, enable the RPC-over-RDMA transport to present
these payloads in the xdr_buf's page list, while placing trailing
content (such as a GETATTR operation) in the xdr_buf's tail.

After this change, the RPC-over-RDMA's "copy tail" hack, added by
commit a97c331f9aa9 ("svcrdma: Handle additional inline content"),
is no longer needed and can be removed.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_rw.c |  108 +++++--------------------------------
 1 file changed, 15 insertions(+), 93 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 1f34fae..7dcda45 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -691,78 +691,6 @@ static int svc_rdma_build_read_chunk(struct svc_rqst *rqstp,
 	return ret;
 }
 
-/* If there is inline content following the Read chunk, append it to
- * the page list immediately following the data payload. This has to
- * be done after the reader function has determined how many pages
- * were consumed for RDMA Read.
- *
- * On entry, ri_pageno and ri_pageoff point directly to the end of the
- * page list. On exit, both have been updated to the new "next byte".
- *
- * Assumptions:
- *	- Inline content fits entirely in rq_pages[0]
- *	- Trailing content is only a handful of bytes
- */
-static int svc_rdma_copy_tail(struct svc_rqst *rqstp,
-			      struct svc_rdma_read_info *info)
-{
-	struct svc_rdma_op_ctxt *head = info->ri_readctxt;
-	unsigned int tail_length, remaining;
-	u8 *srcp, *destp;
-
-	/* Assert that all inline content fits in page 0. This is an
-	 * implementation limit, not a protocol limit.
-	 */
-	if (head->arg.head[0].iov_len > PAGE_SIZE) {
-		pr_warn_once("svcrdma: too much trailing inline content\n");
-		return -EINVAL;
-	}
-
-	srcp = head->arg.head[0].iov_base;
-	srcp += info->ri_position;
-	tail_length = head->arg.head[0].iov_len - info->ri_position;
-	remaining = tail_length;
-
-	/* If there is room on the last page in the page list, try to
-	 * fit the trailing content there.
-	 */
-	if (info->ri_pageoff > 0) {
-		unsigned int len;
-
-		len = min_t(unsigned int, remaining,
-			    PAGE_SIZE - info->ri_pageoff);
-		destp = page_address(rqstp->rq_pages[info->ri_pageno]);
-		destp += info->ri_pageoff;
-
-		memcpy(destp, srcp, len);
-		srcp += len;
-		destp += len;
-		info->ri_pageoff += len;
-		remaining -= len;
-
-		if (info->ri_pageoff == PAGE_SIZE) {
-			info->ri_pageno++;
-			info->ri_pageoff = 0;
-		}
-	}
-
-	/* Otherwise, a fresh page is needed. */
-	if (remaining) {
-		head->arg.pages[info->ri_pageno] =
-				rqstp->rq_pages[info->ri_pageno];
-		head->count++;
-
-		destp = page_address(rqstp->rq_pages[info->ri_pageno]);
-		memcpy(destp, srcp, remaining);
-		info->ri_pageoff += remaining;
-	}
-
-	head->arg.page_len += tail_length;
-	head->arg.len += tail_length;
-	head->arg.buflen += tail_length;
-	return 0;
-}
-
 /* Construct RDMA Reads to pull over a normal Read chunk. The chunk
  * data lands in the page list of head->arg.pages.
  *
@@ -787,34 +715,28 @@ static int svc_rdma_build_normal_read_chunk(struct svc_rqst *rqstp,
 	if (ret < 0)
 		goto out;
 
-	/* Read chunk may need XDR round-up (see RFC 5666, s. 3.7).
+	/* Split the Receive buffer between the head and tail
+	 * buffers at Read chunk's position. XDR roundup of the
+	 * chunk is not included in either the pagelist or in
+	 * the tail.
 	 */
-	if (info->ri_chunklen & 3) {
-		u32 padlen = 4 - (info->ri_chunklen & 3);
-
-		info->ri_chunklen += padlen;
+	head->arg.tail[0].iov_base =
+		head->arg.head[0].iov_base + info->ri_position;
+	head->arg.tail[0].iov_len =
+		head->arg.head[0].iov_len - info->ri_position;
+	head->arg.head[0].iov_len = info->ri_position;
 
-		/* NB: data payload always starts on XDR alignment,
-		 * thus the pad can never contain a page boundary.
-		 */
-		info->ri_pageoff += padlen;
-		if (info->ri_pageoff == PAGE_SIZE) {
-			info->ri_pageno++;
-			info->ri_pageoff = 0;
-		}
-	}
+	/* Read chunk may need XDR roundup (see RFC 5666, s. 3.7).
+	 *
+	 * NFSv2/3 write decoders need the length of the tail to
+	 * contain the size of the roundup padding.
+	 */
+	head->arg.tail[0].iov_len += 4 - (info->ri_chunklen & 3);
 
 	head->arg.page_len = info->ri_chunklen;
 	head->arg.len += info->ri_chunklen;
 	head->arg.buflen += info->ri_chunklen;
 
-	if (info->ri_position < head->arg.head[0].iov_len) {
-		ret = svc_rdma_copy_tail(rqstp, info);
-		if (ret < 0)
-			goto out;
-	}
-	head->arg.head[0].iov_len = info->ri_position;
-
 out:
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-18 15:12 ` [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE Chuck Lever
@ 2017-08-21 21:13   ` J. Bruce Fields
  2017-08-21 21:15     ` Chuck Lever
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2017-08-21 21:13 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
> When processing an NFSv4 WRITE operation, argp->end should never
> point past the end of the data in the final page of the page list.
> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4xdr.c |    6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 51e729a..7c48d68 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
>  	argp->p = page_address(argp->pagelist[0]);
>  	argp->pagelist++;
>  	if (argp->pagelen < PAGE_SIZE) {
> -		argp->end = argp->p + (argp->pagelen>>2);
> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
>  		argp->pagelen = 0;
>  	} else {
>  		argp->end = argp->p + (PAGE_SIZE>>2);
> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
>  		argp->pagelen -= pages * PAGE_SIZE;
>  		len -= pages * PAGE_SIZE;
>  
> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
> -		argp->pagelist++;
> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
> +		next_decode_page(argp);

I think there's no change in behavior here *except* for adding a new
argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).

--b.

>  	}
>  	argp->p += XDR_QUADLEN(len);
>  

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-21 21:13   ` J. Bruce Fields
@ 2017-08-21 21:15     ` Chuck Lever
  2017-08-21 21:21       ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2017-08-21 21:15 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing List


> On Aug 21, 2017, at 5:13 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
>> When processing an NFSv4 WRITE operation, argp->end should never
>> point past the end of the data in the final page of the page list.
>> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>> fs/nfsd/nfs4xdr.c |    6 ++----
>> 1 file changed, 2 insertions(+), 4 deletions(-)
>> 
>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>> index 51e729a..7c48d68 100644
>> --- a/fs/nfsd/nfs4xdr.c
>> +++ b/fs/nfsd/nfs4xdr.c
>> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
>> 	argp->p = page_address(argp->pagelist[0]);
>> 	argp->pagelist++;
>> 	if (argp->pagelen < PAGE_SIZE) {
>> -		argp->end = argp->p + (argp->pagelen>>2);
>> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
>> 		argp->pagelen = 0;
>> 	} else {
>> 		argp->end = argp->p + (PAGE_SIZE>>2);
>> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
>> 		argp->pagelen -= pages * PAGE_SIZE;
>> 		len -= pages * PAGE_SIZE;
>> 
>> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
>> -		argp->pagelist++;
>> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
>> +		next_decode_page(argp);
> 
> I think there's no change in behavior here *except* for adding a new
> argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).

The code around this change is currently working correctly,
so there is no change in behavior AFAICT. This is a defensive
change, but it also replaces duplicate code.


> --b.
> 
>> 	}
>> 	argp->p += XDR_QUADLEN(len);
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-21 21:15     ` Chuck Lever
@ 2017-08-21 21:21       ` J. Bruce Fields
  2017-08-21 22:08         ` Chuck Lever
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2017-08-21 21:21 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Mon, Aug 21, 2017 at 05:15:38PM -0400, Chuck Lever wrote:
> 
> > On Aug 21, 2017, at 5:13 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
> >> When processing an NFSv4 WRITE operation, argp->end should never
> >> point past the end of the data in the final page of the page list.
> >> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
> >> 
> >> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >> ---
> >> fs/nfsd/nfs4xdr.c |    6 ++----
> >> 1 file changed, 2 insertions(+), 4 deletions(-)
> >> 
> >> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >> index 51e729a..7c48d68 100644
> >> --- a/fs/nfsd/nfs4xdr.c
> >> +++ b/fs/nfsd/nfs4xdr.c
> >> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
> >> 	argp->p = page_address(argp->pagelist[0]);
> >> 	argp->pagelist++;
> >> 	if (argp->pagelen < PAGE_SIZE) {
> >> -		argp->end = argp->p + (argp->pagelen>>2);
> >> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
> >> 		argp->pagelen = 0;
> >> 	} else {
> >> 		argp->end = argp->p + (PAGE_SIZE>>2);
> >> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
> >> 		argp->pagelen -= pages * PAGE_SIZE;
> >> 		len -= pages * PAGE_SIZE;
> >> 
> >> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
> >> -		argp->pagelist++;
> >> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
> >> +		next_decode_page(argp);
> > 
> > I think there's no change in behavior here *except* for adding a new
> > argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).
> 
> The code around this change is currently working correctly,
> so there is no change in behavior AFAICT. This is a defensive
> change, but it also replaces duplicate code.

I don't understand.  I'm saying that by calling next_decode_page() there
you've added a new argp->pagelen assignment.  I don't understand how
that can't change behavior, unless there's another bug in our bounds
checking someplace.

Most likely it could cause subsequent op parsers to believe there's less
space in the argument buffer than there really is, so it might fail to
parse a compound with a write plus some other ops, if that puts the
total call close to the maximum size?

--b.

> 
> 
> > --b.
> > 
> >> 	}
> >> 	argp->p += XDR_QUADLEN(len);
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-21 21:21       ` J. Bruce Fields
@ 2017-08-21 22:08         ` Chuck Lever
  2017-08-22 21:45           ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2017-08-21 22:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing List


> On Aug 21, 2017, at 5:21 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Aug 21, 2017 at 05:15:38PM -0400, Chuck Lever wrote:
>> 
>>> On Aug 21, 2017, at 5:13 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
>>>> When processing an NFSv4 WRITE operation, argp->end should never
>>>> point past the end of the data in the final page of the page list.
>>>> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
>>>> 
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>> fs/nfsd/nfs4xdr.c |    6 ++----
>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>> 
>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>>> index 51e729a..7c48d68 100644
>>>> --- a/fs/nfsd/nfs4xdr.c
>>>> +++ b/fs/nfsd/nfs4xdr.c
>>>> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
>>>> 	argp->p = page_address(argp->pagelist[0]);
>>>> 	argp->pagelist++;
>>>> 	if (argp->pagelen < PAGE_SIZE) {
>>>> -		argp->end = argp->p + (argp->pagelen>>2);
>>>> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);

                     ^^^^^^^^^^^^^^ A

>>>> 		argp->pagelen = 0;
>>>> 	} else {
>>>> 		argp->end = argp->p + (PAGE_SIZE>>2);
>>>> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
>>>> 		argp->pagelen -= pages * PAGE_SIZE;
>>>> 		len -= pages * PAGE_SIZE;
>>>> 
>>>> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
>>>> -		argp->pagelist++;
>>>> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);

                     ^^^^^^^^^^^^^^ B

>>>> +		next_decode_page(argp);
>>> 
>>> I think there's no change in behavior here *except* for adding a new
>>> argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).
>> 
>> The code around this change is currently working correctly,
>> so there is no change in behavior AFAICT. This is a defensive
>> change, but it also replaces duplicate code.
> 
> I don't understand.  I'm saying that by calling next_decode_page() there
> you've added a new argp->pagelen assignment.  I don't understand how
> that can't change behavior, unless there's another bug in our bounds
> checking someplace.

Because of line B above, argp->end always points to the
end of the final page in the page list. However, the
buffer might end somewhere in the middle of that page,
in which case, the transport hasn't initialized any of
the bytes between the end of the buffer and the end of
the page.

As long as the other fields in the xdr_buf are set up
properly, the XDR decoder will not walk into that uninit-
ialized section of the last page. But there's nothing
preventing a decoder or transport bug from causing it
to walk into the uninitialized area. And always setting
to the end of the page is confusing when the buffer
itself is actually shorter.

The key is to replace line B above with line A. argp->end
is advanced by the remaining part of the final page rather
than by a whole page.

The next patch uses this new behavior to signal precisely
when it has to move from the page list to the tail iovec.


> Most likely it could cause subsequent op parsers to believe there's less
> space in the argument buffer than there really is, so it might fail to
> parse a compound with a write plus some other ops, if that puts the
> total call close to the maximum size?

Where is argp->pagelen used after the final next_decode_page
call?


> --b.
> 
>> 
>> 
>>> --b.
>>> 
>>>> 	}
>>>> 	argp->p += XDR_QUADLEN(len);
>>>> 
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> --
>> Chuck Lever
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-21 22:08         ` Chuck Lever
@ 2017-08-22 21:45           ` J. Bruce Fields
  2017-08-23 18:36             ` Chuck Lever
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2017-08-22 21:45 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Mon, Aug 21, 2017 at 06:08:15PM -0400, Chuck Lever wrote:
> 
> > On Aug 21, 2017, at 5:21 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Mon, Aug 21, 2017 at 05:15:38PM -0400, Chuck Lever wrote:
> >> 
> >>> On Aug 21, 2017, at 5:13 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>> 
> >>> On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
> >>>> When processing an NFSv4 WRITE operation, argp->end should never
> >>>> point past the end of the data in the final page of the page list.
> >>>> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
> >>>> 
> >>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>> ---
> >>>> fs/nfsd/nfs4xdr.c |    6 ++----
> >>>> 1 file changed, 2 insertions(+), 4 deletions(-)
> >>>> 
> >>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >>>> index 51e729a..7c48d68 100644
> >>>> --- a/fs/nfsd/nfs4xdr.c
> >>>> +++ b/fs/nfsd/nfs4xdr.c
> >>>> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
> >>>> 	argp->p = page_address(argp->pagelist[0]);
> >>>> 	argp->pagelist++;
> >>>> 	if (argp->pagelen < PAGE_SIZE) {
> >>>> -		argp->end = argp->p + (argp->pagelen>>2);
> >>>> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
> 
>                      ^^^^^^^^^^^^^^ A
> 
> >>>> 		argp->pagelen = 0;
> >>>> 	} else {
> >>>> 		argp->end = argp->p + (PAGE_SIZE>>2);
> >>>> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
> >>>> 		argp->pagelen -= pages * PAGE_SIZE;
> >>>> 		len -= pages * PAGE_SIZE;
> >>>> 
> >>>> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
> >>>> -		argp->pagelist++;
> >>>> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
> 
>                      ^^^^^^^^^^^^^^ B
> 
> >>>> +		next_decode_page(argp);
> >>> 
> >>> I think there's no change in behavior here *except* for adding a new
> >>> argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).
> >> 
> >> The code around this change is currently working correctly,
> >> so there is no change in behavior AFAICT. This is a defensive
> >> change, but it also replaces duplicate code.
> > 
> > I don't understand.  I'm saying that by calling next_decode_page() there
> > you've added a new argp->pagelen assignment.  I don't understand how
> > that can't change behavior, unless there's another bug in our bounds
> > checking someplace.
> 
> Because of line B above, argp->end always points to the
> end of the final page in the page list. However, the
> buffer might end somewhere in the middle of that page,
> in which case, the transport hasn't initialized any of
> the bytes between the end of the buffer and the end of
> the page.
> 
> As long as the other fields in the xdr_buf are set up
> properly, the XDR decoder will not walk into that uninit-
> ialized section of the last page. But there's nothing
> preventing a decoder or transport bug from causing it
> to walk into the uninitialized area. And always setting
> to the end of the page is confusing when the buffer
> itself is actually shorter.
> 
> The key is to replace line B above with line A. argp->end
> is advanced by the remaining part of the final page rather
> than by a whole page.

Got it, I agree with that part of the change, it's the pagelen change I
was having trouble with.

But looking at it more, I think your patch is a fix and the current code
is wrong.

> The next patch uses this new behavior to signal precisely
> when it has to move from the page list to the tail iovec.
> 
> 
> > Most likely it could cause subsequent op parsers to believe there's less
> > space in the argument buffer than there really is, so it might fail to
> > parse a compound with a write plus some other ops, if that puts the
> > total call close to the maximum size?
> 
> Where is argp->pagelen used after the final next_decode_page
> call?

Well, it's checked in every read_buf and next_decode_page to decide how
much space is left.

It looks to me like the current code is wrong not to be decreasing
page_len at the end there.  So I wonder if there's a bug right now.
E.g. maybe a compound with multiple writes could leave the xdr decoding
thinking it has more space than it does and allow someone to write
unrelated memory to some file.

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-22 21:45           ` J. Bruce Fields
@ 2017-08-23 18:36             ` Chuck Lever
  2017-08-24  1:18               ` J. Bruce Fields
  0 siblings, 1 reply; 13+ messages in thread
From: Chuck Lever @ 2017-08-23 18:36 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Linux NFS Mailing List


> On Aug 22, 2017, at 5:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Mon, Aug 21, 2017 at 06:08:15PM -0400, Chuck Lever wrote:
>> 
>>> On Aug 21, 2017, at 5:21 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Mon, Aug 21, 2017 at 05:15:38PM -0400, Chuck Lever wrote:
>>>> 
>>>>> On Aug 21, 2017, at 5:13 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>>>> 
>>>>> On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
>>>>>> When processing an NFSv4 WRITE operation, argp->end should never
>>>>>> point past the end of the data in the final page of the page list.
>>>>>> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
>>>>>> 
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>> ---
>>>>>> fs/nfsd/nfs4xdr.c |    6 ++----
>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>> 
>>>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>>>>> index 51e729a..7c48d68 100644
>>>>>> --- a/fs/nfsd/nfs4xdr.c
>>>>>> +++ b/fs/nfsd/nfs4xdr.c
>>>>>> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
>>>>>> 	argp->p = page_address(argp->pagelist[0]);
>>>>>> 	argp->pagelist++;
>>>>>> 	if (argp->pagelen < PAGE_SIZE) {
>>>>>> -		argp->end = argp->p + (argp->pagelen>>2);
>>>>>> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
>> 
>>                     ^^^^^^^^^^^^^^ A
>> 
>>>>>> 		argp->pagelen = 0;
>>>>>> 	} else {
>>>>>> 		argp->end = argp->p + (PAGE_SIZE>>2);
>>>>>> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
>>>>>> 		argp->pagelen -= pages * PAGE_SIZE;
>>>>>> 		len -= pages * PAGE_SIZE;
>>>>>> 
>>>>>> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
>>>>>> -		argp->pagelist++;
>>>>>> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
>> 
>>                     ^^^^^^^^^^^^^^ B
>> 
>>>>>> +		next_decode_page(argp);
>>>>> 
>>>>> I think there's no change in behavior here *except* for adding a new
>>>>> argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).
>>>> 
>>>> The code around this change is currently working correctly,
>>>> so there is no change in behavior AFAICT. This is a defensive
>>>> change, but it also replaces duplicate code.
>>> 
>>> I don't understand.  I'm saying that by calling next_decode_page() there
>>> you've added a new argp->pagelen assignment.  I don't understand how
>>> that can't change behavior, unless there's another bug in our bounds
>>> checking someplace.
>> 
>> Because of line B above, argp->end always points to the
>> end of the final page in the page list. However, the
>> buffer might end somewhere in the middle of that page,
>> in which case, the transport hasn't initialized any of
>> the bytes between the end of the buffer and the end of
>> the page.
>> 
>> As long as the other fields in the xdr_buf are set up
>> properly, the XDR decoder will not walk into that uninit-
>> ialized section of the last page. But there's nothing
>> preventing a decoder or transport bug from causing it
>> to walk into the uninitialized area. And always setting
>> to the end of the page is confusing when the buffer
>> itself is actually shorter.
>> 
>> The key is to replace line B above with line A. argp->end
>> is advanced by the remaining part of the final page rather
>> than by a whole page.
> 
> Got it, I agree with that part of the change, it's the pagelen change I
> was having trouble with.
> 
> But looking at it more, I think your patch is a fix and the current code
> is wrong.
> 
>> The next patch uses this new behavior to signal precisely
>> when it has to move from the page list to the tail iovec.
>> 
>> 
>>> Most likely it could cause subsequent op parsers to believe there's less
>>> space in the argument buffer than there really is, so it might fail to
>>> parse a compound with a write plus some other ops, if that puts the
>>> total call close to the maximum size?
>> 
>> Where is argp->pagelen used after the final next_decode_page
>> call?
> 
> Well, it's checked in every read_buf and next_decode_page to decide how
> much space is left.

Right, and I didn't change the pagelen adjustment that occurs
in the loop. Just the final adjustment should be different.


> It looks to me like the current code is wrong not to be decreasing
> page_len at the end there.  So I wonder if there's a bug right now.
> E.g. maybe a compound with multiple writes could leave the xdr decoding
> thinking it has more space than it does and allow someone to write
> unrelated memory to some file.

I believe that's possible.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-23 18:36             ` Chuck Lever
@ 2017-08-24  1:18               ` J. Bruce Fields
  2017-08-24  2:52                 ` Weston Andros Adamson
  0 siblings, 1 reply; 13+ messages in thread
From: J. Bruce Fields @ 2017-08-24  1:18 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Linux NFS Mailing List

On Wed, Aug 23, 2017 at 02:36:33PM -0400, Chuck Lever wrote:
> 
> > On Aug 22, 2017, at 5:45 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Mon, Aug 21, 2017 at 06:08:15PM -0400, Chuck Lever wrote:
> >> 
> >>> On Aug 21, 2017, at 5:21 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>> 
> >>> On Mon, Aug 21, 2017 at 05:15:38PM -0400, Chuck Lever wrote:
> >>>> 
> >>>>> On Aug 21, 2017, at 5:13 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>>>> 
> >>>>> On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
> >>>>>> When processing an NFSv4 WRITE operation, argp->end should never
> >>>>>> point past the end of the data in the final page of the page list.
> >>>>>> Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
> >>>>>> 
> >>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>>>>> ---
> >>>>>> fs/nfsd/nfs4xdr.c |    6 ++----
> >>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
> >>>>>> 
> >>>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> >>>>>> index 51e729a..7c48d68 100644
> >>>>>> --- a/fs/nfsd/nfs4xdr.c
> >>>>>> +++ b/fs/nfsd/nfs4xdr.c
> >>>>>> @@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd4_compoundargs *argp)
> >>>>>> 	argp->p = page_address(argp->pagelist[0]);
> >>>>>> 	argp->pagelist++;
> >>>>>> 	if (argp->pagelen < PAGE_SIZE) {
> >>>>>> -		argp->end = argp->p + (argp->pagelen>>2);
> >>>>>> +		argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
> >> 
> >>                     ^^^^^^^^^^^^^^ A
> >> 
> >>>>>> 		argp->pagelen = 0;
> >>>>>> 	} else {
> >>>>>> 		argp->end = argp->p + (PAGE_SIZE>>2);
> >>>>>> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct nfsd4_compoundargs *argp, struct xdr_ne
> >>>>>> 		argp->pagelen -= pages * PAGE_SIZE;
> >>>>>> 		len -= pages * PAGE_SIZE;
> >>>>>> 
> >>>>>> -		argp->p = (__be32 *)page_address(argp->pagelist[0]);
> >>>>>> -		argp->pagelist++;
> >>>>>> -		argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
> >> 
> >>                     ^^^^^^^^^^^^^^ B
> >> 
> >>>>>> +		next_decode_page(argp);
> >>>>> 
> >>>>> I think there's no change in behavior here *except* for adding a new
> >>>>> argp->pagelen=0 (or argp->pagelen -= PAGE_SIZE).
> >>>> 
> >>>> The code around this change is currently working correctly,
> >>>> so there is no change in behavior AFAICT. This is a defensive
> >>>> change, but it also replaces duplicate code.
> >>> 
> >>> I don't understand.  I'm saying that by calling next_decode_page() there
> >>> you've added a new argp->pagelen assignment.  I don't understand how
> >>> that can't change behavior, unless there's another bug in our bounds
> >>> checking someplace.
> >> 
> >> Because of line B above, argp->end always points to the
> >> end of the final page in the page list. However, the
> >> buffer might end somewhere in the middle of that page,
> >> in which case, the transport hasn't initialized any of
> >> the bytes between the end of the buffer and the end of
> >> the page.
> >> 
> >> As long as the other fields in the xdr_buf are set up
> >> properly, the XDR decoder will not walk into that uninit-
> >> ialized section of the last page. But there's nothing
> >> preventing a decoder or transport bug from causing it
> >> to walk into the uninitialized area. And always setting
> >> to the end of the page is confusing when the buffer
> >> itself is actually shorter.
> >> 
> >> The key is to replace line B above with line A. argp->end
> >> is advanced by the remaining part of the final page rather
> >> than by a whole page.
> > 
> > Got it, I agree with that part of the change, it's the pagelen change I
> > was having trouble with.
> > 
> > But looking at it more, I think your patch is a fix and the current code
> > is wrong.
> > 
> >> The next patch uses this new behavior to signal precisely
> >> when it has to move from the page list to the tail iovec.
> >> 
> >> 
> >>> Most likely it could cause subsequent op parsers to believe there's less
> >>> space in the argument buffer than there really is, so it might fail to
> >>> parse a compound with a write plus some other ops, if that puts the
> >>> total call close to the maximum size?
> >> 
> >> Where is argp->pagelen used after the final next_decode_page
> >> call?
> > 
> > Well, it's checked in every read_buf and next_decode_page to decide how
> > much space is left.
> 
> Right, and I didn't change the pagelen adjustment that occurs
> in the loop. Just the final adjustment should be different.
> 
> 
> > It looks to me like the current code is wrong not to be decreasing
> > page_len at the end there.  So I wonder if there's a bug right now.
> > E.g. maybe a compound with multiple writes could leave the xdr decoding
> > thinking it has more space than it does and allow someone to write
> > unrelated memory to some file.
> 
> I believe that's possible.

Yeah, I can totally crash the server this way.

I'll send this along for stable.  (And see if I can figure out how to get
a test for this kind of thing into pynfs.  Unfortunately it's seems to
require some contortions to get it to produce bad xdr.)

--b.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE
  2017-08-24  1:18               ` J. Bruce Fields
@ 2017-08-24  2:52                 ` Weston Andros Adamson
  0 siblings, 0 replies; 13+ messages in thread
From: Weston Andros Adamson @ 2017-08-24  2:52 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs list


> On Aug 23, 2017, at 9:18 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>=20
> On Wed, Aug 23, 2017 at 02:36:33PM -0400, Chuck Lever wrote:
>>=20
>>> On Aug 22, 2017, at 5:45 PM, J. Bruce Fields <bfields@fieldses.org> =
wrote:
>>>=20
>>> On Mon, Aug 21, 2017 at 06:08:15PM -0400, Chuck Lever wrote:
>>>>=20
>>>>> On Aug 21, 2017, at 5:21 PM, J. Bruce Fields =
<bfields@fieldses.org> wrote:
>>>>>=20
>>>>> On Mon, Aug 21, 2017 at 05:15:38PM -0400, Chuck Lever wrote:
>>>>>>=20
>>>>>>> On Aug 21, 2017, at 5:13 PM, J. Bruce Fields =
<bfields@fieldses.org> wrote:
>>>>>>>=20
>>>>>>> On Fri, Aug 18, 2017 at 11:12:19AM -0400, Chuck Lever wrote:
>>>>>>>> When processing an NFSv4 WRITE operation, argp->end should =
never
>>>>>>>> point past the end of the data in the final page of the page =
list.
>>>>>>>> Otherwise, nfsd4_decode_compound can walk into uninitialized =
memory.
>>>>>>>>=20
>>>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>>>> ---
>>>>>>>> fs/nfsd/nfs4xdr.c |    6 ++----
>>>>>>>> 1 file changed, 2 insertions(+), 4 deletions(-)
>>>>>>>>=20
>>>>>>>> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
>>>>>>>> index 51e729a..7c48d68 100644
>>>>>>>> --- a/fs/nfsd/nfs4xdr.c
>>>>>>>> +++ b/fs/nfsd/nfs4xdr.c
>>>>>>>> @@ -144,7 +144,7 @@ static void next_decode_page(struct =
nfsd4_compoundargs *argp)
>>>>>>>> 	argp->p =3D page_address(argp->pagelist[0]);
>>>>>>>> 	argp->pagelist++;
>>>>>>>> 	if (argp->pagelen < PAGE_SIZE) {
>>>>>>>> -		argp->end =3D argp->p + (argp->pagelen>>2);
>>>>>>>> +		argp->end =3D argp->p + =
XDR_QUADLEN(argp->pagelen);
>>>>=20
>>>>                    ^^^^^^^^^^^^^^ A
>>>>=20
>>>>>>>> 		argp->pagelen =3D 0;
>>>>>>>> 	} else {
>>>>>>>> 		argp->end =3D argp->p + (PAGE_SIZE>>2);
>>>>>>>> @@ -1279,9 +1279,7 @@ static __be32 nfsd4_decode_opaque(struct =
nfsd4_compoundargs *argp, struct xdr_ne
>>>>>>>> 		argp->pagelen -=3D pages * PAGE_SIZE;
>>>>>>>> 		len -=3D pages * PAGE_SIZE;
>>>>>>>>=20
>>>>>>>> -		argp->p =3D (__be32 =
*)page_address(argp->pagelist[0]);
>>>>>>>> -		argp->pagelist++;
>>>>>>>> -		argp->end =3D argp->p + XDR_QUADLEN(PAGE_SIZE);
>>>>=20
>>>>                    ^^^^^^^^^^^^^^ B
>>>>=20
>>>>>>>> +		next_decode_page(argp);
>>>>>>>=20
>>>>>>> I think there's no change in behavior here *except* for adding a =
new
>>>>>>> argp->pagelen=3D0 (or argp->pagelen -=3D PAGE_SIZE).
>>>>>>=20
>>>>>> The code around this change is currently working correctly,
>>>>>> so there is no change in behavior AFAICT. This is a defensive
>>>>>> change, but it also replaces duplicate code.
>>>>>=20
>>>>> I don't understand.  I'm saying that by calling next_decode_page() =
there
>>>>> you've added a new argp->pagelen assignment.  I don't understand =
how
>>>>> that can't change behavior, unless there's another bug in our =
bounds
>>>>> checking someplace.
>>>>=20
>>>> Because of line B above, argp->end always points to the
>>>> end of the final page in the page list. However, the
>>>> buffer might end somewhere in the middle of that page,
>>>> in which case, the transport hasn't initialized any of
>>>> the bytes between the end of the buffer and the end of
>>>> the page.
>>>>=20
>>>> As long as the other fields in the xdr_buf are set up
>>>> properly, the XDR decoder will not walk into that uninit-
>>>> ialized section of the last page. But there's nothing
>>>> preventing a decoder or transport bug from causing it
>>>> to walk into the uninitialized area. And always setting
>>>> to the end of the page is confusing when the buffer
>>>> itself is actually shorter.
>>>>=20
>>>> The key is to replace line B above with line A. argp->end
>>>> is advanced by the remaining part of the final page rather
>>>> than by a whole page.
>>>=20
>>> Got it, I agree with that part of the change, it's the pagelen =
change I
>>> was having trouble with.
>>>=20
>>> But looking at it more, I think your patch is a fix and the current =
code
>>> is wrong.
>>>=20
>>>> The next patch uses this new behavior to signal precisely
>>>> when it has to move from the page list to the tail iovec.
>>>>=20
>>>>=20
>>>>> Most likely it could cause subsequent op parsers to believe =
there's less
>>>>> space in the argument buffer than there really is, so it might =
fail to
>>>>> parse a compound with a write plus some other ops, if that puts =
the
>>>>> total call close to the maximum size?
>>>>=20
>>>> Where is argp->pagelen used after the final next_decode_page
>>>> call?
>>>=20
>>> Well, it's checked in every read_buf and next_decode_page to decide =
how
>>> much space is left.
>>=20
>> Right, and I didn't change the pagelen adjustment that occurs
>> in the loop. Just the final adjustment should be different.
>>=20
>>=20
>>> It looks to me like the current code is wrong not to be decreasing
>>> page_len at the end there.  So I wonder if there's a bug right now.
>>> E.g. maybe a compound with multiple writes could leave the xdr =
decoding
>>> thinking it has more space than it does and allow someone to write
>>> unrelated memory to some file.
>>=20
>> I believe that's possible.
>=20
> Yeah, I can totally crash the server this way.
>=20
> I'll send this along for stable.  (And see if I can figure out how to =
get
> a test for this kind of thing into pynfs.  Unfortunately it's seems to
> require some contortions to get it to produce bad xdr.)

Having a generic way to include bad XDR would be a really useful feature =
for pynfs.

-dros

>=20
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" =
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v1 2/3] nfsd: Incoming xdr_bufs may have content in tail buffer
  2017-08-18 15:12 ` [PATCH v1 2/3] nfsd: Incoming xdr_bufs may have content in tail buffer Chuck Lever
@ 2017-08-25 17:46   ` J. Bruce Fields
  0 siblings, 0 replies; 13+ messages in thread
From: J. Bruce Fields @ 2017-08-25 17:46 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Fri, Aug 18, 2017 at 11:12:27AM -0400, Chuck Lever wrote:
> Since the beginning, svcsock has built a received RPC Call message
> by populating the xdr_buf's head, then placing the remaining
> message bytes in the xdr_buf's page list. The xdr_buf's tail is
> never populated.
> 
> This means that an NFSv4 COMPOUND containing an NFS WRITE operation
> plus trailing operations has a page list that contains the WRITE
> data payload followed by the trailing operations. NFSv4 XDR decoders
> will not look in the xdr_buf's tail, ever, because svcsock never put
> anything there.
> 
> To support transports that can pass the write payload in the
> xdr_buf's pagelist and trailing content in the xdr_buf's tail,
> introduce logic in READ_BUF that switches to the xdr_buf's tail vec
> when the decoder runs out of content in rq_arg.pages.

This is very specialized: it assumes an xdr buffer will never cross the
boundary from pages into the tail, for example.  But, I guess we do in
fact get that kind of guarantee from the rdma code, so fine.  Might be
worth a comment.

> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfs4xdr.c |   20 ++++++++++++++++++++
>  fs/nfsd/xdr4.h    |    1 +
>  2 files changed, 21 insertions(+)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 7c48d68..a9f88cf 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -159,6 +159,25 @@ static __be32 *read_buf(struct nfsd4_compoundargs *argp, u32 nbytes)
>  	 */
>  	unsigned int avail = (char *)argp->end - (char *)argp->p;
>  	__be32 *p;
> +
> +	if (argp->pagelen == 0) {
> +		struct kvec *vec = &argp->rqstp->rq_arg.tail[0];
> +
> +		if (!argp->tail) {

I think we may have other code that does this by checking whether
argp->p is in the range covered by that iovec, but your approach is
probably cleaner, OK.

--b.

> +			argp->tail = true;
> +			avail = vec->iov_len;
> +			argp->p = vec->iov_base;
> +			argp->end = vec->iov_base + avail;
> +		}
> +
> +		if (avail < nbytes)
> +			return NULL;
> +
> +		p = argp->p;
> +		argp->p += XDR_QUADLEN(nbytes);
> +		return p;
> +	}
> +
>  	if (avail + argp->pagelen < nbytes)
>  		return NULL;
>  	if (avail + PAGE_SIZE < nbytes) /* need more than a page !! */
> @@ -4573,6 +4592,7 @@ void nfsd4_release_compoundargs(struct svc_rqst *rqstp)
>  	args->end = rqstp->rq_arg.head[0].iov_base + rqstp->rq_arg.head[0].iov_len;
>  	args->pagelist = rqstp->rq_arg.pages;
>  	args->pagelen = rqstp->rq_arg.page_len;
> +	args->tail = false;
>  	args->tmpp = NULL;
>  	args->to_free = NULL;
>  	args->ops = args->iops;
> diff --git a/fs/nfsd/xdr4.h b/fs/nfsd/xdr4.h
> index 72c6ad1..bac91b1 100644
> --- a/fs/nfsd/xdr4.h
> +++ b/fs/nfsd/xdr4.h
> @@ -614,6 +614,7 @@ struct nfsd4_compoundargs {
>  	__be32 *			end;
>  	struct page **			pagelist;
>  	int				pagelen;
> +	bool				tail;
>  	__be32				tmp[8];
>  	__be32 *			tmpp;
>  	struct svcxdr_tmpbuf		*to_free;

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-08-25 17:46 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-18 15:12 [PATCH v1 0/3] Handle NFSv4 operations in xdr_buf tail Chuck Lever
2017-08-18 15:12 ` [PATCH v1 1/3] nfsd: Limit end of page list when decoding NFSv4 WRITE Chuck Lever
2017-08-21 21:13   ` J. Bruce Fields
2017-08-21 21:15     ` Chuck Lever
2017-08-21 21:21       ` J. Bruce Fields
2017-08-21 22:08         ` Chuck Lever
2017-08-22 21:45           ` J. Bruce Fields
2017-08-23 18:36             ` Chuck Lever
2017-08-24  1:18               ` J. Bruce Fields
2017-08-24  2:52                 ` Weston Andros Adamson
2017-08-18 15:12 ` [PATCH v1 2/3] nfsd: Incoming xdr_bufs may have content in tail buffer Chuck Lever
2017-08-25 17:46   ` J. Bruce Fields
2017-08-18 15:12 ` [PATCH v1 3/3] svcrdma: Populate tail iovec when receiving Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.