All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation
@ 2020-08-03 16:59 schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec() schumaker.anna
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

These patches add server support for the READ_PLUS operation, which
breaks read requests into several "data" and "hole" segments when
replying to the client.

Here are the results of some performance tests I ran on my own virtual
machines, which does still have the slow SEEK_HOLE/SEEK_DATA behavior.
A recent minimum-compiler-version patch makes installing newer kernels
difficult on the lab machines I have access to, but I'll still try to
collect that data since it seemed promising during my last posting
(https://www.spinics.net/lists/linux-nfs/msg76487.html).

I tested by reading various 2G files from a few different underlying
filesystems and across several NFS versions. I used the `vmtouch` utility
to make sure files were only cached when we wanted them to be. In addition
to 100% data and 100% hole cases, I also tested with files that alternate
between data and hole segments. These files have either 4K, 8K, 16K, or 32K
segment sizes and start with either data or hole segments. So the file
mixed-4d has a 4K segment size beginning with a data segment, but mixed-32h
has 32K segments beginning with a hole. The units are in seconds, with the
first number for each NFS version being the uncached read time and the second
number is for when the file is cached on the server.


          |         v3        |        v4.0       |        v4.1       |        v4.2       |
ext4      | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
----------|-------------------|-------------------|-------------------|-------------------|
data      |   2.697 :   1.365 |   2.539 :   1.482 |   2.246 :   1.595 |   2.516 :   1.716 |
hole      |   2.389 :   1.480 |   1.692 :   1.444 |   1.904 :   1.539 |   1.042 :   1.005 |
mixed-4d  |   2.426 :   1.448 |   2.468 :   1.480 |   2.258 :   1.731 |   2.408 :   1.674 |
mixed-8d  |   2.446 :   1.592 |   2.497 :   1.523 |   2.240 :   1.736 |   2.179 :   1.547 |
mixed-16d |   2.608 :   1.575 |   2.446 :   1.582 |   2.330 :   1.670 |   2.192 :   1.488 |
mixed-32d |   2.544 :   1.587 |   2.164 :   1.499 |   2.076 :   1.452 |   1.982 :   1.346 |
mixed-4h  |   2.591 :   1.486 |   2.233 :   1.503 |   2.190 :   1.520 |   3.679 :   1.815 |
mixed-8h  |   2.498 :   1.547 |   2.168 :   1.510 |   2.098 :   1.511 |   2.815 :   1.484 |
mixed-16h |   2.480 :   1.664 |   2.152 :   1.597 |   2.096 :   1.537 |   2.288 :   1.580 |
mixed-32h |   2.520 :   1.467 |   2.171 :   1.496 |   2.246 :   1.593 |   2.066 :   1.389 |

          |         v3        |        v4.0       |        v4.1       |        v4.2       |
xfs       | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
----------|-------------------|-------------------|-------------------|-------------------|
data      |   2.074 :   1.353 |   2.129 :   1.489 |   2.198 :   1.564 |   2.579 :   1.719 |
hole      |   1.714 :   1.430 |   1.647 :   1.440 |   1.748 :   1.464 |   1.019 :   1.028 |
mixed-4d  |   2.699 :   1.561 |   2.782 :   1.657 |   2.800 :   1.619 |   2.848 :   2.166 |
mixed-8d  |   2.204 :   1.540 |   2.346 :   1.595 |   2.356 :   1.589 |   2.335 :   1.809 |
mixed-16d |   2.034 :   1.445 |   2.212 :   1.561 |   2.172 :   1.546 |   2.127 :   1.658 |
mixed-32d |   1.982 :   1.480 |   2.135 :   1.544 |   2.136 :   1.565 |   2.024 :   1.555 |
mixed-4h  |   2.600 :   1.549 |   2.790 :   1.660 |   2.745 :   1.634 |   8.949 :   2.529 |
mixed-8h  |   2.283 :   1.555 |   2.414 :   1.605 |   2.373 :   1.607 |   5.626 :   2.015 |
mixed-16h |   2.115 :   1.512 |   2.247 :   1.593 |   2.217 :   1.608 |   3.740 :   1.803 |
mixed-32h |   2.069 :   1.499 |   2.212 :   1.582 |   2.235 :   1.631 |   2.819 :   1.542 |

          |         v3        |        v4.0       |        v4.1       |        v4.2       |
btrfs     | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
----------|-------------------|-------------------|-------------------|-------------------|
data      |   2.417 :   1.317 |   2.095 :   1.445 |   2.145 :   1.523 |   2.615 :   1.713 |
hole      |   2.107 :   1.483 |   2.121 :   1.496 |   2.106 :   1.461 |   1.011 :   1.061 |
mixed-4d  |   2.348 :   1.471 |   2.370 :   1.523 |   2.379 :   1.499 |   3.028 : 225.812 |
mixed-8d  |   2.227 :   1.476 |   2.231 :   1.467 |   2.272 :   1.529 |   2.723 :  36.179 |
mixed-16d |   2.175 :   1.460 |   2.208 :   1.457 |   2.200 :   1.464 |   2.526 :  10.371 |
mixed-32d |   2.148 :   1.501 |   2.191 :   1.468 |   2.167 :   1.471 |   2.455 :   3.367 |
mixed-4h  |   2.362 :   1.561 |   2.387 :   1.513 |   2.352 :   1.536 |   5.935 :  41.494 |
mixed-8h  |   2.241 :   1.477 |   2.251 :   1.503 |   2.256 :   1.496 |   3.672 :  10.261 |
mixed-16h |   2.238 :   1.477 |   2.188 :   1.496 |   2.183 :   1.503 |   2.955 :   3.809 |
mixed-32h |   2.146 :   1.490 |   2.135 :   1.523 |   2.157 :   1.557 |   2.327 :   2.088 |

Anna Schumaker (6):
  SUNRPC: Implement xdr_reserve_space_vec()
  NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec()
  NFSD: Add READ_PLUS data support
  NFSD: Add READ_PLUS hole segment encoding
  NFSD: Return both a hole and a data segment
  NFSD: Encode a full READ_PLUS reply

 fs/nfsd/nfs4proc.c         |  17 ++++
 fs/nfsd/nfs4xdr.c          | 169 +++++++++++++++++++++++++++++++------
 include/linux/sunrpc/xdr.h |   2 +
 net/sunrpc/xdr.c           |  45 ++++++++++
 4 files changed, 206 insertions(+), 27 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec()
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
@ 2020-08-03 16:59 ` schumaker.anna
  2020-08-03 19:19   ` Chuck Lever
  2020-08-03 16:59 ` [PATCH v3 2/6] NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec() schumaker.anna
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Reserving space for a large READ payload requires special handling when
reserving space in the xdr buffer pages. One problem we can have is use
of the scratch buffer, which is used to get a pointer to a contiguous
region of data up to PAGE_SIZE. When using the scratch buffer, calls to
xdr_commit_encode() shift the data to it's proper alignment in the xdr
buffer. If we've reserved several pages in a vector, then this could
potentially invalidate earlier pointers and result in incorrect READ
data being sent to the client.

I get around this by looking at the amount of space left in the current
page, and never reserve more than that for each entry in the read
vector. This lets us place data directly where it needs to go in the
buffer pages.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 include/linux/sunrpc/xdr.h |  2 ++
 net/sunrpc/xdr.c           | 45 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 22c207b2425f..bac459584dd0 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -234,6 +234,8 @@ typedef int	(*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
 extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf,
 			    __be32 *p, struct rpc_rqst *rqst);
 extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
+extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec,
+		size_t nbytes);
 extern void xdr_commit_encode(struct xdr_stream *xdr);
 extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
 extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index be11d672b5b9..6dfe5dc8b35f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -648,6 +648,51 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
 }
 EXPORT_SYMBOL_GPL(xdr_reserve_space);
 
+
+/**
+ * xdr_reserve_space_vec - Reserves a large amount of buffer space for sending
+ * @xdr: pointer to xdr_stream
+ * @vec: pointer to a kvec array
+ * @nbytes: number of bytes to reserve
+ *
+ * Reserves enough buffer space to encode 'nbytes' of data and stores the
+ * pointers in 'vec'. The size argument passed to xdr_reserve_space() is
+ * determined based on the number of bytes remaining in the current page to
+ * avoid invalidating iov_base pointers when xdr_commit_encode() is called.
+ */
+int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes)
+{
+	int thislen;
+	int v = 0;
+	__be32 *p;
+
+	/*
+	 * svcrdma requires every READ payload to start somewhere
+	 * in xdr->pages.
+	 */
+	if (xdr->iov == xdr->buf->head) {
+		xdr->iov = NULL;
+		xdr->end = xdr->p;
+	}
+
+	while (nbytes) {
+		thislen = xdr->buf->page_len % PAGE_SIZE;
+		thislen = min_t(size_t, nbytes, PAGE_SIZE - thislen);
+
+		p = xdr_reserve_space(xdr, thislen);
+		if (!p)
+			return -EIO;
+
+		vec[v].iov_base = p;
+		vec[v].iov_len = thislen;
+		v++;
+		nbytes -= thislen;
+	}
+
+	return v;
+}
+EXPORT_SYMBOL_GPL(xdr_reserve_space_vec);
+
 /**
  * xdr_truncate_encode - truncate an encode buffer
  * @xdr: pointer to xdr_stream
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 2/6] NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec()
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec() schumaker.anna
@ 2020-08-03 16:59 ` schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 3/6] NFSD: Add READ_PLUS data support schumaker.anna
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Switch over to the new code so we don't need to reproduce it on the NFSD
side.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 28 +++-------------------------
 1 file changed, 3 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 996ac01ee977..6a1c0a7fae05 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3584,36 +3584,14 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	u32 eof;
-	int v;
 	int starting_len = xdr->buf->len - 8;
-	long len;
-	int thislen;
 	__be32 nfserr;
 	__be32 tmp;
-	__be32 *p;
 	int pad;
 
-	/*
-	 * svcrdma requires every READ payload to start somewhere
-	 * in xdr->pages.
-	 */
-	if (xdr->iov == xdr->buf->head) {
-		xdr->iov = NULL;
-		xdr->end = xdr->p;
-	}
-
-	len = maxcount;
-	v = 0;
-	while (len) {
-		thislen = min_t(long, len, PAGE_SIZE);
-		p = xdr_reserve_space(xdr, thislen);
-		WARN_ON_ONCE(!p);
-		resp->rqstp->rq_vec[v].iov_base = p;
-		resp->rqstp->rq_vec[v].iov_len = thislen;
-		v++;
-		len -= thislen;
-	}
-	read->rd_vlen = v;
+	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
+	if (read->rd_vlen < 0)
+		return nfserr_resource;
 
 	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
 			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 3/6] NFSD: Add READ_PLUS data support
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec() schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 2/6] NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec() schumaker.anna
@ 2020-08-03 16:59 ` schumaker.anna
  2020-08-03 19:17   ` Chuck Lever
  2020-08-03 16:59 ` [PATCH v3 4/6] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

This patch adds READ_PLUS support for returning a single
NFS4_CONTENT_DATA segment to the client. This is basically the same as
the READ operation, only with the extra information about data segments.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4proc.c | 17 ++++++++++
 fs/nfsd/nfs4xdr.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 100 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a09c35f0f6f0..9630d33211f2 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2523,6 +2523,16 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 	return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
 }
 
+static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+	u32 maxcount = svc_max_payload(rqstp);
+	u32 rlen = min(op->u.read.rd_length, maxcount);
+	/* enough extra xdr space for encoding either a hole or data segment. */
+	u32 segments = 1 + 2 + 2;
+
+	return (op_encode_hdr_size + 2 + segments + XDR_QUADLEN(rlen)) * sizeof(__be32);
+}
+
 static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 {
 	u32 maxcount = 0, rlen = 0;
@@ -3059,6 +3069,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
 		.op_name = "OP_COPY",
 		.op_rsize_bop = nfsd4_copy_rsize,
 	},
+	[OP_READ_PLUS] = {
+		.op_func = nfsd4_read,
+		.op_release = nfsd4_read_release,
+		.op_name = "OP_READ_PLUS",
+		.op_rsize_bop = nfsd4_read_plus_rsize,
+		.op_get_currentstateid = nfsd4_get_readstateid,
+	},
 	[OP_SEEK] = {
 		.op_func = nfsd4_seek,
 		.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6a1c0a7fae05..1d143bf02b83 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -1957,7 +1957,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
 	[OP_LAYOUTSTATS]	= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_OFFLOAD_CANCEL]	= (nfsd4_dec)nfsd4_decode_offload_status,
 	[OP_OFFLOAD_STATUS]	= (nfsd4_dec)nfsd4_decode_offload_status,
-	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_read,
 	[OP_SEEK]		= (nfsd4_dec)nfsd4_decode_seek,
 	[OP_WRITE_SAME]		= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_CLONE]		= (nfsd4_dec)nfsd4_decode_clone,
@@ -4367,6 +4367,87 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 		return nfserr_resource;
 	p = xdr_encode_hyper(p, os->count);
 	*p++ = cpu_to_be32(0);
+	return nfserr;
+}
+
+static __be32
+nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
+			    struct nfsd4_read *read,
+			    unsigned long maxcount,  u32 *eof)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	struct file *file = read->rd_nf->nf_file;
+	int starting_len = xdr->buf->len;
+	__be32 nfserr;
+	__be32 *p, tmp;
+	__be64 tmp64;
+
+	/* Content type, offset, byte count */
+	p = xdr_reserve_space(xdr, 4 + 8 + 4);
+	if (!p)
+		return nfserr_resource;
+
+	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
+	if (read->rd_vlen < 0)
+		return nfserr_resource;
+
+	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
+			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
+	if (nfserr)
+		return nfserr;
+	if (svc_encode_read_payload(resp->rqstp, starting_len + 16, maxcount))
+		return nfserr_io;
+
+	tmp = htonl(NFS4_CONTENT_DATA);
+	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
+	tmp64 = cpu_to_be64(read->rd_offset);
+	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
+	tmp = htonl(maxcount);
+	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
+	return nfs_ok;
+}
+
+static __be32
+nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
+		       struct nfsd4_read *read)
+{
+	unsigned long maxcount;
+	struct xdr_stream *xdr = &resp->xdr;
+	struct file *file;
+	int starting_len = xdr->buf->len;
+	int segments = 0;
+	__be32 *p, tmp;
+	u32 eof;
+
+	if (nfserr)
+		return nfserr;
+	file = read->rd_nf->nf_file;
+
+	/* eof flag, segment count */
+	p = xdr_reserve_space(xdr, 4 + 4);
+	if (!p)
+		return nfserr_resource;
+	xdr_commit_encode(xdr);
+
+	maxcount = svc_max_payload(resp->rqstp);
+	maxcount = min_t(unsigned long, maxcount,
+			 (xdr->buf->buflen - xdr->buf->len));
+	maxcount = min_t(unsigned long, maxcount, read->rd_length);
+
+	eof = read->rd_offset >= i_size_read(file_inode(file));
+	if (!eof) {
+		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+		segments++;
+	}
+
+	if (nfserr)
+		xdr_truncate_encode(xdr, starting_len);
+	else {
+		tmp = htonl(eof);
+		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
+		tmp = htonl(segments);
+		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+	}
 
 	return nfserr;
 }
@@ -4509,7 +4590,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
 	[OP_LAYOUTSTATS]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_OFFLOAD_CANCEL]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_OFFLOAD_STATUS]	= (nfsd4_enc)nfsd4_encode_offload_status,
-	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_read_plus,
 	[OP_SEEK]		= (nfsd4_enc)nfsd4_encode_seek,
 	[OP_WRITE_SAME]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_CLONE]		= (nfsd4_enc)nfsd4_encode_noop,
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 4/6] NFSD: Add READ_PLUS hole segment encoding
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (2 preceding siblings ...)
  2020-08-03 16:59 ` [PATCH v3 3/6] NFSD: Add READ_PLUS data support schumaker.anna
@ 2020-08-03 16:59 ` schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 5/6] NFSD: Return both a hole and a data segment schumaker.anna
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

However, we still only reply to the READ_PLUS call with a single segment
at this time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4proc.c |  2 +-
 fs/nfsd/nfs4xdr.c  | 40 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 9630d33211f2..36305b846f5a 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2528,7 +2528,7 @@ static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op
 	u32 maxcount = svc_max_payload(rqstp);
 	u32 rlen = min(op->u.read.rd_length, maxcount);
 	/* enough extra xdr space for encoding either a hole or data segment. */
-	u32 segments = 1 + 2 + 2;
+	u32 segments = 2 * (1 + 2 + 2);
 
 	return (op_encode_hdr_size + 2 + segments + XDR_QUADLEN(rlen)) * sizeof(__be32);
 }
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 1d143bf02b83..292819eafcac 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4378,10 +4378,14 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file = read->rd_nf->nf_file;
 	int starting_len = xdr->buf->len;
+	loff_t hole_pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
 	__be32 nfserr;
 	__be32 *p, tmp;
 	__be64 tmp64;
 
+	if (hole_pos > read->rd_offset)
+		maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
+
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(xdr, 4 + 8 + 4);
 	if (!p)
@@ -4407,6 +4411,27 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	return nfs_ok;
 }
 
+static __be32
+nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
+			    struct nfsd4_read *read,
+			    unsigned long maxcount, u32 *eof)
+{
+	struct file *file = read->rd_nf->nf_file;
+	__be32 *p;
+
+	/* Content type, offset, byte count */
+	p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
+	if (!p)
+		return nfserr_resource;
+
+	*p++ = htonl(NFS4_CONTENT_HOLE);
+	 p   = xdr_encode_hyper(p, read->rd_offset);
+	 p   = xdr_encode_hyper(p, maxcount);
+
+	*eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
+	return nfs_ok;
+}
+
 static __be32
 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		       struct nfsd4_read *read)
@@ -4417,6 +4442,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	int starting_len = xdr->buf->len;
 	int segments = 0;
 	__be32 *p, tmp;
+	loff_t pos;
 	u32 eof;
 
 	if (nfserr)
@@ -4435,11 +4461,23 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	maxcount = min_t(unsigned long, maxcount, read->rd_length);
 
 	eof = read->rd_offset >= i_size_read(file_inode(file));
-	if (!eof) {
+	if (eof)
+		goto out;
+
+	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+	if (pos == -ENXIO)
+		pos = i_size_read(file_inode(file));
+
+	if (pos > read->rd_offset) {
+		maxcount = pos - read->rd_offset;
+		nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
+		segments++;
+	} else {
 		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
 		segments++;
 	}
 
+out:
 	if (nfserr)
 		xdr_truncate_encode(xdr, starting_len);
 	else {
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 5/6] NFSD: Return both a hole and a data segment
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (3 preceding siblings ...)
  2020-08-03 16:59 ` [PATCH v3 4/6] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
@ 2020-08-03 16:59 ` schumaker.anna
  2020-08-03 16:59 ` [PATCH v3 6/6] NFSD: Encode a full READ_PLUS reply schumaker.anna
  2020-08-03 19:26 ` [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation Chuck Lever
  6 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

But only one of each right now. We'll expand on this in the next patch.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 53 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 292819eafcac..c01bf6241c71 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4373,7 +4373,7 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 static __be32
 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 			    struct nfsd4_read *read,
-			    unsigned long maxcount,  u32 *eof)
+			    unsigned long *maxcount, u32 *eof)
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file = read->rd_nf->nf_file;
@@ -4384,29 +4384,29 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	__be64 tmp64;
 
 	if (hole_pos > read->rd_offset)
-		maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
+		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
 
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(xdr, 4 + 8 + 4);
 	if (!p)
 		return nfserr_resource;
 
-	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
+	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
 	if (read->rd_vlen < 0)
 		return nfserr_resource;
 
 	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
-			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
+			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
 	if (nfserr)
 		return nfserr;
-	if (svc_encode_read_payload(resp->rqstp, starting_len + 16, maxcount))
+	if (svc_encode_read_payload(resp->rqstp, starting_len + 16, *maxcount))
 		return nfserr_io;
 
 	tmp = htonl(NFS4_CONTENT_DATA);
 	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
 	tmp64 = cpu_to_be64(read->rd_offset);
 	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
-	tmp = htonl(maxcount);
+	tmp = htonl(*maxcount);
 	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
 	return nfs_ok;
 }
@@ -4414,11 +4414,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 static __be32
 nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
 			    struct nfsd4_read *read,
-			    unsigned long maxcount, u32 *eof)
+			    unsigned long *maxcount, u32 *eof)
 {
 	struct file *file = read->rd_nf->nf_file;
+	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+	unsigned long count;
 	__be32 *p;
 
+	if (data_pos == -ENXIO)
+		data_pos = i_size_read(file_inode(file));
+	else if (data_pos <= read->rd_offset)
+		return nfserr_resource;
+	count = data_pos - read->rd_offset;
+
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
 	if (!p)
@@ -4426,9 +4434,10 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
 
 	*p++ = htonl(NFS4_CONTENT_HOLE);
 	 p   = xdr_encode_hyper(p, read->rd_offset);
-	 p   = xdr_encode_hyper(p, maxcount);
+	 p   = xdr_encode_hyper(p, count);
 
-	*eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
+	*eof = (read->rd_offset + count) >= i_size_read(file_inode(file));
+	*maxcount = min_t(unsigned long, count, *maxcount);
 	return nfs_ok;
 }
 
@@ -4436,7 +4445,7 @@ static __be32
 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		       struct nfsd4_read *read)
 {
-	unsigned long maxcount;
+	unsigned long maxcount, count;
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file;
 	int starting_len = xdr->buf->len;
@@ -4459,6 +4468,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	maxcount = min_t(unsigned long, maxcount,
 			 (xdr->buf->buflen - xdr->buf->len));
 	maxcount = min_t(unsigned long, maxcount, read->rd_length);
+	count    = maxcount;
 
 	eof = read->rd_offset >= i_size_read(file_inode(file));
 	if (eof)
@@ -4467,13 +4477,26 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
 	if (pos == -ENXIO)
 		pos = i_size_read(file_inode(file));
+	else if (pos < 0)
+		pos = read->rd_offset;
 
-	if (pos > read->rd_offset) {
-		maxcount = pos - read->rd_offset;
-		nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
+	if (pos == read->rd_offset) {
+		maxcount = count;
+		nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
+		if (nfserr)
+			goto out;
+		count -= maxcount;
+		read->rd_offset += maxcount;
 		segments++;
-	} else {
-		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+	}
+
+	if (count > 0 && !eof) {
+		maxcount = count;
+		nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
+		if (nfserr)
+			goto out;
+		count -= maxcount;
+		read->rd_offset += maxcount;
 		segments++;
 	}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 6/6] NFSD: Encode a full READ_PLUS reply
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (4 preceding siblings ...)
  2020-08-03 16:59 ` [PATCH v3 5/6] NFSD: Return both a hole and a data segment schumaker.anna
@ 2020-08-03 16:59 ` schumaker.anna
  2020-08-03 19:26 ` [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation Chuck Lever
  6 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-08-03 16:59 UTC (permalink / raw)
  To: bfields, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Reply to the client with multiple hole and data segments. I use the
result of the first vfs_llseek() call for encoding as an optimization so
we don't have to immediately repeat the call.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 33 ++++++++++++++-------------------
 1 file changed, 14 insertions(+), 19 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index c01bf6241c71..8e334f6da8e4 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4373,16 +4373,18 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 static __be32
 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 			    struct nfsd4_read *read,
-			    unsigned long *maxcount, u32 *eof)
+			    unsigned long *maxcount, u32 *eof,
+			    loff_t *pos)
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file = read->rd_nf->nf_file;
 	int starting_len = xdr->buf->len;
-	loff_t hole_pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
+	loff_t hole_pos;
 	__be32 nfserr;
 	__be32 *p, tmp;
 	__be64 tmp64;
 
+	hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
 	if (hole_pos > read->rd_offset)
 		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
 
@@ -4451,6 +4453,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	int starting_len = xdr->buf->len;
 	int segments = 0;
 	__be32 *p, tmp;
+	bool is_data;
 	loff_t pos;
 	u32 eof;
 
@@ -4474,29 +4477,21 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	if (eof)
 		goto out;
 
-	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
-	if (pos == -ENXIO)
-		pos = i_size_read(file_inode(file));
-	else if (pos < 0)
-		pos = read->rd_offset;
+	pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
+	is_data = pos > read->rd_offset;
 
-	if (pos == read->rd_offset) {
+	while (count > 0 && !eof) {
 		maxcount = count;
-		nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
-		if (nfserr)
-			goto out;
-		count -= maxcount;
-		read->rd_offset += maxcount;
-		segments++;
-	}
-
-	if (count > 0 && !eof) {
-		maxcount = count;
-		nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
+		if (is_data)
+			nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
+						segments == 0 ? &pos : NULL);
+		else
+			nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
 		if (nfserr)
 			goto out;
 		count -= maxcount;
 		read->rd_offset += maxcount;
+		is_data = !is_data;
 		segments++;
 	}
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/6] NFSD: Add READ_PLUS data support
  2020-08-03 16:59 ` [PATCH v3 3/6] NFSD: Add READ_PLUS data support schumaker.anna
@ 2020-08-03 19:17   ` Chuck Lever
  2020-08-03 19:41     ` Anna Schumaker
  0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2020-08-03 19:17 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Bruce Fields, Linux NFS Mailing List, Anna Schumaker

Hi Anna-

> On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> This patch adds READ_PLUS support for returning a single
> NFS4_CONTENT_DATA segment to the client. This is basically the same as
> the READ operation, only with the extra information about data segments.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
> fs/nfsd/nfs4proc.c | 17 ++++++++++
> fs/nfsd/nfs4xdr.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++--
> 2 files changed, 100 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index a09c35f0f6f0..9630d33211f2 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -2523,6 +2523,16 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> 	return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
> }
> 
> +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> +{
> +	u32 maxcount = svc_max_payload(rqstp);
> +	u32 rlen = min(op->u.read.rd_length, maxcount);
> +	/* enough extra xdr space for encoding either a hole or data segment. */
> +	u32 segments = 1 + 2 + 2;
> +
> +	return (op_encode_hdr_size + 2 + segments + XDR_QUADLEN(rlen)) * sizeof(__be32);
> +}
> +
> static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> {
> 	u32 maxcount = 0, rlen = 0;
> @@ -3059,6 +3069,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
> 		.op_name = "OP_COPY",
> 		.op_rsize_bop = nfsd4_copy_rsize,
> 	},
> +	[OP_READ_PLUS] = {
> +		.op_func = nfsd4_read,
> +		.op_release = nfsd4_read_release,
> +		.op_name = "OP_READ_PLUS",
> +		.op_rsize_bop = nfsd4_read_plus_rsize,
> +		.op_get_currentstateid = nfsd4_get_readstateid,
> +	},
> 	[OP_SEEK] = {
> 		.op_func = nfsd4_seek,
> 		.op_name = "OP_SEEK",
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 6a1c0a7fae05..1d143bf02b83 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -1957,7 +1957,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
> 	[OP_LAYOUTSTATS]	= (nfsd4_dec)nfsd4_decode_notsupp,
> 	[OP_OFFLOAD_CANCEL]	= (nfsd4_dec)nfsd4_decode_offload_status,
> 	[OP_OFFLOAD_STATUS]	= (nfsd4_dec)nfsd4_decode_offload_status,
> -	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_notsupp,
> +	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_read,
> 	[OP_SEEK]		= (nfsd4_dec)nfsd4_decode_seek,
> 	[OP_WRITE_SAME]		= (nfsd4_dec)nfsd4_decode_notsupp,
> 	[OP_CLONE]		= (nfsd4_dec)nfsd4_decode_clone,
> @@ -4367,6 +4367,87 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> 		return nfserr_resource;
> 	p = xdr_encode_hyper(p, os->count);
> 	*p++ = cpu_to_be32(0);
> +	return nfserr;
> +}
> +
> +static __be32
> +nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> +			    struct nfsd4_read *read,
> +			    unsigned long maxcount,  u32 *eof)
> +{
> +	struct xdr_stream *xdr = &resp->xdr;
> +	struct file *file = read->rd_nf->nf_file;
> +	int starting_len = xdr->buf->len;
> +	__be32 nfserr;
> +	__be32 *p, tmp;
> +	__be64 tmp64;
> +
> +	/* Content type, offset, byte count */
> +	p = xdr_reserve_space(xdr, 4 + 8 + 4);
> +	if (!p)
> +		return nfserr_resource;
> +
> +	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> +	if (read->rd_vlen < 0)
> +		return nfserr_resource;
> +
> +	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> +			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);

nfsd4_encode_read() has this:

3904         if (file->f_op->splice_read &&
3905             test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
3906                 nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
3907         else
3908                 nfserr = nfsd4_encode_readv(resp, read, file, maxcount);

So READ_PLUS never uses ->splice_read. 

readv is known to be less efficient than splice. Is there a measurable
server performance impact (either latency or CPU utilization) when
reading a file with no holes?


> +	if (nfserr)
> +		return nfserr;
> +	if (svc_encode_read_payload(resp->rqstp, starting_len + 16, maxcount))
> +		return nfserr_io;

Not sure you want a read_payload call here. This is to notify the
transport that there is a section of the message that can be sent
via RDMA, but READ_PLUS has no DDP-eligible data items; especially
not holes!

Although, the call is not likely to be much more than a no-op,
since clients won't be providing any Write chunks for READ_PLUS.


> +
> +	tmp = htonl(NFS4_CONTENT_DATA);
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> +	tmp64 = cpu_to_be64(read->rd_offset);
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> +	tmp = htonl(maxcount);
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> +	return nfs_ok;
> +}
> +
> +static __be32
> +nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> +		       struct nfsd4_read *read)
> +{
> +	unsigned long maxcount;
> +	struct xdr_stream *xdr = &resp->xdr;
> +	struct file *file;
> +	int starting_len = xdr->buf->len;
> +	int segments = 0;
> +	__be32 *p, tmp;
> +	u32 eof;
> +
> +	if (nfserr)
> +		return nfserr;
> +	file = read->rd_nf->nf_file;
> +
> +	/* eof flag, segment count */
> +	p = xdr_reserve_space(xdr, 4 + 4);
> +	if (!p)
> +		return nfserr_resource;
> +	xdr_commit_encode(xdr);
> +
> +	maxcount = svc_max_payload(resp->rqstp);
> +	maxcount = min_t(unsigned long, maxcount,
> +			 (xdr->buf->buflen - xdr->buf->len));
> +	maxcount = min_t(unsigned long, maxcount, read->rd_length);
> +
> +	eof = read->rd_offset >= i_size_read(file_inode(file));
> +	if (!eof) {
> +		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> +		segments++;
> +	}
> +
> +	if (nfserr)
> +		xdr_truncate_encode(xdr, starting_len);
> +	else {
> +		tmp = htonl(eof);
> +		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> +		tmp = htonl(segments);
> +		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> +	}
> 
> 	return nfserr;
> }
> @@ -4509,7 +4590,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
> 	[OP_LAYOUTSTATS]	= (nfsd4_enc)nfsd4_encode_noop,
> 	[OP_OFFLOAD_CANCEL]	= (nfsd4_enc)nfsd4_encode_noop,
> 	[OP_OFFLOAD_STATUS]	= (nfsd4_enc)nfsd4_encode_offload_status,
> -	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_noop,
> +	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_read_plus,
> 	[OP_SEEK]		= (nfsd4_enc)nfsd4_encode_seek,
> 	[OP_WRITE_SAME]		= (nfsd4_enc)nfsd4_encode_noop,
> 	[OP_CLONE]		= (nfsd4_enc)nfsd4_encode_noop,
> -- 
> 2.27.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec()
  2020-08-03 16:59 ` [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec() schumaker.anna
@ 2020-08-03 19:19   ` Chuck Lever
  2020-08-03 19:37     ` Anna Schumaker
  0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2020-08-03 19:19 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Bruce Fields, Linux NFS Mailing List, Anna Schumaker

Hi Anna-

> On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> Reserving space for a large READ payload requires special handling when
> reserving space in the xdr buffer pages. One problem we can have is use
> of the scratch buffer, which is used to get a pointer to a contiguous
> region of data up to PAGE_SIZE. When using the scratch buffer, calls to
> xdr_commit_encode() shift the data to it's proper alignment in the xdr
> buffer. If we've reserved several pages in a vector, then this could
> potentially invalidate earlier pointers and result in incorrect READ
> data being sent to the client.
> 
> I get around this by looking at the amount of space left in the current
> page, and never reserve more than that for each entry in the read
> vector. This lets us place data directly where it needs to go in the
> buffer pages.

Nit: This appears to be a refactoring change that should be squashed
together with 2/6.


> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
> include/linux/sunrpc/xdr.h |  2 ++
> net/sunrpc/xdr.c           | 45 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 47 insertions(+)
> 
> diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
> index 22c207b2425f..bac459584dd0 100644
> --- a/include/linux/sunrpc/xdr.h
> +++ b/include/linux/sunrpc/xdr.h
> @@ -234,6 +234,8 @@ typedef int	(*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf,
> 			    __be32 *p, struct rpc_rqst *rqst);
> extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
> +extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec,
> +		size_t nbytes);
> extern void xdr_commit_encode(struct xdr_stream *xdr);
> extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
> extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> index be11d672b5b9..6dfe5dc8b35f 100644
> --- a/net/sunrpc/xdr.c
> +++ b/net/sunrpc/xdr.c
> @@ -648,6 +648,51 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
> }
> EXPORT_SYMBOL_GPL(xdr_reserve_space);
> 
> +
> +/**
> + * xdr_reserve_space_vec - Reserves a large amount of buffer space for sending
> + * @xdr: pointer to xdr_stream
> + * @vec: pointer to a kvec array
> + * @nbytes: number of bytes to reserve
> + *
> + * Reserves enough buffer space to encode 'nbytes' of data and stores the
> + * pointers in 'vec'. The size argument passed to xdr_reserve_space() is
> + * determined based on the number of bytes remaining in the current page to
> + * avoid invalidating iov_base pointers when xdr_commit_encode() is called.
> + */
> +int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes)
> +{
> +	int thislen;
> +	int v = 0;
> +	__be32 *p;
> +
> +	/*
> +	 * svcrdma requires every READ payload to start somewhere
> +	 * in xdr->pages.
> +	 */
> +	if (xdr->iov == xdr->buf->head) {
> +		xdr->iov = NULL;
> +		xdr->end = xdr->p;
> +	}
> +
> +	while (nbytes) {
> +		thislen = xdr->buf->page_len % PAGE_SIZE;
> +		thislen = min_t(size_t, nbytes, PAGE_SIZE - thislen);
> +
> +		p = xdr_reserve_space(xdr, thislen);
> +		if (!p)
> +			return -EIO;
> +
> +		vec[v].iov_base = p;
> +		vec[v].iov_len = thislen;
> +		v++;
> +		nbytes -= thislen;
> +	}
> +
> +	return v;
> +}
> +EXPORT_SYMBOL_GPL(xdr_reserve_space_vec);
> +
> /**
>  * xdr_truncate_encode - truncate an encode buffer
>  * @xdr: pointer to xdr_stream
> -- 
> 2.27.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation
  2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (5 preceding siblings ...)
  2020-08-03 16:59 ` [PATCH v3 6/6] NFSD: Encode a full READ_PLUS reply schumaker.anna
@ 2020-08-03 19:26 ` Chuck Lever
  2020-08-03 19:36   ` Anna Schumaker
  6 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2020-08-03 19:26 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Bruce Fields, Linux NFS Mailing List, Anna Schumaker

Hi Anna-

> On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
> 
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> These patches add server support for the READ_PLUS operation, which
> breaks read requests into several "data" and "hole" segments when
> replying to the client.

At this point, I've wrapped up the content of the "feature" pull
request for v5.9, so Bruce and I are assuming this series is not
for v5.9.


> Here are the results of some performance tests I ran on my own virtual
> machines, which does still have the slow SEEK_HOLE/SEEK_DATA behavior.
> A recent minimum-compiler-version patch makes installing newer kernels
> difficult on the lab machines I have access to, but I'll still try to
> collect that data since it seemed promising during my last posting
> (https://www.spinics.net/lists/linux-nfs/msg76487.html).

Nit: I've seen other maintainers prefer the use of lore.kernel.org
over spinics or mail.info for patch descriptions and cover letters.


> I tested by reading various 2G files from a few different underlying
> filesystems and across several NFS versions. I used the `vmtouch` utility
> to make sure files were only cached when we wanted them to be. In addition
> to 100% data and 100% hole cases, I also tested with files that alternate
> between data and hole segments. These files have either 4K, 8K, 16K, or 32K
> segment sizes and start with either data or hole segments. So the file
> mixed-4d has a 4K segment size beginning with a data segment, but mixed-32h
> has 32K segments beginning with a hole. The units are in seconds, with the
> first number for each NFS version being the uncached read time and the second
> number is for when the file is cached on the server.

In a VM environment, counting bytes transferred might be a more
useful comparison than seconds. I'm not sure if the table reports
wall clock time or something else. At what layer was the timing
measured?

CPU utilization might be interesting to report on the client
because clients would have to more work assembling a file with
holes in it.

Also the benefit of READ_PLUS might be more plain if you chose a
larger file size (by perhaps a factor of 100).

Not clear what the purpose of the v3, v4.0, and v4.1 columns is.


>          |         v3        |        v4.0       |        v4.1       |        v4.2       |
> ext4      | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
> ----------|-------------------|-------------------|-------------------|-------------------|
> data      |   2.697 :   1.365 |   2.539 :   1.482 |   2.246 :   1.595 |   2.516 :   1.716 |
> hole      |   2.389 :   1.480 |   1.692 :   1.444 |   1.904 :   1.539 |   1.042 :   1.005 |
> mixed-4d  |   2.426 :   1.448 |   2.468 :   1.480 |   2.258 :   1.731 |   2.408 :   1.674 |
> mixed-8d  |   2.446 :   1.592 |   2.497 :   1.523 |   2.240 :   1.736 |   2.179 :   1.547 |
> mixed-16d |   2.608 :   1.575 |   2.446 :   1.582 |   2.330 :   1.670 |   2.192 :   1.488 |
> mixed-32d |   2.544 :   1.587 |   2.164 :   1.499 |   2.076 :   1.452 |   1.982 :   1.346 |
> mixed-4h  |   2.591 :   1.486 |   2.233 :   1.503 |   2.190 :   1.520 |   3.679 :   1.815 |
> mixed-8h  |   2.498 :   1.547 |   2.168 :   1.510 |   2.098 :   1.511 |   2.815 :   1.484 |
> mixed-16h |   2.480 :   1.664 |   2.152 :   1.597 |   2.096 :   1.537 |   2.288 :   1.580 |
> mixed-32h |   2.520 :   1.467 |   2.171 :   1.496 |   2.246 :   1.593 |   2.066 :   1.389 |
> 
>          |         v3        |        v4.0       |        v4.1       |        v4.2       |
> xfs       | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
> ----------|-------------------|-------------------|-------------------|-------------------|
> data      |   2.074 :   1.353 |   2.129 :   1.489 |   2.198 :   1.564 |   2.579 :   1.719 |
> hole      |   1.714 :   1.430 |   1.647 :   1.440 |   1.748 :   1.464 |   1.019 :   1.028 |
> mixed-4d  |   2.699 :   1.561 |   2.782 :   1.657 |   2.800 :   1.619 |   2.848 :   2.166 |
> mixed-8d  |   2.204 :   1.540 |   2.346 :   1.595 |   2.356 :   1.589 |   2.335 :   1.809 |
> mixed-16d |   2.034 :   1.445 |   2.212 :   1.561 |   2.172 :   1.546 |   2.127 :   1.658 |
> mixed-32d |   1.982 :   1.480 |   2.135 :   1.544 |   2.136 :   1.565 |   2.024 :   1.555 |
> mixed-4h  |   2.600 :   1.549 |   2.790 :   1.660 |   2.745 :   1.634 |   8.949 :   2.529 |
> mixed-8h  |   2.283 :   1.555 |   2.414 :   1.605 |   2.373 :   1.607 |   5.626 :   2.015 |
> mixed-16h |   2.115 :   1.512 |   2.247 :   1.593 |   2.217 :   1.608 |   3.740 :   1.803 |
> mixed-32h |   2.069 :   1.499 |   2.212 :   1.582 |   2.235 :   1.631 |   2.819 :   1.542 |
> 
>          |         v3        |        v4.0       |        v4.1       |        v4.2       |
> btrfs     | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
> ----------|-------------------|-------------------|-------------------|-------------------|
> data      |   2.417 :   1.317 |   2.095 :   1.445 |   2.145 :   1.523 |   2.615 :   1.713 |
> hole      |   2.107 :   1.483 |   2.121 :   1.496 |   2.106 :   1.461 |   1.011 :   1.061 |
> mixed-4d  |   2.348 :   1.471 |   2.370 :   1.523 |   2.379 :   1.499 |   3.028 : 225.812 |
> mixed-8d  |   2.227 :   1.476 |   2.231 :   1.467 |   2.272 :   1.529 |   2.723 :  36.179 |
> mixed-16d |   2.175 :   1.460 |   2.208 :   1.457 |   2.200 :   1.464 |   2.526 :  10.371 |
> mixed-32d |   2.148 :   1.501 |   2.191 :   1.468 |   2.167 :   1.471 |   2.455 :   3.367 |
> mixed-4h  |   2.362 :   1.561 |   2.387 :   1.513 |   2.352 :   1.536 |   5.935 :  41.494 |
> mixed-8h  |   2.241 :   1.477 |   2.251 :   1.503 |   2.256 :   1.496 |   3.672 :  10.261 |
> mixed-16h |   2.238 :   1.477 |   2.188 :   1.496 |   2.183 :   1.503 |   2.955 :   3.809 |
> mixed-32h |   2.146 :   1.490 |   2.135 :   1.523 |   2.157 :   1.557 |   2.327 :   2.088 |
> 
> Anna Schumaker (6):
>  SUNRPC: Implement xdr_reserve_space_vec()
>  NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec()
>  NFSD: Add READ_PLUS data support
>  NFSD: Add READ_PLUS hole segment encoding
>  NFSD: Return both a hole and a data segment
>  NFSD: Encode a full READ_PLUS reply
> 
> fs/nfsd/nfs4proc.c         |  17 ++++
> fs/nfsd/nfs4xdr.c          | 169 +++++++++++++++++++++++++++++++------
> include/linux/sunrpc/xdr.h |   2 +
> net/sunrpc/xdr.c           |  45 ++++++++++
> 4 files changed, 206 insertions(+), 27 deletions(-)
> 
> -- 
> 2.27.0
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation
  2020-08-03 19:26 ` [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation Chuck Lever
@ 2020-08-03 19:36   ` Anna Schumaker
  0 siblings, 0 replies; 14+ messages in thread
From: Anna Schumaker @ 2020-08-03 19:36 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Bruce Fields, Linux NFS Mailing List

Hi Chuck,

On Mon, Aug 3, 2020 at 3:26 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> Hi Anna-
>
> > On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > These patches add server support for the READ_PLUS operation, which
> > breaks read requests into several "data" and "hole" segments when
> > replying to the client.
>
> At this point, I've wrapped up the content of the "feature" pull
> request for v5.9, so Bruce and I are assuming this series is not
> for v5.9.

It's definitely too close to the merge for 5.9.

>
>
> > Here are the results of some performance tests I ran on my own virtual
> > machines, which does still have the slow SEEK_HOLE/SEEK_DATA behavior.
> > A recent minimum-compiler-version patch makes installing newer kernels
> > difficult on the lab machines I have access to, but I'll still try to
> > collect that data since it seemed promising during my last posting
> > (https://www.spinics.net/lists/linux-nfs/msg76487.html).
>
> Nit: I've seen other maintainers prefer the use of lore.kernel.org
> over spinics or mail.info for patch descriptions and cover letters.

I'll have to remember lore.kernel.org exists. I usually go to whatever
is listed for "Archives" for a mailing list on vger.kernel.org

>
>
> > I tested by reading various 2G files from a few different underlying
> > filesystems and across several NFS versions. I used the `vmtouch` utility
> > to make sure files were only cached when we wanted them to be. In addition
> > to 100% data and 100% hole cases, I also tested with files that alternate
> > between data and hole segments. These files have either 4K, 8K, 16K, or 32K
> > segment sizes and start with either data or hole segments. So the file
> > mixed-4d has a 4K segment size beginning with a data segment, but mixed-32h
> > has 32K segments beginning with a hole. The units are in seconds, with the
> > first number for each NFS version being the uncached read time and the second
> > number is for when the file is cached on the server.
>
> In a VM environment, counting bytes transferred might be a more
> useful comparison than seconds. I'm not sure if the table reports
> wall clock time or something else. At what layer was the timing
> measured?

I used the time reported by dd. I can see what other data I can pull out of it.

>
> CPU utilization might be interesting to report on the client
> because clients would have to more work assembling a file with
> holes in it.
>
> Also the benefit of READ_PLUS might be more plain if you chose a
> larger file size (by perhaps a factor of 100).

I'll try, but the way my test is set up now the files exist on the
disk all at the same time (so I don't need to spend forever constantly
recreating them) so there is an upper limit to how big I can make
things.

>
> Not clear what the purpose of the v3, v4.0, and v4.1 columns is.

It's to compare v4.2 performance against earlier versions. v4.1 gets
us a baseline, since it uses the same codepaths as v4.2 without read
plus.

Anna
>
>
> >          |         v3        |        v4.0       |        v4.1       |        v4.2       |
> > ext4      | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
> > ----------|-------------------|-------------------|-------------------|-------------------|
> > data      |   2.697 :   1.365 |   2.539 :   1.482 |   2.246 :   1.595 |   2.516 :   1.716 |
> > hole      |   2.389 :   1.480 |   1.692 :   1.444 |   1.904 :   1.539 |   1.042 :   1.005 |
> > mixed-4d  |   2.426 :   1.448 |   2.468 :   1.480 |   2.258 :   1.731 |   2.408 :   1.674 |
> > mixed-8d  |   2.446 :   1.592 |   2.497 :   1.523 |   2.240 :   1.736 |   2.179 :   1.547 |
> > mixed-16d |   2.608 :   1.575 |   2.446 :   1.582 |   2.330 :   1.670 |   2.192 :   1.488 |
> > mixed-32d |   2.544 :   1.587 |   2.164 :   1.499 |   2.076 :   1.452 |   1.982 :   1.346 |
> > mixed-4h  |   2.591 :   1.486 |   2.233 :   1.503 |   2.190 :   1.520 |   3.679 :   1.815 |
> > mixed-8h  |   2.498 :   1.547 |   2.168 :   1.510 |   2.098 :   1.511 |   2.815 :   1.484 |
> > mixed-16h |   2.480 :   1.664 |   2.152 :   1.597 |   2.096 :   1.537 |   2.288 :   1.580 |
> > mixed-32h |   2.520 :   1.467 |   2.171 :   1.496 |   2.246 :   1.593 |   2.066 :   1.389 |
> >
> >          |         v3        |        v4.0       |        v4.1       |        v4.2       |
> > xfs       | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
> > ----------|-------------------|-------------------|-------------------|-------------------|
> > data      |   2.074 :   1.353 |   2.129 :   1.489 |   2.198 :   1.564 |   2.579 :   1.719 |
> > hole      |   1.714 :   1.430 |   1.647 :   1.440 |   1.748 :   1.464 |   1.019 :   1.028 |
> > mixed-4d  |   2.699 :   1.561 |   2.782 :   1.657 |   2.800 :   1.619 |   2.848 :   2.166 |
> > mixed-8d  |   2.204 :   1.540 |   2.346 :   1.595 |   2.356 :   1.589 |   2.335 :   1.809 |
> > mixed-16d |   2.034 :   1.445 |   2.212 :   1.561 |   2.172 :   1.546 |   2.127 :   1.658 |
> > mixed-32d |   1.982 :   1.480 |   2.135 :   1.544 |   2.136 :   1.565 |   2.024 :   1.555 |
> > mixed-4h  |   2.600 :   1.549 |   2.790 :   1.660 |   2.745 :   1.634 |   8.949 :   2.529 |
> > mixed-8h  |   2.283 :   1.555 |   2.414 :   1.605 |   2.373 :   1.607 |   5.626 :   2.015 |
> > mixed-16h |   2.115 :   1.512 |   2.247 :   1.593 |   2.217 :   1.608 |   3.740 :   1.803 |
> > mixed-32h |   2.069 :   1.499 |   2.212 :   1.582 |   2.235 :   1.631 |   2.819 :   1.542 |
> >
> >          |         v3        |        v4.0       |        v4.1       |        v4.2       |
> > btrfs     | uncached:  cached | uncached:  cached | uncached:  cached | uncached:  cached |
> > ----------|-------------------|-------------------|-------------------|-------------------|
> > data      |   2.417 :   1.317 |   2.095 :   1.445 |   2.145 :   1.523 |   2.615 :   1.713 |
> > hole      |   2.107 :   1.483 |   2.121 :   1.496 |   2.106 :   1.461 |   1.011 :   1.061 |
> > mixed-4d  |   2.348 :   1.471 |   2.370 :   1.523 |   2.379 :   1.499 |   3.028 : 225.812 |
> > mixed-8d  |   2.227 :   1.476 |   2.231 :   1.467 |   2.272 :   1.529 |   2.723 :  36.179 |
> > mixed-16d |   2.175 :   1.460 |   2.208 :   1.457 |   2.200 :   1.464 |   2.526 :  10.371 |
> > mixed-32d |   2.148 :   1.501 |   2.191 :   1.468 |   2.167 :   1.471 |   2.455 :   3.367 |
> > mixed-4h  |   2.362 :   1.561 |   2.387 :   1.513 |   2.352 :   1.536 |   5.935 :  41.494 |
> > mixed-8h  |   2.241 :   1.477 |   2.251 :   1.503 |   2.256 :   1.496 |   3.672 :  10.261 |
> > mixed-16h |   2.238 :   1.477 |   2.188 :   1.496 |   2.183 :   1.503 |   2.955 :   3.809 |
> > mixed-32h |   2.146 :   1.490 |   2.135 :   1.523 |   2.157 :   1.557 |   2.327 :   2.088 |
> >
> > Anna Schumaker (6):
> >  SUNRPC: Implement xdr_reserve_space_vec()
> >  NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec()
> >  NFSD: Add READ_PLUS data support
> >  NFSD: Add READ_PLUS hole segment encoding
> >  NFSD: Return both a hole and a data segment
> >  NFSD: Encode a full READ_PLUS reply
> >
> > fs/nfsd/nfs4proc.c         |  17 ++++
> > fs/nfsd/nfs4xdr.c          | 169 +++++++++++++++++++++++++++++++------
> > include/linux/sunrpc/xdr.h |   2 +
> > net/sunrpc/xdr.c           |  45 ++++++++++
> > 4 files changed, 206 insertions(+), 27 deletions(-)
> >
> > --
> > 2.27.0
> >
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec()
  2020-08-03 19:19   ` Chuck Lever
@ 2020-08-03 19:37     ` Anna Schumaker
  2020-08-03 19:44       ` Chuck Lever
  0 siblings, 1 reply; 14+ messages in thread
From: Anna Schumaker @ 2020-08-03 19:37 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Bruce Fields, Linux NFS Mailing List

Hi Chuck,

On Mon, Aug 3, 2020 at 3:21 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> Hi Anna-
>
> > On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > Reserving space for a large READ payload requires special handling when
> > reserving space in the xdr buffer pages. One problem we can have is use
> > of the scratch buffer, which is used to get a pointer to a contiguous
> > region of data up to PAGE_SIZE. When using the scratch buffer, calls to
> > xdr_commit_encode() shift the data to it's proper alignment in the xdr
> > buffer. If we've reserved several pages in a vector, then this could
> > potentially invalidate earlier pointers and result in incorrect READ
> > data being sent to the client.
> >
> > I get around this by looking at the amount of space left in the current
> > page, and never reserve more than that for each entry in the read
> > vector. This lets us place data directly where it needs to go in the
> > buffer pages.
>
> Nit: This appears to be a refactoring change that should be squashed
> together with 2/6.

My default was to leave sunrpc and nfs changes as separate patches,
but I can squash them together if you want me to!

Anna
>
>
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > ---
> > include/linux/sunrpc/xdr.h |  2 ++
> > net/sunrpc/xdr.c           | 45 ++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 47 insertions(+)
> >
> > diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
> > index 22c207b2425f..bac459584dd0 100644
> > --- a/include/linux/sunrpc/xdr.h
> > +++ b/include/linux/sunrpc/xdr.h
> > @@ -234,6 +234,8 @@ typedef int       (*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
> > extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf,
> >                           __be32 *p, struct rpc_rqst *rqst);
> > extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
> > +extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec,
> > +             size_t nbytes);
> > extern void xdr_commit_encode(struct xdr_stream *xdr);
> > extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
> > extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
> > diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
> > index be11d672b5b9..6dfe5dc8b35f 100644
> > --- a/net/sunrpc/xdr.c
> > +++ b/net/sunrpc/xdr.c
> > @@ -648,6 +648,51 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
> > }
> > EXPORT_SYMBOL_GPL(xdr_reserve_space);
> >
> > +
> > +/**
> > + * xdr_reserve_space_vec - Reserves a large amount of buffer space for sending
> > + * @xdr: pointer to xdr_stream
> > + * @vec: pointer to a kvec array
> > + * @nbytes: number of bytes to reserve
> > + *
> > + * Reserves enough buffer space to encode 'nbytes' of data and stores the
> > + * pointers in 'vec'. The size argument passed to xdr_reserve_space() is
> > + * determined based on the number of bytes remaining in the current page to
> > + * avoid invalidating iov_base pointers when xdr_commit_encode() is called.
> > + */
> > +int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes)
> > +{
> > +     int thislen;
> > +     int v = 0;
> > +     __be32 *p;
> > +
> > +     /*
> > +      * svcrdma requires every READ payload to start somewhere
> > +      * in xdr->pages.
> > +      */
> > +     if (xdr->iov == xdr->buf->head) {
> > +             xdr->iov = NULL;
> > +             xdr->end = xdr->p;
> > +     }
> > +
> > +     while (nbytes) {
> > +             thislen = xdr->buf->page_len % PAGE_SIZE;
> > +             thislen = min_t(size_t, nbytes, PAGE_SIZE - thislen);
> > +
> > +             p = xdr_reserve_space(xdr, thislen);
> > +             if (!p)
> > +                     return -EIO;
> > +
> > +             vec[v].iov_base = p;
> > +             vec[v].iov_len = thislen;
> > +             v++;
> > +             nbytes -= thislen;
> > +     }
> > +
> > +     return v;
> > +}
> > +EXPORT_SYMBOL_GPL(xdr_reserve_space_vec);
> > +
> > /**
> >  * xdr_truncate_encode - truncate an encode buffer
> >  * @xdr: pointer to xdr_stream
> > --
> > 2.27.0
> >
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 3/6] NFSD: Add READ_PLUS data support
  2020-08-03 19:17   ` Chuck Lever
@ 2020-08-03 19:41     ` Anna Schumaker
  0 siblings, 0 replies; 14+ messages in thread
From: Anna Schumaker @ 2020-08-03 19:41 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Bruce Fields, Linux NFS Mailing List

Hi Chuck,

On Mon, Aug 3, 2020 at 3:19 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> Hi Anna-
>
> > On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
> >
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > This patch adds READ_PLUS support for returning a single
> > NFS4_CONTENT_DATA segment to the client. This is basically the same as
> > the READ operation, only with the extra information about data segments.
> >
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > ---
> > fs/nfsd/nfs4proc.c | 17 ++++++++++
> > fs/nfsd/nfs4xdr.c  | 85 ++++++++++++++++++++++++++++++++++++++++++++--
> > 2 files changed, 100 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index a09c35f0f6f0..9630d33211f2 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -2523,6 +2523,16 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> >       return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
> > }
> >
> > +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > +{
> > +     u32 maxcount = svc_max_payload(rqstp);
> > +     u32 rlen = min(op->u.read.rd_length, maxcount);
> > +     /* enough extra xdr space for encoding either a hole or data segment. */
> > +     u32 segments = 1 + 2 + 2;
> > +
> > +     return (op_encode_hdr_size + 2 + segments + XDR_QUADLEN(rlen)) * sizeof(__be32);
> > +}
> > +
> > static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > {
> >       u32 maxcount = 0, rlen = 0;
> > @@ -3059,6 +3069,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
> >               .op_name = "OP_COPY",
> >               .op_rsize_bop = nfsd4_copy_rsize,
> >       },
> > +     [OP_READ_PLUS] = {
> > +             .op_func = nfsd4_read,
> > +             .op_release = nfsd4_read_release,
> > +             .op_name = "OP_READ_PLUS",
> > +             .op_rsize_bop = nfsd4_read_plus_rsize,
> > +             .op_get_currentstateid = nfsd4_get_readstateid,
> > +     },
> >       [OP_SEEK] = {
> >               .op_func = nfsd4_seek,
> >               .op_name = "OP_SEEK",
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 6a1c0a7fae05..1d143bf02b83 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -1957,7 +1957,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
> >       [OP_LAYOUTSTATS]        = (nfsd4_dec)nfsd4_decode_notsupp,
> >       [OP_OFFLOAD_CANCEL]     = (nfsd4_dec)nfsd4_decode_offload_status,
> >       [OP_OFFLOAD_STATUS]     = (nfsd4_dec)nfsd4_decode_offload_status,
> > -     [OP_READ_PLUS]          = (nfsd4_dec)nfsd4_decode_notsupp,
> > +     [OP_READ_PLUS]          = (nfsd4_dec)nfsd4_decode_read,
> >       [OP_SEEK]               = (nfsd4_dec)nfsd4_decode_seek,
> >       [OP_WRITE_SAME]         = (nfsd4_dec)nfsd4_decode_notsupp,
> >       [OP_CLONE]              = (nfsd4_dec)nfsd4_decode_clone,
> > @@ -4367,6 +4367,87 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> >               return nfserr_resource;
> >       p = xdr_encode_hyper(p, os->count);
> >       *p++ = cpu_to_be32(0);
> > +     return nfserr;
> > +}
> > +
> > +static __be32
> > +nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > +                         struct nfsd4_read *read,
> > +                         unsigned long maxcount,  u32 *eof)
> > +{
> > +     struct xdr_stream *xdr = &resp->xdr;
> > +     struct file *file = read->rd_nf->nf_file;
> > +     int starting_len = xdr->buf->len;
> > +     __be32 nfserr;
> > +     __be32 *p, tmp;
> > +     __be64 tmp64;
> > +
> > +     /* Content type, offset, byte count */
> > +     p = xdr_reserve_space(xdr, 4 + 8 + 4);
> > +     if (!p)
> > +             return nfserr_resource;
> > +
> > +     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> > +     if (read->rd_vlen < 0)
> > +             return nfserr_resource;
> > +
> > +     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > +                         resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
>
> nfsd4_encode_read() has this:
>
> 3904         if (file->f_op->splice_read &&
> 3905             test_bit(RQ_SPLICE_OK, &resp->rqstp->rq_flags))
> 3906                 nfserr = nfsd4_encode_splice_read(resp, read, file, maxcount);
> 3907         else
> 3908                 nfserr = nfsd4_encode_readv(resp, read, file, maxcount);
>
> So READ_PLUS never uses ->splice_read.
>
> readv is known to be less efficient than splice. Is there a measurable
> server performance impact (either latency or CPU utilization) when
> reading a file with no holes?

Good question, and I haven't measured that. For a while I was doing
the first read data segment with a splice, but that was having issues.
I should circle back and see if that's still a problem now that I've
fixed a bunch of other bugs.

>
>
> > +     if (nfserr)
> > +             return nfserr;
> > +     if (svc_encode_read_payload(resp->rqstp, starting_len + 16, maxcount))
> > +             return nfserr_io;
>
> Not sure you want a read_payload call here. This is to notify the
> transport that there is a section of the message that can be sent
> via RDMA, but READ_PLUS has no DDP-eligible data items; especially
> not holes!

Sure thing. I'll remove that for the next version.

Anna
>
> Although, the call is not likely to be much more than a no-op,
> since clients won't be providing any Write chunks for READ_PLUS.
>
>
> > +
> > +     tmp = htonl(NFS4_CONTENT_DATA);
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> > +     tmp64 = cpu_to_be64(read->rd_offset);
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> > +     tmp = htonl(maxcount);
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> > +     return nfs_ok;
> > +}
> > +
> > +static __be32
> > +nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > +                    struct nfsd4_read *read)
> > +{
> > +     unsigned long maxcount;
> > +     struct xdr_stream *xdr = &resp->xdr;
> > +     struct file *file;
> > +     int starting_len = xdr->buf->len;
> > +     int segments = 0;
> > +     __be32 *p, tmp;
> > +     u32 eof;
> > +
> > +     if (nfserr)
> > +             return nfserr;
> > +     file = read->rd_nf->nf_file;
> > +
> > +     /* eof flag, segment count */
> > +     p = xdr_reserve_space(xdr, 4 + 4);
> > +     if (!p)
> > +             return nfserr_resource;
> > +     xdr_commit_encode(xdr);
> > +
> > +     maxcount = svc_max_payload(resp->rqstp);
> > +     maxcount = min_t(unsigned long, maxcount,
> > +                      (xdr->buf->buflen - xdr->buf->len));
> > +     maxcount = min_t(unsigned long, maxcount, read->rd_length);
> > +
> > +     eof = read->rd_offset >= i_size_read(file_inode(file));
> > +     if (!eof) {
> > +             nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> > +             segments++;
> > +     }
> > +
> > +     if (nfserr)
> > +             xdr_truncate_encode(xdr, starting_len);
> > +     else {
> > +             tmp = htonl(eof);
> > +             write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> > +             tmp = htonl(segments);
> > +             write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > +     }
> >
> >       return nfserr;
> > }
> > @@ -4509,7 +4590,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
> >       [OP_LAYOUTSTATS]        = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_OFFLOAD_CANCEL]     = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_OFFLOAD_STATUS]     = (nfsd4_enc)nfsd4_encode_offload_status,
> > -     [OP_READ_PLUS]          = (nfsd4_enc)nfsd4_encode_noop,
> > +     [OP_READ_PLUS]          = (nfsd4_enc)nfsd4_encode_read_plus,
> >       [OP_SEEK]               = (nfsd4_enc)nfsd4_encode_seek,
> >       [OP_WRITE_SAME]         = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_CLONE]              = (nfsd4_enc)nfsd4_encode_noop,
> > --
> > 2.27.0
> >
>
> --
> Chuck Lever
>
>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec()
  2020-08-03 19:37     ` Anna Schumaker
@ 2020-08-03 19:44       ` Chuck Lever
  0 siblings, 0 replies; 14+ messages in thread
From: Chuck Lever @ 2020-08-03 19:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Bruce Fields, Linux NFS Mailing List



> On Aug 3, 2020, at 3:37 PM, Anna Schumaker <schumaker.anna@gmail.com> wrote:
> 
> Hi Chuck,
> 
> On Mon, Aug 3, 2020 at 3:21 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> Hi Anna-
>> 
>>> On Aug 3, 2020, at 12:59 PM, schumaker.anna@gmail.com wrote:
>>> 
>>> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> 
>>> Reserving space for a large READ payload requires special handling when
>>> reserving space in the xdr buffer pages. One problem we can have is use
>>> of the scratch buffer, which is used to get a pointer to a contiguous
>>> region of data up to PAGE_SIZE. When using the scratch buffer, calls to
>>> xdr_commit_encode() shift the data to it's proper alignment in the xdr
>>> buffer. If we've reserved several pages in a vector, then this could
>>> potentially invalidate earlier pointers and result in incorrect READ
>>> data being sent to the client.
>>> 
>>> I get around this by looking at the amount of space left in the current
>>> page, and never reserve more than that for each entry in the read
>>> vector. This lets us place data directly where it needs to go in the
>>> buffer pages.
>> 
>> Nit: This appears to be a refactoring change that should be squashed
>> together with 2/6.
> 
> My default was to leave sunrpc and nfs changes as separate patches,
> but I can squash them together if you want me to!

IMO, in this case the rule about introducing and using a new helper
in the same patch takes precedence.


> Anna
>> 
>> 
>>> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
>>> ---
>>> include/linux/sunrpc/xdr.h |  2 ++
>>> net/sunrpc/xdr.c           | 45 ++++++++++++++++++++++++++++++++++++++
>>> 2 files changed, 47 insertions(+)
>>> 
>>> diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
>>> index 22c207b2425f..bac459584dd0 100644
>>> --- a/include/linux/sunrpc/xdr.h
>>> +++ b/include/linux/sunrpc/xdr.h
>>> @@ -234,6 +234,8 @@ typedef int       (*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
>>> extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf,
>>>                          __be32 *p, struct rpc_rqst *rqst);
>>> extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
>>> +extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec,
>>> +             size_t nbytes);
>>> extern void xdr_commit_encode(struct xdr_stream *xdr);
>>> extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
>>> extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
>>> diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
>>> index be11d672b5b9..6dfe5dc8b35f 100644
>>> --- a/net/sunrpc/xdr.c
>>> +++ b/net/sunrpc/xdr.c
>>> @@ -648,6 +648,51 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
>>> }
>>> EXPORT_SYMBOL_GPL(xdr_reserve_space);
>>> 
>>> +
>>> +/**
>>> + * xdr_reserve_space_vec - Reserves a large amount of buffer space for sending
>>> + * @xdr: pointer to xdr_stream
>>> + * @vec: pointer to a kvec array
>>> + * @nbytes: number of bytes to reserve
>>> + *
>>> + * Reserves enough buffer space to encode 'nbytes' of data and stores the
>>> + * pointers in 'vec'. The size argument passed to xdr_reserve_space() is
>>> + * determined based on the number of bytes remaining in the current page to
>>> + * avoid invalidating iov_base pointers when xdr_commit_encode() is called.
>>> + */
>>> +int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes)
>>> +{
>>> +     int thislen;
>>> +     int v = 0;
>>> +     __be32 *p;
>>> +
>>> +     /*
>>> +      * svcrdma requires every READ payload to start somewhere
>>> +      * in xdr->pages.
>>> +      */
>>> +     if (xdr->iov == xdr->buf->head) {
>>> +             xdr->iov = NULL;
>>> +             xdr->end = xdr->p;
>>> +     }
>>> +
>>> +     while (nbytes) {
>>> +             thislen = xdr->buf->page_len % PAGE_SIZE;
>>> +             thislen = min_t(size_t, nbytes, PAGE_SIZE - thislen);
>>> +
>>> +             p = xdr_reserve_space(xdr, thislen);
>>> +             if (!p)
>>> +                     return -EIO;
>>> +
>>> +             vec[v].iov_base = p;
>>> +             vec[v].iov_len = thislen;
>>> +             v++;
>>> +             nbytes -= thislen;
>>> +     }
>>> +
>>> +     return v;
>>> +}
>>> +EXPORT_SYMBOL_GPL(xdr_reserve_space_vec);
>>> +
>>> /**
>>> * xdr_truncate_encode - truncate an encode buffer
>>> * @xdr: pointer to xdr_stream
>>> --
>>> 2.27.0
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-08-03 19:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03 16:59 [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
2020-08-03 16:59 ` [PATCH v3 1/6] SUNRPC: Implement xdr_reserve_space_vec() schumaker.anna
2020-08-03 19:19   ` Chuck Lever
2020-08-03 19:37     ` Anna Schumaker
2020-08-03 19:44       ` Chuck Lever
2020-08-03 16:59 ` [PATCH v3 2/6] NFSD: nfsd4_encode_readv() can use xdr_reserve_space_vec() schumaker.anna
2020-08-03 16:59 ` [PATCH v3 3/6] NFSD: Add READ_PLUS data support schumaker.anna
2020-08-03 19:17   ` Chuck Lever
2020-08-03 19:41     ` Anna Schumaker
2020-08-03 16:59 ` [PATCH v3 4/6] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
2020-08-03 16:59 ` [PATCH v3 5/6] NFSD: Return both a hole and a data segment schumaker.anna
2020-08-03 16:59 ` [PATCH v3 6/6] NFSD: Encode a full READ_PLUS reply schumaker.anna
2020-08-03 19:26 ` [PATCH v3 0/6] NFSD: Add support for the v4.2 READ_PLUS operation Chuck Lever
2020-08-03 19:36   ` Anna Schumaker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.