All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation
@ 2020-09-08 16:25 schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 1/5] SUNRPC/NFSD: Implement xdr_reserve_space_vec() schumaker.anna
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: schumaker.anna @ 2020-09-08 16:25 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

These patches add server support for the READ_PLUS operation, which
breaks read requests into several "data" and "hole" segments when
replying to the client.

- Changes since v4:
  - Fix up nfsd4_read_plus_rsize() calculation
  - Return a short read if we detect the file has changed during
    encoding
  - Rebase to v5.9-rc4

Here are the results of some performance tests I ran on some lab
machines. I tested by reading various 2G files from a few different underlying
filesystems and across several NFS versions. I used the `vmtouch` utility
to make sure files were only cached when we wanted them to be. In addition
to 100% data and 100% hole cases, I also tested with files that alternate
between data and hole segments. These files have either 4K, 8K, 16K, or 32K
segment sizes and start with either data or hole segments. So the file
mixed-4d has a 4K segment size beginning with a data segment, but mixed-32h
has 32K segments beginning with a hole. The units are in seconds, with the
first number for each NFS version being the uncached read time and the second
number is for when the file is cached on the server.

I added some extra data collection (client cpu percentage and sys time),
but the extra data means I couldn't figure out a way to break this down
into a concise table. I cut out v3 and v4.0 performance numbers to get
the size down, but I kept v4.1 for comparison because it uses the same
code that v4.2 without read plus uses.


Read Plus Results (ext4):
  data
   :... v4.1 ... Uncached ... 20.540 s, 105 MB/s, 0.65 s kern, 3% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 20.605 s, 104 MB/s, 0.65 s kern, 3% cpu
        :....... Cached ..... 18.253 s, 118 MB/s, 0.67 s kern, 3% cpu
  hole
   :... v4.1 ... Uncached ... 18.255 s, 118 MB/s, 0.72 s kern,  3% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.72 s kern,  3% cpu
   :... v4.2 ... Uncached ...  0.847 s, 2.5 GB/s, 0.73 s kern, 86% cpu
        :....... Cached .....  0.845 s, 2.5 GB/s, 0.72 s kern, 85% cpu
  mixed-4d
   :... v4.1 ... Uncached ... 54.691 s,  39 MB/s, 0.75 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.71 s kern, 3% cpu
   :... v4.2 ... Uncached ... 51.587 s,  42 MB/s, 0.75 s kern, 1% cpu
        :....... Cached .....  9.215 s, 233 MB/s, 0.67 s kern, 7% cpu
  mixed-8d
   :... v4.1 ... Uncached ... 37.072 s,  58 MB/s, 0.67 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.71 s kern, 3% cpu
   :... v4.2 ... Uncached ... 33.259 s,  65 MB/s, 0.68 s kern, 2% cpu
        :....... Cached .....  9.172 s, 234 MB/s, 0.67 s kern, 7% cpu
  mixed-16d
   :... v4.1 ... Uncached ... 27.138 s,  79 MB/s, 0.73 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.71 s kern, 3% cpu
   :... v4.2 ... Uncached ... 23.042 s,  93 MB/s, 0.73 s kern, 3% cpu
        :....... Cached .....  9.150 s, 235 MB/s, 0.66 s kern, 7% cpu
  mixed-32d
   :... v4.1 ... Uncached ... 25.326 s,  85 MB/s, 0.68 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 21.125 s, 102 MB/s, 0.69 s kern, 3% cpu
        :....... Cached .....  9.140 s, 235 MB/s, 0.67 s kern, 7% cpu
  mixed-4h
   :... v4.1 ... Uncached ... 58.317 s,  37 MB/s, 0.75 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 51.878 s,  41 MB/s, 0.74 s kern, 1% cpu
        :....... Cached .....  9.215 s, 233 MB/s, 0.68 s kern, 7% cpu
  mixed-8h
   :... v4.1 ... Uncached ... 36.855 s,  58 MB/s, 0.68 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.72 s kern, 3% cpu
   :... v4.2 ... Uncached ... 29.457 s,  73 MB/s, 0.68 s kern, 2% cpu
        :....... Cached .....  9.172 s, 234 MB/s, 0.67 s kern, 7% cpu
  mixed-16h
   :... v4.1 ... Uncached ... 26.460 s,  81 MB/s, 0.74 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.71 s kern, 3% cpu
   :... v4.2 ... Uncached ... 19.587 s, 110 MB/s, 0.74 s kern, 3% cpu
        :....... Cached .....  9.150 s, 235 MB/s, 0.67 s kern, 7% cpu
  mixed-32h
   :... v4.1 ... Uncached ... 25.495 s,  84 MB/s, 0.69 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.65 s kern, 3% cpu
   :... v4.2 ... Uncached ... 17.634 s, 122 MB/s, 0.69 s kern, 3% cpu
        :....... Cached .....  9.140 s, 235 MB/s, 0.68 s kern, 7% cpu



Read Plus Results (xfs):
  data
   :... v4.1 ... Uncached ... 20.230 s, 106 MB/s, 0.65 s kern, 3% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.68 s kern, 3% cpu
   :... v4.2 ... Uncached ... 20.724 s, 104 MB/s, 0.65 s kern, 3% cpu
        :....... Cached ..... 18.253 s, 118 MB/s, 0.67 s kern, 3% cpu
  hole
   :... v4.1 ... Uncached ... 18.255 s, 118 MB/s, 0.68 s kern,  3% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.69 s kern,  3% cpu
   :... v4.2 ... Uncached ...  0.904 s, 2.4 GB/s, 0.72 s kern, 79% cpu
        :....... Cached .....  0.908 s, 2.4 GB/s, 0.73 s kern, 80% cpu
  mixed-4d
   :... v4.1 ... Uncached ... 57.553 s,  37 MB/s, 0.77 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 37.162 s,  58 MB/s, 0.73 s kern, 1% cpu
        :....... Cached .....  9.215 s, 233 MB/s, 0.67 s kern, 7% cpu
  mixed-8d
   :... v4.1 ... Uncached ... 36.754 s,  58 MB/s, 0.69 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.68 s kern, 3% cpu
   :... v4.2 ... Uncached ... 24.454 s,  88 MB/s, 0.69 s kern, 2% cpu
        :....... Cached .....  9.172 s, 234 MB/s, 0.66 s kern, 7% cpu
  mixed-16d
   :... v4.1 ... Uncached ... 27.156 s,  79 MB/s, 0.73 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.71 s kern, 3% cpu
   :... v4.2 ... Uncached ... 22.934 s,  94 MB/s, 0.72 s kern, 3% cpu
        :....... Cached .....  9.150 s, 235 MB/s, 0.68 s kern, 7% cpu
  mixed-32d
   :... v4.1 ... Uncached ... 27.849 s,  77 MB/s, 0.68 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.72 s kern, 3% cpu
   :... v4.2 ... Uncached ... 23.670 s,  91 MB/s, 0.67 s kern, 2% cpu
        :....... Cached .....  9.139 s, 235 MB/s, 0.64 s kern, 7% cpu
  mixed-4h
   :... v4.1 ... Uncached ... 57.639 s,  37 MB/s, 0.72 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.69 s kern, 3% cpu
   :... v4.2 ... Uncached ... 35.503 s,  61 MB/s, 0.72 s kern, 2% cpu
        :....... Cached .....  9.215 s, 233 MB/s, 0.66 s kern, 7% cpu
  mixed-8h
   :... v4.1 ... Uncached ... 37.044 s,  58 MB/s, 0.71 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.68 s kern, 3% cpu
   :... v4.2 ... Uncached ... 23.779 s,  90 MB/s, 0.69 s kern, 2% cpu
        :....... Cached .....  9.172 s, 234 MB/s, 0.65 s kern, 7% cpu
  mixed-16h
   :... v4.1 ... Uncached ... 27.167 s,  79 MB/s, 0.73 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.67 s kern, 3% cpu
   :... v4.2 ... Uncached ... 19.088 s, 113 MB/s, 0.75 s kern, 3% cpu
        :....... Cached .....  9.159 s, 234 MB/s, 0.66 s kern, 7% cpu
  mixed-32h
   :... v4.1 ... Uncached ... 27.592 s,  78 MB/s, 0.71 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.68 s kern, 3% cpu
   :... v4.2 ... Uncached ... 19.682 s, 109 MB/s, 0.67 s kern, 3% cpu
        :....... Cached .....  9.140 s, 235 MB/s, 0.67 s kern, 7% cpu



Read Plus Results (btrfs):
  data
   :... v4.1 ... Uncached ... 21.317 s, 101 MB/s, 0.63 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.67 s kern, 3% cpu
   :... v4.2 ... Uncached ... 28.665 s,  75 MB/s, 0.65 s kern, 2% cpu
        :....... Cached ..... 18.253 s, 118 MB/s, 0.66 s kern, 3% cpu
  hole
   :... v4.1 ... Uncached ... 18.256 s, 118 MB/s, 0.70 s kern,  3% cpu
   :    :....... Cached ..... 18.254 s, 118 MB/s, 0.73 s kern,  4% cpu
   :... v4.2 ... Uncached ...  0.851 s, 2.5 GB/s, 0.72 s kern, 84% cpu
        :....... Cached .....  0.847 s, 2.5 GB/s, 0.73 s kern, 86% cpu
  mixed-4d
   :... v4.1 ... Uncached ... 56.857 s,  38 MB/s, 0.76 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.72 s kern, 3% cpu
   :... v4.2 ... Uncached ... 54.455 s,  39 MB/s, 0.73 s kern, 1% cpu
        :....... Cached .....  9.215 s, 233 MB/s, 0.68 s kern, 7% cpu
  mixed-8d
   :... v4.1 ... Uncached ... 36.641 s,  59 MB/s, 0.68 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 33.205 s,  65 MB/s, 0.67 s kern, 2% cpu
        :....... Cached .....  9.172 s, 234 MB/s, 0.65 s kern, 7% cpu
  mixed-16d
   :... v4.1 ... Uncached ... 28.653 s,  75 MB/s, 0.72 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 25.748 s,  83 MB/s, 0.71 s kern, 2% cpu
        :....... Cached .....  9.150 s, 235 MB/s, 0.64 s kern, 7% cpu
  mixed-32d
   :... v4.1 ... Uncached ... 28.886 s,  74 MB/s, 0.67 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.71 s kern, 3% cpu
   :... v4.2 ... Uncached ... 24.724 s,  87 MB/s, 0.74 s kern, 2% cpu
        :....... Cached .....  9.140 s, 235 MB/s, 0.63 s kern, 6% cpu
  mixed-4h
   :... v4.1 ... Uncached ...  52.181 s,  41 MB/s, 0.73 s kern, 1% cpu
   :    :....... Cached .....  18.252 s, 118 MB/s, 0.66 s kern, 3% cpu
   :... v4.2 ... Uncached ... 150.341 s,  14 MB/s, 0.72 s kern, 0% cpu
        :....... Cached .....   9.216 s, 233 MB/s, 0.63 s kern, 6% cpu
  mixed-8h
   :... v4.1 ... Uncached ... 36.945 s,  58 MB/s, 0.68 s kern, 1% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.65 s kern, 3% cpu
   :... v4.2 ... Uncached ... 79.781 s,  27 MB/s, 0.68 s kern, 0% cpu
        :....... Cached .....  9.172 s, 234 MB/s, 0.66 s kern, 7% cpu
  mixed-16h
   :... v4.1 ... Uncached ... 28.651 s,  75 MB/s, 0.73 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.66 s kern, 3% cpu
   :... v4.2 ... Uncached ... 47.428 s,  45 MB/s, 0.71 s kern, 1% cpu
        :....... Cached .....  9.150 s, 235 MB/s, 0.67 s kern, 7% cpu
  mixed-32h
   :... v4.1 ... Uncached ... 28.618 s,  75 MB/s, 0.69 s kern, 2% cpu
   :    :....... Cached ..... 18.252 s, 118 MB/s, 0.70 s kern, 3% cpu
   :... v4.2 ... Uncached ... 38.813 s,  55 MB/s, 0.67 s kern, 1% cpu
        :....... Cached .....  9.140 s, 235 MB/s, 0.61 s kern, 6% cpu



Thoughts?
Anna


Anna Schumaker (5):
  SUNRPC/NFSD: Implement xdr_reserve_space_vec()
  NFSD: Add READ_PLUS data support
  NFSD: Add READ_PLUS hole segment encoding
  NFSD: Return both a hole and a data segment
  NFSD: Encode a full READ_PLUS reply

 fs/nfsd/nfs4proc.c         |  20 +++++
 fs/nfsd/nfs4xdr.c          | 173 +++++++++++++++++++++++++++++++------
 include/linux/sunrpc/xdr.h |   2 +
 net/sunrpc/xdr.c           |  45 ++++++++++
 4 files changed, 213 insertions(+), 27 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 1/5] SUNRPC/NFSD: Implement xdr_reserve_space_vec()
  2020-09-08 16:25 [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
@ 2020-09-08 16:25 ` schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 2/5] NFSD: Add READ_PLUS data support schumaker.anna
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-09-08 16:25 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Reserving space for a large READ payload requires special handling when
reserving space in the xdr buffer pages. One problem we can have is use
of the scratch buffer, which is used to get a pointer to a contiguous
region of data up to PAGE_SIZE. When using the scratch buffer, calls to
xdr_commit_encode() shift the data to it's proper alignment in the xdr
buffer. If we've reserved several pages in a vector, then this could
potentially invalidate earlier pointers and result in incorrect READ
data being sent to the client.

I get around this by looking at the amount of space left in the current
page, and never reserve more than that for each entry in the read
vector. This lets us place data directly where it needs to go in the
buffer pages.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c          | 28 +++---------------------
 include/linux/sunrpc/xdr.h |  2 ++
 net/sunrpc/xdr.c           | 45 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 259d5ad0e3f4..0be194de4888 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3814,36 +3814,14 @@ static __be32 nfsd4_encode_readv(struct nfsd4_compoundres *resp,
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	u32 eof;
-	int v;
 	int starting_len = xdr->buf->len - 8;
-	long len;
-	int thislen;
 	__be32 nfserr;
 	__be32 tmp;
-	__be32 *p;
 	int pad;
 
-	/*
-	 * svcrdma requires every READ payload to start somewhere
-	 * in xdr->pages.
-	 */
-	if (xdr->iov == xdr->buf->head) {
-		xdr->iov = NULL;
-		xdr->end = xdr->p;
-	}
-
-	len = maxcount;
-	v = 0;
-	while (len) {
-		thislen = min_t(long, len, PAGE_SIZE);
-		p = xdr_reserve_space(xdr, thislen);
-		WARN_ON_ONCE(!p);
-		resp->rqstp->rq_vec[v].iov_base = p;
-		resp->rqstp->rq_vec[v].iov_len = thislen;
-		v++;
-		len -= thislen;
-	}
-	read->rd_vlen = v;
+	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
+	if (read->rd_vlen < 0)
+		return nfserr_resource;
 
 	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
 			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount,
diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h
index 5a6a81b7cd9f..6613d96a3029 100644
--- a/include/linux/sunrpc/xdr.h
+++ b/include/linux/sunrpc/xdr.h
@@ -234,6 +234,8 @@ typedef int	(*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr,
 extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf,
 			    __be32 *p, struct rpc_rqst *rqst);
 extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes);
+extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec,
+		size_t nbytes);
 extern void xdr_commit_encode(struct xdr_stream *xdr);
 extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len);
 extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen);
diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c
index be11d672b5b9..6dfe5dc8b35f 100644
--- a/net/sunrpc/xdr.c
+++ b/net/sunrpc/xdr.c
@@ -648,6 +648,51 @@ __be32 * xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes)
 }
 EXPORT_SYMBOL_GPL(xdr_reserve_space);
 
+
+/**
+ * xdr_reserve_space_vec - Reserves a large amount of buffer space for sending
+ * @xdr: pointer to xdr_stream
+ * @vec: pointer to a kvec array
+ * @nbytes: number of bytes to reserve
+ *
+ * Reserves enough buffer space to encode 'nbytes' of data and stores the
+ * pointers in 'vec'. The size argument passed to xdr_reserve_space() is
+ * determined based on the number of bytes remaining in the current page to
+ * avoid invalidating iov_base pointers when xdr_commit_encode() is called.
+ */
+int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes)
+{
+	int thislen;
+	int v = 0;
+	__be32 *p;
+
+	/*
+	 * svcrdma requires every READ payload to start somewhere
+	 * in xdr->pages.
+	 */
+	if (xdr->iov == xdr->buf->head) {
+		xdr->iov = NULL;
+		xdr->end = xdr->p;
+	}
+
+	while (nbytes) {
+		thislen = xdr->buf->page_len % PAGE_SIZE;
+		thislen = min_t(size_t, nbytes, PAGE_SIZE - thislen);
+
+		p = xdr_reserve_space(xdr, thislen);
+		if (!p)
+			return -EIO;
+
+		vec[v].iov_base = p;
+		vec[v].iov_len = thislen;
+		v++;
+		nbytes -= thislen;
+	}
+
+	return v;
+}
+EXPORT_SYMBOL_GPL(xdr_reserve_space_vec);
+
 /**
  * xdr_truncate_encode - truncate an encode buffer
  * @xdr: pointer to xdr_stream
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 2/5] NFSD: Add READ_PLUS data support
  2020-09-08 16:25 [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 1/5] SUNRPC/NFSD: Implement xdr_reserve_space_vec() schumaker.anna
@ 2020-09-08 16:25 ` schumaker.anna
  2020-09-08 19:42   ` J. Bruce Fields
  2020-09-08 16:25 ` [PATCH v5 3/5] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: schumaker.anna @ 2020-09-08 16:25 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

This patch adds READ_PLUS support for returning a single
NFS4_CONTENT_DATA segment to the client. This is basically the same as
the READ operation, only with the extra information about data segments.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

---
v5: Fix up nfsd4_read_plus_rsize() calculation
---
 fs/nfsd/nfs4proc.c | 20 +++++++++++
 fs/nfsd/nfs4xdr.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 101 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index eaf50eafa935..0a3df5f10501 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -2591,6 +2591,19 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 	return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
 }
 
+static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
+{
+	u32 maxcount = svc_max_payload(rqstp);
+	u32 rlen = min(op->u.read.rd_length, maxcount);
+	/*
+	 * Worst case is a single large data segment, so make
+	 * sure we have enough room to encode that
+	 */
+	u32 seg_len = 1 + 2 + 1;
+
+	return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32);
+}
+
 static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
 {
 	u32 maxcount = 0, rlen = 0;
@@ -3163,6 +3176,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
 		.op_name = "OP_COPY",
 		.op_rsize_bop = nfsd4_copy_rsize,
 	},
+	[OP_READ_PLUS] = {
+		.op_func = nfsd4_read,
+		.op_release = nfsd4_read_release,
+		.op_name = "OP_READ_PLUS",
+		.op_rsize_bop = nfsd4_read_plus_rsize,
+		.op_get_currentstateid = nfsd4_get_readstateid,
+	},
 	[OP_SEEK] = {
 		.op_func = nfsd4_seek,
 		.op_name = "OP_SEEK",
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 0be194de4888..26d12e3edf33 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2173,7 +2173,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
 	[OP_LAYOUTSTATS]	= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_OFFLOAD_CANCEL]	= (nfsd4_dec)nfsd4_decode_offload_status,
 	[OP_OFFLOAD_STATUS]	= (nfsd4_dec)nfsd4_decode_offload_status,
-	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_notsupp,
+	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_read,
 	[OP_SEEK]		= (nfsd4_dec)nfsd4_decode_seek,
 	[OP_WRITE_SAME]		= (nfsd4_dec)nfsd4_decode_notsupp,
 	[OP_CLONE]		= (nfsd4_dec)nfsd4_decode_clone,
@@ -4597,6 +4597,85 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 		return nfserr_resource;
 	p = xdr_encode_hyper(p, os->count);
 	*p++ = cpu_to_be32(0);
+	return nfserr;
+}
+
+static __be32
+nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
+			    struct nfsd4_read *read,
+			    unsigned long maxcount,  u32 *eof)
+{
+	struct xdr_stream *xdr = &resp->xdr;
+	struct file *file = read->rd_nf->nf_file;
+	int starting_len = xdr->buf->len;
+	__be32 nfserr;
+	__be32 *p, tmp;
+	__be64 tmp64;
+
+	/* Content type, offset, byte count */
+	p = xdr_reserve_space(xdr, 4 + 8 + 4);
+	if (!p)
+		return nfserr_resource;
+
+	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
+	if (read->rd_vlen < 0)
+		return nfserr_resource;
+
+	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
+			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
+	if (nfserr)
+		return nfserr;
+
+	tmp = htonl(NFS4_CONTENT_DATA);
+	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
+	tmp64 = cpu_to_be64(read->rd_offset);
+	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
+	tmp = htonl(maxcount);
+	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
+	return nfs_ok;
+}
+
+static __be32
+nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
+		       struct nfsd4_read *read)
+{
+	unsigned long maxcount;
+	struct xdr_stream *xdr = &resp->xdr;
+	struct file *file;
+	int starting_len = xdr->buf->len;
+	int segments = 0;
+	__be32 *p, tmp;
+	u32 eof;
+
+	if (nfserr)
+		return nfserr;
+	file = read->rd_nf->nf_file;
+
+	/* eof flag, segment count */
+	p = xdr_reserve_space(xdr, 4 + 4);
+	if (!p)
+		return nfserr_resource;
+	xdr_commit_encode(xdr);
+
+	maxcount = svc_max_payload(resp->rqstp);
+	maxcount = min_t(unsigned long, maxcount,
+			 (xdr->buf->buflen - xdr->buf->len));
+	maxcount = min_t(unsigned long, maxcount, read->rd_length);
+
+	eof = read->rd_offset >= i_size_read(file_inode(file));
+	if (!eof) {
+		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+		segments++;
+	}
+
+	if (nfserr)
+		xdr_truncate_encode(xdr, starting_len);
+	else {
+		tmp = htonl(eof);
+		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
+		tmp = htonl(segments);
+		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+	}
 
 	return nfserr;
 }
@@ -4974,7 +5053,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
 	[OP_LAYOUTSTATS]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_OFFLOAD_CANCEL]	= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_OFFLOAD_STATUS]	= (nfsd4_enc)nfsd4_encode_offload_status,
-	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_noop,
+	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_read_plus,
 	[OP_SEEK]		= (nfsd4_enc)nfsd4_encode_seek,
 	[OP_WRITE_SAME]		= (nfsd4_enc)nfsd4_encode_noop,
 	[OP_CLONE]		= (nfsd4_enc)nfsd4_encode_noop,
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 3/5] NFSD: Add READ_PLUS hole segment encoding
  2020-09-08 16:25 [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 1/5] SUNRPC/NFSD: Implement xdr_reserve_space_vec() schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 2/5] NFSD: Add READ_PLUS data support schumaker.anna
@ 2020-09-08 16:25 ` schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 4/5] NFSD: Return both a hole and a data segment schumaker.anna
  2020-09-08 16:25 ` [PATCH v5 5/5] NFSD: Encode a full READ_PLUS reply schumaker.anna
  4 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-09-08 16:25 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

However, we still only reply to the READ_PLUS call with a single segment
at this time.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
 fs/nfsd/nfs4xdr.c | 40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 26d12e3edf33..45159bd9e9a4 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4608,10 +4608,14 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file = read->rd_nf->nf_file;
 	int starting_len = xdr->buf->len;
+	loff_t hole_pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
 	__be32 nfserr;
 	__be32 *p, tmp;
 	__be64 tmp64;
 
+	if (hole_pos > read->rd_offset)
+		maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
+
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(xdr, 4 + 8 + 4);
 	if (!p)
@@ -4635,6 +4639,27 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	return nfs_ok;
 }
 
+static __be32
+nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
+			    struct nfsd4_read *read,
+			    unsigned long maxcount, u32 *eof)
+{
+	struct file *file = read->rd_nf->nf_file;
+	__be32 *p;
+
+	/* Content type, offset, byte count */
+	p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
+	if (!p)
+		return nfserr_resource;
+
+	*p++ = htonl(NFS4_CONTENT_HOLE);
+	 p   = xdr_encode_hyper(p, read->rd_offset);
+	 p   = xdr_encode_hyper(p, maxcount);
+
+	*eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
+	return nfs_ok;
+}
+
 static __be32
 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		       struct nfsd4_read *read)
@@ -4645,6 +4670,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	int starting_len = xdr->buf->len;
 	int segments = 0;
 	__be32 *p, tmp;
+	loff_t pos;
 	u32 eof;
 
 	if (nfserr)
@@ -4663,11 +4689,23 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	maxcount = min_t(unsigned long, maxcount, read->rd_length);
 
 	eof = read->rd_offset >= i_size_read(file_inode(file));
-	if (!eof) {
+	if (eof)
+		goto out;
+
+	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+	if (pos == -ENXIO)
+		pos = i_size_read(file_inode(file));
+
+	if (pos > read->rd_offset) {
+		maxcount = pos - read->rd_offset;
+		nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
+		segments++;
+	} else {
 		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
 		segments++;
 	}
 
+out:
 	if (nfserr)
 		xdr_truncate_encode(xdr, starting_len);
 	else {
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 4/5] NFSD: Return both a hole and a data segment
  2020-09-08 16:25 [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (2 preceding siblings ...)
  2020-09-08 16:25 ` [PATCH v5 3/5] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
@ 2020-09-08 16:25 ` schumaker.anna
  2020-09-08 19:49   ` J. Bruce Fields
  2020-09-08 16:25 ` [PATCH v5 5/5] NFSD: Encode a full READ_PLUS reply schumaker.anna
  4 siblings, 1 reply; 14+ messages in thread
From: schumaker.anna @ 2020-09-08 16:25 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

But only one of each right now. We'll expand on this in the next patch.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
---
v5: If we've already encoded a segment, then return a short read if
    later segments return an error for some reason.
---
 fs/nfsd/nfs4xdr.c | 54 ++++++++++++++++++++++++++++++++++-------------
 1 file changed, 39 insertions(+), 15 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 45159bd9e9a4..856606263c1d 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4603,7 +4603,7 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 static __be32
 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 			    struct nfsd4_read *read,
-			    unsigned long maxcount,  u32 *eof)
+			    unsigned long *maxcount, u32 *eof)
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file = read->rd_nf->nf_file;
@@ -4614,19 +4614,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	__be64 tmp64;
 
 	if (hole_pos > read->rd_offset)
-		maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
+		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
 
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(xdr, 4 + 8 + 4);
 	if (!p)
 		return nfserr_resource;
 
-	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
+	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
 	if (read->rd_vlen < 0)
 		return nfserr_resource;
 
 	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
-			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
+			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
 	if (nfserr)
 		return nfserr;
 
@@ -4634,7 +4634,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
 	tmp64 = cpu_to_be64(read->rd_offset);
 	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
-	tmp = htonl(maxcount);
+	tmp = htonl(*maxcount);
 	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
 	return nfs_ok;
 }
@@ -4642,11 +4642,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 static __be32
 nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
 			    struct nfsd4_read *read,
-			    unsigned long maxcount, u32 *eof)
+			    unsigned long *maxcount, u32 *eof)
 {
 	struct file *file = read->rd_nf->nf_file;
+	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
+	unsigned long count;
 	__be32 *p;
 
+	if (data_pos == -ENXIO)
+		data_pos = i_size_read(file_inode(file));
+	else if (data_pos <= read->rd_offset)
+		return nfserr_resource;
+	count = data_pos - read->rd_offset;
+
 	/* Content type, offset, byte count */
 	p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
 	if (!p)
@@ -4654,9 +4662,10 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
 
 	*p++ = htonl(NFS4_CONTENT_HOLE);
 	 p   = xdr_encode_hyper(p, read->rd_offset);
-	 p   = xdr_encode_hyper(p, maxcount);
+	 p   = xdr_encode_hyper(p, count);
 
-	*eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
+	*eof = (read->rd_offset + count) >= i_size_read(file_inode(file));
+	*maxcount = min_t(unsigned long, count, *maxcount);
 	return nfs_ok;
 }
 
@@ -4664,7 +4673,7 @@ static __be32
 nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		       struct nfsd4_read *read)
 {
-	unsigned long maxcount;
+	unsigned long maxcount, count;
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file;
 	int starting_len = xdr->buf->len;
@@ -4687,6 +4696,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	maxcount = min_t(unsigned long, maxcount,
 			 (xdr->buf->buflen - xdr->buf->len));
 	maxcount = min_t(unsigned long, maxcount, read->rd_length);
+	count    = maxcount;
 
 	eof = read->rd_offset >= i_size_read(file_inode(file));
 	if (eof)
@@ -4695,24 +4705,38 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
 	if (pos == -ENXIO)
 		pos = i_size_read(file_inode(file));
+	else if (pos < 0)
+		pos = read->rd_offset;
 
-	if (pos > read->rd_offset) {
-		maxcount = pos - read->rd_offset;
-		nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
+	if (pos == read->rd_offset) {
+		maxcount = count;
+		nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
+		if (nfserr)
+			goto out;
+		count -= maxcount;
+		read->rd_offset += maxcount;
 		segments++;
-	} else {
-		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
+	}
+
+	if (count > 0 && !eof) {
+		maxcount = count;
+		nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
+		if (nfserr)
+			goto out;
+		count -= maxcount;
+		read->rd_offset += maxcount;
 		segments++;
 	}
 
 out:
-	if (nfserr)
+	if (nfserr && segments == 0)
 		xdr_truncate_encode(xdr, starting_len);
 	else {
 		tmp = htonl(eof);
 		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
 		tmp = htonl(segments);
 		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
+		nfserr = nfs_ok;
 	}
 
 	return nfserr;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 5/5] NFSD: Encode a full READ_PLUS reply
  2020-09-08 16:25 [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
                   ` (3 preceding siblings ...)
  2020-09-08 16:25 ` [PATCH v5 4/5] NFSD: Return both a hole and a data segment schumaker.anna
@ 2020-09-08 16:25 ` schumaker.anna
  4 siblings, 0 replies; 14+ messages in thread
From: schumaker.anna @ 2020-09-08 16:25 UTC (permalink / raw)
  To: bfields, chuck.lever, linux-nfs; +Cc: Anna.Schumaker

From: Anna Schumaker <Anna.Schumaker@Netapp.com>

Reply to the client with multiple hole and data segments. I use the
result of the first vfs_llseek() call for encoding as an optimization so
we don't have to immediately repeat the call.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

---
v5: Truncate the encode to the last segment length if we're returning a
    short read
---
 fs/nfsd/nfs4xdr.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 856606263c1d..eec23a7d5ca0 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -4603,16 +4603,18 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
 static __be32
 nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
 			    struct nfsd4_read *read,
-			    unsigned long *maxcount, u32 *eof)
+			    unsigned long *maxcount, u32 *eof,
+			    loff_t *pos)
 {
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file = read->rd_nf->nf_file;
 	int starting_len = xdr->buf->len;
-	loff_t hole_pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
+	loff_t hole_pos;
 	__be32 nfserr;
 	__be32 *p, tmp;
 	__be64 tmp64;
 
+	hole_pos = pos ? *pos : vfs_llseek(file, read->rd_offset, SEEK_HOLE);
 	if (hole_pos > read->rd_offset)
 		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
 
@@ -4677,8 +4679,10 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	struct xdr_stream *xdr = &resp->xdr;
 	struct file *file;
 	int starting_len = xdr->buf->len;
+	int last_segment = xdr->buf->len;
 	int segments = 0;
 	__be32 *p, tmp;
+	bool is_data;
 	loff_t pos;
 	u32 eof;
 
@@ -4702,29 +4706,22 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 	if (eof)
 		goto out;
 
-	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
-	if (pos == -ENXIO)
-		pos = i_size_read(file_inode(file));
-	else if (pos < 0)
-		pos = read->rd_offset;
+	pos = vfs_llseek(file, read->rd_offset, SEEK_HOLE);
+	is_data = pos > read->rd_offset;
 
-	if (pos == read->rd_offset) {
+	while (count > 0 && !eof) {
 		maxcount = count;
-		nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
-		if (nfserr)
-			goto out;
-		count -= maxcount;
-		read->rd_offset += maxcount;
-		segments++;
-	}
-
-	if (count > 0 && !eof) {
-		maxcount = count;
-		nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
+		if (is_data)
+			nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof,
+						segments == 0 ? &pos : NULL);
+		else
+			nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
 		if (nfserr)
 			goto out;
 		count -= maxcount;
 		read->rd_offset += maxcount;
+		is_data = !is_data;
+		last_segment = xdr->buf->len;
 		segments++;
 	}
 
@@ -4736,7 +4733,10 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
 		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
 		tmp = htonl(segments);
 		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
-		nfserr = nfs_ok;
+		if (nfserr) {
+			xdr_truncate_encode(xdr, last_segment);
+			nfserr = nfs_ok;
+		}
 	}
 
 	return nfserr;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/5] NFSD: Add READ_PLUS data support
  2020-09-08 16:25 ` [PATCH v5 2/5] NFSD: Add READ_PLUS data support schumaker.anna
@ 2020-09-08 19:42   ` J. Bruce Fields
  2020-09-09 16:53     ` Anna Schumaker
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2020-09-08 19:42 UTC (permalink / raw)
  To: schumaker.anna; +Cc: chuck.lever, linux-nfs, Anna.Schumaker

On Tue, Sep 08, 2020 at 12:25:56PM -0400, schumaker.anna@gmail.com wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> This patch adds READ_PLUS support for returning a single
> NFS4_CONTENT_DATA segment to the client. This is basically the same as
> the READ operation, only with the extra information about data segments.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> ---
> v5: Fix up nfsd4_read_plus_rsize() calculation
> ---
>  fs/nfsd/nfs4proc.c | 20 +++++++++++
>  fs/nfsd/nfs4xdr.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++--
>  2 files changed, 101 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index eaf50eafa935..0a3df5f10501 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -2591,6 +2591,19 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
>  	return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
>  }
>  
> +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> +{
> +	u32 maxcount = svc_max_payload(rqstp);
> +	u32 rlen = min(op->u.read.rd_length, maxcount);
> +	/*
> +	 * Worst case is a single large data segment, so make
> +	 * sure we have enough room to encode that
> +	 */

After the last patch we allow an unlimited number of segments.  So a
zillion 1-byte segments is also possible, and is a worse case.

Possible ways to avoid that kind of thing:

	- when encoding, stop and return a short read if the xdr-encoded
	  result would exceed the limit calculated here.
	- round SEEK_HOLE results up to the nearest (say) 512 bytes, and
	  SEEK_DATA result down to the nearest 512 bytes.  May also need
	  some logic to avoid encoding 0-length segments.
	- talk to linux-fsdevel, see if we can get a minimum hole
	  alignment guarantee from filesystems.

I'm thinking #1 is simplest.

--b.

> +	u32 seg_len = 1 + 2 + 1;
> +
> +	return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32);
> +}
> +
>  static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
>  {
>  	u32 maxcount = 0, rlen = 0;
> @@ -3163,6 +3176,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
>  		.op_name = "OP_COPY",
>  		.op_rsize_bop = nfsd4_copy_rsize,
>  	},
> +	[OP_READ_PLUS] = {
> +		.op_func = nfsd4_read,
> +		.op_release = nfsd4_read_release,
> +		.op_name = "OP_READ_PLUS",
> +		.op_rsize_bop = nfsd4_read_plus_rsize,
> +		.op_get_currentstateid = nfsd4_get_readstateid,
> +	},
>  	[OP_SEEK] = {
>  		.op_func = nfsd4_seek,
>  		.op_name = "OP_SEEK",
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 0be194de4888..26d12e3edf33 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -2173,7 +2173,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
>  	[OP_LAYOUTSTATS]	= (nfsd4_dec)nfsd4_decode_notsupp,
>  	[OP_OFFLOAD_CANCEL]	= (nfsd4_dec)nfsd4_decode_offload_status,
>  	[OP_OFFLOAD_STATUS]	= (nfsd4_dec)nfsd4_decode_offload_status,
> -	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_notsupp,
> +	[OP_READ_PLUS]		= (nfsd4_dec)nfsd4_decode_read,
>  	[OP_SEEK]		= (nfsd4_dec)nfsd4_decode_seek,
>  	[OP_WRITE_SAME]		= (nfsd4_dec)nfsd4_decode_notsupp,
>  	[OP_CLONE]		= (nfsd4_dec)nfsd4_decode_clone,
> @@ -4597,6 +4597,85 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
>  		return nfserr_resource;
>  	p = xdr_encode_hyper(p, os->count);
>  	*p++ = cpu_to_be32(0);
> +	return nfserr;
> +}
> +
> +static __be32
> +nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> +			    struct nfsd4_read *read,
> +			    unsigned long maxcount,  u32 *eof)
> +{
> +	struct xdr_stream *xdr = &resp->xdr;
> +	struct file *file = read->rd_nf->nf_file;
> +	int starting_len = xdr->buf->len;
> +	__be32 nfserr;
> +	__be32 *p, tmp;
> +	__be64 tmp64;
> +
> +	/* Content type, offset, byte count */
> +	p = xdr_reserve_space(xdr, 4 + 8 + 4);
> +	if (!p)
> +		return nfserr_resource;
> +
> +	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> +	if (read->rd_vlen < 0)
> +		return nfserr_resource;
> +
> +	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> +			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
> +	if (nfserr)
> +		return nfserr;
> +
> +	tmp = htonl(NFS4_CONTENT_DATA);
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> +	tmp64 = cpu_to_be64(read->rd_offset);
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> +	tmp = htonl(maxcount);
> +	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> +	return nfs_ok;
> +}
> +
> +static __be32
> +nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> +		       struct nfsd4_read *read)
> +{
> +	unsigned long maxcount;
> +	struct xdr_stream *xdr = &resp->xdr;
> +	struct file *file;
> +	int starting_len = xdr->buf->len;
> +	int segments = 0;
> +	__be32 *p, tmp;
> +	u32 eof;
> +
> +	if (nfserr)
> +		return nfserr;
> +	file = read->rd_nf->nf_file;
> +
> +	/* eof flag, segment count */
> +	p = xdr_reserve_space(xdr, 4 + 4);
> +	if (!p)
> +		return nfserr_resource;
> +	xdr_commit_encode(xdr);
> +
> +	maxcount = svc_max_payload(resp->rqstp);
> +	maxcount = min_t(unsigned long, maxcount,
> +			 (xdr->buf->buflen - xdr->buf->len));
> +	maxcount = min_t(unsigned long, maxcount, read->rd_length);
> +
> +	eof = read->rd_offset >= i_size_read(file_inode(file));
> +	if (!eof) {
> +		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> +		segments++;
> +	}
> +
> +	if (nfserr)
> +		xdr_truncate_encode(xdr, starting_len);
> +	else {
> +		tmp = htonl(eof);
> +		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> +		tmp = htonl(segments);
> +		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> +	}
>  
>  	return nfserr;
>  }
> @@ -4974,7 +5053,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
>  	[OP_LAYOUTSTATS]	= (nfsd4_enc)nfsd4_encode_noop,
>  	[OP_OFFLOAD_CANCEL]	= (nfsd4_enc)nfsd4_encode_noop,
>  	[OP_OFFLOAD_STATUS]	= (nfsd4_enc)nfsd4_encode_offload_status,
> -	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_noop,
> +	[OP_READ_PLUS]		= (nfsd4_enc)nfsd4_encode_read_plus,
>  	[OP_SEEK]		= (nfsd4_enc)nfsd4_encode_seek,
>  	[OP_WRITE_SAME]		= (nfsd4_enc)nfsd4_encode_noop,
>  	[OP_CLONE]		= (nfsd4_enc)nfsd4_encode_noop,
> -- 
> 2.28.0
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/5] NFSD: Return both a hole and a data segment
  2020-09-08 16:25 ` [PATCH v5 4/5] NFSD: Return both a hole and a data segment schumaker.anna
@ 2020-09-08 19:49   ` J. Bruce Fields
  2020-09-09 16:51     ` Anna Schumaker
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2020-09-08 19:49 UTC (permalink / raw)
  To: schumaker.anna; +Cc: chuck.lever, linux-nfs, Anna.Schumaker

On Tue, Sep 08, 2020 at 12:25:58PM -0400, schumaker.anna@gmail.com wrote:
> From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> 
> But only one of each right now. We'll expand on this in the next patch.
> 
> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> ---
> v5: If we've already encoded a segment, then return a short read if
>     later segments return an error for some reason.
> ---
>  fs/nfsd/nfs4xdr.c | 54 ++++++++++++++++++++++++++++++++++-------------
>  1 file changed, 39 insertions(+), 15 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> index 45159bd9e9a4..856606263c1d 100644
> --- a/fs/nfsd/nfs4xdr.c
> +++ b/fs/nfsd/nfs4xdr.c
> @@ -4603,7 +4603,7 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
>  static __be32
>  nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
>  			    struct nfsd4_read *read,
> -			    unsigned long maxcount,  u32 *eof)
> +			    unsigned long *maxcount, u32 *eof)
>  {
>  	struct xdr_stream *xdr = &resp->xdr;
>  	struct file *file = read->rd_nf->nf_file;
> @@ -4614,19 +4614,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
>  	__be64 tmp64;
>  
>  	if (hole_pos > read->rd_offset)
> -		maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
> +		*maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
>  
>  	/* Content type, offset, byte count */
>  	p = xdr_reserve_space(xdr, 4 + 8 + 4);
>  	if (!p)
>  		return nfserr_resource;
>  
> -	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> +	read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
>  	if (read->rd_vlen < 0)
>  		return nfserr_resource;
>  
>  	nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> -			    resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
> +			    resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
>  	if (nfserr)
>  		return nfserr;
>  
> @@ -4634,7 +4634,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
>  	write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
>  	tmp64 = cpu_to_be64(read->rd_offset);
>  	write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> -	tmp = htonl(maxcount);
> +	tmp = htonl(*maxcount);
>  	write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
>  	return nfs_ok;
>  }
> @@ -4642,11 +4642,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
>  static __be32
>  nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
>  			    struct nfsd4_read *read,
> -			    unsigned long maxcount, u32 *eof)
> +			    unsigned long *maxcount, u32 *eof)
>  {
>  	struct file *file = read->rd_nf->nf_file;
> +	loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> +	unsigned long count;
>  	__be32 *p;
>  
> +	if (data_pos == -ENXIO)
> +		data_pos = i_size_read(file_inode(file));
> +	else if (data_pos <= read->rd_offset)
> +		return nfserr_resource;

I think there's still a race here:

	vfs_llseek(.,0,SEEK_HOLE) returns 1024
	read 1024 bytes of data
					another process fills the hole
	vfs_llseek(.,1024,SEEK_DATA) returns 1024
	code above returns nfserr_resource

We end up returning an error to the client when we should have just
returned more data.

--b.

> +	count = data_pos - read->rd_offset;
> +
>  	/* Content type, offset, byte count */
>  	p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
>  	if (!p)
> @@ -4654,9 +4662,10 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
>  
>  	*p++ = htonl(NFS4_CONTENT_HOLE);
>  	 p   = xdr_encode_hyper(p, read->rd_offset);
> -	 p   = xdr_encode_hyper(p, maxcount);
> +	 p   = xdr_encode_hyper(p, count);
>  
> -	*eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
> +	*eof = (read->rd_offset + count) >= i_size_read(file_inode(file));
> +	*maxcount = min_t(unsigned long, count, *maxcount);
>  	return nfs_ok;
>  }
>  
> @@ -4664,7 +4673,7 @@ static __be32
>  nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>  		       struct nfsd4_read *read)
>  {
> -	unsigned long maxcount;
> +	unsigned long maxcount, count;
>  	struct xdr_stream *xdr = &resp->xdr;
>  	struct file *file;
>  	int starting_len = xdr->buf->len;
> @@ -4687,6 +4696,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>  	maxcount = min_t(unsigned long, maxcount,
>  			 (xdr->buf->buflen - xdr->buf->len));
>  	maxcount = min_t(unsigned long, maxcount, read->rd_length);
> +	count    = maxcount;
>  
>  	eof = read->rd_offset >= i_size_read(file_inode(file));
>  	if (eof)
> @@ -4695,24 +4705,38 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
>  	pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
>  	if (pos == -ENXIO)
>  		pos = i_size_read(file_inode(file));
> +	else if (pos < 0)
> +		pos = read->rd_offset;
>  
> -	if (pos > read->rd_offset) {
> -		maxcount = pos - read->rd_offset;
> -		nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
> +	if (pos == read->rd_offset) {
> +		maxcount = count;
> +		nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
> +		if (nfserr)
> +			goto out;
> +		count -= maxcount;
> +		read->rd_offset += maxcount;
>  		segments++;
> -	} else {
> -		nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> +	}
> +
> +	if (count > 0 && !eof) {
> +		maxcount = count;
> +		nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> +		if (nfserr)
> +			goto out;
> +		count -= maxcount;
> +		read->rd_offset += maxcount;
>  		segments++;
>  	}
>  
>  out:
> -	if (nfserr)
> +	if (nfserr && segments == 0)
>  		xdr_truncate_encode(xdr, starting_len);
>  	else {
>  		tmp = htonl(eof);
>  		write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
>  		tmp = htonl(segments);
>  		write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> +		nfserr = nfs_ok;
>  	}
>  
>  	return nfserr;
> -- 
> 2.28.0
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/5] NFSD: Return both a hole and a data segment
  2020-09-08 19:49   ` J. Bruce Fields
@ 2020-09-09 16:51     ` Anna Schumaker
  2020-09-09 20:24       ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Anna Schumaker @ 2020-09-09 16:51 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, Linux NFS Mailing List

On Tue, Sep 8, 2020 at 3:49 PM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Tue, Sep 08, 2020 at 12:25:58PM -0400, schumaker.anna@gmail.com wrote:
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > But only one of each right now. We'll expand on this in the next patch.
> >
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > ---
> > v5: If we've already encoded a segment, then return a short read if
> >     later segments return an error for some reason.
> > ---
> >  fs/nfsd/nfs4xdr.c | 54 ++++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 39 insertions(+), 15 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 45159bd9e9a4..856606263c1d 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -4603,7 +4603,7 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> >  static __be32
> >  nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> >                           struct nfsd4_read *read,
> > -                         unsigned long maxcount,  u32 *eof)
> > +                         unsigned long *maxcount, u32 *eof)
> >  {
> >       struct xdr_stream *xdr = &resp->xdr;
> >       struct file *file = read->rd_nf->nf_file;
> > @@ -4614,19 +4614,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> >       __be64 tmp64;
> >
> >       if (hole_pos > read->rd_offset)
> > -             maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
> > +             *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
> >
> >       /* Content type, offset, byte count */
> >       p = xdr_reserve_space(xdr, 4 + 8 + 4);
> >       if (!p)
> >               return nfserr_resource;
> >
> > -     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> > +     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
> >       if (read->rd_vlen < 0)
> >               return nfserr_resource;
> >
> >       nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > -                         resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
> > +                         resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> >       if (nfserr)
> >               return nfserr;
> >
> > @@ -4634,7 +4634,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> >       write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> >       tmp64 = cpu_to_be64(read->rd_offset);
> >       write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> > -     tmp = htonl(maxcount);
> > +     tmp = htonl(*maxcount);
> >       write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> >       return nfs_ok;
> >  }
> > @@ -4642,11 +4642,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> >  static __be32
> >  nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> >                           struct nfsd4_read *read,
> > -                         unsigned long maxcount, u32 *eof)
> > +                         unsigned long *maxcount, u32 *eof)
> >  {
> >       struct file *file = read->rd_nf->nf_file;
> > +     loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> > +     unsigned long count;
> >       __be32 *p;
> >
> > +     if (data_pos == -ENXIO)
> > +             data_pos = i_size_read(file_inode(file));
> > +     else if (data_pos <= read->rd_offset)
> > +             return nfserr_resource;
>
> I think there's still a race here:
>
>         vfs_llseek(.,0,SEEK_HOLE) returns 1024
>         read 1024 bytes of data
>                                         another process fills the hole
>         vfs_llseek(.,1024,SEEK_DATA) returns 1024
>         code above returns nfserr_resource
>
> We end up returning an error to the client when we should have just
> returned more data.

As long as we've encoded at least one segment successfully, we'll
actually return a short read in this case (as of the most recent
patches). I tried implementing a check for what each segment actually
was before encoding, but it lead to a lot of extra lseeks (so
potential for races / performance problems on btrfs). Returning a
short read seemed like a better approach to me.

Anna

>
> --b.
>
> > +     count = data_pos - read->rd_offset;
> > +
> >       /* Content type, offset, byte count */
> >       p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
> >       if (!p)
> > @@ -4654,9 +4662,10 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> >
> >       *p++ = htonl(NFS4_CONTENT_HOLE);
> >        p   = xdr_encode_hyper(p, read->rd_offset);
> > -      p   = xdr_encode_hyper(p, maxcount);
> > +      p   = xdr_encode_hyper(p, count);
> >
> > -     *eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
> > +     *eof = (read->rd_offset + count) >= i_size_read(file_inode(file));
> > +     *maxcount = min_t(unsigned long, count, *maxcount);
> >       return nfs_ok;
> >  }
> >
> > @@ -4664,7 +4673,7 @@ static __be32
> >  nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> >                      struct nfsd4_read *read)
> >  {
> > -     unsigned long maxcount;
> > +     unsigned long maxcount, count;
> >       struct xdr_stream *xdr = &resp->xdr;
> >       struct file *file;
> >       int starting_len = xdr->buf->len;
> > @@ -4687,6 +4696,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> >       maxcount = min_t(unsigned long, maxcount,
> >                        (xdr->buf->buflen - xdr->buf->len));
> >       maxcount = min_t(unsigned long, maxcount, read->rd_length);
> > +     count    = maxcount;
> >
> >       eof = read->rd_offset >= i_size_read(file_inode(file));
> >       if (eof)
> > @@ -4695,24 +4705,38 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> >       pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> >       if (pos == -ENXIO)
> >               pos = i_size_read(file_inode(file));
> > +     else if (pos < 0)
> > +             pos = read->rd_offset;
> >
> > -     if (pos > read->rd_offset) {
> > -             maxcount = pos - read->rd_offset;
> > -             nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
> > +     if (pos == read->rd_offset) {
> > +             maxcount = count;
> > +             nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
> > +             if (nfserr)
> > +                     goto out;
> > +             count -= maxcount;
> > +             read->rd_offset += maxcount;
> >               segments++;
> > -     } else {
> > -             nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> > +     }
> > +
> > +     if (count > 0 && !eof) {
> > +             maxcount = count;
> > +             nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> > +             if (nfserr)
> > +                     goto out;
> > +             count -= maxcount;
> > +             read->rd_offset += maxcount;
> >               segments++;
> >       }
> >
> >  out:
> > -     if (nfserr)
> > +     if (nfserr && segments == 0)
> >               xdr_truncate_encode(xdr, starting_len);
> >       else {
> >               tmp = htonl(eof);
> >               write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> >               tmp = htonl(segments);
> >               write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > +             nfserr = nfs_ok;
> >       }
> >
> >       return nfserr;
> > --
> > 2.28.0
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/5] NFSD: Add READ_PLUS data support
  2020-09-08 19:42   ` J. Bruce Fields
@ 2020-09-09 16:53     ` Anna Schumaker
  2020-09-09 20:25       ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: Anna Schumaker @ 2020-09-09 16:53 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, Linux NFS Mailing List

On Tue, Sep 8, 2020 at 3:42 PM J. Bruce Fields <bfields@redhat.com> wrote:
>
> On Tue, Sep 08, 2020 at 12:25:56PM -0400, schumaker.anna@gmail.com wrote:
> > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > This patch adds READ_PLUS support for returning a single
> > NFS4_CONTENT_DATA segment to the client. This is basically the same as
> > the READ operation, only with the extra information about data segments.
> >
> > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> >
> > ---
> > v5: Fix up nfsd4_read_plus_rsize() calculation
> > ---
> >  fs/nfsd/nfs4proc.c | 20 +++++++++++
> >  fs/nfsd/nfs4xdr.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 101 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > index eaf50eafa935..0a3df5f10501 100644
> > --- a/fs/nfsd/nfs4proc.c
> > +++ b/fs/nfsd/nfs4proc.c
> > @@ -2591,6 +2591,19 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> >       return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
> >  }
> >
> > +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > +{
> > +     u32 maxcount = svc_max_payload(rqstp);
> > +     u32 rlen = min(op->u.read.rd_length, maxcount);
> > +     /*
> > +      * Worst case is a single large data segment, so make
> > +      * sure we have enough room to encode that
> > +      */
>
> After the last patch we allow an unlimited number of segments.  So a
> zillion 1-byte segments is also possible, and is a worse case.
>
> Possible ways to avoid that kind of thing:
>
>         - when encoding, stop and return a short read if the xdr-encoded
>           result would exceed the limit calculated here.

Doesn't this happen automatically through calls to xdr_reserve_space()?

>         - round SEEK_HOLE results up to the nearest (say) 512 bytes, and
>           SEEK_DATA result down to the nearest 512 bytes.  May also need
>           some logic to avoid encoding 0-length segments.
>         - talk to linux-fsdevel, see if we can get a minimum hole
>           alignment guarantee from filesystems.
>
> I'm thinking #1 is simplest.
>
> --b.
>
> > +     u32 seg_len = 1 + 2 + 1;
> > +
> > +     return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32);
> > +}
> > +
> >  static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> >  {
> >       u32 maxcount = 0, rlen = 0;
> > @@ -3163,6 +3176,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
> >               .op_name = "OP_COPY",
> >               .op_rsize_bop = nfsd4_copy_rsize,
> >       },
> > +     [OP_READ_PLUS] = {
> > +             .op_func = nfsd4_read,
> > +             .op_release = nfsd4_read_release,
> > +             .op_name = "OP_READ_PLUS",
> > +             .op_rsize_bop = nfsd4_read_plus_rsize,
> > +             .op_get_currentstateid = nfsd4_get_readstateid,
> > +     },
> >       [OP_SEEK] = {
> >               .op_func = nfsd4_seek,
> >               .op_name = "OP_SEEK",
> > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > index 0be194de4888..26d12e3edf33 100644
> > --- a/fs/nfsd/nfs4xdr.c
> > +++ b/fs/nfsd/nfs4xdr.c
> > @@ -2173,7 +2173,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
> >       [OP_LAYOUTSTATS]        = (nfsd4_dec)nfsd4_decode_notsupp,
> >       [OP_OFFLOAD_CANCEL]     = (nfsd4_dec)nfsd4_decode_offload_status,
> >       [OP_OFFLOAD_STATUS]     = (nfsd4_dec)nfsd4_decode_offload_status,
> > -     [OP_READ_PLUS]          = (nfsd4_dec)nfsd4_decode_notsupp,
> > +     [OP_READ_PLUS]          = (nfsd4_dec)nfsd4_decode_read,
> >       [OP_SEEK]               = (nfsd4_dec)nfsd4_decode_seek,
> >       [OP_WRITE_SAME]         = (nfsd4_dec)nfsd4_decode_notsupp,
> >       [OP_CLONE]              = (nfsd4_dec)nfsd4_decode_clone,
> > @@ -4597,6 +4597,85 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> >               return nfserr_resource;
> >       p = xdr_encode_hyper(p, os->count);
> >       *p++ = cpu_to_be32(0);
> > +     return nfserr;
> > +}
> > +
> > +static __be32
> > +nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > +                         struct nfsd4_read *read,
> > +                         unsigned long maxcount,  u32 *eof)
> > +{
> > +     struct xdr_stream *xdr = &resp->xdr;
> > +     struct file *file = read->rd_nf->nf_file;
> > +     int starting_len = xdr->buf->len;
> > +     __be32 nfserr;
> > +     __be32 *p, tmp;
> > +     __be64 tmp64;
> > +
> > +     /* Content type, offset, byte count */
> > +     p = xdr_reserve_space(xdr, 4 + 8 + 4);
> > +     if (!p)
> > +             return nfserr_resource;
> > +
> > +     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> > +     if (read->rd_vlen < 0)
> > +             return nfserr_resource;
> > +
> > +     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > +                         resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
> > +     if (nfserr)
> > +             return nfserr;
> > +
> > +     tmp = htonl(NFS4_CONTENT_DATA);
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> > +     tmp64 = cpu_to_be64(read->rd_offset);
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> > +     tmp = htonl(maxcount);
> > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> > +     return nfs_ok;
> > +}
> > +
> > +static __be32
> > +nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > +                    struct nfsd4_read *read)
> > +{
> > +     unsigned long maxcount;
> > +     struct xdr_stream *xdr = &resp->xdr;
> > +     struct file *file;
> > +     int starting_len = xdr->buf->len;
> > +     int segments = 0;
> > +     __be32 *p, tmp;
> > +     u32 eof;
> > +
> > +     if (nfserr)
> > +             return nfserr;
> > +     file = read->rd_nf->nf_file;
> > +
> > +     /* eof flag, segment count */
> > +     p = xdr_reserve_space(xdr, 4 + 4);
> > +     if (!p)
> > +             return nfserr_resource;
> > +     xdr_commit_encode(xdr);
> > +
> > +     maxcount = svc_max_payload(resp->rqstp);
> > +     maxcount = min_t(unsigned long, maxcount,
> > +                      (xdr->buf->buflen - xdr->buf->len));
> > +     maxcount = min_t(unsigned long, maxcount, read->rd_length);
> > +
> > +     eof = read->rd_offset >= i_size_read(file_inode(file));
> > +     if (!eof) {
> > +             nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> > +             segments++;
> > +     }
> > +
> > +     if (nfserr)
> > +             xdr_truncate_encode(xdr, starting_len);
> > +     else {
> > +             tmp = htonl(eof);
> > +             write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> > +             tmp = htonl(segments);
> > +             write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > +     }
> >
> >       return nfserr;
> >  }
> > @@ -4974,7 +5053,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
> >       [OP_LAYOUTSTATS]        = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_OFFLOAD_CANCEL]     = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_OFFLOAD_STATUS]     = (nfsd4_enc)nfsd4_encode_offload_status,
> > -     [OP_READ_PLUS]          = (nfsd4_enc)nfsd4_encode_noop,
> > +     [OP_READ_PLUS]          = (nfsd4_enc)nfsd4_encode_read_plus,
> >       [OP_SEEK]               = (nfsd4_enc)nfsd4_encode_seek,
> >       [OP_WRITE_SAME]         = (nfsd4_enc)nfsd4_encode_noop,
> >       [OP_CLONE]              = (nfsd4_enc)nfsd4_encode_noop,
> > --
> > 2.28.0
> >
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/5] NFSD: Return both a hole and a data segment
  2020-09-09 16:51     ` Anna Schumaker
@ 2020-09-09 20:24       ` J. Bruce Fields
  2020-09-09 20:44         ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2020-09-09 20:24 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: J. Bruce Fields, Chuck Lever, Linux NFS Mailing List

On Wed, Sep 09, 2020 at 12:51:38PM -0400, Anna Schumaker wrote:
> On Tue, Sep 8, 2020 at 3:49 PM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Tue, Sep 08, 2020 at 12:25:58PM -0400, schumaker.anna@gmail.com wrote:
> > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > >
> > > But only one of each right now. We'll expand on this in the next patch.
> > >
> > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > ---
> > > v5: If we've already encoded a segment, then return a short read if
> > >     later segments return an error for some reason.
> > > ---
> > >  fs/nfsd/nfs4xdr.c | 54 ++++++++++++++++++++++++++++++++++-------------
> > >  1 file changed, 39 insertions(+), 15 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > index 45159bd9e9a4..856606263c1d 100644
> > > --- a/fs/nfsd/nfs4xdr.c
> > > +++ b/fs/nfsd/nfs4xdr.c
> > > @@ -4603,7 +4603,7 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> > >  static __be32
> > >  nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > >                           struct nfsd4_read *read,
> > > -                         unsigned long maxcount,  u32 *eof)
> > > +                         unsigned long *maxcount, u32 *eof)
> > >  {
> > >       struct xdr_stream *xdr = &resp->xdr;
> > >       struct file *file = read->rd_nf->nf_file;
> > > @@ -4614,19 +4614,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > >       __be64 tmp64;
> > >
> > >       if (hole_pos > read->rd_offset)
> > > -             maxcount = min_t(unsigned long, maxcount, hole_pos - read->rd_offset);
> > > +             *maxcount = min_t(unsigned long, *maxcount, hole_pos - read->rd_offset);
> > >
> > >       /* Content type, offset, byte count */
> > >       p = xdr_reserve_space(xdr, 4 + 8 + 4);
> > >       if (!p)
> > >               return nfserr_resource;
> > >
> > > -     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> > > +     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, *maxcount);
> > >       if (read->rd_vlen < 0)
> > >               return nfserr_resource;
> > >
> > >       nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > > -                         resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
> > > +                         resp->rqstp->rq_vec, read->rd_vlen, maxcount, eof);
> > >       if (nfserr)
> > >               return nfserr;
> > >
> > > @@ -4634,7 +4634,7 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > >       write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> > >       tmp64 = cpu_to_be64(read->rd_offset);
> > >       write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> > > -     tmp = htonl(maxcount);
> > > +     tmp = htonl(*maxcount);
> > >       write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> > >       return nfs_ok;
> > >  }
> > > @@ -4642,11 +4642,19 @@ nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > >  static __be32
> > >  nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> > >                           struct nfsd4_read *read,
> > > -                         unsigned long maxcount, u32 *eof)
> > > +                         unsigned long *maxcount, u32 *eof)
> > >  {
> > >       struct file *file = read->rd_nf->nf_file;
> > > +     loff_t data_pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> > > +     unsigned long count;
> > >       __be32 *p;
> > >
> > > +     if (data_pos == -ENXIO)
> > > +             data_pos = i_size_read(file_inode(file));
> > > +     else if (data_pos <= read->rd_offset)
> > > +             return nfserr_resource;
> >
> > I think there's still a race here:
> >
> >         vfs_llseek(.,0,SEEK_HOLE) returns 1024
> >         read 1024 bytes of data
> >                                         another process fills the hole
> >         vfs_llseek(.,1024,SEEK_DATA) returns 1024
> >         code above returns nfserr_resource
> >
> > We end up returning an error to the client when we should have just
> > returned more data.
> 
> As long as we've encoded at least one segment successfully, we'll
> actually return a short read in this case (as of the most recent
> patches). I tried implementing a check for what each segment actually
> was before encoding, but it lead to a lot of extra lseeks (so
> potential for races / performance problems on btrfs). Returning a
> short read seemed like a better approach to me.

Argh, right, I forgot the "if (nfserr && segments == 0)" at the end of
nfsd4_encode_read_plus().

It's still possible to get a spurious error return if this happens at
the very start of the READ_PLUS, though.  Hm.  Might be better to just
encode another data segment?

--b.

> 
> Anna
> 
> >
> > --b.
> >
> > > +     count = data_pos - read->rd_offset;
> > > +
> > >       /* Content type, offset, byte count */
> > >       p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
> > >       if (!p)
> > > @@ -4654,9 +4662,10 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> > >
> > >       *p++ = htonl(NFS4_CONTENT_HOLE);
> > >        p   = xdr_encode_hyper(p, read->rd_offset);
> > > -      p   = xdr_encode_hyper(p, maxcount);
> > > +      p   = xdr_encode_hyper(p, count);
> > >
> > > -     *eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
> > > +     *eof = (read->rd_offset + count) >= i_size_read(file_inode(file));
> > > +     *maxcount = min_t(unsigned long, count, *maxcount);
> > >       return nfs_ok;
> > >  }
> > >
> > > @@ -4664,7 +4673,7 @@ static __be32
> > >  nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > >                      struct nfsd4_read *read)
> > >  {
> > > -     unsigned long maxcount;
> > > +     unsigned long maxcount, count;
> > >       struct xdr_stream *xdr = &resp->xdr;
> > >       struct file *file;
> > >       int starting_len = xdr->buf->len;
> > > @@ -4687,6 +4696,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > >       maxcount = min_t(unsigned long, maxcount,
> > >                        (xdr->buf->buflen - xdr->buf->len));
> > >       maxcount = min_t(unsigned long, maxcount, read->rd_length);
> > > +     count    = maxcount;
> > >
> > >       eof = read->rd_offset >= i_size_read(file_inode(file));
> > >       if (eof)
> > > @@ -4695,24 +4705,38 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > >       pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> > >       if (pos == -ENXIO)
> > >               pos = i_size_read(file_inode(file));
> > > +     else if (pos < 0)
> > > +             pos = read->rd_offset;
> > >
> > > -     if (pos > read->rd_offset) {
> > > -             maxcount = pos - read->rd_offset;
> > > -             nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
> > > +     if (pos == read->rd_offset) {
> > > +             maxcount = count;
> > > +             nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
> > > +             if (nfserr)
> > > +                     goto out;
> > > +             count -= maxcount;
> > > +             read->rd_offset += maxcount;
> > >               segments++;
> > > -     } else {
> > > -             nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> > > +     }
> > > +
> > > +     if (count > 0 && !eof) {
> > > +             maxcount = count;
> > > +             nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> > > +             if (nfserr)
> > > +                     goto out;
> > > +             count -= maxcount;
> > > +             read->rd_offset += maxcount;
> > >               segments++;
> > >       }
> > >
> > >  out:
> > > -     if (nfserr)
> > > +     if (nfserr && segments == 0)
> > >               xdr_truncate_encode(xdr, starting_len);
> > >       else {
> > >               tmp = htonl(eof);
> > >               write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> > >               tmp = htonl(segments);
> > >               write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > > +             nfserr = nfs_ok;
> > >       }
> > >
> > >       return nfserr;
> > > --
> > > 2.28.0
> > >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/5] NFSD: Add READ_PLUS data support
  2020-09-09 16:53     ` Anna Schumaker
@ 2020-09-09 20:25       ` J. Bruce Fields
  2020-09-09 20:50         ` J. Bruce Fields
  0 siblings, 1 reply; 14+ messages in thread
From: J. Bruce Fields @ 2020-09-09 20:25 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: J. Bruce Fields, Chuck Lever, Linux NFS Mailing List

On Wed, Sep 09, 2020 at 12:53:18PM -0400, Anna Schumaker wrote:
> On Tue, Sep 8, 2020 at 3:42 PM J. Bruce Fields <bfields@redhat.com> wrote:
> >
> > On Tue, Sep 08, 2020 at 12:25:56PM -0400, schumaker.anna@gmail.com wrote:
> > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > >
> > > This patch adds READ_PLUS support for returning a single
> > > NFS4_CONTENT_DATA segment to the client. This is basically the same as
> > > the READ operation, only with the extra information about data segments.
> > >
> > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > >
> > > ---
> > > v5: Fix up nfsd4_read_plus_rsize() calculation
> > > ---
> > >  fs/nfsd/nfs4proc.c | 20 +++++++++++
> > >  fs/nfsd/nfs4xdr.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++--
> > >  2 files changed, 101 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > index eaf50eafa935..0a3df5f10501 100644
> > > --- a/fs/nfsd/nfs4proc.c
> > > +++ b/fs/nfsd/nfs4proc.c
> > > @@ -2591,6 +2591,19 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > >       return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
> > >  }
> > >
> > > +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > > +{
> > > +     u32 maxcount = svc_max_payload(rqstp);
> > > +     u32 rlen = min(op->u.read.rd_length, maxcount);
> > > +     /*
> > > +      * Worst case is a single large data segment, so make
> > > +      * sure we have enough room to encode that
> > > +      */
> >
> > After the last patch we allow an unlimited number of segments.  So a
> > zillion 1-byte segments is also possible, and is a worse case.
> >
> > Possible ways to avoid that kind of thing:
> >
> >         - when encoding, stop and return a short read if the xdr-encoded
> >           result would exceed the limit calculated here.
> 
> Doesn't this happen automatically through calls to xdr_reserve_space()?

No, xdr_reserve_space() will keep us from running out of buffer
completely, but it won't check that ops come in under the estimates made
in read_plus_rsize().

--b.

> 
> >         - round SEEK_HOLE results up to the nearest (say) 512 bytes, and
> >           SEEK_DATA result down to the nearest 512 bytes.  May also need
> >           some logic to avoid encoding 0-length segments.
> >         - talk to linux-fsdevel, see if we can get a minimum hole
> >           alignment guarantee from filesystems.
> >
> > I'm thinking #1 is simplest.
> >
> > --b.
> >
> > > +     u32 seg_len = 1 + 2 + 1;
> > > +
> > > +     return (op_encode_hdr_size + 2 + seg_len + XDR_QUADLEN(rlen)) * sizeof(__be32);
> > > +}
> > > +
> > >  static inline u32 nfsd4_readdir_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > >  {
> > >       u32 maxcount = 0, rlen = 0;
> > > @@ -3163,6 +3176,13 @@ static const struct nfsd4_operation nfsd4_ops[] = {
> > >               .op_name = "OP_COPY",
> > >               .op_rsize_bop = nfsd4_copy_rsize,
> > >       },
> > > +     [OP_READ_PLUS] = {
> > > +             .op_func = nfsd4_read,
> > > +             .op_release = nfsd4_read_release,
> > > +             .op_name = "OP_READ_PLUS",
> > > +             .op_rsize_bop = nfsd4_read_plus_rsize,
> > > +             .op_get_currentstateid = nfsd4_get_readstateid,
> > > +     },
> > >       [OP_SEEK] = {
> > >               .op_func = nfsd4_seek,
> > >               .op_name = "OP_SEEK",
> > > diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
> > > index 0be194de4888..26d12e3edf33 100644
> > > --- a/fs/nfsd/nfs4xdr.c
> > > +++ b/fs/nfsd/nfs4xdr.c
> > > @@ -2173,7 +2173,7 @@ static const nfsd4_dec nfsd4_dec_ops[] = {
> > >       [OP_LAYOUTSTATS]        = (nfsd4_dec)nfsd4_decode_notsupp,
> > >       [OP_OFFLOAD_CANCEL]     = (nfsd4_dec)nfsd4_decode_offload_status,
> > >       [OP_OFFLOAD_STATUS]     = (nfsd4_dec)nfsd4_decode_offload_status,
> > > -     [OP_READ_PLUS]          = (nfsd4_dec)nfsd4_decode_notsupp,
> > > +     [OP_READ_PLUS]          = (nfsd4_dec)nfsd4_decode_read,
> > >       [OP_SEEK]               = (nfsd4_dec)nfsd4_decode_seek,
> > >       [OP_WRITE_SAME]         = (nfsd4_dec)nfsd4_decode_notsupp,
> > >       [OP_CLONE]              = (nfsd4_dec)nfsd4_decode_clone,
> > > @@ -4597,6 +4597,85 @@ nfsd4_encode_offload_status(struct nfsd4_compoundres *resp, __be32 nfserr,
> > >               return nfserr_resource;
> > >       p = xdr_encode_hyper(p, os->count);
> > >       *p++ = cpu_to_be32(0);
> > > +     return nfserr;
> > > +}
> > > +
> > > +static __be32
> > > +nfsd4_encode_read_plus_data(struct nfsd4_compoundres *resp,
> > > +                         struct nfsd4_read *read,
> > > +                         unsigned long maxcount,  u32 *eof)
> > > +{
> > > +     struct xdr_stream *xdr = &resp->xdr;
> > > +     struct file *file = read->rd_nf->nf_file;
> > > +     int starting_len = xdr->buf->len;
> > > +     __be32 nfserr;
> > > +     __be32 *p, tmp;
> > > +     __be64 tmp64;
> > > +
> > > +     /* Content type, offset, byte count */
> > > +     p = xdr_reserve_space(xdr, 4 + 8 + 4);
> > > +     if (!p)
> > > +             return nfserr_resource;
> > > +
> > > +     read->rd_vlen = xdr_reserve_space_vec(xdr, resp->rqstp->rq_vec, maxcount);
> > > +     if (read->rd_vlen < 0)
> > > +             return nfserr_resource;
> > > +
> > > +     nfserr = nfsd_readv(resp->rqstp, read->rd_fhp, file, read->rd_offset,
> > > +                         resp->rqstp->rq_vec, read->rd_vlen, &maxcount, eof);
> > > +     if (nfserr)
> > > +             return nfserr;
> > > +
> > > +     tmp = htonl(NFS4_CONTENT_DATA);
> > > +     write_bytes_to_xdr_buf(xdr->buf, starting_len,      &tmp,   4);
> > > +     tmp64 = cpu_to_be64(read->rd_offset);
> > > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + 4,  &tmp64, 8);
> > > +     tmp = htonl(maxcount);
> > > +     write_bytes_to_xdr_buf(xdr->buf, starting_len + 12, &tmp,   4);
> > > +     return nfs_ok;
> > > +}
> > > +
> > > +static __be32
> > > +nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > > +                    struct nfsd4_read *read)
> > > +{
> > > +     unsigned long maxcount;
> > > +     struct xdr_stream *xdr = &resp->xdr;
> > > +     struct file *file;
> > > +     int starting_len = xdr->buf->len;
> > > +     int segments = 0;
> > > +     __be32 *p, tmp;
> > > +     u32 eof;
> > > +
> > > +     if (nfserr)
> > > +             return nfserr;
> > > +     file = read->rd_nf->nf_file;
> > > +
> > > +     /* eof flag, segment count */
> > > +     p = xdr_reserve_space(xdr, 4 + 4);
> > > +     if (!p)
> > > +             return nfserr_resource;
> > > +     xdr_commit_encode(xdr);
> > > +
> > > +     maxcount = svc_max_payload(resp->rqstp);
> > > +     maxcount = min_t(unsigned long, maxcount,
> > > +                      (xdr->buf->buflen - xdr->buf->len));
> > > +     maxcount = min_t(unsigned long, maxcount, read->rd_length);
> > > +
> > > +     eof = read->rd_offset >= i_size_read(file_inode(file));
> > > +     if (!eof) {
> > > +             nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> > > +             segments++;
> > > +     }
> > > +
> > > +     if (nfserr)
> > > +             xdr_truncate_encode(xdr, starting_len);
> > > +     else {
> > > +             tmp = htonl(eof);
> > > +             write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> > > +             tmp = htonl(segments);
> > > +             write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > > +     }
> > >
> > >       return nfserr;
> > >  }
> > > @@ -4974,7 +5053,7 @@ static const nfsd4_enc nfsd4_enc_ops[] = {
> > >       [OP_LAYOUTSTATS]        = (nfsd4_enc)nfsd4_encode_noop,
> > >       [OP_OFFLOAD_CANCEL]     = (nfsd4_enc)nfsd4_encode_noop,
> > >       [OP_OFFLOAD_STATUS]     = (nfsd4_enc)nfsd4_encode_offload_status,
> > > -     [OP_READ_PLUS]          = (nfsd4_enc)nfsd4_encode_noop,
> > > +     [OP_READ_PLUS]          = (nfsd4_enc)nfsd4_encode_read_plus,
> > >       [OP_SEEK]               = (nfsd4_enc)nfsd4_encode_seek,
> > >       [OP_WRITE_SAME]         = (nfsd4_enc)nfsd4_encode_noop,
> > >       [OP_CLONE]              = (nfsd4_enc)nfsd4_encode_noop,
> > > --
> > > 2.28.0
> > >
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 4/5] NFSD: Return both a hole and a data segment
  2020-09-09 20:24       ` J. Bruce Fields
@ 2020-09-09 20:44         ` J. Bruce Fields
  0 siblings, 0 replies; 14+ messages in thread
From: J. Bruce Fields @ 2020-09-09 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: J. Bruce Fields, Chuck Lever, Linux NFS Mailing List

On Wed, Sep 09, 2020 at 04:24:00PM -0400, bfields wrote:
> On Wed, Sep 09, 2020 at 12:51:38PM -0400, Anna Schumaker wrote:
> > On Tue, Sep 8, 2020 at 3:49 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > > I think there's still a race here:
> > >
> > >         vfs_llseek(.,0,SEEK_HOLE) returns 1024
> > >         read 1024 bytes of data
> > >                                         another process fills the hole
> > >         vfs_llseek(.,1024,SEEK_DATA) returns 1024
> > >         code above returns nfserr_resource
> > >
> > > We end up returning an error to the client when we should have just
> > > returned more data.
> > 
> > As long as we've encoded at least one segment successfully, we'll
> > actually return a short read in this case (as of the most recent
> > patches). I tried implementing a check for what each segment actually
> > was before encoding, but it lead to a lot of extra lseeks (so
> > potential for races / performance problems on btrfs). Returning a
> > short read seemed like a better approach to me.
> 
> Argh, right, I forgot the "if (nfserr && segments == 0)" at the end of
> nfsd4_encode_read_plus().

(Oops, it's actually the "if (nfserr)" below that handles that case.)

> It's still possible to get a spurious error return if this happens at
> the very start of the READ_PLUS, though.  Hm.  Might be better to just
> encode another data segment?

So, I mean, if nfsd4_encode_read_plus_hole() doesn't find a hole after
all, just keep going, and create a data segment.

--b.

> 
> --b.
> 
> > 
> > Anna
> > 
> > >
> > > --b.
> > >
> > > > +     count = data_pos - read->rd_offset;
> > > > +
> > > >       /* Content type, offset, byte count */
> > > >       p = xdr_reserve_space(&resp->xdr, 4 + 8 + 8);
> > > >       if (!p)
> > > > @@ -4654,9 +4662,10 @@ nfsd4_encode_read_plus_hole(struct nfsd4_compoundres *resp,
> > > >
> > > >       *p++ = htonl(NFS4_CONTENT_HOLE);
> > > >        p   = xdr_encode_hyper(p, read->rd_offset);
> > > > -      p   = xdr_encode_hyper(p, maxcount);
> > > > +      p   = xdr_encode_hyper(p, count);
> > > >
> > > > -     *eof = (read->rd_offset + maxcount) >= i_size_read(file_inode(file));
> > > > +     *eof = (read->rd_offset + count) >= i_size_read(file_inode(file));
> > > > +     *maxcount = min_t(unsigned long, count, *maxcount);
> > > >       return nfs_ok;
> > > >  }
> > > >
> > > > @@ -4664,7 +4673,7 @@ static __be32
> > > >  nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > > >                      struct nfsd4_read *read)
> > > >  {
> > > > -     unsigned long maxcount;
> > > > +     unsigned long maxcount, count;
> > > >       struct xdr_stream *xdr = &resp->xdr;
> > > >       struct file *file;
> > > >       int starting_len = xdr->buf->len;
> > > > @@ -4687,6 +4696,7 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > > >       maxcount = min_t(unsigned long, maxcount,
> > > >                        (xdr->buf->buflen - xdr->buf->len));
> > > >       maxcount = min_t(unsigned long, maxcount, read->rd_length);
> > > > +     count    = maxcount;
> > > >
> > > >       eof = read->rd_offset >= i_size_read(file_inode(file));
> > > >       if (eof)
> > > > @@ -4695,24 +4705,38 @@ nfsd4_encode_read_plus(struct nfsd4_compoundres *resp, __be32 nfserr,
> > > >       pos = vfs_llseek(file, read->rd_offset, SEEK_DATA);
> > > >       if (pos == -ENXIO)
> > > >               pos = i_size_read(file_inode(file));
> > > > +     else if (pos < 0)
> > > > +             pos = read->rd_offset;
> > > >
> > > > -     if (pos > read->rd_offset) {
> > > > -             maxcount = pos - read->rd_offset;
> > > > -             nfserr = nfsd4_encode_read_plus_hole(resp, read, maxcount, &eof);
> > > > +     if (pos == read->rd_offset) {
> > > > +             maxcount = count;
> > > > +             nfserr = nfsd4_encode_read_plus_data(resp, read, &maxcount, &eof);
> > > > +             if (nfserr)
> > > > +                     goto out;
> > > > +             count -= maxcount;
> > > > +             read->rd_offset += maxcount;
> > > >               segments++;
> > > > -     } else {
> > > > -             nfserr = nfsd4_encode_read_plus_data(resp, read, maxcount, &eof);
> > > > +     }
> > > > +
> > > > +     if (count > 0 && !eof) {
> > > > +             maxcount = count;
> > > > +             nfserr = nfsd4_encode_read_plus_hole(resp, read, &maxcount, &eof);
> > > > +             if (nfserr)
> > > > +                     goto out;
> > > > +             count -= maxcount;
> > > > +             read->rd_offset += maxcount;
> > > >               segments++;
> > > >       }
> > > >
> > > >  out:
> > > > -     if (nfserr)
> > > > +     if (nfserr && segments == 0)
> > > >               xdr_truncate_encode(xdr, starting_len);
> > > >       else {
> > > >               tmp = htonl(eof);
> > > >               write_bytes_to_xdr_buf(xdr->buf, starting_len,     &tmp, 4);
> > > >               tmp = htonl(segments);
> > > >               write_bytes_to_xdr_buf(xdr->buf, starting_len + 4, &tmp, 4);
> > > > +             nfserr = nfs_ok;
> > > >       }
> > > >
> > > >       return nfserr;
> > > > --
> > > > 2.28.0
> > > >
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 2/5] NFSD: Add READ_PLUS data support
  2020-09-09 20:25       ` J. Bruce Fields
@ 2020-09-09 20:50         ` J. Bruce Fields
  0 siblings, 0 replies; 14+ messages in thread
From: J. Bruce Fields @ 2020-09-09 20:50 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: J. Bruce Fields, Chuck Lever, Linux NFS Mailing List

On Wed, Sep 09, 2020 at 04:25:34PM -0400, bfields wrote:
> On Wed, Sep 09, 2020 at 12:53:18PM -0400, Anna Schumaker wrote:
> > On Tue, Sep 8, 2020 at 3:42 PM J. Bruce Fields <bfields@redhat.com> wrote:
> > >
> > > On Tue, Sep 08, 2020 at 12:25:56PM -0400, schumaker.anna@gmail.com wrote:
> > > > From: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > >
> > > > This patch adds READ_PLUS support for returning a single
> > > > NFS4_CONTENT_DATA segment to the client. This is basically the same as
> > > > the READ operation, only with the extra information about data segments.
> > > >
> > > > Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
> > > >
> > > > ---
> > > > v5: Fix up nfsd4_read_plus_rsize() calculation
> > > > ---
> > > >  fs/nfsd/nfs4proc.c | 20 +++++++++++
> > > >  fs/nfsd/nfs4xdr.c  | 83 ++++++++++++++++++++++++++++++++++++++++++++--
> > > >  2 files changed, 101 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> > > > index eaf50eafa935..0a3df5f10501 100644
> > > > --- a/fs/nfsd/nfs4proc.c
> > > > +++ b/fs/nfsd/nfs4proc.c
> > > > @@ -2591,6 +2591,19 @@ static inline u32 nfsd4_read_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > > >       return (op_encode_hdr_size + 2 + XDR_QUADLEN(rlen)) * sizeof(__be32);
> > > >  }
> > > >
> > > > +static inline u32 nfsd4_read_plus_rsize(struct svc_rqst *rqstp, struct nfsd4_op *op)
> > > > +{
> > > > +     u32 maxcount = svc_max_payload(rqstp);
> > > > +     u32 rlen = min(op->u.read.rd_length, maxcount);
> > > > +     /*
> > > > +      * Worst case is a single large data segment, so make
> > > > +      * sure we have enough room to encode that
> > > > +      */
> > >
> > > After the last patch we allow an unlimited number of segments.  So a
> > > zillion 1-byte segments is also possible, and is a worse case.
> > >
> > > Possible ways to avoid that kind of thing:
> > >
> > >         - when encoding, stop and return a short read if the xdr-encoded
> > >           result would exceed the limit calculated here.
> > 
> > Doesn't this happen automatically through calls to xdr_reserve_space()?
> 
> No, xdr_reserve_space() will keep us from running out of buffer
> completely, but it won't check that ops come in under the estimates made
> in read_plus_rsize().

If it's easier, another option might be just: if we ever get a "small"
hole (say, less than 512 bytes), just give up and encode the rest of the
result as a single big data segment.

--b.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-09-09 20:50 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-08 16:25 [PATCH v5 0/5] NFSD: Add support for the v4.2 READ_PLUS operation schumaker.anna
2020-09-08 16:25 ` [PATCH v5 1/5] SUNRPC/NFSD: Implement xdr_reserve_space_vec() schumaker.anna
2020-09-08 16:25 ` [PATCH v5 2/5] NFSD: Add READ_PLUS data support schumaker.anna
2020-09-08 19:42   ` J. Bruce Fields
2020-09-09 16:53     ` Anna Schumaker
2020-09-09 20:25       ` J. Bruce Fields
2020-09-09 20:50         ` J. Bruce Fields
2020-09-08 16:25 ` [PATCH v5 3/5] NFSD: Add READ_PLUS hole segment encoding schumaker.anna
2020-09-08 16:25 ` [PATCH v5 4/5] NFSD: Return both a hole and a data segment schumaker.anna
2020-09-08 19:49   ` J. Bruce Fields
2020-09-09 16:51     ` Anna Schumaker
2020-09-09 20:24       ` J. Bruce Fields
2020-09-09 20:44         ` J. Bruce Fields
2020-09-08 16:25 ` [PATCH v5 5/5] NFSD: Encode a full READ_PLUS reply schumaker.anna

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.